XElement vs Dictionary

https://stackoverflow.com/questions/1415550

06-07-2019
|

Question

I need advice. I have application that imports 10,000 rows containing name & address from a text file into XElements that are subsequently added to a synchronized queue. When the import is complete the app spawns worker threads that process the XElements by deenqueuing them, making a database call, inserting the database output into the request document and inserting the processed document into an output queue. When all requests have been processed the output queue is written to disk as an XML doc.

I used XElements for the requests because I needed the flexibility to add fields to the request during processing. i.e. Depending on the job type the app might require that it add phone number, date of birth or email address to a request based on a name/address match against a public record database.

My questions is; The XElements seems to use quite a bit of memory and I know there is a lot of parsing as the document makes its way through the processing methods. I’m considering replacing the XElements with a Dictionary object but I’m skeptical the gain will be worth the effort. In essence it will accomplish the same thing.

Thoughts?

Solution

So you're not actually using any XML as such? You're just using XElement as a collection of name/value pairs? If so, I'd definitely use a dictionary. I would expect your code to potentially come out cleaner as well.

On the other hand if you're genuinely using XML, you probably want to stick with XElement.

Do you actually have a problem? You say it's using up quite a bit of memory - do you have enough memory? Could you buy more memory? That would almost certainly be cheaper than spending even a few hours refactoring, if it's just for the sake of saving memory. (It sounds like this app is only run on one box - I could be wrong. The more widely deployed it is, the more it probably makes sense to spend some time optimising it.)

EDIT: Okay, so buying more memory isn't really viable. Even so, do you actually have a problem? What's the impact of this perhaps using more memory than it needs? What's it really costing you?

OTHER TIPS

Using LINQ can make sense if you can avoid having to store the entire tree before using it.

I would look at doing as much processing as possible in building the query from each row.

You then take the query results and then process them, storing the result in the database.

This will reduce memory issues, as each row is only read in when needed and then processed and saved.

You may find this helpful: http://www.onedotnetway.com/tutorial-reading-a-text-file-using-linq/

Take the results of your query, do a for loop through each Customer and save the record:

var query =
        from c in
            (from line in File.ReadAllLines(filePath)
             let customerRecord = line.Split(',')
             select new Customer()
                 {
                     Firstname = customerRecord[0],
                     Lastname = customerRecord[1],
                     PhoneNumber = customerRecord[2],
                     City = customerRecord[3],
                     Country = customerRecord[4]
                 })
        where c.Country == "UK"
        select c;

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow