XML Parsing performance DOM vs XOM

https://stackoverflow.com/questions/17209578

01-06-2022
|

Question

I wrote the same XML parsing algorithm in Java using different parser Parser X (XOM) and Parser Y (DOM). I embedded the code inside a 2 million times loop to imitate the numbers of operations I need to carry and used a Java profiler to monitor performance. Measurements are shown below.

                        Parser X (XOM)                      Parser Y (DOM)

Heap Memory                6.82                                 7.9
Non-heap memory            14                                   15
Garbage Collector     617 collections \ 2 sec               523 collections \ 1 sec
Up time                  1 m 53 s                              1 m 54 s    
CPU time                 1 m 2 s                               44.8 s

I have few questions.

What if I want to process about 2 million XMLs with sizes reaching 100 MB?. Which one is better for a better performance. Performance is measured against time (The one that finishes processing all XMLs faster regardless machine utilization as I have dedicated machine for this process). In short which one is better in terms of Memory VS CPU time VS uptime
Is it feasible to utilize the full CPU power to finish faster? Multi-threading?
If I want to measure performance. Should I use CPU time or Up time. I know that CPU time is the time dedicated by the CPU to finish the process while the up time is the total time taken on our watches by the machine to finish the process?
Why does Parser Y take the same up time as Parser X but with much lower CPU time despite the fact that this measurement is a mean not a result of a one run.
Is it feasible to make Parser Y's up time shorter so the difference in CPU time performance is reflected in the real life.

Solution 2

After expanding the code of both algorithms to cover a variety of operations, it turned out that the XOM parser was much faster in Up time with the same CPU time and lower memory foot print. XOM parser wins for me.

OTHER TIPS

If you want to process XML quickly, you should use a tool that will generate a custom XML reader from your schema directly. These avoid the general overhead of DOMs. They also tend to provide your application with direct access APIs to the specific XML content, including the data represented in a natural way (e.g., a float rather than text string for real number data).

Here are a few:

Altova
CodeSynthesis
XMLBooster (with some benchmarks)

I have no specific experience with these tools. (I did write one of these for internal purposes).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow