سؤال

I am having an huge XML file containing the Resumes. This file is in two format viz- A single master file containing all the Resumes for ex-

<Resumes>
  <Resume>
    <Name>ABC</Name>
    ......
    ......
  </Resume>
  <Resume>
    <Name>PQR</Name>
    ......
    ......
  </Resume>
  ......
  ......
</Resumes>

and multiple files for ex-

file 1-

<Resumes>
  <Resume>
    <Name>ABC</Name>
    ......
    ......
  </Resume>
</Resumes>

file-2

<Resumes>
  <Resume>
    <Name>PQR</Name>
    ......
    ......
  </Resume>
</Resumes>

and so on.

I want to use baseX or eXist XML DB for storing the XML. So in future, if I want to add more Resumes (in XML) format then which one will be better?

هل كانت مفيدة؟

المحلول

For eXist-db, let me quote from a post on exist-open by Wolfgang Meier in response to a similar question:

I don't think there's a theoretical limit, but there are certainly some practical considerations. Storing a very large document can block the db more than storing many small ones. It requires a single transaction (and sufficient disk space for the transaction log).

The dblp bibliography, which I use for some automated performance tests, comes as a single document with more than 600mb. This loads well if you slightly increase the cache size and memory settings. I know other users have to deal with much larger documents (many gigabytes), but if you have a choice, I would definitely recommend to split your data in smaller chunks, which are easier to handle.

Granted, eXist-db has become even more efficient and robust since November 2009 when Wolfgang wrote this post, but I think his advice still applies. Two final notes:

  1. Make sure you use the latest version of eXist, e.g. either 1.4.2 or the 2.0 Tech Preview. These benefit from the advances I spoke about.

  2. To squeeze out the most performance of eXist-db, read the eXist-db documentation article entitled, Performance Tuning.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top