Question

I am processing some input files and inserting the obtained records as CouchDB documents. I have noticed that the insert speed is decreasing in pace with database size increase.

What I do is:

  1. Read data from input file
  2. Process the data to obtain the structured documents
  3. Put the documents in a local buffer
  4. As soon as the buffer has 1000 documents, perform a couchdb bulk insert
  5. Repeat until input data has been fully processed

Here you have the log of my current run:

2012-03-15 10:15:58,716 - docs= 10000 rate=2282.38 entries/s
2012-03-15 10:16:46,748 - docs=100000 rate=1822.76 entries/s
2012-03-15 10:17:47,433 - docs=200000 rate=1592.01 entries/s
2012-03-15 10:18:48,566 - docs=300000 rate=1358.32 entries/s
2012-03-15 10:19:54,637 - docs=400000 rate=1572.55 entries/s
2012-03-15 10:21:01,690 - docs=500000 rate=1560.41 entries/s
2012-03-15 10:22:09,400 - docs=600000 rate=1556.22 entries/s
2012-03-15 10:23:16,153 - docs=700000 rate=1550.21 entries/s
2012-03-15 10:24:30,850 - docs=800000 rate=1393.61 entries/s
2012-03-15 10:25:46,099 - docs=900000 rate=1336.83 entries/s
2012-03-15 10:27:09,290 - docs=1000000 rate= 871.37 entries/s
2012-03-15 10:28:31,745 - docs=1100000 rate=1256.36 entries/s
2012-03-15 10:29:53,313 - docs=1200000 rate=1140.49 entries/s
2012-03-15 10:31:29,207 - docs=1300000 rate=1080.79 entries/s
2012-03-15 10:33:23,917 - docs=1400000 rate= 741.65 entries/s
2012-03-15 10:35:45,475 - docs=1500000 rate= 567.96 entries/s
2012-03-15 10:39:04,293 - docs=1600000 rate= 564.01 entries/s
2012-03-15 10:42:20,160 - docs=1700000 rate= 499.29 entries/s
2012-03-15 10:46:06,270 - docs=1800000 rate= 505.04 entries/s
2012-03-15 10:50:24,745 - docs=1900000 rate= 402.14 entries/s
2012-03-15 10:55:23,800 - docs=2000000 rate= 346.19 entries/s
2012-03-15 11:02:03,217 - docs=2100000 rate= 274.59 entries/s
2012-03-15 11:08:21,690 - docs=2200000 rate= 269.57 entries/s

The "rate" shows the rate of insertion of the last thousand documents, which as you can see is degrading very fast.

  • Is this normal?
  • Can I do something to keep a high insert rate?
  • Do you have experience with big CouchDB databases.
  • Any advices that you would like to share?
Was it helpful?

Solution

The high insert rates are anomalous, caused by everything fitting neatly into your disk cache. As your database size increases you will eventually need to read data from disk in order to update the btree. It would be better to run an insert test for longer, graph it, and then you should see that the huge spike at the front is the oddity, not the lower but more or less constant rate that follows it.

From other threads you've asked this question on, another significant factor is that you used fully random uuid's. Because CouchDB is based on b+tree, inserting fully random ids is the worst possible scenarios for update. CouchDB ships with a number of uuid algorithms, the default, called 'sequential' returns values with a very low chance of collisions that are still sequential enough to give much better insert performance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top