Question

I am using RavenDB to bulk load some documents. Is there a way to get the count of documents loaded into the database?

For insert operations I am doing:

BulkInsertOperation _bulk = docStore.BulkInsert(null, 
             new BulkInsertOptions{ CheckForUpdates = true});

foreach(MyDocument myDoc in docCollection)
     _bulk.Store(myDoc);

_bulk.Dispose();

And right after that I call the following:

session.Query<MyDocument>().Count();

but I always get a number which is less than the count I see in raven studio.

Was it helpful?

Solution

By default, the query you are doing limits to a sane number of results, part of RavenDB's promise to be safe by default and not stream back millions of records.

In order to get the number of a specific type of document in yoru database, you need a special map-reduce index whose job it is to track the counts for each document type. Because this type of index deals directly with document metadata, it's easier to define this in Raven Studio instead of trying to create it with code.

The source for that index is in this question but I'll copy it here:

// Index Name: Raven/DocumentCollections

// Map Query
from doc in docs
let Name = doc["@metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}

// Reduce Query
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }

Then to access it in your code you would need a class that mimics the structure of the anonymous type created by both the Map and Reduce queries:

public class Collection
{
    public string Name { get; set; }
    public int Count { get; set; }
}

Then, as Ayende notes in the answer to the previously linked question, you can get results from the index like this:

session.Query<Collection>("Raven/DocumentCollections")
       .Where(x => x.Name == "MyDocument")
       .FirstOrDefault();

Keep in mind, however, that indexes are updated asynchronously so after bulk-inserting a bunch of documents, the index may be stale. You can force it to wait by adding .Customize(x => x.WaitForNonStaleResults()) right after the .Query(...).

Raven Studio actually gets this data from the index Raven/DocumentsByEntityName which exists for every database, by sidestepping normal queries and getting metadata on the index. You can emulate that like this:

QueryResult result = docStore.DatabaseCommands.Query("Raven/DocumentsByEntityName",
    new Raven.Abstractions.Data.IndexQuery
    {
        Query = "Tag:MyDocument",
        PageSize = 0
    },
    includes: null,
    metadataOnly: true);

var totalDocsOfType = result.TotalResults;

That QueryResult contains a lot of useful data:

{
    Results: [ ],
    Includes: [ ],
    IsStale: false,
    IndexTimestamp: "2013-11-08T15:51:25.6463491Z",
    TotalResults: 3,
    SkippedResults: 0,
    IndexName: "Raven/DocumentsByEntityName",
    IndexEtag: "01000000-0000-0040-0000-00000000000B",
    ResultEtag: "BA222B85-627A-FABE-DC7C-3CBC968124DE",
    Highlightings: { },
    NonAuthoritativeInformation: false,
    LastQueryTime: "2014-02-06T18:12:56.1990451Z",
    DurationMilliseconds: 1
}

A lot of that is the same data you get on any query if you request statistics, like this:

RavenQueryStatistics stats;

Session.Query<Course>()
    .Statistics(out stats)
    // Rest of query
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top