Question

I'm using RavenDB to store a collection of incidents. These incidents have a date that I'm using to group by day (DateTime.Date). I'm trying to add some stats by hour, but I can't seem to find a way to do it cleanly.

the simple way:

public class DailyStats : AbstractIndexCreationTask<Incident, DateStat>
{
    public DailyStats()
    {
        Map = docs => from doc in docs
                      select new 
                                 {
                                     doc.OccuredOn,
                                     Hour0 = doc.OccuredOn.Hour == 0 ? 1 : 0
                                     Hour1 = doc.OccuredOn.Hour == 1 ? 1 : 0
                                     //....
                                 };

        Reduce = mapped => from m in mapped
                           group m by new { m.Date.Date }
                           into g
                           select new
                                      {
                                          g.Key.Date,
                                          Hour0 = g.Sum(x => x.Hour0),
                                          Hour1 = g.Sum(x => x.Hour1)
                                          //....
                                      }
    }
}

but this is horribly repetitive. Instead, I'm trying to use a dictionary:

public class DailyStats : AbstractIndexCreationTask<Incident, DateStat>
{
    public DailyStats()
    {
        Map = docs => from doc in docs
                      select new 
                                 {
                                     doc.OccuredOn,
                                     IncidentsByHour = Enumerable.Range(0, 24).ToDictionary(h => h, h => doc.IncidentDate.Hour == h ? 1 : 0),
                                 };

        Reduce = mapped => from m in mapped
                           group m by new { m.Date.Date }
                           into g
                           select new
                                      {
                                          g.Key.Date,
                                          IncidentsByHour = Enumerable.Range(0, 24).Select(h => g.Sum(x => x.IncidentsByHour[h])),
                                      }
    }
}

which throws the exception:

Line 201, Position 22: Error CS1502 - The best overloaded method match for 'System.Linq.Enumerable.ToDictionary(System.Collections.Generic.IEnumerable, System.Func, System.Collections.Generic.IEqualityComparer)' has some invalid arguments Line 201, Position 72: Error CS1503 - Argument 2: cannot convert from 'System.Func' to 'System.Func' Line 201, Position 106: Error CS1503 - Argument 3: cannot convert from 'System.Func' to 'System.Collections.Generic.IEqualityComparer' Line 274, Position 22: Error CS1928 - 'System.Collections.Generic.IEnumerable' does not contain a definition for 'Select' and the best extension method overload 'System.Linq.Enumerable.Select(System.Collections.Generic.IEnumerable, System.Func)' has some invalid arguments Line 274, Position 54: Error CS1503 - Argument 2: cannot convert from 'System.Func' to 'System.Func'

I'm not sure how to resolve this exception, since it's happening on the Raven side.

The reason for grouping by day is I need to pull 365 days worth of stats, but still have some basic information by hour. Would it be better to instead have two indexes, one by day and one by hour (for a total of 365 + 24 records loaded. My understanding is that bigger but fewer indices are best)?

Was it helpful?

Solution

Try this:

public class DailyStats : AbstractIndexCreationTask<Incident, DateStat>
{
  public DailyStats()
  {
    Map = docs =>
      from doc in docs
      select new
      {
        Date = doc.OccuredOn,
        IncidentsByHour = new Dictionary<int, int> { { doc.OccuredOn.Hour, 1 } }
      };

    Reduce = mapped =>
      from m in mapped
      group m by new { m.Date.Date }
      into g
      select new
      {
        Date = g.Key,
        IncidentsByHour = g.SelectMany(x => x.IncidentsByHour)
                           .GroupBy(x => x.Key)
                           .OrderBy(x => x.Key)
                           .ToDictionary(x => x.Key, x => x.Sum(y => y.Value))
      };
  }
}

The only difference here is that you won't get any items in your dictionary for hours that have no incidents.

There is indeed some kind of bug with Raven still. The map should be able to be written with this:

IncidentsByHour = Enumerable.Range(0, 24)
                      .ToDictionary(h => h, h => doc.OccuredOn.Hour == h ? 1 : 0)

But it fails for some strange reason. I'll report that as a bug.

And yes, it is usually better to have fewer larger indexes than many small ones.

OTHER TIPS

Depending on what you want your results to look like, you might try a faceted search. http://ravendb.net/docs/2.5/client-api/faceted-search

Obviously this would only work if you have already drilled down to the day you are interested in. I would also write code to generate the range, but it would look something like the following:

var myCoolStuff = session.Query<Incident, SomeIndex>().Where().ToFacet(
new List<Facet>
          {
              new Facet
                  {
                      Name = "OccuredOn"
                      Mode = FacetMode.Ranges,
                      Ranges =
                          {
                              "[2013-01-01T00:00 TO 2013-01-01T01:00]",
                              "[2013-01-01T01:00 TO 2013-01-01T02:00]",
                              "[2013-01-01T02:00 TO 2013-01-01T03:00]",
                              "[2013-01-01T03:00 TO 2013-01-01T04:00]",
                              "[2013-01-01T04:00 TO 2013-01-01T05:00]",
                              "[2013-01-01T05:00 TO 2013-01-01T06:00]",
                              "[2013-01-01T06:00 TO 2013-01-01T07:00]",
                              "[2013-01-01T07:00 TO 2013-01-01T08:00]",
                              "[2013-01-01T08:00 TO 2013-01-01T09:00]",
                              "[2013-01-01T09:00 TO 2013-01-01T10:00]",
                              "[2013-01-01T10:00 TO 2013-01-01T11:00]",
                              "[2013-01-01T11:00 TO 2013-01-01T12:00]",
                              "[2013-01-01T12:00 TO 2013-01-01T13:00]",
                              "[2013-01-01T13:00 TO 2013-01-01T14:00]",
                              "[2013-01-01T14:00 TO 2013-01-01T15:00]",
                              "[2013-01-01T15:00 TO 2013-01-01T16:00]",
                              "[2013-01-01T16:00 TO 2013-01-01T17:00]",
                              "[2013-01-01T17:00 TO 2013-01-01T18:00]",
                              "[2013-01-01T18:00 TO 2013-01-01T19:00]",
                              "[2013-01-01T19:00 TO 2013-01-01T20:00]",
                              "[2013-01-01T20:00 TO 2013-01-01T21:00]",
                              "[2013-01-01T21:00 TO 2013-01-01T22:00]",
                              "[2013-01-01T22:00 TO 2013-01-01T23:00]",
                              "[2013-01-01T23:00 TO 2013-01-02T00:00]"
                          }                      
});
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top