Вопрос

Our company has a need to store and compute analytics related to content creation, review/approval and publishing workflow for documents. We are looking at something like Amazon SimpleDB.

We will store "events" which correspond to actions that users take in the system. For instance:

  • [User B] requested [document B] be reviewed at [Time] by [User A]
  • [User A] approved [document B] at [Time]
  • [User B] edited [document B] at [Time]
  • [User B] published [document B] at [Time]

Then we want to be able to create graphs (histogram/line plot) of this activity for given time periods. For instance:

  • Edits vs Time
  • Approvals vs Time
  • Publishes vs Time
  • Approvals vs Publishes vs Time

In SQL I assume this would be done by grouping results into "buckets". However, I am having a hard time figuring out how to do this with a NoSQL db like AWS Simpledb without batching this processing using Hadoop/Map Reduce. This has to be realtime so doing any batch processing is out of the question.

We are also looking at Neo4J so if someone has a solution for Neo I would be interested as well.

Thanks

Это было полезно?

Решение 2

And you would use "Action-Nodes" to model Approval, Publication, Edits so you can connect more than two things to it.

For modeling time I'd recommend a ordered list of events or even a time tree: http://docs.neo4j.org/chunked/milestone/cypher-cookbook-path-tree.html

I create a small GraphGist for you to show it, check it out:

http://gist.neo4j.org/?9263624

Другие советы

In Neo4j's Cypher, you can collect things into buckets with CASE/WHEN and aggregation syntax.

I think data-driven visualization is more suitable for your scenario.

I am yes referring to D3 with MongoDB for storage.

Cube collects timestamped events for storage in a MongoDB database.

Cubism.js (a D3 plugin) does the visualization for you.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top