문제

I am working on a web application which is going to have billions of graphs each of size not more than 500 nodes. I'm trying to use Neo4j to store all my graph data. I have been doing research on how Neo4j can be scaled to achieve my purpose. After a lot of research I seek help from SO for the issues I am not clear about. Please excuse me if my issues are already on Internet and I was unable to figure them out.

  1. All my graphs are small and disconnected. So dividing them horizontally is not a problem for me. But I didn't find anywhere on Internet on how I can create more than one Neo4j database (NOT instance) to support my web application. Is there any way to do that? If it is not possible, is it going to be included in near future? EDIT: As mentioned in comments below I've used labeling to identify small graphs in my DB but every article related to horizontal sharding speaks about dividing the graph optimally and storing them in seperate databases. As of now I hardcode the db path '/data/graph.db' in neo4j-server.properties during configuration. Is there a way that I can hardcode two paths like that for 2 different dbs and on the fly decide which one to connect to?

  2. For a single graph database, I read about Cache Sharding and HA clustering combination to achieve high performance. All articles mention that every database instance can handle many requests. Could anyone give me approximate number of requests each database instance (slave of HA cluster) can handle as of now and if its going to be increased in near future?

The graph is sole of my web product and I would like to achieve the highest performance possible. Please help me better understand Neo4j to see if it suits my purpose. Any suggestions on other databases which would serve my purpose are most welcome. Thank you for your patience! :)

도움이 되었습니까?

해결책

I can answer 1. The idea of independent graphs or graph islands within a neo4j db can be implemented using lables starting Neo4j 2.0.

다른 팁

For 2), it highly depends on the type of queries and weather you can cache reads or not. Normally, you would set up an HA cluster, and direct queries of the same data vicinity to the same cluster nodes, thereby hitting warm caches. Do you have more details about your domain, data volume etc?

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top