Question

I need to model airline flight data in a graph database (I am specifically working with neo4j, though I will consider others if that becomes problematic). My question is more about how to model this data in a way that will ease traversal and discovery of different flight options. A few specific examples of the type of data I would like to both store and later query:

1) A direct flight scenario like JFK->LAX. Seems straightforward, simple two node relationship. But there are many flights that may be of interest between these two nodes. So, if I need to store individual flight detail, is that best in an array on the relationship between the JFK and LAX nodes?

2) A flight scenario with multiple stops, like JFK->LAX->SAN. In this scenario, it seems like there modeling the relationship between the three nodes may be of limited utility if I'm interested in the departure and arrival city? i.e. I could have a relationship from JFK->SAN and the fact that there is a layover in LAX could be a property on that relationship?

If I need to query or traverse the graph based on arrays of data in relationships between nodes, and those arrays become large (e.g. 100 different flights between JFK and LAX), will that introduce performance or scalability problems?

Hopefully this question isn't too open-ended - I'm just trying to avoid building something that works for a small example model with ~5 nodes but can't scale to hundreds of airports and tens of thousands of flights.

Was it helpful?

Solution

Hundreds of airports and tens of thousands of flights is still a very small data set and I'd be surprised if that would be a problem in neo4j.

Off the top of my head you could perhaps have all the airports as their own nodes and each route could be its own node with relationships to all the airports it touches, maybe with an "order" property on each relationship which is local to the route.

         (ROUTE1)---------
         /    \           \
*order=1/      \*order=2   \*order=3
       v        v           v
    (JFK)       (LAX)      (SAN)

I'm sure there are better solutions.

OTHER TIPS

Check out Neo4J's contribution page

One of the winners of their contest was a gist describing US Flights and Airports it is very well done

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top