I am using a GTFS feed for an app I am working on. I am attempting to list all of the stops for a chosen route. Currently, I am attempting to order the list by stop_sequence, but this is not working properly since some trips do not go to every stop and the data I have received increments the stop_sequence by 1 per stop per trip. The significance of this is that the stop_sequence does not account for other trips that might have more or less stops.
Here's an example:
This is the order of the stops for a route, (ignoring the fact that not every trip will stop at each stop)
Stop A
Stop B
Stop C
Stop D
Stop E
Now here are some example trips for the route:
Trip 1: A, B, C, D
Trip 2: A, B, E
What my data is doing:
For Trip 1:
Stop A: stop_sequence = 1
Stop B: stop_sequence = 2
Stop C: stop_sequence = 3
Stop D: stop_sequence = 4
For Trip 2:
Stop A: stop_sequence = 1
Stop B: stop_sequence = 2
Stop E: stop_sequence = 3
So when I try to order all potential stops for a route I end up with this:
Stop A
Stop B
Stop C
Stop E
Stop D
which clearly is incorrect.
Does anyone know of any other potential ideas to correctly order the stops, perhaps using other data that comes with the GTFS Feed?
UPDATED with a real world example
Here is the example output of a database query that gets all of the stops for route 915.
This is for the AM schedule.
+---------+---------+---------------+------------------------------------------------+
| stop_id | trip_id | stop_sequence | stop_name |
+---------+---------+---------------+------------------------------------------------+
| 11771 | 1269287 | 1 | LOTTE PLAZA US 40 & US 29 |
| 11772 | 1269280 | 1 | HARPER'S FARM RD & CEDAR LA eb |
| 11773 | 1269280 | 2 | LITTLE PATUXENT & GRAY STAR wb |
| 11774 | 1269280 | 3 | LITTLE PATUXENT & WHITE CORD WAY wb |
| 11775 | 1269280 | 4 | LITTLE PATUXENT & BRIGHT PASSAGE eb |
| 11776 | 1269280 | 5 | LITTLE PATUXENT & HICKORY RID nb |
| 11777 | 1269280 | 6 | LITTLE PATUXENT & CEDAR LA eb |
| 11778 | 1269280 | 7 | LITTLE PATUXENT & HARPER'S FARM opp eb |
| 11779 | 1269280 | 8 | COLUMBIA MALL & SOUTH RING RD eb |
| 11782 | 1269280 | 9 | BROKEN LAND & HICKORY RIDGE sb |
| 11780 | 1269289 | 9 | LITTLE PATUXENT & GOV WARFIELD nb |
| 11783 | 1269280 | 10 | BROKEN LAND PARK & RIDE |
| 11781 | 1269289 | 10 | LITTLE PATUXENT & VANTAGE PT nb |
| 11784 | 1269280 | 11 | SCAGGSVILLE PARK & RIDE |
| 11785 | 1269280 | 12 | BURTONSVILLE PARK & RIDE |
| 11786 | 1269280 | 13 | COLESVILLE RD & FENTON ST sb |
| 11787 | 1269280 | 14 | SILVER SPRING METRO STATION |
| 11788 | 1269280 | 15 | WALTER REED HOSP & 16TH ST NW |
| 11789 | 1269280 | 16 | 16TH ST & P ST NW |
| 11790 | 1269280 | 17 | 16TH ST & M ST NW |
| 11718 | 1269280 | 18 | K ST & 16TH ST NW fs eb |
| 11719 | 1269280 | 19 | K ST & 14TH ST NW eb |
| 11791 | 1269280 | 20 | 13TH ST & H ST NW sb |
| 11759 | 1269280 | 21 | PENNSYLVANIA AVE & 12TH ST NW eb |
| 11793 | 1269280 | 22 | CONSTITUTION AVE & 10TH ST NW fs eb |
| 12046 | 1269280 | 23 | 7TH ST NW & CONSTITUTION AVE eb |
| 11650 | 1269280 | 24 | INDEPENDENCE AVE & 7/6 ST SW mid eb |
| 11601 | 1269280 | 25 | INDEPENDENCE AVE & 4TH/3RD ST SW eb |
| 13627 | 1269280 | 26 | M ST & 1st ST SE (NAVY YARD) sb |
| 13628 | 1269280 | 27 | M ST & 4th ST SE (SOUTHEAST FEDERAL CENTER) eb |
| 11569 | 1269280 | 28 | M ST & ISAAC HALL AVE SE eb |
| 11795 | 1269280 | 29 | M ST & 8/9TH STS mid eb |
+---------+---------+---------------+------------------------------------------------+
and here is the link to the pdf of the schedule that a lot of commuters are currently using. The first instance of where the two lists differ is after "COLUMBIA MALL & SOUTH RING RD eb"
http://mta.maryland.gov/sites/default/files/915May2011B.pdf
I am trying to make this app commuter friendly as possible, but when the stops are out of order when compared to what commuters usually use, it might cause a lot of confusion.
UPDATE 2:
I still do not see how topological sorting can be used to get the correct sequence. Yes it might give a valid sequence, but it is not guaranteed to be the correct sequence that a commuter will easily recognize. Let's look at another example using the pdf I provided. We will look at Trips 1 and 5 and up until the stop "Columbia Mall". I would create the following edges:
Edges created from Trip 1
Cedar Lane --> Gray Star Way
Gray Star Way --> White Cord Way
...
Harpers Farm Rd --> Columbia Mall
Edges created from Trip 5
Lotte Plaza --> Columbia Mall
The only thing that a topological sorting ensures is
for every directed edge uv from vertex u to vertex v, u comes before v in the ordering
That means that there are multiple valid orderings, but only one is the actual correct one that I want(but there is no way for me to progromatically choose this one over other valid orderings, at least not that I can think of).
A valid ordering might be (this is also the correct one):
Lotte Plaza,
Cedar Lane
Gray Star
...
Columbia Mall
or even
Cedar Lane
Gray Star
...
Lotte Plaza
Columbia Mall
As you can see, according to a topological sort, both of these are valid, but only one of them is the one I want. I cannot think of a way to consistently choose the correct sequence based upon the data provided by the GTFS feed.
Please let me know if I am looking at this the wrong way.