With almost 37,000 songs in your text-file database, and a significant number of ways for users to access it (by year, year-range, rank, rank-range, ...), including the possibility of sorting, I think it's almost required that you pre-process your song-database with a one-time-only process--once every time the database is changed--to optimize your data before ever giving your users the chance to query against it.
(I am making the assumption here that you can't use a real database, which would be the ideal solution)
The first thing I would recommend is to assign a unique key (ID) to each song, starting with 1
, which, in code, should be represented with a long. Make this the first column in your database. For example, call it agazillionsongs_with_id.txt
:
1 2008 50 Ashley Tisdale He Said, She Said
2 2008 123 Taylor Swift Teardrops On My Guitar
3 2008 233 Finger Eleven Paralyzer
4 2008 258 Paramore Misery Business
...
470 2007 251 Hannah Montana True Friend
471 2006 1 Beyonce Irreplaceable
...
Now create additional sub-tables (indexes), in separate text files, each of which simply refer to the song's key. The simplest one is a year-index, which could be stored in a file called agazillionsongs_sub_year.txt
:
2008 1
2008 2
2008 3
2008 4
...
2007 470
...
2006 471
...
Alternatively, you could store each year in this form (I prefer this format)
2008 1, 2, 3, 4, ...
2007 470, ...
2006 471, ...
Either way, this could be represented in code as a yearMap
object, which is a Map<Integer,List<Song>>
, where the value is a reference to the appropriate Song
object. Once these maps are created, outputting them to file is easy.
Do this also with rank: agazillionsongs_sub_rank.txt
/ rankMap
/ Map<Integer,List<Song>>
Indexing by name and title is trickier--do you index the whole name, literally or with some concept of "fuzziness", or only the beginning, ...? It's a hard but important concept.
The farther you can take this idea, the more ways you slice-and-dice your data, the faster your users can query the database. This is because the need for reading through the full song-database each time, putting every line into a Song
object, is eliminated.
Instead, this pre-processing allows you to know exactly which rows in the database need to be retrieved. So you can ignore all other lines and discard them.
I hope this helps you.