Question

we store apple app data in a database (http://www.apple.com/itunes/affiliates/resources/documentation/itunes-enterprise-partner-feed.html).

we want to optimize for one type of query: find all apps that meet some criteria. criteria: (1) avg rating of app; (2) number of app ratings; (3) devices supported by app; (4) countries where app is sold; (5) current price of app; and (6) date when app went free. the query should be as fast as possible. example query: "find all apps with > 600 ratings, averages 5 stars, supports iPads and iPhones, is sold in the US, and dropped their price to $0.00 two days ago."

based on the apple schema, there is price information for every country. assuming apple supports 100 countries, each app will have 100 prices -- one for each country. we also need to store the historical prices for each app, meaning an app with 10 price changes will have 1000 prices (assuming 100 countries).

three questions:

1) how do you advise we store the price data in mongo to make queries fast? right now, we're thinking of storing prices as an array of objects. each object consists of three elements: (1) date; (2) country; (3) price.

2) if we store price data as objects in an array, what do we need to do to make searches against price data very fast. again, the common price search is something like, "find all apps that dropped their price to $0.00 2 days again in the USA store."

3) any gotchas we should be mindful of in storing the data?

Was it helpful?

Solution

Personally, I would have a separate collection for the daily price data -- 1 record per day per app (the compound natural key), with that day's set of 100 numbers for that app. This way the records will never need to grow or relocate -- that's a big win. With proper indexes, most any query against this collection can be made to perform well. Keep the field names small for more efficient storage.

I would keep a separate collection for the app "master data" -- 1 record per app. In those records you can memoize the most recent date the app went free, a snapshot of the most recent by-country price vector, and similar snapshot values of any other "summary" data that may form the selection criteria for an app search. Aggregations to compute and record such values, should they may become costly, can then be performed in the background at convenient times.

Hope that's a help! Great that you're asking these questions up front. :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top