Question

I have a very complex data-set that I need to easily aggregate and work with values at multiple levels.

For example, assume I have data on population and crime rate for each city in the US. Each city should roll up to a state, so the state population is the SUM of each city within it, and the crime rate is the AVERAGE of the crime rates of each city below it. Then I need each state to roll up to the US overall, maintaining the same calculation logic.

What is the best data structure to accomplish complex aggregations of hierarchically organized data in python?

Ideally I would be able to select a node, and then using some method feed the node an argument on what data to aggregate, and the logic to aggregate it with.

Was it helpful?

Solution

two words

use pandas

link to tutorial http://pandas.pydata.org/pandas-docs/stable/cookbook.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top