Question

I'm new to python and was curious as to how, given a large set of data consisting of census information, I could plot a histogram or graph of some sort. My main question is how to access the file, not exactly how the graph should be coded. Do I import the file directly? How do I extract the data from the file? How is this done?

Thanks

Was it helpful?

Solution

You cannot directly import a data file in a python script. You need to open the file for reading and then parse it according to the format of data stored in the file.

For reference, Here is an example of how to read a text file:

# To read all data at once
with open("/path/to/file.txt") as file_handle:
    file_contents = file_handle.read()

# To read one line at a time
with open("/path/to/file.txt") as file_handle:
    for line in file_handle:
        line = line.strip()
        # Do more stuff with line

OTHER TIPS

what format is your data in? Python offers modules to read data from a variety of data formats (CSV, JSON, XML, ...) CSV is a very common one that suffices for many cases (the csv module is part of the standard library)

Typically you write a small routine that casts the different fields as expected (string to floating point numbers, or dates, integers,...) and cast your data in a numpy matrix (np.array) where each row corresponds to a sample and each column to an observation

for the plots check matplotlib. It is really easy to generate graphs, especially if you have some previous experience with Matlab

It all depends on how data is stored into the file (csv, xml, yaml, json, excel, ...)

Hopefully, you'll find that there is a library for that very format (e.g. csv: http://docs.python.org/2/library/csv.html)

Once you can read the file and get data, you have to store it in a suitable data structure and then pass it to some plotting library.

The plotting library can be a Python library (e.g. Matplotlib) or a separate software (e.g. FusionCharts )

Here is a sample schema (you can skip some steps):

Data on disk (e.g. CSV) or on DB --> Read data and store internally --> transform data into plotting format (e.g. XML) --> give data to plotting library

Keep in mind MVC pattern!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top