Object-oriented scientific data processing, how to cleverly fit data, analysis and visualization in objects?

StackOverflow https://stackoverflow.com/questions/17522492

سؤال

As a biology undergrad i'm often writing python software in order to do some data analysis. The general structure is always :

There is some data to load, perform analysis on (statistics, clustering...) and then visualize the results.

Sometimes for a same experiment, the data can come in different formats, you can have different ways to analyses them and different visualization possible which might or not depend of the analysis performed.

I'm struggling to find a generic "pythonic" and object oriented way to make it clear and easily extensible. It should be easy to add new type of action or to do slight variations of existing ones, so I'm quite convinced that I should do that with oop.

I've already done a Data object with methods to load the experimental data. I plan to create inherited class if I have multiple data source in order to override the load function.

After that... I'm not sure. Should I do a Analysis abstract class with child class for each type of analysis (and use their attributes to store the results) and do the same for Visualization with a general Experiment object holding the Data instance and the multiple Analysis and Visualization instances ? Or should the visualizations be functions that take an Analysis and/or Data object(s) as parameter(s) in order to construct the plots ? Is there a more efficient way ? Am I missing something ?

هل كانت مفيدة؟

المحلول

Your general idea would work, here are some more details that will hopefully help you to proceed:

  • Create an abstract Data class, with some generic methods like load, save, print etc.
  • Create concrete subclasses for each specific form of data you are interested in. This might be task-specific (e.g. data for natural language processing) or form-specific (data given as a matrix, where each row corresponds to a different observation)
  • As you said, create an abstract Analysis class.
  • Create concrete subclasses for each form of analysis. Each concrete subclass should override a method process which accepts a specific form of Data and returns a new instance of Data with the results (if you think the form of the results would be different of that of the input data, use a different class Result)
  • Create a Visualization class hierarchy. Each concrete subclass should override a method visualize which accepts a specific instance of Data (or Result if you use a different class) and returns some graph of some form.

I do have a warning: Python is abstract, powerful and high-level enough that you don't generally need to create your own OO design -- it is always possible to do what you want with mininal code using numpy, scipy, and matplotlib, so before start doing the extra coding be sure you need it :)

نصائح أخرى

It has been a while since you asked your question, but this might be interesting.

I created and actively develop a python library to do exactly this (albeit with a slightly broader scope). It is designed so that you can fully customize your data processing, while still having some basic tools (including for plot

The library is called Experiment NoteBook (enb), and is available in github (https://github.com/miguelinux314/experiment-notebook) and via pip (e.g., pip install enb).

I recommend any interested reader to take a look at the tutorial-like documentation (https://miguelinux314.github.io/experiment-notebook) to get an idea of the intended workflow.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top