Parsing CSV data based on header fields using Pyparsing

Question

CSV file processing

Check documentation for csv module, being builtin one, and there you will find DictReader, which allows you to process CSV file with a header, and providing iterator, which for each record/line returns a dictionary having for each field name a key and related value.

Having this data in "data.csv" file:

name;surname
Jan;Vlcinsky
Pieter;Pan
Jane;Fonda

you can then test from console:

>>> from csv import DictReader
>>> fname = "data.csv"
>>> f = open(fname)
>>> reader = DictReader(f, delimiter=";")
>>> for rec in reader:
...     print rec
...
{'surname': 'Vlcinsky', 'name': 'Jan'}
{'surname': 'Pan', 'name': 'Pieter'}
{'surname': 'Fonda', 'name': 'Jane'}

Using your data and emulating open files using StringIO:

from StringIO import StringIO
from csv import DictReader

data1 = """
FirstName Surname Address Notes PurchaseOrder OrderDate
"Bob" "Smith" "123 Lucky Street" "Bad customer" "123ABC", 2013/10/20
"Zoe" "Jackson" "5 Mountain View Street" "Good customer" "abc211" 2014/01/01
""".strip()


data2 = """
FirstName Surname Address PhoneHome PhoneMobile PurchaseOrder OrderDate Total
"Bob" "Smith" "123 Lucky Street" "12345678" "1234567890" "123ABC" 2013/10/20, $100
"Zoe" "Jackson" "5 Mountain View Street" "87654321" "0987654321" "abc211" 2014/01/01 $1000
""".strip()

buf1 = StringIO(data1)
buf2 = StringIO(data2)

reader = DictReader(buf1, delimiter=" ")
for rec in reader:
    print rec

print "---next one comes---"

reader = DictReader(buf2, delimiter=" ")
for rec in reader:
    print rec

What will show:

{'Surname': 'Smith', 'FirstName': 'Bob', 'Notes': 'Bad customer', 'PurchaseOrder': '123ABC,', 'Address': '123 Lucky Street', 'OrderDate': '2013/10/20'}
{'Surname': 'Jackson', 'FirstName': 'Zoe', 'Notes': 'Good customer', 'PurchaseOrder': 'abc211', 'Address': '5 Mountain View Street', 'OrderDate': '2014/01/01'}
---next one comes---
{'Surname': 'Smith', 'FirstName': 'Bob', 'PhoneMobile': '1234567890', 'PhoneHome': '12345678', 'PurchaseOrder': '123ABC', 'Address': '123 Lucky Street', 'Total': '$100', 'OrderDate': '2013/10/20,'}
{'Surname': 'Jackson', 'FirstName': 'Zoe', 'PhoneMobile': '0987654321', 'PhoneHome': '87654321', 'PurchaseOrder': 'abc211', 'Address': '5 Mountain View Street', 'Total': '$1000', 'OrderDate': '2014/01/01'}

This way we have the parsing part done and the only remaining thing is to create proper objects from them later on.

Playing with classes and printing

The question is using PyParser as sort of class instances. Here comes an example, how we can create classes of our own.

File classes.py:

class Base():
    templ = """
    - Base:
        - ????
    """
    reprtempl = "<Base: {self.__dict__}>"
    def report(self):
        return self.templ.strip().format(self=self)
    def __repr__(self):
        return self.reprtempl.format(self=self)


class Customer(Base):
    templ = """
    - Customer:
        - Address: {self.address}
        - Firstname: {self.first_name}
        - Surname: {self.surname}
        - Notes: {self.notes}
    """
    reprtempl = "<Customer: {self.__dict__}>"

    def __init__(self, first_name, surname, address, phone_home=None, phone_mobile=None, notes=None, **kwargs):
        self.first_name = first_name
        self.surname = surname
        self.address = address
        self.notes = notes
        self.phone_home = phone_home
        self.phone_mobile = phone_mobile

class Order(Base):
    templ = """
    - Order:
        - Order_date: {self.order_date}
        - Purchase_order: {self.purchase_order}
        - Total: {self.total}
    """
    reprtempl = "<Order: {self.__dict__}>"

    def __init__(self, order_date, purchase_order, total=None, **kwargs):
        self.order_date = order_date
        self.purchase_order = purchase_order
        self.total = total

if __name__ == "__main__":
    customer_dct = {"first_name": "Bob", "surname": "Smith", "address": "Sezam Street 1A",
            "phone_home": "11223344", "phone_mobile": "88990077"}
    customer = Customer(**customer_dct)
    print customer
    print customer.report()
    order_dct = {"order_date": "2014/01/01", "purchase_order": "abc211", "total": "$12"}
    order = Order(**order_dct)
    print order
    print order.report()

Base class is implementing __repr__ and report and is common base for following classes Customer and Order.

Constructors are using default values (for cases, we expect given attribute being sometime missing) and **kwargs which makes the constructor tolerant to extra (unexpected) named parameters.

Final section if __name__ ... include short testing code. If you run

$ python classes.py

you would see class instance and use in action.

Using classes togather with csv reading

Note: Following code uses a bit modified names of fields - just to follow naming conventions in Python classes. Original field names would be usable, but to follow naming conventions in the classes, some keyword translation step would have to be added (and I skipped that).

from StringIO import StringIO
from csv import DictReader
from classes import Customer, Order

data1 = """
first_name surname address notes purchase_order order_date
"Bob" "Smith" "123 Lucky Street" "Bad customer" "123ABC", 2013/10/20
"Zoe" "Jackson" "5 Mountain View Street" "Good customer" "abc211" 2014/01/01
""".strip()


data2 = """
first_name surname address phone_home phone_mobile purchase_order order_date total
"Bob" "Smith" "123 Lucky Street" "12345678" "1234567890" "123ABC" 2013/10/20, $100
"Zoe" "Jackson" "5 Mountain View Street" "87654321" "0987654321" "abc211" 2014/01/01 $1000
""".strip()

buf1 = StringIO(data1)
buf2 = StringIO(data2)

reader = DictReader(buf1, delimiter=" ")
for rec in reader:
    print rec
    customer = Customer(**rec)
    print customer.report()
    order = Order(**rec)
    print order
    print order.report()

print "---next one comes---"

reader = DictReader(buf2, delimiter=" ")
for rec in reader:
    print rec
    customer = Customer(**rec)
    print customer.report()
    order = Order(**rec)
    print order
    print order.report()

Conclusions

python csv allows reading into DictReader, which provides records in form of dictionary item
custom classes in Python can be created, can allow construction using set of parameters from keyword, and allow implementation of handy methods (here e.g. report).
example could be further extended, e.g. to manage relations between customer and order, but this is out of scope of this answer.