Question

Okay so I have a transaction file:

IN CU
     Customer_ID=
     Last_Name=Johnston
     First_Name=Karen
     Street_Address=291 Stone Cr
     City=Toronto
//
IN VE
     License_Plate#=LSR976
     Make=Cadillac
     Model=Seville
     Year=1996
     Owner_ID=779
//
IN SE
     Vehicle_ID=LSR976
     Service_Code=461
     Date_Scheduled=00/12/19

IN means insert, and CU (means customer) refers to what file we are writing too, in this case it's customer.diff. The problem I'm having is that I need to go through each line, and check the value of each field (Customer_ID) for example. You see how Customer_ID is left blank? I need to replace any numeric blank fields with a value of 0, so for example Customer_ID=0 in this case. Here's what I have so far but nothing is changing:

def insertion():
    field_names = {'Customer_ID=': 'Customer_ID=0',
'Home_Phone=':'Home_Phone=0','Business_Phone=': 'Business_Phone=0'}

    with open('xactions.two.txt', 'r') as from_file:
        search_lines = from_file.readlines()


    if search_lines[3:5] == 'CU':
        for i in search_lines:
            if field_names[i] == True:
                with open('customer.diff', 'w') as to_file:
                    to_file.write(field_names[i])

Thanks

Était-ce utile?

La solution

Why not try something a little simpler? I haven't tested this code.

def insertion():
    field_names = {'Customer_ID=': 'Customer_ID=0',
'Home_Phone=':'Home_Phone=0','Business_Phone=': 'Business_Phone=0'}

with open('xactions.two.txt', 'r') as from_file:
    with open('customer.diff', 'w') as to_file:
        for line in from_file:
            line = line.rstrip("\n")
            found = False
            for field in field_names.keys():
                if field in line:
                   to_file.write(line + "0")
                   found = True
            if not found:
                to_file.write(line)
            to_file.write("\n")

Autres conseils

Here's a fairly comprehensive approach; it's a bit long, but not as complicated as it looks!

I assume Python 3.x, although it should work in Python 2.x with few changes. I make extensive use of generators to stream data through rather than holding it in memory.

To start with: we are going to define the expected data-type for each field. Some fields do not correspond to built-in Python data types, so I start by defining some custom data types for those fields:

import time

class Date:
    def __init__(self, s):
        """
        Parse a date provided as "yy/mm/dd"
        """
        if s.strip():
            self.date = time.strptime(s, "%y/%m/%d")
        else:
            self.date = time.gmtime(0.)

    def __str__(self):
        """
        Return a date as "yy/mm/dd"
        """
        return time.strftime("%y/%m/%d", self.date)

def Int(s):
    """
    Parse a string to integer ("" => 0)
    """
    if s.strip():
        return int(s)
    else:
        return 0

class Year:
    def __init__(self, s):
        """
        Parse a year provided as "yyyy"
        """
        if s.strip():
            self.date = time.strptime(s, "%Y")
        else:
            self.date = time.gmtime(0.)

    def __str__(self):
        """
        Return a year as "yyyy"
        """
        return time.strftime("%Y", self.date)

Now we set up a table, defining what type each field should be:

# Expected data-type of each field:
#   data_types[section][field] = type
data_types = {
    "CU": {
        "Customer_ID":    Int,
        "Last_Name":      str,
        "First_Name":     str,
        "Street_Address": str,
        "City":           str
    },
    "VE": {
        "License_Plate#": str,
        "Make":           str,
        "Model":          str,
        "Year":           Year,
        "Owner_ID":       Int
    },
    "SE": {
        "Vehicle_ID":     str,
        "Service_Code":   Int,
        "Date_Scheduled": Date
    }
}

We parse the input file; this is by far the most complicated bit! It's a finite state machine implemented as a generator function, yielding a section at a time:

# Customized error-handling
class TransactionError         (BaseException): pass
class EntryNotInSectionError   (TransactionError): pass
class MalformedLineError       (TransactionError): pass
class SectionNotTerminatedError(TransactionError): pass
class UnknownFieldError        (TransactionError): pass
class UnknownSectionError      (TransactionError): pass

def read_transactions(fname):
    """
    Read a transaction file
    Return a series of ("section", {"key": "value"})
    """
    section, accum = None, {}
    with open(fname) as inf:
        for line_no, line in enumerate(inf, 1):
            line = line.strip()

            if not line:
                # blank line - skip it
                pass
            elif line == "//":
                # end of section - return any accumulated data
                if accum:
                    yield (section, accum)
                section, accum = None, {}
            elif line[:3] == "IN ":
                # start of section
                if accum:
                    raise SectionNotTerminatedError(
                       "Line {}: Preceding {} section was not terminated"
                       .format(line_no, section)
                    )
                else:
                    section = line[3:].strip()
                    if section not in data_types:
                        raise UnknownSectionError(
                            "Line {}: Unknown section type {}"
                            .format(line_no, section)
                        )
            else:
                # data entry: "key=value"
                if section is None:
                    raise EntryNotInSectionError(
                        "Line {}: '{}' should be in a section"
                        .format(line_no, line)
                    )
                pair = line.split("=")
                if len(pair) != 2:
                    raise MalformedLineError(
                        "Line {}: '{}' could not be parsed as a key/value pair"
                        .format(line_no, line)
                    )
                key,val = pair
                if key not in data_types[section]:
                    raise UnknownFieldError(
                        "Line {}: unrecognized field name {} in section {}"
                        .format(line_no, key, section)
                    )
                accum[key] = val.strip()

        # end of file - nothing should be left over
        if accum:
            raise SectionNotTerminatedError(
               "End of file: Preceding {} section was not terminated"
               .format(line_no, section)
            )

Now that the file is read, the rest is easier. We do type-conversion on each field, using the lookup table we defined above:

def format_field(section, key, value):
    """
    Cast a field value to the appropriate data type
    """
    return data_types[section][key](value)

def format_section(section, accum):
    """
    Cast all values in a section to the appropriate data types
    """
    return (section, {key:format_field(section, key, value) for key,value in accum.items()})

and write the results back to file:

def write_transactions(fname, transactions):
    with open(fname, "w") as outf:
        for section,accum in transactions:
            # start section
            outf.write("IN {}\n".format(section))
            # write key/value pairs in order by key
            keys = sorted(accum.keys())
            for key in keys:
                outf.write("    {}={}\n".format(key, accum[key]))
            # end section
            outf.write("//\n")

All the machinery is in place; we just have to call it:

def main():
    INPUT  = "transaction.txt"
    OUTPUT = "customer.diff"
    transactions = read_transactions(INPUT)
    cleaned_transactions = (format_section(section, accum) for section,accum in transactions)
    write_transactions(OUTPUT, cleaned_transactions)

if __name__=="__main__":
    main()

Hope that helps!

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top