Hi I am trying to create a new CSV file from merge of specific fields in two CSV files based on a common column or primary key. I have tried doing the same thing in powershell and it worked but was very slow in completing the process - more than 30 minutes for merging 5000+ line files so trying this in Python. I am new so please go easy on me.

So two files are infile.csv and checkfile.csv and the columns in the output file created would be based on columns in infile.csv. The code checks the values in checkfile.csv, creates outfile.csv, copies columns from infile.csv and needs to rewrite values for two fields based on corresponding values in checkfile.com. Following are the details

infile.csv -

"StockNumber","SKU","ChannelProfileID","CostPrice"
"10m_s-vid#APTIIAMZ","2VV-10",3746,0.33
"10m_s-vid#CSE","2VV-10",3746,0.98
"1RR-01#CSE","1RR-01",3746
"1RR-01#PCAWS","1RR-01",3746,
"1m_s-vid_ext#APTIIAMZ","2VV-101",3746,0.42

checkfile.csv

ProductCode, Description, Supplier, CostPrice, RRPPrice, Stock, Manufacturer, SupplierProductCode, ManuCode, LeadTime
2VV-03,3MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.43,,930,CDL,2VV-03,2VV-03,1
2VV-05,5MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.54,,1935,CDL,2VV-05,2VV-05,1
2VV-10,10MTR BLACK SVHS M - M GOLD CABLE - B/Q 50,Cables Direct Ltd,0.86,,1991,CDL,2VV-10,2VV-10,1

The outfile.csv I am getting is -

StockNumber,SKU,ChannelProfileID,CostPrice
10m_s-vid#APTIIAMZ,2VV-10,"(' ',)", 
10m_s-vid#CSE,2VV-10,"(' ',)", 
1RR-01#CSE,1RR-01,"(' ',)", 
1RR-01#PCAWS,1RR-01,"(' ',)", 
1m_s-vid_ext#APTIIAMZ,2VV-101,"(' ',)", 

But the outfile.csv I need is -

StockNumber,SKU,ChannelProfileID,CostPrice
10m_s-vid#APTIIAMZ,2VV-10,1991,0.86  
10m_s-vid#CSE,2VV-10,1991,0.86   
1RR-01#CSE,1RR-01
1RR-01#PCAWS,1RR-01          
1m_s-vid_ext#APTIIAMZ,2VV-101

Finally the code -

import csv

with open('checkfile.csv', 'rb') as checkfile:
    checkreader = csv.DictReader(checkfile)

    product_result = dict(
        ((v['ProductCode'], v[' Stock']), (v['ProductCode'], v[' CostPrice']))  for v in checkreader
    )

with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)

        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()

        for item in reader:
            result = product_result.get(item['SKU'], " ")

            item['ChannelProfileID'] = result,
            item['CostPrice'] = result

            writer.writerow(item)
有帮助吗?

解决方案

You could make it a little simpler:

import csv

with open('checkfile.csv', 'rb') as checkfile:
    product_result = {
        record['ProductCode']: record for record in csv.DictReader(checkfile)}

with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)
        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()
        for item in reader:
            record = product_result.get(item['SKU'], None)
            if record:
                item['ChannelProfileID'] = record[' Stock']  # ???
                item['CostPrice'] = record[' CostPrice']
            else:
                item['ChannelProfileID'] = None
                item['CostPrice'] = None
            writer.writerow(item)

I wasn't sure about the line which I commented with ???.

Also, if you really want to produce broken CSV, please feel free to omit the else-clause.

I tested it with StringIO objects. It produced the result you specified, but with trailing commas, where there was no match in checkfile.

And I used Python 2.7 dict comprehension, since you tagged your question with python-2.7.

其他提示

import csv

product_result = {}

with open('checkfile.csv', 'rb') as checkfile:
    checkreader = csv.DictReader(checkfile)

    for v in checkreader:
        product_result[v['ProductCode']] = (v[' Stock'], v[' CostPrice'])

with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)
        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()

        for item in reader:
            result = product_result.get(item['SKU'])
            if result:
               item['ChannelProfileID'], item['CostPrice'] = result
            else:
               item['ChannelProfileID'] = item['CostPrice'] = None

            writer.writerow(item)
import re
import csv
import collections
import glob

# Variables

total_record = []
headerCount = 0

for file in glob.glob("*.csv"):
    print(file)

    with open(file, 'r') as f:
        reader = csv.reader(f)
        list_record = list(reader)
        if headerCount == 0:
            headerCount = 1
            total_record.extend(list_record)
        else:
            list_record.pop(0)
            total_record.extend(list_record)

with open('combine.csv', 'w') as csvFile:
    writer = csv.writer(csvFile)
    writer.writerows(total_record)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top