I have found somewhat similar questions however the answers that I think could work are too complex for me to morph into what I need. I could use some help figuring out how to accomplish the following in Python:
I have a CSV file which contains three columns of data. In the first column I have duplicate values (as in duplicated in other rows) of which I need to combine to a single row along with specific data from columns two and three. The result should be another CSV.
In addition, for each set of rows that have duplicate column one data there are a number of situations for data in columns two and three which need combined. In other words, for any first instance of column one value, if value in column two is not empty, grab it and place in a "final" row in column two, else if column two is empty, grab value in column three and place in "final" row in column three. The rule I need to implement is: The first and last instance of column one values need to combine whatever column two and three data exists, while maintaining column two data in column two and three in three. Also, there are never three full values in a given row of source CSV.
To better explain, here are the data situated as listed in source CSV:
These are examples of sets of rows in source CSV that need to be combined:
Example1: Here I have four rows with matching column one data, as for all examples I need the result to be a row containing column one value followed by values found in first and last instance of column one value.
wp.xyz03.def02.01195.1,wp03.xyz03-c01_lc08_m00,
wp.xyz03.def02.01195.1,wp02.xyz03,
wp.xyz03.def02.01195.1,,wp01.def02
wp.xyz03.def02.01195.1,,wp02.def02-c02_lc14_m00
So the desired result for this group would be:
wp.xyz03.def02.01195.1,wp03.xyz03-c01_lc08_m00,wp02.def02-c02_lc14_m00
Example2: Here I have three rows with matching column one data, again I need the result to be a row containing column one value followed by values found in first and last instance of column one value.
wp.atl21.lmn01.01193.2,wp03.atl21-c06_lc14_m00,
wp.atl21.lmn01.01193.2,wp02.atl21,
wp.atl21.lmn01.01193.2,,wp03.lmn01
So the desired result for this group would be:
wp.atl21.lmn01.01193.2,wp03.atl21-c06_lc14_m00,wp03.lmn01
Example3: Here I have three rows with matching column one data, again I need the result to be a row containing column one value followed by values found in first and last instance of column one value. Note this example sees the first row now contains no value in column two but rather desired value is in column three.
tp.ghi03.ghi05.02194.65,,tp05.ghi05:1
tp.ghi03.ghi05.02194.65,tp05.ghi03:2,
tp.ghi03.ghi05.02194.65,tp05.ghi03-c06_lc11_m00,
So the desired result for this group would be:
tp.ghi03.ghi05.02194.65,tp05.ghi03-c06_lc11_m00,tp05.ghi05:1
Putting it all together:
This:
wp.xyz03.def02.01195.1,wp03.xyz03-c01_lc08_m00,
wp.xyz03.def02.01195.1,wp02.xyz03,
wp.xyz03.def02.01195.1,,wp01.def02
wp.xyz03.def02.01195.1,,wp02.def02-c02_lc14_m00
wp.atl21.lmn01.01193.2,wp03.atl21-c06_lc14_m00,
wp.atl21.lmn01.01193.2,wp02.atl21,
wp.atl21.lmn01.01193.2,,wp03.lmn01
tp.ghi03.ghi05.02194.65,,tp05.ghi05:1
tp.ghi03.ghi05.02194.65,tp05.ghi03:2,
tp.ghi03.ghi05.02194.65,tp05.ghi03-c06_lc11_m00,
Needs to turn into this:
wp.xyz03.def02.01195.1,wp03.xyz03-c01_lc08_m00,wp02.def02-c02_lc14_m00
wp.atl21.lmn01.01193.2,wp03.atl21-c06_lc14_m00,wp03.lmn01
tp.ghi03.ghi05.02194.65,tp05.ghi03-c06_lc11_m00,tp05.ghi05:1
I've tried a number of things to accomplish this but I cannot achieve desired result without getting into very unfamiliar territory quickly.
This is my original attempt which resulted in cutting off some of the necessary values as once I reach three values it writes out, and never catches that there might be another:
reader = csv.reader(open('parse_lur_luraz_clean_temp.csv', 'r'), delimiter=',')
final = ['-','-','-']
parselur = ['-']
lur_a = ""
lur_z = ""
for row in reader:
if row[0] != parselur[0]:
final = ['-','-','-']
if row[1] != '': lur_a = row[1]
if row[2] != '': lur_z = row[2]
parselur[0] = row[0]
elif row[0] == parselur[0]:
if row[1] == '':
lur_a = row[1]
elif row[1] != '':
lur_a = row[1]
if row[2] == '':
lur_z = row[2]
elif row[2] != '':
lur_z = row[2]
parselur[0] = row[0]
if parselur[0] != '' and parselur[0] not in final: final[0] = parselur[0]
if lur_a != '':
if final[1] == '-' or '_lc' not in final[1]: final[1] = lur_a
lur_a = ''
if lur_z != '':
if final[2] == '-' or '_lc' not in final[2]: final[2] = lur_z
lur_z = ''
if len(final) == 3 and '-' not in final:
fd = open('final_alu_nsn_temp.csv','a')
writer = csv.writer(fd)
writer.writerow((final))
fd.close()
final = ['-','-','-']
else:
parselur[0] = row[0]