Modify URL-strings in CSV file - Output file contains each character in individual cell

https://stackoverflow.com/questions/23597174

20-07-2023
|

Question

I am trying to write a function which allows me to remove certain elements from URLs. These URLs are stored in a CSV called Backlink_Test. I would like to iterate over each item in that list of URLs, remove the unwanted elements from the URL, and then add the modified URLs to a new list, which is then stored in a new CSV called Cleaned_URLs.

The code is working to the extent that I can open the source file, run the loop and then store the results in the destination file. However, I am encountering quite an annoying problem: in the destination file, the URLs are stored with each character in an individual cell, rather than the whole URL in once cell.

This surprised me as I did a little test where I literally copied the contents from CSV to another (without modifying anything) and words with multiple characters were stored just fine. So my suspicion is that the for-loop creates the problem?

Any help / insight would be much appreciated! Code below, and screenshot of destination file attached.

import csv

new_strings = []    

#replace unwanted elements and add cleaned strings to new list
with open("Backlink_Test.csv", "rb") as csvfile:
    reader = csv.reader(csvfile)
    for string in reader:
        string = str(string) 
        string = string.replace("www.", "").replace("http://", "").replace("https://", "")
        new_strings.append(string)

new_strings.sort()
print new_strings #for testing only; will be removed once function is working

cleaned_file = open("Cleaned_URLS.csv", "w")
writer = csv.writer(cleaned_file)
writer.writerows(new_strings)
cleaned_file.close()

Screenshot of destination file

Here is now the working code:

import csv

new_strings = []    

#replace unwanted elements and add cleaned strings to new list
with open("Backlink_Test.csv", "rb") as csvfile:
    reader = csv.reader(csvfile)
    for string in reader:
        string = str(string) 
        string = string.replace("www.", "").replace("http://", "").replace("https://", "")
        new_strings.append(string)

new_strings.sort()
print new_strings

cleaned_file = open("Cleaned_URLS.csv", "w")
writer = csv.writer(cleaned_file)
for url in new_strings:
    writer.writerow([url])

cleaned_file.close()

Solution

csvwriter.writerows expects an iterable of rows. A row is an iterable of cells.

You're feeding it with a list of strings. Since string is a list of letters, every letter is considered a cell in your example -- and it's exactly what gets written.

What you're doing wrong is assuming csv.reader outputs strings. It outputs rows.

Update:

for url in urls:
    writer.writerow([url])

OTHER TIPS

That's what Python does when you loop over a string instead of a list. Examine the return value from csv.reader() and adjust your code accordingly. In particular, string = str(string) is flattening your input.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow