I am trying to write a function which allows me to remove certain elements from URLs. These URLs are stored in a CSV called Backlink_Test
. I would like to iterate over each item in that list of URLs, remove the unwanted elements from the URL, and then add the modified URLs to a new list, which is then stored in a new CSV called Cleaned_URLs
.
The code is working to the extent that I can open the source file, run the loop and then store the results in the destination file. However, I am encountering quite an annoying problem: in the destination file, the URLs are stored with each character in an individual cell, rather than the whole URL in once cell.
This surprised me as I did a little test where I literally copied the contents from CSV to another (without modifying anything) and words with multiple characters were stored just fine. So my suspicion is that the for-loop creates the problem?
Any help / insight would be much appreciated! Code below, and screenshot of destination file attached.
import csv
new_strings = []
#replace unwanted elements and add cleaned strings to new list
with open("Backlink_Test.csv", "rb") as csvfile:
reader = csv.reader(csvfile)
for string in reader:
string = str(string)
string = string.replace("www.", "").replace("http://", "").replace("https://", "")
new_strings.append(string)
new_strings.sort()
print new_strings #for testing only; will be removed once function is working
cleaned_file = open("Cleaned_URLS.csv", "w")
writer = csv.writer(cleaned_file)
writer.writerows(new_strings)
cleaned_file.close()
Here is now the working code:
import csv
new_strings = []
#replace unwanted elements and add cleaned strings to new list
with open("Backlink_Test.csv", "rb") as csvfile:
reader = csv.reader(csvfile)
for string in reader:
string = str(string)
string = string.replace("www.", "").replace("http://", "").replace("https://", "")
new_strings.append(string)
new_strings.sort()
print new_strings
cleaned_file = open("Cleaned_URLS.csv", "w")
writer = csv.writer(cleaned_file)
for url in new_strings:
writer.writerow([url])
cleaned_file.close()