Use a collections.defaultdict()
object:
from collections import defaultdict
urls = defaultdict(int)
for url in url_source:
print '{}: {}'.format(url, urls[url])
# process
urls[url] += 1
Question
I'm running a Python code that reads a list of URLs and opens each one of them individually with urlopen. Some URLs are repeated in the list. An example of the list would be something like:
I would like to know if there's a way to implement a counter that would tell me how many times a unique URL was opened previously by the code. I want to get a counter that would return me what is showed in bold for each of the URLs in the list.
Thanks!
Solution
Use a collections.defaultdict()
object:
from collections import defaultdict
urls = defaultdict(int)
for url in url_source:
print '{}: {}'.format(url, urls[url])
# process
urls[url] += 1
OTHER TIPS
Using ioStringIO
for simplicity:
import io
fin = io.StringIO("""www.example.com/page1
www.example.com/page1
www.example.com/page2
www.example.com/page2
www.example.com/page2
www.example.com/page3
www.example.com/page4
www.example.com/page4""")
We use collections.Counter
from collections import Counter
data = [line.strip() for line in f]
counts = Counter(data)
new_data = []
for line in data[::-1]:
counts[line] -= 1
new_data.append((line, counts[line]))
for line in new_data[::-1]:
fout.write('{} {:d}\n'.format(*line))
This is the result:
fout.seek(0)
print(fout.read())
www.example.com/page1 0
www.example.com/page1 1
www.example.com/page2 0
www.example.com/page2 1
www.example.com/page2 2
www.example.com/page3 0
www.example.com/page4 0
www.example.com/page4 1
EDIT
Shorter version that works for large files because it needs only one line at the time:
from collections import defaultdict
counts = defaultdict(int)
for raw_line in fin:
line = raw_line.strip()
fout.write('{} {:d}\n'.format(line, counts[line]))
counts[line] += 1
i think you can't do it that way. Delete duplicates in list.