Question

I'm running a Python code that reads a list of URLs and opens each one of them individually with urlopen. Some URLs are repeated in the list. An example of the list would be something like:

  • www.example.com/page1
  • www.example.com/page1
  • www.example.com/page2
  • www.example.com/page2
  • www.example.com/page2
  • www.example.com/page3
  • www.example.com/page4
  • www.example.com/page4
  • [...]

I would like to know if there's a way to implement a counter that would tell me how many times a unique URL was opened previously by the code. I want to get a counter that would return me what is showed in bold for each of the URLs in the list.

  • www.example.com/page1 : 0
  • www.example.com/page1 : 1
  • www.example.com/page2 : 0
  • www.example.com/page2 : 1
  • www.example.com/page2 : 2
  • www.example.com/page3 : 0
  • www.example.com/page4 : 0
  • www.example.com/page4 : 1

Thanks!

Was it helpful?

Solution

Use a collections.defaultdict() object:

from collections import defaultdict

urls = defaultdict(int)

for url in url_source:
    print '{}: {}'.format(url, urls[url])

    # process

    urls[url] += 1

OTHER TIPS

Using ioStringIO for simplicity:

import io
fin = io.StringIO("""www.example.com/page1
www.example.com/page1
www.example.com/page2
www.example.com/page2
www.example.com/page2
www.example.com/page3
www.example.com/page4
www.example.com/page4""")

We use collections.Counter

from collections import Counter
data = [line.strip() for line in f]
counts = Counter(data)
new_data = []
for line in data[::-1]:
    counts[line] -= 1
    new_data.append((line, counts[line]))
for line in new_data[::-1]:
    fout.write('{} {:d}\n'.format(*line))

This is the result:

fout.seek(0)
print(fout.read())

www.example.com/page1 0
www.example.com/page1 1
www.example.com/page2 0
www.example.com/page2 1
www.example.com/page2 2
www.example.com/page3 0
www.example.com/page4 0
www.example.com/page4 1

EDIT

Shorter version that works for large files because it needs only one line at the time:

from collections import defaultdict
counts = defaultdict(int)

for raw_line in fin:
    line = raw_line.strip() 
    fout.write('{} {:d}\n'.format(line, counts[line]))
    counts[line] += 1

i think you can't do it that way. Delete duplicates in list.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top