Python: looping through urls in file

https://stackoverflow.com/questions/10056112

29-05-2021
|

Question

I'm trying to loop through a text file with a list of urls and have my python script parse each of the urls in the file.

The code only processes the LAST line in the file, when it should process every line and append the results to the file.

I have no idea what to do, i appreciate your help. Thanks!

import feedparser # pip install feedparser
from BeautifulSoup import BeautifulStoneSoup
from BeautifulSoup import BeautifulSoup
import re


urls = open("c:/a2.txt", "r") # file with rss urls

for lines in urls:

    d = feedparser.parse(lines) # feedparser is supposed to process every url in the file(urls)


statusupdate = d.entries[0].description


soup = BeautifulStoneSoup(statusupdate)

for e in d.entries:
    print(e.title)
    print(e.link)
    print(soup.find("img")["src"])
    print("\n") # 2 newlines


# writes title,link,image to a file and adds some characters
    f = open(r'c:\a.txt', 'a')
    f.writelines('"')
    f.writelines(e.title)
    f.writelines('"')
    f.writelines(",")
    f.writelines('"')
    f.writelines(e.link)
    f.writelines('"')
    f.writelines(",")
    f.writelines('"')
    f.writelines(soup.find("img")["src"])
    f.writelines('"')
    f.writelines(",")
    f.writelines("\n")
f.close()

Solution

for lines in urls:
    d = feedparser.parse(lines)

This loop simply keeps going and it keeps reassigning something to the variable d. That means, when the loop is finished, d will have the values associated with the last line.

If you wish to process every line, you need to do something with every value of d. For example, you could put every d.entries[0].description in a list and then iterate over that list to process it.

urls = open("c:/a2.txt", "r") # file with rss urls
results = []
for lines in urls:
    results.append(feedparser.parse(lines))
contents = []
for r in results:
    statusupdate = r.entries[0].description
    soup = BeautifulStoneSoup(statusupdate)
    for e in r.entries:
        contents.append((e.title, e.link, soup.find("img")["src"]))
with open(r'c:\a.txt', 'a') as f:
    for c in contents:
        f.writelines('"')
        f.writelines(c[0])
        f.writelines('"')
        f.writelines(",")
        f.writelines('"')
        f.writelines(c[1])
        f.writelines('"')
        f.writelines(",")
        f.writelines('"')
        f.writelines(c[2])
        f.writelines('"')
        f.writelines(",")
        f.writelines("\n")

OTHER TIPS

Maybe you shouldn't assign the value returned by feedparser.parse() to the same variable every time?

At least with your current indentation, it's the only thing that happens inside the loop. statusupdate = d.entries[0].description only runs once and operates on the last value of d, because it's outside the loop.

There are couple of issues in your program

You are looping the input file without processing the data. The processing only happens with the last line.
You are opening the output file, (though in append mode) in a loop.

My suggestion would be to keep the open statement for the output file outside the loop, and all your statements should be indented in a way so that its part of the loop which iterates the input file.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow