سؤال

I have written this code to replace urls with their titles. It does replace urls with titles as required but it prints their titles in next line.

twfile.txt contains these lines:

link1 http://t.co/HvKkwR1c
no link line

Output tw2file:

link1
Instagram
no link line

but i want output in this form:

link1 Instagram
no link line

What should i do?

My Code:

from bs4 import BeautifulSoup
import urllib

output = open('tw2file.txt','w')

with open('twfile.txt','r') as inputf:
    for line in inputf:
        try:
            list1 = line.split(' ')
            for i in range(len(list1)):

                if "http" in list1[i]:
                    ##print list1[i]
                    response = urllib.urlopen(list1[i])
                    html = response.read()
                    soup = BeautifulSoup(html)
                    list1[i] = soup.html.head.title
                    ##print list1[i]


                    list1[i] = ''.join(ch for ch in list1[i])
                else:
                    list1[i] = ''.join(ch for ch in list1[i])
            line = ' '.join(list1)
            print line
            output.write(line)
        except:
            pass


inputf.close()
output.close()
هل كانت مفيدة؟

المحلول

Try this code: (see here, here, and here)

from bs4 import BeautifulSoup
import urllib

with open('twfile.txt','r') as inputf, open('tw2file.txt','w') as output:
    for line in inputf:
        try:
            list1 = line.split(' ')
            for i in range(len(list1)):
                if "http" in list1[i]:
                    response = urllib.urlopen(list1[i])
                    html = response.read()
                    soup = BeautifulSoup(html)
                    list1[i] = soup.html.head.title
                    list1[i] = ''.join(ch for ch in list1[i]).strip() # here
                else:
                    list1[i] = ''.join(ch for ch in list1[i]).strip() # here
            line = ' '.join(list1)
            print line
            output.write('{}\n'.format(line))  # here
        except:
            pass

BTW, you are using Python 2.7.x +, two opens expressed in the same with clause. Also their closes are unnecessary.

نصائح أخرى

Regarding the content written to a file

fileobject = open("bar", 'w' )
fileobject.write("Hello, World\n") # newline is inserted by '\n'
fileobject.close()

Regarding console output

Change print line to print line,

Python writes the '\n' character at the end, unless the print statement ends with a comma.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top