Question

The only reliable method that I a have found for using a script to download text from wikipedia is with cURL. So far the only way I have for doing that is to call os.system(). Even though the output appears properly in the python shell I can't seem to the function it to return anything other than the exit code(0). Alternately somebody could show be how to properly use urllib.

Was it helpful?

Solution

From Dive into Python:

import urllib
sock = urllib.urlopen("http://en.wikipedia.org/wiki/Python_(programming_language)")
htmlsource = sock.read()
sock.close()
print htmlsource

That will print out the source code for the Python Wikipedia article. I suggest you take a look at Dive into Python for more details.

Example using urllib2 from the Python Library Reference:

import urllib2
f = urllib2.urlopen('http://www.python.org/')
print f.read(100)

Edit: Also you might want to take a look at wget.
Edit2: Added urllib2 example based on S.Lott's advice

OTHER TIPS

Answering the question, Python has a subprocess module which allows you to interact with spawned processes.http://docs.python.org/library/subprocess.html#subprocess.Popen

It allows you to read the stdout for the invoked process, and even send items to the stdin.

however as you said urllib is a much better option. if you search stackoverflow i am sure you will find at least 10 other related questions...

As an alternetive to urllib, you could use the libCurl Python bindings.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top