You can do without regexes here.
with open("file_path","r") as f:
lines = f.read()
lines = lines.replace("http://","")
lines = lines.replace("www.", "") # May replace some false positives ('www.com')
urls = [url.split('/')[0] for url in lines.split()]
print '\n'.join(urls)
Example file input:
http://foo.com/index.html
http://www.foobar.com
www.bar.com/?q=res
www.foobar.com
Output:
foo.com
foobar.com
bar.com
foobar.com
Edit:
There could be a tricky url like foobarwww.com, and the above approach would strip the www. We will have to then revert back to using regexes.
Replace the line lines = lines.replace("www.", "")
with lines = re.sub(r'(www.)(?!com)',r'',lines)
. Of course, every possible TLD should be used for the not-match pattern.