python script for downloading all Ctrl+Alt+Del webcomics? [closed]

https://stackoverflow.com/questions/1183111

19-09-2019
|

Question

Does anyone have one? Ive tried Comicdownloader but that only lets me download the newest comic and I dont understand Python well enough to figure out how to change it to download all the comics

Solution

Well, it's not python and the method is not very elegant either but it'll work.

After a brief inspection of the ctrl+alt+del archive site I found that all the comics are stored in /comics/ as jpg files.

With that assumption, running the following command in a shell on a *nix machine will download all comics. The filenames are obviously dates in the format YYYYMMDD.

$ for filename in $(seq 20020101 20090726); do wget http://www.ctrlaltdel-online.com/comics/"$filename".jpg; done

Oh, and it may take a while : ).

OTHER TIPS

Well, there are several approaches. You could look at the urls of the comic images themselves, to see if they follow a pattern, write a small script to generate urls that match the pattern, and simply use wget or something similar to download them. To learn the requisite python, I recommend the python tutorial. Look specifically at ways of combining strings, string formatting, string operations, so you can build your URLs from component strings, numbers, and so forth. This sort of project may be a good excuse to learn python, which will probably help you with other things in the future. (and is FUN!)

Alternatively, I imagine that people have done this before - there almost certainly are apps out there that do what you want, given the urls of the images, or just the pattern of urls. Have a look around.

You could write to the author of the comic and ask if they are available in bulk somehow, possibly for purchase, maybe as a book. I imagine ctrl-alt-del is popular enough to simply absorb the added load of you downloading the entire archive, but smaller websites might not appreciate it - though I don't think they can really stop you or anything.

As for your question, this site should help you with your programming questions, help you with problems you're having, nudge you along the way - but mostly people don't appreciate it if you ask them to just write your code for you. Learn some python, try it yourself, if you run into any problems, ask then.

I use urllib with a modified user-agent(since Timmy blocked the original) and BeautifulSoup. Anyway, heres the date generator:

t1 = datetime.date(2002, 10, 22)
t2 = datetime.date.today()
while t2 > t1:
    t1 = t1 + datetime.date.resolution
    toon = t1.strftime("%Y%m%d")
    url1 = str(toon)
    main(url1)

and my main(): looks like this (took the "Check if file exists" part out)

def main(url):
    con = httplib.HTTPConnection("www.cad-comic.com")
    con.request("GET", "/cad/" + url)
    r1 = con.getresponse()
    soup = BeautifulSoup(r1)
    img = soup.findAll("img")[1]
    nImg = str(img)
    imgSplit = re.split('"', nImg)
    imgUrl = imgSplit[1]
    if imgUrl.startswith('http://www.cad-comic.com/comics/'):
        MeOpans().retrieve(imgUrl, url + ".jpg")
        print "Downloaded " + url

"MeOpans()" looks like this:

class MeOpans(urllib.URLopener):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow