Question

I'm not sure what language fits this best (or if there's already a program for this), but here's what I basically want to do: When given a URL, I want it to go to that page, capture text between certain html tags (just one time per page), then click the "Next" button and move to the next page, (and repeat until finished). Then export the whole thing as a .pdf or something similar (a .txt could even work). It'd be useful if the program could print a horizontal rule between each post, but not required

I only need this to work once and, in fact, here's the blog I want to copy the posts from: http://www.trailjournals.com/entry.cfm?id=336394 (I basically just don't want to spend the time clicking through all of them).

I know some JavaScript, some basic regex, and some HTML along with a couple others that aren't really applicable here (and I'm a quick learner), so I'm here to learn, not just asking for someone to do something for me.

Thanks!

No correct solution

OTHER TIPS

There's probably a better way to do it, but since I'm an engineering student (and Matlab is currently the programming language I'm best at), I decided to see if I could do it through there. And it worked.

Granted, some things could probably have been done better (I don't really know regex that well, so I used a lot of "findstr" instead).

clear;
clc;
fid=fopen('journals.txt','w');
fprintf(fid,'');
fclose('all');
fid=fopen('journals.txt','a');

id=input('Enter the starting id number: ','s');
loop=1;
while loop==1
    clc;

    url=strcat('http://www.trailjournals.com/entry.cfm?id=',id)

    strContents=urlread(url);

    f=findstr('</TABLE>',strContents);
    f=f(1)+13;
    l=findstr('<p>',strContents);
    l=l(end)-5;
    if f>l(end)
        f=findstr('<blockquote>',strContents);
        f=f(1)+14;
    end

    p=strContents(1,f:l)

    if isempty(p)==1
        cprintf('red','EMPTY ENTRY!\n');
        return;
    end


%     disp(p);
%     disp('------------');
%     ques=input('Does this look good? (y/n): ','s');
%     disp('------------');
%     
%     while ques=='n'
%         firstword=input('Enter the first word: ','s');
%         lastword=input('Enter the last word: ','s');
%         f=findstr(firstword,strContents);
%         l=findstr(lastword,strContents);
%         p=strContents(1,f:l+length(lastword));
%         disp(p);
%         disp('------------');
%         ques=input('Does this look good? (y/n): ','s');
%         disp('------------');
%     end

    fprintf(fid,p);
    fprintf(fid,'\n');
    fprintf(fid,'\r\n\r\n-------------------------------------------\r\n\r\n');

    %Next URL: next:next+6
    next=findstr('">Next</a>',strContents);
    if isempty(next)==1
        break;
    end
    next=next(1);
    next=next-6;
    id=strContents(1,next:next+5);
    url=strcat('http://www.trailjournals.com/entry.cfm?id=',id);
end

fclose('all');
cprintf('Green','The process has been completed\n');
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top