As Robin suggest, you should really do this kind of stuff within a programming language containing a decent html-parser. You can always use command-line tools do various tasks, however in this case I probably would have chosen perl.
If you really want to try to do it with command-line tools i would suggest, curl, grep, sort and sed.
I always find it easier when I have something to play with, so here's something to get you started.
I would not use this kind of code to produce something useful though, but just so you could get some ideas.
The memberpages seems so be xxx://xxx.xxx/index1.html, where the 1 is indicating the page-number. Therefore the first thing I would do is to extract the number of the last memberpage. When I have that I know which urls I want to feed curl with.
Every username is in a member of the class "username", with that information we can use grep to get the relevant data.
#!/bin/bash
number_of_pages=2
curl http://www.marksdailyapple.com/forum/memberslist/index[1-${number_of_pages}].html --silent | egrep 'class="username">.*</a>' -o | sed 's/.*>\(.*\)<\/a>/\1/' | sort
The idea here is to give curl the addresses in the format index[1-XXXX].html, that will make curl traverse all the pages. We then grep for the username class, pass it to sed to extract relevant data (the username). We then pass the produced "username-list" to sort to get the usernames sorted. I always like sorted things ;)
Big Notes though,
- You should really be doing this in another way. Again, I recommend perl for these kind of tasks.
- There is no errorchecking, validaton of usernames, etc, etc. If you should use this in some sort of production there are no shortcuts, do it right. Try to read up on how to parse webpages in different programming languages.
- By purpose I declared number_of_pages to two. You'll have to figure out a way bý yourself to get the number of the last memberpage. It was a lot of pages though, and i imagine it would take some time to iterate through them.
Hope that helps !