Delete text with GREP in Textwrangler

Question 1

I use a multi-step method to process these kind of files.

First you want to have only one HTML tag per line, GREP works on each line so you want to minimise the need for complicated patterns. I usually replace all: > with >\n
Then you want to develop a pattern for each occurrence of the item you want. In this case 'title=".?"'. Put that in between parentheses (). Then you want add some filling to that statement to find and replace all occurrences of this pattern: .?(title=".?").
Replace everything that matches .?(title=".?").* with \1
Finally, make smart use of the Textwrangler function process lines containing, to filter any remaining rubbish.

Notes

the \1 refers to the first occurrence of a match between () you can also reorder stuff using multiple parentheses and use something like (.?), (.) with \2, \1 to shuffle columns.

Learn how to do lazy regular expressions. The use of ? in these patterns is very powerfull. Basically ? will have the pattern looking for the next occurrence of the next part of the pattern not the latest part that the next part of your pattern occurs.

Question 2

I've figured this problem out, it was quite simple. Instead of retrieving the content in the title attribute, I'd retrieve the page name.

To ensure I only struck the correct line where the content was, I'd use the following string for searching the code.

(.)/wiki/(.)" Returning \2

After that, I simply remove any cases where there is HTML code:

<(.*) Returning ''

Finally, I'll remove the remaining content after the page name:

"(.*) Returning ''

A bit of cleaning up the spacing and I have a list for all game names.