Question

I am trying to parse the yahoo answers feed - http://answers.yahoo.com/rss/allq The issue is that the titles have

[ Category ] : Open Question :

in every title that I do not want... I want to write a regexp to remove this...

anything that we can make to remove all the letters in the starting [ and the first : should do it.

there is a space after the : also, we need to remove that too.

Thanks for this in advance, I will also try to find a solution myself.

Was it helpful?

Solution

Have you considered using Yahoo's YQL service to parse this feed (or other web pages)?

They already have sample queries for you to get at Yahoo Answers data:

(Just an FYI in case you weren't aware of this convenient service. I use it instead of screen scraping with RegEx's.)

OTHER TIPS

the following regex should do the job:

^\[.*?: 

Usage sample in c#:

string resultString = Regex.Replace(subjectString, @"^\[.*?: ", "");

What it does is start with an [ bracket and take any characters until it matches a : and take the follwing space.

Hope this helps, Tom.

Thanks @ cmptrgeekken for pointing the non greedy thing out!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top