문제

I am trying to parse the yahoo answers feed - http://answers.yahoo.com/rss/allq The issue is that the titles have

[ Category ] : Open Question :

in every title that I do not want... I want to write a regexp to remove this...

anything that we can make to remove all the letters in the starting [ and the first : should do it.

there is a space after the : also, we need to remove that too.

Thanks for this in advance, I will also try to find a solution myself.

도움이 되었습니까?

해결책

Have you considered using Yahoo's YQL service to parse this feed (or other web pages)?

They already have sample queries for you to get at Yahoo Answers data:

(Just an FYI in case you weren't aware of this convenient service. I use it instead of screen scraping with RegEx's.)

다른 팁

the following regex should do the job:

^\[.*?: 

Usage sample in c#:

string resultString = Regex.Replace(subjectString, @"^\[.*?: ", "");

What it does is start with an [ bracket and take any characters until it matches a : and take the follwing space.

Hope this helps, Tom.

Thanks @ cmptrgeekken for pointing the non greedy thing out!

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top