質問

I would like to extract all the just the twitter handles from the following: http://twitaholic.com/top100/followers/

All the twitter handles start with an @

So something like wget twitaholic.com/top100/followers/ | grep -oh "@" to print just the the handles, but that doesn't work (only prints the @). What's wrong?

役に立ちましたか?

解決

You are using -o option of grep and only specifying one character, that is @, also you don't need the -h option.

Try this:

wget twitaholic.com/top100/followers/ | grep -o "@[^<]*"

What we are telling grep here is look for @ symbol and capture everything until you see a < symbol. This is because the line that carries the handle looks like this:

;@BarackObama<br

So you effectively need to extract text starting from @ to <.

Output:

$ wget twitaholic.com/top100/followers/ | grep -o "@[^<]*" | head -10
@katyperry
@justinbieber
@BarackObama
@ladygaga
@YouTube
@taylorswift13
@britneyspears
@rihanna
@jtimberlake
@instagram
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top