Question

I am trying to use grep to extract a list of urls beginning with http and ending with jpg.

grep -o 'picturesite.com/wp-content/uploads/.......' filename

The code above is how far I've gotten. I then need to pass these file names to curl

title : "Family Vacation", jpg:"http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg", owner : "PhotoTaker"

Was it helpful?

Solution

sed -nr 's/http\S*(jpg\|gif\|other\|ext)/\
    curl $CURLOPTS & >$OUT/p' <$infile | sh -n

The above command will search $infile for any string beginning with "http" followed by any length of non-whitespace characters and ending with any of the "\|" separated file extensions contained in the parentheses.

Once it's found such a string sed will substitute it into the curl commandline on the second line to replace "&." It will then pipe the command string to sh for execution.

Remember, sed is the stream editor, not just the stream searcher, so it can very capably pre-process input for other commands to make them do what you want.

Note: sh is currently passed the 'noexecute' argument which basically works more like echo than anything else. When you've run it a few times and are satisfied you're doing the right thing you'll need to remove it for any effect.

Note 2: If there's a chance you'll want to match more than one url per line you'll need the 'g' sed option.

OTHER TIPS

You can capture url patterns by doing:

grep -o 'http.*.jpg' file

$ grep -o 'http.*.jpg' <<EOF
> title : "Family Vacation", jpg:"http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg", owner : "PhotoTaker
> EOF 
http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg

curl does not take url from standard input so your best bet would be to store the extracted url to a file and then reading the file one line at a time and passing the variable that holds the line to curl command.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top