How do I substring piped output from grep in Linux?

https://stackoverflow.com/questions/9049571

03-12-2019
|

Question

I'm trying to write a script to login to a Drupal website automagically to put it into maintenance mode. Here's what I have so far, and the grep gives me back the line I want.

curl http://www.drupalwebsite.org/?q=user | grep '<input type="hidden" name="form_build_id" id="form-[a-zA-Z0-9]*" value="form-[a-zA-Z0-9]*"  />'

Now I'm kind of a Linux newbie, and I'm using Cygwin with BASH. How would I then pipe the output and use a command to get the value of the id attribute from the output that grep generated? I'll be using this substring later to do another curl request to actually submit the login.

I was looking at using expr but I don't really understand how I would tell expr "oh hey this stdin data I want to you manipulate in this way". It seems like the only way I could do this would be by saving off the grep output in a variable and then feeding the variable to expr.

Solution

Use sed to trim the results you get from your grep, ie.

edit : added myID variable, use any name you like.

myID=$( 
  curl http://www.drupalwebsite.org/?q=user \
  | grep '<input type="hidden" name="form_build_id" id="form-[a-zA-Z0-9]*" value="form-[a-zA-Z0-9]*"  />' \
  | sed 's/^.* id="//;s/" value=.*$//'
)


#use ${myID} later in script
printf "myID=${myID}\n"

The first part removes the 'front' part of the string, everything up to the id=", while the 2nd part removes every " value= .....

Note that you can chain together multiple sub-replace actions in sed by separating them with the ';'.

edit2 Also, once you're using sed, there's no reason to use grep, try this:

myID=$( 
  curl http://www.drupalwebsite.org/?q=user \
  | sed -n '\@<input type="hidden" name="form_build_id" id="form-[a-zA-Z0-9]*" value="form-[a-zA-Z0-9]*"  />@{
       s\@^.* id="@@
       s\@" value=.*$@@p
   }'
)

( It's a good habit to get into to removing unnecessary processes. It may not matter in this case, but if you get to where you are writing code that will be executed 1000's of time in a hour, then having an extra grep when you don't need it is creating 1000's of extra processes that don't need to be created.)

You may have to escape the '< and >' chars like '\< >' or , worst case '[<] [>]'.

I'm using the '@' as the reg-ex replacement separator now to avoid having to escape any '/' chars in the srch-target string. And I continue using it in the whole example, just to be consistent. For some seds you have tell them that you're using a non-standard separator, hence the leading \@ at the front of each block of sed code.

The -n means "don't default print each line of input", and because of that, we have to add the 'p' at the end, which means print the current buffer.

Finally, I'm not sure about your regular expression, particularly the -[a-zA-Z0-9]*, this means zero or more of the previous character (or character class in this case). Typically people wanting at least one alpha-numeric, will use -[a-zA-Z0-9][a-zA-Z0-9]*, yes OR [[:alnum:]][[:alnum:]]*, but I don't know your data well enough to say for sure.

I hope this helps.

OTHER TIPS

You could use grep again with the -o option. Possibly two consecutive greps to also filter out the surrounding id="..." part.

   -o, --only-matching
          Print only the matched (non-empty) parts  of  a  matching  line,
          with each such part on a separate output line.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow