Pergunta

I am trying to write a bash script to pick out particular data files from a folder so I can make plots from them. I am trying to write a bash script that uses grep to do this. At this point I am piping the results from ls -1 into grep using a regular expression to generate a list of file names. The following are the file names I am sorting through, this pattern continues. Bolded ones are the names I would like to match with the regex: ifrontThermal.\d+

ifrontThermal64.00490
ifrontThermal64.00490.HeI
ifrontThermal64.00490.HeII
ifrontThermal64.00490.HI
ifrontThermal64.00490.radFlux
ifrontThermal64.00490.radTens
ifrontThermal64.00490.u
ifrontThermal64.00490.uNoncool
ifrontThermal64.00500
ifrontThermal64.00500.HeI
ifrontThermal64.00500.HeII
ifrontThermal64.00500.HI
ifrontThermal64.00500.radFlux
ifrontThermal64.00500.radTens
ifrontThermal64.00500.u
ifrontThermal64.00500.uNoncool

These commands return nothing

$ (ls -1)|(grep ifrontThermal64.\d+)
$ (ls -1)|(grep ifrontThermal64\.\d+)
$ (ls -1)|(grep ifrontThermal64.[0-9]+)

These command returns what I expect but not what I want.

 $ (ls -1)|(grep ifrontThermal64.)
 $ (ls -1)|(grep ifrontThermal64.[0-9])

When I test the 3 ones that don't work at http://regexpal.com/ these seem to be fine.

Thanks in advance for any help!

Foi útil?

Solução

If the number suffixes of interest are of fixed length and all you care about is filtering out the files that have an additional extension, the following glob (NOT a regex, but a wildcard expression) will do:

ifrontThermal64.[0-9][0-9][0-9][0-9][0-9]

E.g.:

printf "%s\n" ifrontThermal64.[0-9][0-9][0-9][0-9][0-9]

Note that globs always match against the entire filename, whereas grep performs substring matching by default.

As for why your approach didn't work:

  • Your regex isn't quoted, so the shell's parsing 'eats' the \, thereby altering it.
  • Also, whether grep recognizes \d is platform-dependent; to rule out such issues, use [0-9] instead.
  • If you use grep without -E, it uses so-called basic regular expressions, which require that the quantifier + be escaped as \+; while you could do that, the generally better option is to instead use grep -E or to simply invoke grep as egrep in order to use extended regexes, which mostly behave like regular expressions in other languages.
  • ., when intended to be a literal, should be \-escaped in a regex (which you did in one of your attempts).
  • The -1 option of ls is implied when ls is not outputting to a terminal.
  • grep uses substring matching by default, so use -x to match against the entire input line (alternatively, use the anchors ^ and $) so as to rule out filenames that match the expression but have an additional extension.

Thus, a corrected version of the original command is:

 ls | egrep -x 'ifrontThermal64\.[0-9]+'

As an aside: there's no point in enclosing your commands in parentheses; you'll needlessly create subshells (unless they're optimized away, but the point is that they're not needed).

Outras dicas

You can use the regex with egrep, which uses extended regex

ifrontThermal64.\d+$

The $ is an anchor identifying end of a string.

For your case, no need to follow the step ls->egrep->plot. You can use

find . -name 'ifrontThermal64.\d+$' -exec plot {} \;
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top