Question

I'm only learning to use REGEX, AWK and SED. I currently have a group of files that I'd like to rename - they all sit in one directory.

The naming pattern is consistent, but I would like to re-arrange the filenames, here is the format:

01._HORRIBLE_HISTORIES_S2.mp4
02._HORRIBLE_HISTORIES_S2.mp4

I'd like to rename them to HORRIBLE_HISTORIES_s01e01.mp4 - where the e01 is gleaned from the first column. I know that I want to grab "01" from the first column, stuff it in a variable then paste it after the S2 in each filename, at the same time I want to remove it from the beginning of the filename along with the "._", additionally I want to change the "S2" to "s02".

If anyone would be so kind, could you help me write something using awk/sed and explain the procedure, that I might learn from it?

Was it helpful?

Solution

for f in *.mp4; do 
  echo mv "$f" \
    "$(awk -F '[._]' '{ si = sprintf("%02s", substr($5,2)); 
                          print $3 "_" $4 "_s" si "e" $1 "." $6 }' <<<"$f")"
done 
  • Loops over all *.mp4 files.
  • Renames each to the result of the awk command, provided via command substitution ($(...)).
  • The awk command splits the input filename into tokens by . or "_" (which makes the first token available as $1, the second as $2, ...).
  • First, the number in "_S{number}" is left-padded to 2 digits with a 0 (i.e., a 0 is only prepended if the number doesn't already have 2 digits) and stored in variable si (season index); if it's OK to always prepend 0, the awk "program" can be simplified to: { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 }
  • The result, along with the remaining tokens, is then rearranged to form the desired filename.

Note the echo before mv to allow you to safely preview the resulting command - remove it to perform actual renaming.

Alternative: a pure bash solution using a regular expression:

for f in *.mp4; do 
  [[ $f =~ ^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$ ]]
  echo mv "$f" \
"${BASH_REMATCH[2]}_s0${BASH_REMATCH[3]}e${BASH_REMATCH[1]}.${BASH_REMATCH[4]}"
done 
  • Uses bash's regular-expression matching operator, =~, with capture groups (the substrings in (...)) to match against each filename and extract substrings of interest.
  • The matching results are stored in the special array variable $BASH_REMATCH, with element 0 containing the entire match, 1 containing what matches the first capture group, 2 the second, and so on.
  • The mv command's target argument then assembles the capture-group matches in the desired order; note that in this case, for simplicity, I've made the zero-padding of s{number} unconditional - a 0 is simply prepended.

As above, you need to remove echo before mv to perform actual renaming.

OTHER TIPS

A common way of renaming multiple files according to a pattern, is to use the Perl command rename. It uses Perl regular expressions and is very powerful. Use -n -v to test the pattern without touching the files:

$ rename -n -v 's/^(\d+)._(.+)_S2\.mp4/$2_s02e$1.mp4/' *.mp4
01._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e01.mp4
02._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e02.mp4

Use parentheses to capture strings into variables $1 (first capture), $2 (second capture) etc:

  • ^(\d+) capture numbers at beginning of filename (into $1)
  • ._(.+)_S2\.mp4 capture everything between ._ and _S2.mp4 (into $2)
  • $2_s02e$1.mp4 assemble your new filename with the captured data as you want it

When you are happy with the result, remove -n from the command and it will rename all the files for real.

rename is often available by default on Linux (package util-linux). There is a similar discussion here on SO with more details about finding/installing the right command.

You can do it with almost pure bash (with variable expansion):

for f in *mp4 ; do
  newfilename="${f:5:20}_s01e${f:1:2}.mp4"
  echo mv $f $newfilename
done

If the output from this command suits your needs, you may remove the echo from the cycle, or more simply (if your last command was the above) issue: !! | bash

using AWK. rename file with first and second and 4th part

ls | while read file; do newfile=`echo $file | awk -F . '{print $1 "." $2 "." $4}'`; echo $newfile;  mv $file $newfile; done;

Make the filename string into a textfile then use loop and awk to rename file.

while read oldname; do
  newname=$(awk -F'.' '{ print substr($2, 2) "_e" $1 "." $3 }' <<< ${oldname} | \
        awk -F'_' '{ print $1 "_s0" substr($2, 2) $3 }');
  mv ${oldname} ${newname};
done<input.txt

If you're willing to use gawk, the regex matching really comes in handy. I find this pipe-based solution a little nicer than worrying about looping constructs.

ls -1 | \
    gawk 'match($0, /.../, a) { printf ... | "sh" } \
    END { close("sh") }'

For ease of reading I've replaced the regex and the mv command with ellipses.

  • Line 1 lists all the file names in the current directory, one line each and pipes that to the gawk command.
  • Line 2 runs the regex match, assigning captured groups to the array variable a. The action converts this into our desired command with printf which is itself piped to sh to execute.
  • Line 3 closes the shell that was implicitly opened when we started piping things to it.

So then you just fill in your regex and command syntax (borrowing from mklement0). For example (LIVE CODE WARNING):

ls -1 | \
    gawk 'match($0, /^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$/, a) { printf "mv %s %s_s0%se%s.%s\n",a[0],a[2],a[3],a[1],a[4] | "sh" } \
    END { close("sh") }'

To preview that command (as you should) you can simply remove the | "sh" from the second line.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top