Question

This is a command to concatenate multiple same pattern files into one big files. In the folder I have 77k files, and I get Argument list too long

Example file names are per minute: cartreset-2014-05-08-01-12.log

rm -f /tmp/temp.files
ls -1 /var/log/processing/*.log | \
    xargs -n1 basename > /tmp/temp.files
cat /tmp/temp.files | \
    sed -r "s~(.*)-[0-9]{4}(-[0-9]{2})+\.log~cat /var/log/processing/\1* >> /var/log/processing/\1$(date  +"-%Y-%m-%d-%H-%M").log~" | \
    uniq | \
    sh
cd /var/log/processing
xargs rm -rf < /tmp/temp.files
rm -f /tmp/temp.files

sh: /bin/ls: Argument list too long

what options do I have?

Was it helpful?

Solution

Your full code is:

rm -f /tmp/temp.files
ls -1 /var/log/processing/*.log | xargs -n1 basename > /tmp/temp.files
cat /tmp/temp.files | sed -r "s~(.*)-[0-9]{4}(-[0-9]{2})+\.log~cat /var/log/processing/\1* >> /var/log/processing/\1$(date  +"-%Y-%m-%d-%H-%M").log~" | uniq | sh
cd /var/log/processing
xargs rm -rf < /tmp/temp.files
rm -f /tmp/temp.files

But the problem lies on the ls -1 /var/log/processing/*.log part, so I am skipping the rest.

The expansion done by /var/log/processing/*.log gives so many results that ls itself cannot handle all of them and hence prints the "Argument list too long" message.

You can use a find statement like this:

find /var/log/processing -name "*.log" -exec basename {} \; > /tmp/temp.files

See I am not using ls parsing (read interesting Why you shouldn't parse the output of ls).

OTHER TIPS

Parsing output of ls is always wrong.

Anyway, the problem is that the pattern expands to so many files that maximum permissible length of command-line is exceeded. There are two ways to avoid it:

  1. Using a for loop. Expansion of pattern in a for loop is processed internally by the shell and therefore not limited by the command-line length:

    for file in /var/log/processing/*.log
    do
        basename "$file"
    done > /tmp/temp.files
    

    (yes, output of the loop as a whole can be redirected after the done keyword) When you are running xargs with -n1, it is even slightly more efficient to use a loop.

  2. Using find:

    find /var/log/processing \
        -maxdepth 1 -name '*.log' \
        -printf '%f\n' > /tmp/temp.files
    

    the '%f' prints just the filename, so it saves you calling the basename. The -maxdepth is there so the command is actually equivalent to the pattern; if there are no subdirectories, you can skip it. On the other hand if there are subdirectories, you may want to use %P, which gives you relative paths from the first argument (/var/log/processing).

I would probably try to get rid of the temporary file altogether and do the work in one or more loops.

Removing the temp file, you can use bash arrays:

cd /var/log/processing

logs=( *.log )
prefixes=( "${logs[@]/-[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[0-9][0-9].log/}" )

date=$(date "+%Y-%m-%d-%H-%M")

printf "%s\n" "${prefixes[@]}" | 
sort -u |
while read -r prefix; do
    cat "$prefix"* >> "$prefix"-"$date".log
done 

printf "%s\n" "${logs[@]}" | xargs rm

cd -
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top