Question

I need to import old http log files from my Domino webserver into my piwik tracking. the problem is the format of the log if an user is logged in. Normal/good format example:

123.123.123 www.example.com - [17/Mar/2013:00:00:39 +0100] "GET /example.org HTTP/1.1" 200 3810 "" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 234 "" "example"

Bad format example - produced if user is logged in

123.123.123 www.example.com  "CN=SomeUser/OU=SomeOU/O=SomeO" - [17/Mar/2013:00:00:39 +0100] "GET /example.org HTTP/1.1" 200 3810 "" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 234 "" "example

i am looking for a one-liner bash to remove those CN information if it is included.

UPDATE:
this is my solution to get a one liner to import an domino log file into piwik. maybe someday someone finds this thing and needn't flip his table

for i in `ls -v *.log`; do date && echo " bearbeite" $i && echo " "  && awk '{sub(/ +"CN=[^"]+" +/," - ")}1' $i  grep -v http.monitor | grep -v nagios  > $i.cleanTmp && python /var/www/piwik/misc/log-analytics/import_logs.py --url=http://127.0.0.1/piwik --idsite=8 $i.cleanTmp --dry-run && rm $i.cleanTmp; done;
Était-ce utile?

La solution

If You need a pure solution You can do something like this:

Example file

cat >infile <<XXX
123.123.123 www.example.com  "CN=SomeUser/OU=SomeOU/O=SomeO" - [17/Mar/2013:00:00:39 +0100] "GET /example.org HTTP/1.1" 200 3810 "" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 234 "" "example"
XXX

while read x; do
    [[ $x =~ \ +\"CN=[^\"]+\"\ + ]] && x=${x/$BASH_REMATCH/ }
    echo $x
done <infile

Output:

123.123.123 www.example.com - [17/Mar/2013:00:00:39 +0100] "GET /example.org HTTP/1.1" 200 3810 "" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 234 "" "example"

It parses for a string starting with spaces then "CN= and then any non " characters, then a " then some spaces. If this patten found, it replaces with a space.

If the log files are big ones (>1MB) and this should be done periodically, then use instead of the pure solution.

awk '{sub(/ +"CN=[^"]+" +/," ")}1' infile

Autres conseils

So you just want to remove the "CN=SomeUser/OU=SomeOU/O=SomeO" part?

The regex to match that looks like this:

"CN=\w+\/OU=\w+\/O=\w+" 
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top