When importing with the piwik
(resp. matomo
) log imported you can issue the --debug
option twice, which will spew the invalid line.
Here is an example of a script that shows it (but this is my preferred log format)
python /opt/piwik.git/misc/log-analytics/import_logs.py \
--debug --debug \
--url=$piwik_site \
--log-format-regex='(?P<host>\S+) (?P<ip>\S+) \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\d+) (?P<length>\d+) "(?P<referrer>.*?)"$'
--add-sites-new-hosts \
--enable-http-errors \
--enable-http-redirects \
--enable-static \
--strip-query-string \
--show-progress \
--show-progress-delay 2 \
--recorders $cpu \
"$1"
$1 is the name of the file I'm importing from (my Apache, Nginx and Lighttpd boxen all use this same format).
The output will have a few lines that look like this:
2013-09-03 19:42:34,145: [DEBUG] Invalid line detected (line did not match): edoceo.com 10.0.0.1 - [03/Sep/2013:16:41:03 -0700] "GET / HTTP/1.1" 301 - "-" "Some Bad Robot v0.1"
And those show you what's invalid, and provide the clues on how to tune/tweak your regexp.
I've got a few details of my setup at http://edoceo.com/howto/piwik#import