Question

Analyse Queries with Pt-query-digest

I am using pt-query-digest to analyse queries and would like to analyse a specific time range only, in order to measure the effect of adding new indices and tuning the server.

The Query log is of course turned on (MySQL 5.6) and queries are written to the log files as expected. I transfer the log files to my local machine and concatenate all compressed log files into one file like this:

zcat -f $(ls -tr mysql-slow.log*) > MySQL-All-Slowlogs.log

When I then execute pt-query-digest without any parameters, I get the output as expected:

pt-query-digest MySQL-All-Slowlogs.log > output.log

The date range in this output file is the whole range that is covered within the log files and the header states this:

# 490ms user time, 10ms system time, 35.09M rss, 103.70M vsz
# Current date: Fri Apr  7 10:58:38 2017
# Hostname: LocalMachine
# Files: MySQL-All-Slowlogs.log
# Overall: 2.33k total, 125 unique, 0.00 QPS, 0.05x concurrency __________
# Time range: 2017-03-31 03:16:49 to 2017-04-07 06:00:10
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======

Thats fine. Now I would like to investigate on the queries from the last two days, for instance

Problem

Pt-query-digest does not respect date ranges I provide, I still get the full list starting end of march (2017-03-31)

# Gives all queries
pt-query-digest --since 2d MySQL-All-Slowlogs.log > output.log 
# Also all queries
pt-query-digest --since '2017-04-05'  MySQL-All-Slowlogs.log > output.log

And specifying the until date gives no results at all

pt-query-digest --since '2017-04-05' --until '2017-04-07'  MySQL-All-Slowlogs.log > output.log 

# No events processed.

I also provided a DSN with the same results while connecting via I an SSH tunnel

  pt-query-digest --since '2017-04-05' h=127.0.0.1,P=$PORT,u=$USER,p=$PASSWORD MySQL-All-Slowlogs.log > output.log

Expected Result

The documentation says this:

      --since
type: string

Parse only queries newer than this value (parse queries since this date).

This option allows you to ignore queries older than a certain value and parse only those queries which are more recent than the value. The value can be several types:

* Simple time value N with optional suffix: N[shmd], where
  s=seconds, h=hours, m=minutes, d=days (default s if no suffix
  given); this is like saying "since N[shmd] ago"
* Full date with optional hours:minutes:seconds:
  YYYY-MM-DD [HH:MM:SS]
* Short, MySQL-style date:
  YYMMDD [HH:MM:SS]
* Any time expression evaluated by MySQL:
  CURRENT_DATE - INTERVAL 7 DAY
If you give a MySQL time expression, and you have not also specified a DSN for --explain, --processlist, or --review, then you must specify a DSN on the command line so that pt-query-digest can connect to MySQL to evaluate the expression.

The MySQL time expression is wrapped inside a query like “SELECT UNIX_TIMESTAMP(<expression>)”, so be sure that the expression is valid inside this query. For example, do not use UNIX_TIMESTAMP() because UNIX_TIMESTAMP(UNIX_TIMESTAMP()) returns 0.

Events are assumed to be in chronological: older events at the beginning of the log and newer events at the end of the log. --since is strict: it ignores all queries until one is found that is new enough. Therefore, if the query events are not consistently timestamped, some may be ignored which are actually new enough.

What am I missing here?

Thanks!

Was it helpful?

Solution

Preserve the correct sequence

I finally figured out the problem. The way how I extract all the gzipped log files and add them to a consolidated log file did not preserve the correct sequence and therefore pt-query-digest did not find the queries in the order it was expecting them.

The manual even says:

Events are assumed to be in chronological: older events at the beginning of the log and newer events at the end of the log

Solution

The solution is to process the logrotate files in reverse order in order to end up with a logfile that has the oldest logs first and the newest ones at the end of the file.

ls mysql-slow.log* |sort -nr -t . -k 3,3 | xargs zcat -f > MySQL-All-Slowlogs.log

The command above does the following things:

  1. ls: list all files starting with the file name mysql-slow.log.*
  2. sort -nr -t: sort based on a numeric literal (otherwise the sequence number of the file will be interpreted as string alphabetically, sorting 111 before 2. -r reverses the order. -t uses "." as field seperator. k specifies the key to sort on.
  3. xargs: For each line cat the zipped or unzipped slow log file to the culmulative file.

Then you can use the following comand to obtain

 pt-query-digest --since '2017-04-06 12:00:00' --until '2017-04-09 12:00:00'  MySQL-All-Slowlogs.log  > output.txt
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top