pt-query-digest for MongoDB profiler logs

hingo's picture

One of my favorite MySQL tools ever is pt-query-digest. It's the tool you use to generate a report of your slow query log (or some other supported sources), and is similar to for example the Query Analyzer in MySQL Enterprise Monitor. Especially in the kind of job I am, where I often just land out of nowhere on a server at the customer site, and need to quickly get an overview of what is going on, this tool is always the first one I run. I will show some examples below.

When I started engaging with MongoDB projects at Nokia, I needed some project of my own to tinker with. So I decided to add support for MongoDB "slow query log". Back then, my first task was to figure out if MongoDB even has such logging for slow queries. Lucky me, it has: it's known as the MongoDB Profiler.

Nokia allowed me to contribute changes back upstream to Percona. In the past weeks I finally had some time to upgrade it to understand the changed log format of MongoDB 2.4. (Earlier versions are no longer supported.)

And therefore, I now proudly present to you, pt-query-digest with MongoDB support: https://gist.github.com/henrikingo/6065357/raw/pt-query-digest-mongo (or branch the whole bzr repo: https://code.launchpad.net/~hingo/percona-toolkit/pqd-mongodb-24 )

Btw, the other reason I like this and other Percona Toolkit tools: They ship with all their dependencies embedded into the same file. This means, when I'm on a customer server without root access, I can just do

wget https://gist.github.com/henrikingo/6065357/raw/pt-query-digest-mongo

...and I'm good to go! Command line tools for the win :-)

Example usage

My pt-query-digest supports reading slow queries from the mongodb log file. To get some queries there, you need to set the threshold to lower than 100 ms. In fact, I like to collect as much as possible, so I set it to -1 ms. (Updated 2013-10-18...)For some reason you cannot set the threshold to 0, but -1 works and has the same effect.

db.setProfilingLevel(1,-1)

Then run pt-query digest with the --type=mongolog option. Using --limit is optional and means the same is in normal pt-query-digest, see --help for other options.

pt-query-digest --type=mongolog --limit=20 < ../mongodb.log

At the top of the output you first get some summary information of all the queries in the log.

# 19.2s user time, 50ms system time, 25.96M rss, 95.67M vsz
# Current date: Tue Jul 23 14:21:53 2013
# Hostname: hingo-sputnik
# Files: STDIN
# Overall: 84.96k total, 9 unique, 0 QPS, 0x concurrency _________________
# Time range: Sat Jul 20 08:08:57 to Jul
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time          2404s     2ms    83ms    28ms    42ms    13ms    33ms
# Rows sent        665.96k       0     100    8.03   97.36   26.46       0
# Rows examine     733.62k       0     200    8.84   97.36   30.16       0
# Query size         3.33M       6     173   41.09   76.28   30.35   31.70
# Key updates         1020       0       1    0.01       0    0.11       0
# Result bytes      97.74M       0  12.91k   1.18k  12.50k   3.47k   92.72
# Rlock time           29s       0    18ms   341us     3ms   972us       0
# Scan & order       6.11k       0       1    0.07    0.99    0.26       0
# Wlock time         392ms       0     7ms     4us       0    63us       0
# Yields                 0       0       0       0       0       0       0

From the above we can already see that we are looking at a fairly well behaved system. For example Execution time for all queries is well below 100 ms (so wouldn't even show up by default in the log). Largest result sizes are just below 13k, there was really little time spent in write lock, and so forth.

Next we get a summary of the breakdown per query. Now we get to the parts where pt-query-digest really shines. The typical use of a slow query log would be to just look at whichever queries happen to be slowest, and then try to fix them. Of course, if you have a query that takes tens of seconds to run, there may or may not be something wrong. But experience shows, your energy is often better spent somewhere else completely, and pt-query-digest will tell you where to focus.

Pt-query-digest instead will de-parameterize all of the queries, so that it can recognize which queries are actually the same, just that the parameters were different. It then sums up and sorts all these queries (by default) by execution time. A typical application will only have between 10 to 100 different queries, so this is the perfect tool to give you an overview of what the app is doing to your database:

# Profile
# Rank Query ID           Response time   Calls R/Call V/M   Item
# ==== ================== =============== ===== ====== ===== =============
#    1 0xA9297704DC845CFE 2354.4260 97.9% 71264 0.0330  0.00 COMMAND sbtest.$cmd
#    2 0x056DECD9CABFBFFA   18.1960  0.8%  4171 0.0044  0.00 COMMAND sbtest.$cmd
#    3 0xF6661101714FC832   17.3660  0.7%  6257 0.0028  0.00 QUERY sbtest.sbtest1
#    4 0x102C8F18966084AF    7.8110  0.3%  1383 0.0056  0.00 UPDATE sbtest.sbtest1
#    5 0x411E607C59AC3D1F    2.1300  0.1%   440 0.0048  0.01 INSERT sbtest.sbtest1
#    6 0xE097817C3B6F6EE3    1.4490  0.1%   565 0.0026  0.00 QUERY sbtest.sbtest1
#    7 0xFC580F8AC66DC809    1.1850  0.0%   442 0.0027  0.00 COMMAND sbtest.$cmd
#    8 0x094C2DDF61C1708A    1.1330  0.0%   253 0.0045  0.01 REMOVE sbtest.sbtest1
#    9 0x9696A6AE83C26BF7    0.5080  0.0%   188 0.0027  0.00 QUERY sbtest.sbtest1

Now, if this was a MySQL based application, I would be very worried with the above results. There is one query that is taking up 98% of the total execution time of everything that happens in the database! It turns out, the benchmark I used to generate some "noise" for this example was set to use WriteConcern j=1, so the top line is simply wait time resulting from getLastError calls:

# Query 1: 0 QPS, 0x concurrency, ID 0xA9297704DC845CFE at byte 8790360 __
# Scores: V/M = 0.00
# Time range: Sat Jul 20 08:08:57 to Jul
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         83   71264
# Exec time     97   2354s     2ms    83ms    33ms    44ms     8ms    34ms
# Rows sent      0       0       0       0       0       0       0       0
# Rows examine   0       0       0       0       0       0       0       0
# Query size    65   2.17M      32      32      32      32       0      32
# Key updates    0       0       0       0       0       0       0       0
# Result bytes   6   5.98M      79     169   88.02   92.72    8.26   92.72
# Rlock time     0       0       0       0       0       0       0       0
# Scan & order   0       0       0       0       0       0       0       0
# Wlock time     0       0       0       0       0       0       0       0
# Yields         0       0       0       0       0       0       0       0
# String:
# Databases    sbtest.$cmd
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  #
#  10ms  ################################################################
# 100ms
#    1s
#  10s+
command sbtest.$cmd { getlasterror: 1, fsync: true }

If your application is doing this, it might not be what you want. As you can see, your clients end up waiting a lot compared to what it actually takes to execute the real queries. But for the MongoDB server this is actually harmless, it's not executing or blocking anything, it's just you that is waiting all the time.

Ok, let's fix that and run pt-query-digest again. We get a different distribution that's more interesting to look at:

# Profile
# Rank Query ID           Response time Calls R/Call V/M   Item
# ==== ================== ============= ===== ====== ===== ===============
#    1 0x056DECD9CABFBFFA 18.1960 36.6%  4171 0.0044  0.00 COMMAND sbtest.$cmd
#    2 0xF6661101714FC832 17.3660 34.9%  6257 0.0028  0.00 QUERY sbtest.sbtest1
#    3 0x102C8F18966084AF  7.8110 15.7%  1383 0.0056  0.00 UPDATE sbtest.sbtest1
#    4 0x411E607C59AC3D1F  2.1300  4.3%   440 0.0048  0.01 INSERT sbtest.sbtest1
#    5 0xE097817C3B6F6EE3  1.4490  2.9%   565 0.0026  0.00 QUERY sbtest.sbtest1
#    6 0xFC580F8AC66DC809  1.1850  2.4%   442 0.0027  0.00 COMMAND sbtest.$cmd
#    7 0x094C2DDF61C1708A  1.1330  2.3%   253 0.0045  0.01 REMOVE sbtest.sbtest1
#    8 0x9696A6AE83C26BF7  0.5080  1.0%   188 0.0027  0.00 QUERY sbtest.sbtest1

So it seems we have 2 queries that together consume over 70% of the total execution time spent in the database and the top 3 queries use up 95% of your MongoDB capacity! This means, it makes sense to focus solely on optimizing these top 2-3 queries and ignore everything else.

This is what they look like in more detail:

# Query 1: 0 QPS, 0x concurrency, ID 0x056DECD9CABFBFFA at byte 389327 ___
# Scores: V/M = 0.00
# Time range: Sat Jul 20 08:11:09 to Jul
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         30    4171
# Exec time     36     18s     2ms    46ms     4ms    11ms     4ms     3ms
# Rows sent      0       0       0       0       0       0       0       0
# Rows examine   0       0       0       0       0       0       0       0
# Query size    58 695.74k     167     173  170.81  166.51       0  166.51
# Key updates    0       0       0       0       0       0       0       0
# Result bytes   0 309.57k      76      76      76      76       0      76
# Rlock time    21      6s   357us     7ms     1ms     3ms   930us     1ms
# Scan & order   0       0       0       0       0       0       0       0
# Wlock time     0       0       0       0       0       0       0       0
# Yields         0       0       0       0       0       0       0       0
# String:
# Databases    sbtest.$cmd
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  ################################################################
#  10ms  #####
# 100ms
#    1s
#  10s+
command sbtest.$cmd { aggregate: "sbtest1", pipeline: [ 
{ $match: { _id: { $gte: 8184, $lte: 8283 } } }, 
{ $project: { k: 1, _id: 0 } },
{ $group: { _id: null, average: { $sum: "$k" } } } 
] }

# Query 2: 0 QPS, 0x concurrency, ID 0xF6661101714FC832 at byte 2281363 __
# Scores: V/M = 0.00
# Time range: Sat Jul 20 08:09:10 to Jul
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         45    6257
# Exec time     34     17s     2ms    61ms     3ms     5ms     2ms     2ms
# Rows sent     91 610.80k      97     100   99.96   97.36    0.06   97.36
# Rows examine  92 677.46k      97     200  110.87  192.76   29.75   97.36
# Query size    34 408.30k      63      68   66.82   65.89    0.88   65.89
# Key updates    0       0       0       0       0       0       0       0
# Result bytes  85  78.85M  12.52k  12.91k  12.91k  12.50k    0.00  12.50k
# Rlock time    66     19s     2ms    18ms     3ms     5ms     1ms     3ms
# Scan & order 100   6.11k       1       1       1       1       0       1
# Wlock time     0       0       0       0       0       0       0       0
# Yields         0       0       0       0       0       0       0       0
# String:
# Databases    sbtest.sbtest1
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  ################################################################
#  10ms  #
# 100ms
#    1s
#  10s+
query sbtest.sbtest1 { $query: { _id: { $gte: 7957, $lte: 8056 } }, $orderby: { c: 1 } }

# Query 3: 0 QPS, 0x concurrency, ID 0x102C8F18966084AF at byte 1229838 __
# Scores: V/M = 0.00
# Time range: Sat Jul 20 08:11:13 to Jul
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         10    1383
# Exec time     15      8s     2ms    44ms     6ms    13ms     4ms     4ms
# Rows sent      0       0       0       0       0       0       0       0
# Rows examine   0    1020       0       1    0.74    0.99    0.44    0.99
# Query size     1  17.41k      10      14   12.89   12.54    0.24   12.54
# Key updates  100    1020       0       1    0.74    0.99    0.44    0.99
# Result bytes   0       0       0       0       0       0       0       0
# Rlock time     0       0       0       0       0       0       0       0
# Scan & order   0       0       0       0       0       0       0       0
# Wlock time    57   226ms    19us     7ms   163us   301us   319us   119us
# Yields         0       0       0       0       0       0       0       0
# String:
# Databases    sbtest.sbtest1
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  ################################################################
#  10ms  ##########
# 100ms
#    1s
#  10s+
update sbtest.sbtest1 { _id: 4482 }

As you can see, there is a lot of information presented for each query. And thanks to the wonder of open source and a few regular expressions, I can now benefit from this reporting tool also in my work with MongoDB.

If you want to read up on what each of the columns mean, consult the Percona Toolkit documentation.

I used Tim Callaghan's Sysbench for MongoDB to generate some queries for this demo.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Anonymous friend's picture

no

Sequence (?\{.*\}) (?.*) locks\(micros\) (?.*) (?\d+)ms/ at ./pt-query-digest-mongo line 9739.

Neeraj's picture

thanks it looks great in your example...need help

I got this error while parsing my log:
Sequence (?\{.*\}) (?.*) locks\(micros\) (?.*) (?\d+)ms/ at /var/lib/./pt-query-digest-mongo line 9739.

Have you seen this before?

Thanks

hingo's picture

No. What MongoDB version are

No. What MongoDB version are you running? The log format changes a lot, my pt-query-digest-mongo only works for MongoDB 2.4.

Neeraj's picture

Hi again, I didn't get

Hi again,

I didn't get notification of your reply so sorry for the delay. I m using 2.4.3. I think I also saw your post on github or google code as in a development phase.

Thanks

Neeraj's picture

Hi again, I didn't get

Hi again,

I didn't get notification of your reply so sorry for the delay. I m using 2.4.3. I think I also saw your post on github or google code as in a development phase.

Thanks

hingo's picture

Hi Neeraj Well, it's a

Hi Neeraj

Well, it's a script, so it's hacky by nature. But it works for me. If you have a mongodb.log that it doesn't work for, email it to me at henrik.ingo@avoinelama.fi and I'll see if I can fix it.

neeraj's picture

it works

hi Henrik,

i was able to make it work on one of my other servers log. I think I have the old version as it is still skipping Remove queries..

# Profile
# Rank Query ID Response time Calls R/Call V/M Item
# ==== ================== ================== ====== ======== ===== =======
# 1 0xC354C34E8454BF3F 4697965.0900 91.0% 15662 299.9595 0.02
# 2 0xAD3C065CA60E7F53 366289.8590 7.1% 449980 0.8140 0.54 UPDATE player.PlayerAttribute
# 3 0x90633CC6352D65DA 95220.4230 1.8% 131960 0.7216 0.39 UPDATE player.PlayerAttribute
# 4 0xE0AAF4BADCA83EC4 407.8860 0.0% 573 0.7118 0.52 UPDATE local.slaves
# 5 0x52A17FB22F5EEEC4 142.8080 0.0% 293 0.4874 0.51
# 6 0x5547DD2E7C1A53F1 33.9470 0.0% 126 0.2694 0.53
# 7 0x2733F52119625936 32.9650 0.0% 128 0.2575 0.56
# 8 0xF0639ECDEE175FDD 25.7950 0.0% 119 0.2168 0.14
# 9 0x65F3C9A522472D93 23.4380 0.0% 124 0.1890 0.14
# 10 0x0EFFA7D2E0EEAC35 12.5790 0.0% 6 2.0965 0.41
# 11 0x827D318DB17D645A 10.8000 0.0% 6 1.8000 0.69
# 12 0x9A6EFF4725382C8D 8.9060 0.0% 4 2.2265 0.55
# 13 0x125828DBCC1D55C1 5.4390 0.0% 3 1.8130 0.04
# 14 0x8249E2526FD48604 5.0430 0.0% 4 1.2607 0.29
# 15 0xD7E3DBD6951BF4F1 4.9370 0.0% 2 2.4685 0.00
# 16 0x9278CA4598BE4247 3.7070 0.0% 2 1.8535 0.06
# 17 0x83472312FE8E43E7 3.5300 0.0% 2 1.7650 0.12
# 18 0xA29BC954CD64A643 2.8940 0.0% 2 1.4470 0.29
# 19 0xFBAC4BDDD57B926D 2.6380 0.0% 3 0.8793 0.29
# 20 0x58B1B7FE44831848 2.4690 0.0% 1 2.4690 0.00
# MISC 0xMISC 13.1780 0.0% 49 0.2689 0.0 <16 ITEMS>

I will download the latest one from launchpad and check again....thanks again.

hingo's picture

You really need to use the

You really need to use the --type=mongolog switch. Now it's trying to parse SQL, that's why remove ops are missing.

neeraj's picture

it works

hi Henrik,

i was able to make it work on one of my other servers log. I think I have the old version as it is still skipping Remove queries..

# Profile
# Rank Query ID Response time Calls R/Call V/M Item
# ==== ================== ================== ====== ======== ===== =======
# 1 0xC354C34E8454BF3F 4697965.0900 91.0% 15662 299.9595 0.02
# 2 0xAD3C065CA60E7F53 366289.8590 7.1% 449980 0.8140 0.54 UPDATE player.PlayerAttribute
# 3 0x90633CC6352D65DA 95220.4230 1.8% 131960 0.7216 0.39 UPDATE player.PlayerAttribute
# 4 0xE0AAF4BADCA83EC4 407.8860 0.0% 573 0.7118 0.52 UPDATE local.slaves
# 5 0x52A17FB22F5EEEC4 142.8080 0.0% 293 0.4874 0.51
# 6 0x5547DD2E7C1A53F1 33.9470 0.0% 126 0.2694 0.53
# 7 0x2733F52119625936 32.9650 0.0% 128 0.2575 0.56
# 8 0xF0639ECDEE175FDD 25.7950 0.0% 119 0.2168 0.14
# 9 0x65F3C9A522472D93 23.4380 0.0% 124 0.1890 0.14
# 10 0x0EFFA7D2E0EEAC35 12.5790 0.0% 6 2.0965 0.41
# 11 0x827D318DB17D645A 10.8000 0.0% 6 1.8000 0.69
# 12 0x9A6EFF4725382C8D 8.9060 0.0% 4 2.2265 0.55
# 13 0x125828DBCC1D55C1 5.4390 0.0% 3 1.8130 0.04
# 14 0x8249E2526FD48604 5.0430 0.0% 4 1.2607 0.29
# 15 0xD7E3DBD6951BF4F1 4.9370 0.0% 2 2.4685 0.00
# 16 0x9278CA4598BE4247 3.7070 0.0% 2 1.8535 0.06
# 17 0x83472312FE8E43E7 3.5300 0.0% 2 1.7650 0.12
# 18 0xA29BC954CD64A643 2.8940 0.0% 2 1.4470 0.29
# 19 0xFBAC4BDDD57B926D 2.6380 0.0% 3 0.8793 0.29
# 20 0x58B1B7FE44831848 2.4690 0.0% 1 2.4690 0.00
# MISC 0xMISC 13.1780 0.0% 49 0.2689 0.0 <16 ITEMS>

I will download the latest one from launchpad and check again....thanks again.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Use [fn]...[/fn] (or <fn>...</fn>) to insert automatically numbered footnotes.
  • Allowed HTML tags: <h1> <h2> <h3> <h4> <p> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <br> <sup> <div> <blockquote> <pre> <img>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically. (Better URL filter.)

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
8 + 5 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.