One of my favorite MySQL tools ever is pt-query-digest. It's the tool you use to generate a report of your slow query log (or some other supported sources), and is similar to for example the Query Analyzer in MySQL Enterprise Monitor. Especially in the kind of job I am, where I often just land out of nowhere on a server at the customer site, and need to quickly get an overview of what is going on, this tool is always the first one I run. I will show some examples below.
When I started engaging with MongoDB projects at Nokia, I needed some project of my own to tinker with. So I decided to add support for MongoDB "slow query log". Back then, my first task was to figure out if MongoDB even has such logging for slow queries. Lucky me, it has: it's known as the MongoDB Profiler.
Nokia allowed me to contribute changes back upstream to Percona. In the past weeks I finally had some time to upgrade it to understand the changed log format of MongoDB 2.4. (Earlier versions are no longer supported.)
And therefore, I now proudly present to you, pt-query-digest with MongoDB support: https://gist.github.com/henrikingo/6065357/raw/pt-query-digest-mongo (or branch the whole bzr repo: https://code.launchpad.net/~hingo/percona-toolkit/pqd-mongodb-24 )
Btw, the other reason I like this and other Percona Toolkit tools: They ship with all their dependencies embedded into the same file. This means, when I'm on a customer server without root access, I can just do
wget https://gist.github.com/henrikingo/6065357/raw/pt-query-digest-mongo
...and I'm good to go! Command line tools for the win :-)
Example usage
My pt-query-digest supports reading slow queries from the mongodb log file. To get some queries there, you need to set the threshold to lower than 100 ms. In fact, I like to collect as much as possible, so I set it to -1 ms. (Updated 2013-10-18...)For some reason you cannot set the threshold to 0, but -1 works and has the same effect.
db.setProfilingLevel(1,-1)
Then run pt-query digest with the --type=mongolog option. Using --limit is optional and means the same is in normal pt-query-digest, see --help for other options.
pt-query-digest --type=mongolog --limit=20 At the top of the output you first get some summary information of all the queries in the log.# 19.2s user time, 50ms system time, 25.96M rss, 95.67M vsz # Current date: Tue Jul 23 14:21:53 2013 # Hostname: hingo-sputnik # Files: STDIN # Overall: 84.96k total, 9 unique, 0 QPS, 0x concurrency _________________ # Time range: Sat Jul 20 08:08:57 to Jul # Attribute total min max avg 95% stddev median # ============ ======= ======= ======= ======= ======= ======= ======= # Exec time 2404s 2ms 83ms 28ms 42ms 13ms 33ms # Rows sent 665.96k 0 100 8.03 97.36 26.46 0 # Rows examine 733.62k 0 200 8.84 97.36 30.16 0 # Query size 3.33M 6 173 41.09 76.28 30.35 31.70 # Key updates 1020 0 1 0.01 0 0.11 0 # Result bytes 97.74M 0 12.91k 1.18k 12.50k 3.47k 92.72 # Rlock time 29s 0 18ms 341us 3ms 972us 0 # Scan & order 6.11k 0 1 0.07 0.99 0.26 0 # Wlock time 392ms 0 7ms 4us 0 63us 0 # Yields 0 0 0 0 0 0 0From the above we can already see that we are looking at a fairly well behaved system. For example Execution time for all queries is well below 100 ms (so wouldn't even show up by default in the log). Largest result sizes are just below 13k, there was really little time spent in write lock, and so forth.
Next we get a summary of the breakdown per query. Now we get to the parts where pt-query-digest really shines. The typical use of a slow query log would be to just look at whichever queries happen to be slowest, and then try to fix them. Of course, if you have a query that takes tens of seconds to run, there may or may not be something wrong. But experience shows, your energy is often better spent somewhere else completely, and pt-query-digest will tell you where to focus.
Pt-query-digest instead will de-parameterize all of the queries, so that it can recognize which queries are actually the same, just that the parameters were different. It then sums up and sorts all these queries (by default) by execution time. A typical application will only have between 10 to 100 different queries, so this is the perfect tool to give you an overview of what the app is doing to your database:
# Profile # Rank Query ID Response time Calls R/Call V/M Item # ==== ================== =============== ===== ====== ===== ============= # 1 0xA9297704DC845CFE 2354.4260 97.9% 71264 0.0330 0.00 COMMAND sbtest.$cmd # 2 0x056DECD9CABFBFFA 18.1960 0.8% 4171 0.0044 0.00 COMMAND sbtest.$cmd # 3 0xF6661101714FC832 17.3660 0.7% 6257 0.0028 0.00 QUERY sbtest.sbtest1 # 4 0x102C8F18966084AF 7.8110 0.3% 1383 0.0056 0.00 UPDATE sbtest.sbtest1 # 5 0x411E607C59AC3D1F 2.1300 0.1% 440 0.0048 0.01 INSERT sbtest.sbtest1 # 6 0xE097817C3B6F6EE3 1.4490 0.1% 565 0.0026 0.00 QUERY sbtest.sbtest1 # 7 0xFC580F8AC66DC809 1.1850 0.0% 442 0.0027 0.00 COMMAND sbtest.$cmd # 8 0x094C2DDF61C1708A 1.1330 0.0% 253 0.0045 0.01 REMOVE sbtest.sbtest1 # 9 0x9696A6AE83C26BF7 0.5080 0.0% 188 0.0027 0.00 QUERY sbtest.sbtest1Now, if this was a MySQL based application, I would be very worried with the above results. There is one query that is taking up 98% of the total execution time of everything that happens in the database! It turns out, the benchmark I used to generate some "noise" for this example was set to use WriteConcern j=1, so the top line is simply wait time resulting from getLastError calls:
# Query 1: 0 QPS, 0x concurrency, ID 0xA9297704DC845CFE at byte 8790360 __ # Scores: V/M = 0.00 # Time range: Sat Jul 20 08:08:57 to Jul # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 83 71264 # Exec time 97 2354s 2ms 83ms 33ms 44ms 8ms 34ms # Rows sent 0 0 0 0 0 0 0 0 # Rows examine 0 0 0 0 0 0 0 0 # Query size 65 2.17M 32 32 32 32 0 32 # Key updates 0 0 0 0 0 0 0 0 # Result bytes 6 5.98M 79 169 88.02 92.72 8.26 92.72 # Rlock time 0 0 0 0 0 0 0 0 # Scan & order 0 0 0 0 0 0 0 0 # Wlock time 0 0 0 0 0 0 0 0 # Yields 0 0 0 0 0 0 0 0 # String: # Databases sbtest.$cmd # Query_time distribution # 1us # 10us # 100us # 1ms # # 10ms ################################################################ # 100ms # 1s # 10s+ command sbtest.$cmd { getlasterror: 1, fsync: true }If your application is doing this, it might not be what you want. As you can see, your clients end up waiting a lot compared to what it actually takes to execute the real queries. But for the MongoDB server this is actually harmless, it's not executing or blocking anything, it's just you that is waiting all the time.
Ok, let's fix that and run pt-query-digest again. We get a different distribution that's more interesting to look at:
# Profile # Rank Query ID Response time Calls R/Call V/M Item # ==== ================== ============= ===== ====== ===== =============== # 1 0x056DECD9CABFBFFA 18.1960 36.6% 4171 0.0044 0.00 COMMAND sbtest.$cmd # 2 0xF6661101714FC832 17.3660 34.9% 6257 0.0028 0.00 QUERY sbtest.sbtest1 # 3 0x102C8F18966084AF 7.8110 15.7% 1383 0.0056 0.00 UPDATE sbtest.sbtest1 # 4 0x411E607C59AC3D1F 2.1300 4.3% 440 0.0048 0.01 INSERT sbtest.sbtest1 # 5 0xE097817C3B6F6EE3 1.4490 2.9% 565 0.0026 0.00 QUERY sbtest.sbtest1 # 6 0xFC580F8AC66DC809 1.1850 2.4% 442 0.0027 0.00 COMMAND sbtest.$cmd # 7 0x094C2DDF61C1708A 1.1330 2.3% 253 0.0045 0.01 REMOVE sbtest.sbtest1 # 8 0x9696A6AE83C26BF7 0.5080 1.0% 188 0.0027 0.00 QUERY sbtest.sbtest1So it seems we have 2 queries that together consume over 70% of the total execution time spent in the database and the top 3 queries use up 95% of your MongoDB capacity! This means, it makes sense to focus solely on optimizing these top 2-3 queries and ignore everything else.
This is what they look like in more detail:
# Query 1: 0 QPS, 0x concurrency, ID 0x056DECD9CABFBFFA at byte 389327 ___ # Scores: V/M = 0.00 # Time range: Sat Jul 20 08:11:09 to Jul # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 30 4171 # Exec time 36 18s 2ms 46ms 4ms 11ms 4ms 3ms # Rows sent 0 0 0 0 0 0 0 0 # Rows examine 0 0 0 0 0 0 0 0 # Query size 58 695.74k 167 173 170.81 166.51 0 166.51 # Key updates 0 0 0 0 0 0 0 0 # Result bytes 0 309.57k 76 76 76 76 0 76 # Rlock time 21 6s 357us 7ms 1ms 3ms 930us 1ms # Scan & order 0 0 0 0 0 0 0 0 # Wlock time 0 0 0 0 0 0 0 0 # Yields 0 0 0 0 0 0 0 0 # String: # Databases sbtest.$cmd # Query_time distribution # 1us # 10us # 100us # 1ms ################################################################ # 10ms ##### # 100ms # 1s # 10s+ command sbtest.$cmd { aggregate: "sbtest1", pipeline: [ { $match: { _id: { $gte: 8184, $lte: 8283 } } }, { $project: { k: 1, _id: 0 } }, { $group: { _id: null, average: { $sum: "$k" } } } ] } # Query 2: 0 QPS, 0x concurrency, ID 0xF6661101714FC832 at byte 2281363 __ # Scores: V/M = 0.00 # Time range: Sat Jul 20 08:09:10 to Jul # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 45 6257 # Exec time 34 17s 2ms 61ms 3ms 5ms 2ms 2ms # Rows sent 91 610.80k 97 100 99.96 97.36 0.06 97.36 # Rows examine 92 677.46k 97 200 110.87 192.76 29.75 97.36 # Query size 34 408.30k 63 68 66.82 65.89 0.88 65.89 # Key updates 0 0 0 0 0 0 0 0 # Result bytes 85 78.85M 12.52k 12.91k 12.91k 12.50k 0.00 12.50k # Rlock time 66 19s 2ms 18ms 3ms 5ms 1ms 3ms # Scan & order 100 6.11k 1 1 1 1 0 1 # Wlock time 0 0 0 0 0 0 0 0 # Yields 0 0 0 0 0 0 0 0 # String: # Databases sbtest.sbtest1 # Query_time distribution # 1us # 10us # 100us # 1ms ################################################################ # 10ms # # 100ms # 1s # 10s+ query sbtest.sbtest1 { $query: { _id: { $gte: 7957, $lte: 8056 } }, $orderby: { c: 1 } } # Query 3: 0 QPS, 0x concurrency, ID 0x102C8F18966084AF at byte 1229838 __ # Scores: V/M = 0.00 # Time range: Sat Jul 20 08:11:13 to Jul # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 10 1383 # Exec time 15 8s 2ms 44ms 6ms 13ms 4ms 4ms # Rows sent 0 0 0 0 0 0 0 0 # Rows examine 0 1020 0 1 0.74 0.99 0.44 0.99 # Query size 1 17.41k 10 14 12.89 12.54 0.24 12.54 # Key updates 100 1020 0 1 0.74 0.99 0.44 0.99 # Result bytes 0 0 0 0 0 0 0 0 # Rlock time 0 0 0 0 0 0 0 0 # Scan & order 0 0 0 0 0 0 0 0 # Wlock time 57 226ms 19us 7ms 163us 301us 319us 119us # Yields 0 0 0 0 0 0 0 0 # String: # Databases sbtest.sbtest1 # Query_time distribution # 1us # 10us # 100us # 1ms ################################################################ # 10ms ########## # 100ms # 1s # 10s+ update sbtest.sbtest1 { _id: 4482 }As you can see, there is a lot of information presented for each query. And thanks to the wonder of open source and a few regular expressions, I can now benefit from this reporting tool also in my work with MongoDB.
If you want to read up on what each of the columns mean, consult the Percona Toolkit documentation.
I used Tim Callaghan's Sysbench for MongoDB to generate some queries for this demo.
- Log in to post comments
- 50123 views
no
Sequence (?\{.*\}) (?.*) locks\(micros\) (?.*) (?\d+)ms/ at ./pt-query-digest-mongo line 9739.
thanks it looks great in your example...need help
I got this error while parsing my log:
Sequence (?\{.*\}) (?.*) locks\(micros\) (?.*) (?\d+)ms/ at /var/lib/./pt-query-digest-mongo line 9739.
Have you seen this before?
Thanks
No. What MongoDB version are
No. What MongoDB version are you running? The log format changes a lot, my pt-query-digest-mongo only works for MongoDB 2.4.
Hi again, I didn't get
Hi again,
I didn't get notification of your reply so sorry for the delay. I m using 2.4.3. I think I also saw your post on github or google code as in a development phase.
Thanks
Hi again, I didn't get
Hi again,
I didn't get notification of your reply so sorry for the delay. I m using 2.4.3. I think I also saw your post on github or google code as in a development phase.
Thanks
Hi Neeraj Well, it's a
Hi Neeraj
Well, it's a script, so it's hacky by nature. But it works for me. If you have a mongodb.log that it doesn't work for, email it to me at henrik.ingo@avoinelama.fi and I'll see if I can fix it.
it works
hi Henrik,
i was able to make it work on one of my other servers log. I think I have the old version as it is still skipping Remove queries..
# Profile
# Rank Query ID Response time Calls R/Call V/M Item
# ==== ================== ================== ====== ======== ===== =======
# 1 0xC354C34E8454BF3F 4697965.0900 91.0% 15662 299.9595 0.02
# 2 0xAD3C065CA60E7F53 366289.8590 7.1% 449980 0.8140 0.54 UPDATE player.PlayerAttribute
# 3 0x90633CC6352D65DA 95220.4230 1.8% 131960 0.7216 0.39 UPDATE player.PlayerAttribute
# 4 0xE0AAF4BADCA83EC4 407.8860 0.0% 573 0.7118 0.52 UPDATE local.slaves
# 5 0x52A17FB22F5EEEC4 142.8080 0.0% 293 0.4874 0.51
# 6 0x5547DD2E7C1A53F1 33.9470 0.0% 126 0.2694 0.53
# 7 0x2733F52119625936 32.9650 0.0% 128 0.2575 0.56
# 8 0xF0639ECDEE175FDD 25.7950 0.0% 119 0.2168 0.14
# 9 0x65F3C9A522472D93 23.4380 0.0% 124 0.1890 0.14
# 10 0x0EFFA7D2E0EEAC35 12.5790 0.0% 6 2.0965 0.41
# 11 0x827D318DB17D645A 10.8000 0.0% 6 1.8000 0.69
# 12 0x9A6EFF4725382C8D 8.9060 0.0% 4 2.2265 0.55
# 13 0x125828DBCC1D55C1 5.4390 0.0% 3 1.8130 0.04
# 14 0x8249E2526FD48604 5.0430 0.0% 4 1.2607 0.29
# 15 0xD7E3DBD6951BF4F1 4.9370 0.0% 2 2.4685 0.00
# 16 0x9278CA4598BE4247 3.7070 0.0% 2 1.8535 0.06
# 17 0x83472312FE8E43E7 3.5300 0.0% 2 1.7650 0.12
# 18 0xA29BC954CD64A643 2.8940 0.0% 2 1.4470 0.29
# 19 0xFBAC4BDDD57B926D 2.6380 0.0% 3 0.8793 0.29
# 20 0x58B1B7FE44831848 2.4690 0.0% 1 2.4690 0.00
# MISC 0xMISC 13.1780 0.0% 49 0.2689 0.0 <16 ITEMS>
I will download the latest one from launchpad and check again....thanks again.
You really need to use the
You really need to use the --type=mongolog switch. Now it's trying to parse SQL, that's why remove ops are missing.
it works
hi Henrik,
i was able to make it work on one of my other servers log. I think I have the old version as it is still skipping Remove queries..
# Profile
# Rank Query ID Response time Calls R/Call V/M Item
# ==== ================== ================== ====== ======== ===== =======
# 1 0xC354C34E8454BF3F 4697965.0900 91.0% 15662 299.9595 0.02
# 2 0xAD3C065CA60E7F53 366289.8590 7.1% 449980 0.8140 0.54 UPDATE player.PlayerAttribute
# 3 0x90633CC6352D65DA 95220.4230 1.8% 131960 0.7216 0.39 UPDATE player.PlayerAttribute
# 4 0xE0AAF4BADCA83EC4 407.8860 0.0% 573 0.7118 0.52 UPDATE local.slaves
# 5 0x52A17FB22F5EEEC4 142.8080 0.0% 293 0.4874 0.51
# 6 0x5547DD2E7C1A53F1 33.9470 0.0% 126 0.2694 0.53
# 7 0x2733F52119625936 32.9650 0.0% 128 0.2575 0.56
# 8 0xF0639ECDEE175FDD 25.7950 0.0% 119 0.2168 0.14
# 9 0x65F3C9A522472D93 23.4380 0.0% 124 0.1890 0.14
# 10 0x0EFFA7D2E0EEAC35 12.5790 0.0% 6 2.0965 0.41
# 11 0x827D318DB17D645A 10.8000 0.0% 6 1.8000 0.69
# 12 0x9A6EFF4725382C8D 8.9060 0.0% 4 2.2265 0.55
# 13 0x125828DBCC1D55C1 5.4390 0.0% 3 1.8130 0.04
# 14 0x8249E2526FD48604 5.0430 0.0% 4 1.2607 0.29
# 15 0xD7E3DBD6951BF4F1 4.9370 0.0% 2 2.4685 0.00
# 16 0x9278CA4598BE4247 3.7070 0.0% 2 1.8535 0.06
# 17 0x83472312FE8E43E7 3.5300 0.0% 2 1.7650 0.12
# 18 0xA29BC954CD64A643 2.8940 0.0% 2 1.4470 0.29
# 19 0xFBAC4BDDD57B926D 2.6380 0.0% 3 0.8793 0.29
# 20 0x58B1B7FE44831848 2.4690 0.0% 1 2.4690 0.00
# MISC 0xMISC 13.1780 0.0% 49 0.2689 0.0 <16 ITEMS>
I will download the latest one from launchpad and check again....thanks again.
Hi, your tool is very easy to
Hi, your tool is very easy to use, it helped me a lot, thank you very much. However, I found that it could not analyze mongodb2.6's log,can you improve it? Thanks again.
Hi Hisoka This tool is no
Hi Hisoka
This tool is no longer maintained. Much of the same functionality is available in the mloginfo tool here: https://github.com/rueckstiess/mtools
Hi Hisoka Can this tool
Hi Hisoka
Can this tool support mongodb 3.0.7 version profile log? And then, will you continue to develop this tool to adapt latest version?
Thank you so much.
I have miss out your reply to
I have miss out your reply to others, and it's so a pity.
Thank you for your dedication once again.