One-liner for condensing sysbench output into a csv file

An important part of benchmarking is to draw graphs. A graph can reveal results you wouldn't have spotted just by looking at raw numbers. By the way, the process of massaging the raw numbers into graphs will often reveal things too.

Sysbench output tends to be quite wordy, especially when you have a script that runs 1, 2, 4, 8... threads with the same test. To manually copy paste the numbers into a spreadsheet is tiresome. So I came up with this monster shell one-liner to condense the output into a csv file. I'm posting it here so I will find it the next time I need it:

cat sysbench.log | egrep " cat|threads:|transactions|deadlocks|read/write|min:|avg:|max:|percentile:" | tr -d "\n" | sed 's/Number of threads: /\n/g' | sed 's/\[/\n/g' | sed 's/[A-Za-z\/]\{1,\}://g'| sed 's/ \.//g' | sed -e 's/read\/write//g' -e 's/approx\.  95//g' -e 's/per sec.)//g' -e 's/ms//g' -e 's/(//g' -e 's/^.*cat //g' | sed 's/ \{1,\}/\t/g' > sysbench.csv

Example input:

[sqlclust@esitbi126lab sysbench]$ cat mysql51
sysbench 0.5: multi-threaded system evaluation benchmark
 
Running the test with following options:
Number of threads: 2
Random number generator seed is 0 and will be ignored
 
 
Threads started!
 
OLTP test statistics:
queries performed:
read: 702226
write: 200636
other: 100318
total: 1003180
transactions: 50159 (334.38 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 902862 (6018.90 per sec.)
other operations: 100318 (668.77 per sec.)
...

Example output:

mysql51,
2,50159,334.38,0,0.00,902862,6018.90,4.48,5.98,24.08,6.88
8,159546,1063.60,0,0.00,2871828,19144.76,5.25,7.52,24.88,9.74
16,243102,1620.59,0,0.00,4375836,29170.55,6.04,9.87,646.63,13.62
...

Explanation:

egrep " cat|threads:|transactions|deadlocks|read/write|min:|avg:|max:|percentile:"

List of words for the statistics I want to filter. "cat" is to find the line which is a delimiter between runs. In my case I produce one big logfile of sysbench output, where different runs are delimited with a shell prompt and commands "cat mysql51" or "cat mysql55". The filenames mysql51 and mysql55 are kept as descriptive names for the csv tables. If you collect statistics in some other format, you could replace cat with some other keyword.

tr -d "\n" | sed 's/Number of threads: /\n/g' | sed 's/\[/\n/g'

Delete all newlines. Add a newline in front of the number of threads. Add a newline in front of the shell command prompt. (The last regexp could really also be just "cat", I suppose.)

sed 's/[A-Za-z\/]\{1,\}://g'| sed 's/ \.//g'

Remove all text followed by colon, such as "deadlocks:". Remove a few stray dots that are not a decimal dot but just floating around amidst white space.

sed -e 's/read\/write//g' -e 's/approx\. 95//g' -e 's/per sec.)//g' -e 's/ms//g' -e 's/(//g' -e 's/^.*cat //g'

Remove more text not caught by the previous.

sed 's/ \{1,\}/\t/g'

Replace whitespace with a tab "\t". If you want your csv delimiter to be something else, use that character here instead of \t.

Thank you very much for for sharing this, I had a very very lengthy time compiling sysbench results for charting months ago. This will make the job alot easier :)

Of course as Baron said, having better or alternative output formats for sysbench would be great as well.

Henrik, good one-liner there. In regard to your statement "The other thing you always need to do for sysbench is to write a script that will run it in a loop for an increasing number of threads." - this has already been taken care of and will even generate graphs from the sysbench output which you can then load as HTML (it generates CSV and the required HTML+JS to view the graphs). See the google code here: http://code.google.com/p/quadrant-framework/

There are a lot of features for loop control, including the ability to run os-system commands in between loops, control threads increments, change mysql dynamic variables in between loops, and you can save your tests as templates for later use.

Derek Downey (not verified)

Thu, 2011-10-20 21:35

Thanks for sharing this oneliner. Just started investigating sysbench and was going to start working on a solution. Glad you and Google beat me to it.

Also note the quadrant framework in previous comment. Didn't try it myself yet, but if it even draws the graphs for you, sounds like heaven!

Gaurav Tomar (not verified)

Tue, 2022-11-22 13:58

Thanks, for inspiring, modified a bit as per my use case, try and let me know.

cat sysbench.log | egrep " cat|threads:|transactions|queries:|deadlocks|read:|write:|ignored errors:|reconnects:|total time:|min:|avg:|max:|percentile:"|awk -F':' '{print $2}'|sed 's/[a-zA-Z(]//g'|sed 's/.)//g'|xargs -n16|cut -d' ' -f1-8,11-|sed 's/ /,/g'|sed '1s/^/Threads,Reads,Writes,Transactions,TPS,Queries,QPS,Ignored Errors,Reconnects,Total Time,Min Lat,Avg Lat,Max La,95th Pct Lat\n/'

About the bookAbout this siteAcademicAccordAmazonAppleBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDistributed ConsensusDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLNyrkiöodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPParkinsonPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTransactionsTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube