performance

A scalability model for Cassandra

One thing that struck me when reading up on Cassandra is that there is a very strong mindset in the Cassandra community around linear scalability and therefore on primary key based data models. So de-normalizing your data, such as by using materialized views is considered a best practice.

However, de-normalization has some challenges of its own. Both Cassandra-managed materialized views or any other application side managed denormalization run the risk of becoming inconsistent. And of course it does mean you're multiplying your database size.

Automated System Performance Testing at MongoDB - the end of a trilogy

We are pleased to announce that our paper Automated System Performance Testing at MongoDB will be presented at DBTest 2020. (A workshop in conjunction with SIGMOD/PODS.) It is available today on Arxiv.org.

This paper presents the framework we developed to do automated performance testing on realistic MongoDB clusters. We have used and evolved this system to run hundreds of benchmarks every day as part of our Continuous Integration system. In conjunction with publishing this paper, we have also finally open sourced the Python code to this framework. The framework is called Distributed Systems Infrastructure 2.0, or DSI for short.

Writing a data loader for database benchmarks

A task that I've done many times in my career in databases is to load data into a database as a first step in some benchmark. To do it efficiently you want to use multiple threads. Dividing the work onto many threads requires good comprehension of third grade math, yet can be surprisingly hard to get right.

The typical setup is often like this:

  1. The benchmark framework launches N independent threads. For example in Sysbench these are completely isolated Lua environments with no shared data structures or communication possible between the threads.
  2. Each thread gets as input its thread id i and the total number of threads launched N.

Video x2: Measuring performance variability of EC2

I was recently invited to speak at Fwdays Highload in Kyiv. This was my first ever visit to Ukraine, so I was excited to go and visit this large and beautiful European capital. Over a thousand years ago Vikings would row their boats through the rivers in Russia, and take the Dniepr southward to Kyiv and ultimately Turkey. It was exciting to travel in the footsteps of my forefathers.

My talk isn't really MongoDB specific, rather about an EC2 performance tuning project we did in 2017:

About the bookAbout this siteAcademicAccordAmazonAppleBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDistributed ConsensusDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLNyrkiöodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPParkinsonPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTransactionsTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube