Between following (from a distance) the talks at Fosdem and anticipating the ones at MySQL User Conference in April, I was reminded of 2 interesting MySQL talks that have had a deeper meaning to me than their original speakers probably intended. I thought today could be a good time to share these 2 stories that for me personally are filed in the "things I learned from MySQL AB and Sun" folder...
"If you can't solve the problem, try solving some other problem"
These genial words were uttered by MySQL consultant and instructor Kai Voigt at the MySQL Customer Conference in London, Oct 23, 2008. (The slides seem to be gone from mysql.com following the Oracle takeover and revamp of the site. The talk was titled something like "Top 5 performance tuning tips for MySQL".)
The problem Kai was solving at a customer was to select all data points (x, y)
that are within a given radius r
from a point (a, b)
. Quite a common problem given how maps and spatial databases are becoming increasingly popular. So something like:
SELECT userid, x, y FROM locations WHERE
r >= sqrt((a-x)*(a-x)+(b-y)*(b-y));
(r, a and b are parameters and would be substituted by "?" or a number in an actual SQL query.)
The problem is that to calculate the radius from something, you end up using the formula from Pythagoras theorem. While a simple concept, the squares and square roots are too complex for MySQL to make any useful optimization on which index to use, so it ends up calculating the radius for all records in the entire table!
Failing to modify the query to something that MySQL could handle more optimally, Kai eventually had to go back to the customer to say: "Would you mind if instead of looking up all records that are within the circle with (a,b) as the center point, we could return all records that are within a square with (a,b) as a center point"
The customer was ok with that, so Kai modified the query to this simpler form:
SELECT userid, x, y FROM locations WHERE
ABS(x-a)
To really make it easy for MySQL to use an index for this query, we still need to "solve" x
and y
from the formula:
SELECT userid, x, y FROM locations WHERE
(x a-r) AND (y b-r);
With the above query, MySQL can quickly return you all userid's that are within a certain square that has point (a,b) at its center.
However, the really funny thing is that once you have found the records that are within the square, it becomes possible to go back to the original problem of finding records that are also within the original circle. This is because the circle is fully inside the square, so you can now calculate the Pythagoras theorem only for the records you found with the previous query. Essentially you filter out points that are in the "corners" of the square, and keep the ones that are inside the circle. This is feasible to do efficiently since you do it for the small resultset, not for the entire table.
So by adding the "complex" Pythagoras theorem back into the SQL query, Kai eventually got the answer he was originally looking for:
SELECT userid, x, y FROM locations WHERE
(x a-r) AND (y b-r);
AND r >= sqrt((a-x)*(a-x)+(b-y)*(b-y));
In addition to learning a useful SQL trick, this particular one has stuck with me for a year now, because I really like the general principle: "If you can't solve the problem, try solving some other problem". If you're stuck on a problem, maybe you're trying to solve the wrong thing? Or maybe it's the right thing but you need to make it simpler first, before you can solve it?
Very useful!
Outrun the Lions
The other "deep thought" talk I never heard myself, but read about in the blog post by Baron Schwartz from Percona, called Outrun the Lions. This is based on the story from the African savanna: "Every day, a gazelle wakes and knows it must outrun the fastest lion; every day a lion awakes and knows it must outrun the slowest gazelle.
I've heard the same insight stated in another way: If you wake up in a cave, next to one dragon and one halfling... remember that you don't need to outrun the dragon, you only need to outrun the halfling! I do like the analogy of lions and gazelles better though, since in that story it doesn't matter if you are the gazelle or the lion, you face exactly the same problem.
I'm actually not sure how exactly the lion's and gazelles map into Baron's advice on MySQL performance tuning. Whether you see yourself as the gazelle or the lion, the moral of the story is that you don't need to be the fastest runner on the planet, you only need to be fast enough to survive.
Taken the right way, that can often be comforting advice and also reminds you not to over optimize everything.
- Log in to post comments
- 18018 views
It's a great advice. It's
It's a great advice. It's make me think. I've stuck with a problem for a while, may be I looking the wrong way or may be I should make it simple to understand. Thanks any way. nice blog :)