| 45 million row table - Two BETWEENs not using a composite index optimally [message #1575] |
Mon, 06 August 2007 23:05  |
tazman Messages: 2 Registered: August 2007 |
Junior Member |
|
|
45 million row MyISAM table in MySQL 5.1.20.
select * from M
where B between 726642 and 733649
and C between 8224 and 8230;
Result set has 311 rows.
Full table scan: 32.08 seconds.
Using index on (B): 37.90 seconds.
Using index on (B,C): 39.75 seconds.
Potential performance: 0.01 seconds.
I'd hoped that MySQL would apply the composite index (B,C), with an index range scan on B, and a subindex range scan on C (for each matching B). But it seems not. The similarities between the response times for using the index on (B) and the index on (B,C), indicate MySQL is probably index range scanning on B, and then looking at every C element and applying a WHERE filter. That is, MySQL does not seem to take advantage of the fact that for each matching B, the C elements are available in the index in sorted order, and so could be subindex range scanned, rather than a full subindex scan.
It seems MySQL scans 4,776,432 index entries, rather than just the 74,865 index entries it needs to examine.
To check this, I collapsed the ranges, by making all the numbers in each range just the first number:
UPDATE m SET b = 726642 WHERE b BETWEEN 726642 AND 733649;
UPDATE m SET c = 8224 WHERE c BETWEEN 8224 AND 8230;
Now I run the query as: select * from M where B = 726642 and C = 8224;
This returns in 0.01 seconds.
If the BETWEENS were more cleverly optimized by MySQL, the original query should return in the same time - 0.01 seconds.
Is my understanding correct?
Is there any way to get MySQL to optimally use a composite index (index range scan plus subindex range scan) to satisfy two BETWEEN statements?
We cannot easily flatten these ranges into single numbers. These ranges encode a complex data structure. Flattening the ranges would involve many code changes and many extra columns and many extra indexes.
Thanks for your expertise and illumination,
Tasman
p.s. Note this is a cross post from the MySQL Optimizer forum; I hadn't gotten any response there in over a day, so I thought I'd try here.
|
|
|
| Re: 45 million row table - Two BETWEENs not using a composite index optimally [message #1594 is a reply to message #1575 ] |
Thu, 16 August 2007 08:11   |
Peter Messages: 405 Registered: August 2006 |
Senior Member Super Guru |
|
|
MySQL as of MySQL 5.0 can only have one range per index - as soon as there is the range everything in that range will be just scanned.
Sometimes you can replace first range with IN
A IN (5,6,7) and B between 5 and 10 will use key A,B
If the range is too large you can sometimes divide it in number of intervals and use that instead
Say now you need 545 to 789 in the query above - you can use
A100 IN (5,6,7) and A BETWEEN 545 to 789 and B BETWEEN 5 and 10
with key(A100,B)
when A100 keps A/100 numbers.
Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
|
|
|
|
| Re: 45 million row table - Two BETWEENs not using a composite index optimally [message #1640 is a reply to message #1635 ] |
Fri, 17 August 2007 05:47  |
Peter Messages: 405 Registered: August 2006 |
Senior Member Super Guru |
|
|
Right if you have cascaded INs it may blow up MySQL optimizer.
That was very easy DOS on MySQL as it not only takes time but also a lot of memory to build all combinations (ie 1000 in 1st in and 1000 in second in will give 1.000.000 of combinations)
This was recently fixed inM ySQL 5.0
Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
|
|
|