CREATE TABLE `test` (
`id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
`month` timestamp NOT NULL DEFAULT '2018-01-01 00:00:00',
`b` varchar(255) NOT NULL DEFAULT '',
`c` varchar(255) NOT NULL DEFAULT '',
`d` varchar(255) NOT NULL DEFAULT '',
`e` varchar(255) NOT NULL DEFAULT '',
...
PRIMARY KEY (`id`),
INDEX `idx_month_b`(`month`, `b`) USING BTREE,
INDEX `idx_d`(`c`) USING BTREE,
...
);
background
The total number of records in the table is about 4 million;
The data distribution on the month field is not very even, with more than half of the data concentrated after the year 2023.
The cardinality of the index for the month field is 80.The cardinality of the index for the ‘c’ field is 2.65 million+.
Problem 1
performance
select * from test where month>='2024-07-01' and month<='2024-07-31' and d=? order by id desc limit 10;
Under the condition month>=’2024-07-01′ and month<=’2024-07-31′, there are actually 210,000 records.
The actual execution result of the SQL is empty.
The execution plan shows that the primary key index was used, with rows=85. When I explicitly specified to use the index (month, b), the rows=440,000. I expected it to use the (month, b) index, and the actual execution result is significantly faster when using the forced index.
Why?
Why did it not use the expected index, and why is the estimated number of scanned rows so low when using the primary key?
Problem 2
performance
select * from test where c='test1' order by id desc limit 1;
select * from test where c='test2' order by id desc limit 1;
The index on column c has good selectivity, with varying numbers of records ranging from 1 to 4,000 for different c values.
Under the condition c=’test1′, there are 3,000 records, while under the condition c=’test2′, there are 2,900 records.
The actual execution result of the SQL is not empty.
When c=’test1′, the query used the index on c, while for c=’test2′, it used the primary key index, resulting in a significant performance difference.
Why?
In the case where the data volumes are so close, what possibilities exist that could cause a query with a smaller data volume to utilize the primary key index? Research indicates that under a single-column index, the primary key may not be completely ordered,is it right?
Why did it not use the expected index(c), and why is the estimated number of scanned rows so low when using the primary key?
After increasing the limit, it will reach the expected index.
学xi ai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.