Let’s take the employees database, and slightly modify the employees tables:
|
mysql>
ALTER TABLE employees ADD INDEX idx_first (first_name),ENGINE=InnoDB; |
1 |
SELECT *
FROM employees ORDER BY first_name; |
This
query can ofcourse by executed by running a full table scan, but we could also
take advantage of the idx_first index, which will translate to a full index
scan.
Let’s see
how the optimizer will execute it:
1 2 3 4 5 6 7 8 9 10 11 12 |
mysql>
EXPLAIN SELECT * FROM employees ORDER BY first_name\G ***************************
1. row ***************************
id: 1 select_type:
SIMPLE table:
employees
type: ALL possible_keys:
NULL key:
NULL key_len:
NULL ref:
NULL
rows: 300363 Extra:
Using filesort |
1 2 3 4 5 6 7 8 9 10 11 12 |
mysql>
EXPLAIN SELECT * FROM employees FORCE INDEX(idx_first) ORDER BY first_name\G ***************************
1. row ***************************
id: 1 select_type:
SIMPLE table:
employees
type: index possible_keys:
NULL key:
idx_first key_len:
16 ref:
NULL
rows: 300363 Extra: |
Honestly,
it looks better: the number of rows is the same, but a full index scan instead
of a full table scan and no filesort. But predicting how a query will perform
by only looking at the execution plan is not enough, we must run both queries
and compare execution time.
First
case: the employees table does not fit in memory
With the full table scan, the query runs in about 4s.
With the full index scan, it runs in about 30s.
So the
optimizer was right after all. But why? Simply because all access patterns are
not equal. When we are doing a full table scan, we are doing sequential reads,
which are quite fast even with slow disks. But when we are using the index, we
first have to do a full index scan (fast sequential reads, no problem) and then
lots of random reads to fetch rows by index value. And random reads are orders
of magnitude slower than sequential reads.
The
optimizer has this knowledge, so it is able to pick the right execution plan.
Second
case: the employees table fits in memory
With the full table scan, the query runs in about 3.3s.
With the full index scan, the query runs in about 2.6s.
We can
see here a limitation of the optimizer: it does not know on which kind of media
data is stored. If it is stored on spinning disks, the assumption that random
reads are much slower than sequential reads is correct, but it is not the case
anymore if data is stored in memory. That’s why when execution plans look
similar, you should always execute the query to really see which execution plan
should be chosen. Here if we know that the table will always be in memory, we
should add the FORCE INDEX hint to ensure optimal response time.
Now let’s
modify the query by selecting only the first_name field instead of selecting
all the fields:
1 2 3 4 5 6 7 8 9 10 11 12 |
mysql>
explain SELECT first_name FROM employees ORDER BY first_name\G ***************************
1. row ***************************
id: 1 select_type:
SIMPLE table:
employees
type: index possible_keys:
NULL key:
idx_first key_len:
16 ref:
NULL
rows: 300584 Extra:
Using index |
The
optimizer chooses the full index scan. It is a good choice because the index
now covers the query, which means that reading the index is enough to get the
results.
Conclusions:
- For
a non-covering index, the difference between a full table scan and an
execution plan based on a full index scan is basically the difference
between sequential reads and random reads: it can be close if you have
fast storage or it can be very different if you have slow storage.
- A
full index scan can become interesting when the index is covering.
- Don’t
forget to measure response time when you are trying different execution
plans. It is too easy to get focused on having a good-looking execution,
but the end user only cares about response time!
No comments:
Post a Comment