MySQL 인덱스 통계

2021년 6월 17일 · 약 5분

쿼리를 실행시에 예상했던 인덱스를 타지 않는 현상을 알아보고자 합니다.

다음과 같은 테이블이 있다고 가정해보자.

테이블 생성

CREATE TABLE `payment_history` (
    `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, -- 결제 id
    `user_id` varchar(100) DEFAULT NULL, -- 유저 정보
    `status` varchar(10) DEFAULT NULL, -- 결제 상태
    ...
    PRIMARY KEY (`id`),
    KEY `idx_user_id` (`user_id`),
    KEY `idx_user_id_status` (`user_id`, `status`),
    ...
)

explain

그리고 다음과 같은 쿼리를 실행한다면 어떤 인덱스를 타게 될까요?

SELECT  *
FROM    `payment_history`
WHERE   `user_id` = 'AAAAAAA'
AND     `status` = 'Completed'

다음과 같은 실행 계획을 예상해볼 수 있습니다.

... | possible_keys                   | key                | ... | rows  | ...
------------------------------------------------------------------------
... | idx_user_id, idx_user_id_status | idx_user_id_status | ... | 17714 | ...

하지만 경우에 따라, 이는 다른 실행 계획 결과가 도출될 수 있습니다.

... | possible_keys                   | key                | ... | rows  | ...
------------------------------------------------------------------------
... | idx_user_id, idx_user_id_status | idx_user_id        | ... | 16670 | ...

그 이유는 옵티마이저가 예상과 다른 인덱스를 탐색했기 때문입니다. 옵티마이저는 최적의 실행계획을 세우기 위해 인덱스 통계 정보를 의존하게 되는데, MySQL 같은 경우는 mysql.innodb_index_stats 라는 테이블에서 해당 정보를 얻을 수 있습니다.

innodb_index_stats

SELECT  *
FROM    `mysql.innodb_index_stats`
WHERE   `database_name` = 'test' 
AND     `table_name` = 'payment_history';

... | table_name      | index_name         | ... | stat_name    | stat_value | ... | stat_description
-------------------------------------------------------------------------------------------------------------------------
... | payment_history | PRIMARY            | ... | n_diff_pfx01 |        ... | ... | id
... | payment_history | PRIMARY            | ... | n_leaf_pages |        ... | ... | Number of leaf pages in the index
... | payment_history | PRIMARY            | ... | size         |        ... | ... | Number of pages in the index
... | payment_history | idx_user_id        | ... | n_diff_pfx01 |  1,360,401 | ... | id
... | payment_history | idx_user_id        | ... | n_diff_pfx02 | 17,432,752 | ... | user_id,id
... | payment_history | idx_user_id        | ... | n_leaf_pages |     66,084 | ... | Number of leaf pages in the index
... | payment_history | idx_user_id        | ... | size         |     75,904 | ... | Number of pages in the index
... | payment_history | idx_user_id_status | ... | n_diff_pfx01 |  1,566,677 | ... | user_id
... | payment_history | idx_user_id_status | ... | n_diff_pfx02 |  1,660,688 | ... | user_id,status
... | payment_history | idx_user_id_status | ... | n_diff_pfx03 | 15,970,551 | ... | user_id,status,id
... | payment_history | idx_user_id_status | ... | n_leaf_pages |     81,288 | ... | Number of leaf pages in the index
... | payment_history | idx_user_id_status | ... | size         |     93,540 | ... | Number of pages in the index

여기서 각 인덱스에 대한 stat_value를 보고 대략적으로 판단할 수 있는건, 위 쿼리에서 idx_user_id를 탐색하는 것이 보다 빠르다는 것을 기대할 수 있을 것 같습니다.

인덱스 통계 구성법

https://dev.mysql.com/doc/refman/5.6/en/innodb-performance-optimizer-statistics.html

1. innodb_stats_persistent=ON

innodb_stats_persistent은 MySQL 5.6.2에서 생겼으며, 5.6.6부터 기본값이 ON입니다.

innodb_stats_auto_recalc은 자동 계산 여부를 지정하는 옵션입니다. (기본값은 ON) 테이블의 10% 변경이 있을 때 재계산을 하는데, 아쉽게도 이 값은 하드코딩 되어있어 변경할 수 없습니다. (mysql/mysql-server)

2. innodb_stats_persistent=OFF

innodb_stats_on_metadata을 ON하면 다음 쿼리시 재계산을 할 수 있습니다. (기본값은 OFF)

SHOW TABLE STATUS
SHOW INDEX
INFORMATION_SCHEMA.TABLES
INFORMATION_SCHEMA.STATISTICS

3. ANALYZE TABLE `table_name`

ANALYZE TABLE table_name

인덱스 통계 테이블을 재계산합니다.
하지만 역시 막쓰는게 아닌 것이... 너무 느리기도 하고 테이블 read lock이 걸리기 때문에 조심해서 사용해야 합니다

During the analysis, the table is locked with a read lock for InnoDB and MyISAM.
또한 재계산한다고 좋아진다는 보장이 없습니다. 결국 랜덤으로 샘플링을 하기 때문에 좋아질 수도, 나빠질 수도 있습니다.

그렇다면 어떻게 해야할까?

테이블 별로 innodb_stats_persistent_sample_pages를 조절하자
적절히 ANALYZE TABLE를 사용하자

테이블 생성​

explain​

innodb_index_stats​

인덱스 통계 구성법​

1. innodb_stats_persistent=ON​

2. innodb_stats_persistent=OFF​

3. ANALYZE TABLE table_name​

그렇다면 어떻게 해야할까?​

참고​