Optimization of MySQL

Optimization of MySQL is generally focused on query operations. First, we need to know the operations that SQL queries need to go through.

iShot_2023-02-28_08.58.53

1. Connection Configuration Optimization#

1.1 Server-Side Optimization#

Modify the default maximum number of connections on the server (usually 151 by default) and the default client timeout time (usually 8 hours by default). Of course, they do not need to be modified if not necessary. The following is the way to query their default values:

iShot_2023-02-28_09.11.29

iShot_2023-02-28_09.13.11

1.2 Client-Side Optimization#

For client optimization, we do not want clients to frequently create and destroy connections. In this case, we need to use a database connection pool to reuse connections and reduce the overhead of database resources. The most popular database connection pools currently are Druid and HikariCP. ~~DBCP and C3P0 have...~~

Of course, we need to pay attention that the size of the connection pool is not the bigger the better. Because each connection requires a thread to maintain, when the number of threads exceeds the number of CPUs, frequent thread context switching is required, which leads to increased performance overhead.

2. Architecture Optimization#

2.1 Use Caching#

Add caching between queries and the database to store data that does not change frequently.

2.2 Read-Write Separation#

This method is mainly used to reduce the pressure on servers. Use a server cluster, with one server as the master node and the other servers as slave nodes. When the data on the master server changes, the other servers also need to be synchronized. The master node is only responsible for insert, update, and delete operations, while the slave nodes are responsible for queries. This method effectively reduces the pressure on servers.

2.3 Sharding#

2.3.1 Vertical Sharding#

iShot_2023-02-28_14.08.52

Based on a monolithic database, vertically divide it according to business logic and distribute different tables to different databases, which are located on different servers. This is called vertical sharding.

iShot_2023-02-28_14.12.03

2.3.2 Vertical Partitioning#

iShot_2023-02-28_14.03.01

Vertical partitioning means dividing a single table into multiple smaller tables based on the original table. This operation needs to be judged based on specific business logic. Usually, frequently used fields (hot fields) are divided into one table, while less frequently used or not immediately used fields (cold fields) are divided into another table. This not only solves the problem of large data volume in a single table but also improves query speed.

iShot_2023-02-28_14.05.19

Taking the above diagram as an example: Usually, the details of a product are relatively long, and when viewing the product list, it is often not necessary to immediately display the product details (usually displayed when clicking the details button), but to display more important information about the product (such as price). According to this business logic, we have vertically partitioned the original product table.

2.3.3 Horizontal Sharding#

After vertical sharding, the database performance problem is solved to a certain extent. However, with the growth of business volume, the data stored in the monolithic database has exceeded the estimate, and a single server can no longer support it. However, from a business perspective, vertical sharding is no longer possible at this time. In this case, you can try horizontal sharding based on certain rules, such as putting product information with odd IDs and even IDs in two different databases.

iShot_2023-02-28_14.14.57

iShot_2023-02-28_14.15.44

2.3.4 Horizontal Partitioning#

Save the data of a single table on multiple data tables according to certain rules (known as sharding rules). This is horizontal partitioning.

iShot_2023-02-28_14.19.27

iShot_2023-02-28_14.20.14

2.3.5 Summary#

Horizontal partitioning is mainly used to solve storage bottlenecks, while vertical partitioning is mainly used to reduce concurrency pressure.

2.4 Peak Shaving with Message Queues#

Usually, user requests directly access the database. If there are a large number of online users at the same time, it is likely to overwhelm the database (similar to the status of Weibo when a celebrity has an affair or announces a relationship).

In this case, you can reduce the pressure on the database by using a message queue. Regardless of the number of simultaneous user requests, store them in the message queue first, and then systematically consume the requests from the message queue.

3. Optimizer - SQL Analysis and Optimization#

At this point, we have entered the territory of the parser and optimizer. Generally speaking, if your SQL syntax is correct, there will be no problems with the parser. In addition, in order to prevent your poorly written SQL from running inefficiently, the optimizer will automatically optimize it. However, if it is really bad, the optimizer cannot help you, and you can only watch your SQL query become a slow query.

3.1 Slow Queries#

As the name suggests, slow queries are queries that are very slow. You can use the following command to check the slow query status of MySQL:

iShot_2023-02-28_22.42.48

Use the following command to view the criteria for slow queries:

iShot_2023-02-28_22.44.30

3.2 View Running Threads#

Run show full processlist to view all running threads in MySQL.

iShot_2023-02-28_23.03.44

Among them,

Id: The unique identifier of the thread, which can be used to kill a specific thread;
User: The user who started the thread. Ordinary accounts can only view their own threads;
Host: The IP and port from which the connection was initiated;
db: The database the thread is operating on;
Command: The command of the thread;
Time: The duration of the operation, in seconds;
State: The state of the thread;
Info: The first 100 characters of the SQL statement.

3.3 View Server Running Status#

Use SHOW STATUS to view the running status of the MySQL server, which has two scopes: session and global.

iShot_2023-02-28_23.12.45

The above code is used to view the number of selects.

3.4 View Storage Engine Running Information#

SHOW ENGINE INNODB STATUS;

3.5 `EXPLAIN` Execution Plan#

The so-called plan refers to whether the optimizer in our MySQL will optimize the SQL statement we wrote (such as changing outer join to inner join, optimizing subqueries to join queries, etc.). The optimizer estimates the cost of executing this SQL statement and decides which index to use (or ultimately chooses not to use an index and perform a full table scan). The optimizer's strategy for executing a single table is also determined based on this SQL statement, and so on.

EXPLAIN execution plan used to be commonly used in query statements, but now it can also be used in insert, update, and delete statements.

3.6 SQL and Index Optimization#

3.6.1 SQL Optimization#

Use small tables to drive large tables; rewrite subqueries with join or change them to union;
In join queries, try to reduce the fan-out (number of records) of the driving table, and the cost of accessing the driven table should be as low as possible. It is recommended to create an index on the join column of the driven table to reduce the access cost. The join column of the driven table is best the primary key or a unique secondary index column of the table, so that the cost of the driven table will be even lower;
For large offset limits in limit, filter first and then sort.

iShot_2023-03-01_23.06.18

3.6.2 Index Optimization#

Avoid returning to the table if possible. If it is necessary to return to the table, reduce the number of returns to the table (returning to the table has a cost and may require loading data pages from external storage);
Use index covering.

4. Storage Structure and Table Structure#

4.1 Selecting Storage Engines#

It is recommended to choose different storage engines based on different businesses. For example:

For business tables with frequent query and insert operations, it is recommended to use MyISAM;
Use Memory for temporary tables;
Use InnoDB for businesses with a large number of concurrent users and frequent updates;
Use the default if you are not sure.

4.2 Field Optimization#

Principle: Use the smallest data type that can correctly store the data.

4.2.1 Integer Types#

MySQL provides 6 integer types, which are:

tinyint
smallint
mediumint
int
integer
bigint

4.2.2 Character Types#

If the field length is uncertain, use varchar without hesitation, because varchar will have additional space to record the length of the field. However, if the field length is fixed, try to use char, which will save a lot of memory space.

4.2.3 Not Null#

Set non-null fields as NOT NULL and provide default values, or use special values instead of NULL. Storing and optimizing NULL types will have performance issues.

4.2.4 Avoid Using Foreign Keys, Triggers, and Views (Especially for Large Data)#

This is also mentioned in the "Alibaba Development Manual". There are three reasons:

It reduces readability. When checking the code, you also need to check the database code;
Hand over the calculation work to the program (business layer) instead of the database. The database should only focus on storage and do it well;
In general, when dealing with large data, in order to solve the storage bottleneck, horizontal partitioning is used;
Triggers can be understood as capturing data changes in real-time and performing corresponding operations. The original intention is very good, but the disadvantages outweigh the advantages. Insert, update, and delete operations will frequently trigger triggers, resulting in a large consumption of resources;
The verification of data integrity should be completed by the developer, not relying on foreign keys. Once foreign keys are used, you will find it difficult to delete garbage data during testing (foreign key constraints will cause the table structure to be very messy and may even cause circular constraints).

4.2.5 Storage of Images, Audio, and Video#

Store their addresses only.

4.2.6 Large Field Splitting and Data Redundancy#

Large field splitting is actually vertical partitioning mentioned earlier, which separates infrequently used fields or fields with large data volumes to avoid having too many columns and too much data. This is especially important when you habitually write SELECT *, as the problems caused by the large number of columns and data volume will be greatly magnified!

Field redundancy is not in line with the database design normalization principle, but it is very beneficial for fast retrieval. For example, in the contract table, in addition to storing the customer ID, you can also redundantly store the customer name, so that you do not need to retrieve the customer name based on the customer ID. Therefore, appropriate redundancy based on business logic is also a good optimization technique. (Usually used in scenarios with more queries and fewer modifications)

5. Business Optimization#

Strictly speaking, business optimization is not considered as MySQL tuning, but business optimization can effectively reduce the pressure on database access. A typical example is Taobao. Here are a few simple examples to provide some ideas:

Pre-sale diversion: In the past, it was the mode of buying everything on the evening of Double 11. In recent years, the pre-sale period of Double 11 has become longer and longer, starting more than half a month in advance. Various deposit and red envelope modes have emerged. This method is called pre-sale diversion. This can divert customer service requests, so there is no need to wait until the early morning of Double 11 to place orders together;
Degradation strategy: In the early morning of Double 11, you may want to query orders beyond that day, but the query fails; even the chicken feed in Alipay is delayed, which is a degradation strategy. Gather the computing resources of less important services to ensure the most critical business;
During Double 11, Alipay strongly recommends using Huabei for payment instead of bank cards. Although part of the consideration is to increase software stickiness, on the other hand, using Yu'ebao actually uses Alibaba's internal servers, which have faster access speed. Using a bank card requires calling the bank's interface, which is slower in comparison.