Tải bản đầy đủ (.pdf) (389 trang)

Silvia botros, jeremy tinley high performance MySQL proven strategies

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (7.23 MB, 389 trang )

th
ur ion
Fo dit
E

High Performance

MySQL

Proven Strategies for Operating at Scale

Silvia Botros &
Jeremy Tinley

Foreword by
Jeremy Cole



Praise for High Performance MySQL, 4th Edition

I love that this new edition shifts the book’s emphasis to a modern, pragmatic mindset of
team players delivering business value. It moves past the previous editions’ myopic focus
on gnarly internals and theory to wring out diminishing returns, toward a more holistic
perspective. The ‘how databases work’ material is still covered thoroughly, but now with a
fresh, humanistic take that is greatly needed.
—Baron Schwartz, Lead author of High Performance MySQL,
2nd and 3rd editions
High Performance MySQL has been a staple of the MySQL world since the first edition 17
years ago. MySQL moves ever onward, and Silvia and Jeremy have done an excellent job
bringing this essential work up to date with modern MySQL.


—Jeremy Cole
Updated to meet modern practices, this latest edition is rich with solid advice for
MySQL administrators and developers.
—Shlomi Noach, Database Engineer, PlanetScale


High Performance MySQL has a new focus. It’s no longer about squeezing every
ounce of power from MySQL. We now have a large ecosystem of tools and providers.
Silvia and Jeremy beautifully cover how MySQL fits into the new picture. This book is
a must-have if you run MySQL in any form.
—Sugu Sougoumarane, CTO of PlanetScale, cocreator of Vitess
Silvia and Jeremy did a fantastic job keeping the original spirit of the book strong while
updating it to cover the rapidly changing MySQL space.
—Peter Zaitsev, Founder and CEO of Percona and coauthor of
High Performance MySQL, 3rd edition


FOURTH EDITION

High Performance MySQL

Proven Strategies for Operating at Scale

Silvia Botros and Jeremy Tinley
Foreword by Jeremy Cole

Beijing

Boston Farnham Sebastopol


Tokyo


High Performance MySQL
by Silvia Botros and Jeremy Tinley
Copyright © 2022 Silvia Botros and Jeremy Tinley. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (). For more information, contact our corporate/institutional
sales department: 800-998-9938 or

Editors: Virginia Wilson and Andy Kwan
Production Editor: Elizabeth Faerm
Copyeditor: Shannon Turlington
Proofreader: Kim Cofer
November 2021:

Indexer: Judith McConville
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Kate Dullea

Fourth Edition

Revision History for the Fourth Edition
2021-11-17:

First Release


See for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. High Performance MySQL, the cover
image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the author(s) and do not represent the publisher’s views.
While the publisher and the author(s) have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author(s) disclaim all responsibil‐
ity for errors or omissions, including without limitation responsibility for damages resulting from the use
of or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.

978-1-492-08051-0
[LSI]


Table of Contents

Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1. MySQL Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
MySQL’s Logical Architecture
Connection Management and Security
Optimization and Execution
Concurrency Control
Read/Write Locks
Lock Granularity
Transactions
Isolation Levels
Deadlocks

Transaction Logging
Transactions in MySQL
Multiversion Concurrency Control
Replication
Datafiles Structure
The InnoDB Engine
JSON Document Support
Data Dictionary Changes
Atomic DDL
Summary

1
2
3
3
4
4
6
8
9
10
11
13
15
16
16
17
17
18
18


2. Monitoring in a Reliability Engineering World. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
The Impact of Reliability Engineering on DBA Teams
Defining Service Level Goals

20
20
v


What Does It Take to Make Customers Happy?
What to Measure
Defining SLIs and SLOs
Monitoring Solutions
Monitoring Availability
Monitoring Query Latency
Monitoring for Errors
Proactive Monitoring
Measuring Long-Term Performance
Learning Your Business Cadence
Tracking Your Metrics Effectively
Using Monitoring Tools to Inspect the Performance
Using SLOs to Guide Your Overall Architecture
Summary

22
23
23
24
25

27
27
29
36
36
37
37
38
39

3. Performance Schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Introduction to Performance Schema
Instrument Elements
Consumer Organization
Resource Consumption
Limitations
sys Schema
Understanding Threads
Configuration
Enabling and Disabling Performance Schema
Enabling and Disabling Instruments
Enabling and Disabling Consumers
Tuning Monitoring for Specific Objects
Tuning Threads Monitoring
Adjusting Memory Size for Performance Schema
Defaults
Using Performance Schema
Examining SQL Statements
Examining Read Versus Write Performance
Examining Metadata Locks

Examining Memory Usage
Examining Variables
Examining Most Frequent Errors
Examining Performance Schema Itself
Summary

vi

|

Table of Contents

41
42
44
45
46
46
46
48
48
48
50
51
51
52
53
53
53
62

63
64
66
70
71
73


4. Operating System and Hardware Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
What Limits MySQL’s Performance?
How to Select CPUs for MySQL
Balancing Memory and Disk Resources
Caching, Reads, and Writes
What’s Your Working Set?
Solid-State Storage
An Overview of Flash Memory
Garbage Collection
RAID Performance Optimization
RAID Failure, Recovery, and Monitoring
RAID Configuration and Caching
Network Configuration
Choosing a Filesystem
Choosing a Disk Queue Scheduler
Memory and Swapping
Operating System Status
Other Helpful Tools
Summary

75
76

76
76
77
78
78
79
79
81
83
86
87
89
90
92
95
96

5. Optimizing Server Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
How MySQL’s Configuration Works
Syntax, Scope, and Dynamism
Persisted System Variables
Side Effects of Setting Variables
Planning Your Variable Changes
What Not to Do
Creating a MySQL Configuration File
Minimal Configuration
Inspecting MySQL Server Status Variables
Configuring Memory Usage
Per-Connection Memory Needs
Reserving Memory for the Operating System

The InnoDB Buffer Pool
The Thread Cache
Configuring MySQL’s I/O Behavior
The InnoDB Transaction Log
Log Buffer
The InnoDB Tablespace
Other I/O Configuration Options
Configuring MySQL Concurrency
Safety Settings

100
101
103
103
104
105
106
107
108
109
109
109
110
111
112
113
113
115
118
119

120

Table of Contents

|

vii


Advanced InnoDB Settings
Summary

122
124

6. Schema Design and Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Choosing Optimal Data Types
Whole Numbers
Real Numbers
String Types
Date and Time Types
Bit-Packed Data Types
JSON Data
Choosing Identifiers
Special Types of Data
Schema Design Gotchas in MySQL
Too Many Columns
Too Many Joins
The All-Powerful ENUM
The ENUM in Disguise

NULL Not Invented Here
Schema Management
Schema Management as Part of the Data Store Platform
Summary

126
127
127
128
135
136
139
142
144
145
145
145
145
146
146
146
147
154

7. Indexing for High Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Indexing Basics
Types of Indexes
Benefits of Indexes
Indexing Strategies for High Performance
Prefix Indexes and Index Selectivity

Multicolumn Indexes
Choosing a Good Column Order
Clustered Indexes
Covering Indexes
Using Index Scans for Sorts
Redundant and Duplicate Indexes
Unused Indexes
Index and Table Maintenance
Finding and Repairing Table Corruption
Updating Index Statistics
Reducing Index and Data Fragmentation
Summary

viii

|

Table of Contents

156
156
161
162
162
165
167
170
178
180
182

185
186
186
187
188
189


8. Query Performance Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Why Are Queries Slow?
Slow Query Basics: Optimize Data Access
Are You Asking the Database for Data You Don’t Need?
Is MySQL Examining Too Much Data?
Ways to Restructure Queries
Complex Queries Versus Many Queries
Chopping Up a Query
Join Decomposition
Query Execution Basics
The MySQL Client/Server Protocol
Query States
The Query Optimization Process
The Query Execution Engine
Returning Results to the Client
Limitations of the MySQL Query Optimizer
UNION Limitations
Equality Propagation
Parallel Execution
SELECT and UPDATE on the Same Table
Optimizing Specific Types of Queries
Optimizing COUNT() Queries

Optimizing JOIN Queries
Optimizing GROUP BY with ROLLUP
Optimizing LIMIT and OFFSET
Optimizing SQL_CALC_FOUND_ROWS
Optimizing UNION
Summary

191
192
192
194
198
198
199
200
201
202
204
205
217
218
219
219
220
220
220
221
221
223
223

223
225
225
226

9. Replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Replication Overview
How Replication Works
Replication Under the Hood
Choosing Replication Format
Global Transaction Identifiers
Making Replication Crash Safe
Delayed Replication
Multithreaded Replication
Semisynchronous Replication
Replication Filters
Replication Failover
Planned Promotions

227
229
230
230
231
232
233
234
237
237
239

239

Table of Contents

|

ix


Unplanned Promotions
Trade-Offs of Promotion
Replication Topologies
Active/Passive
Active/Read Pool
Discouraged Topologies
Replication Administration and Maintenance
Monitoring Replication
Measuring Replication Lag
Determining Whether Replicas Are Consistent with the Source
Replication Problems and Solutions
Binary Logs Corrupted on the Source
Nonunique Server IDs
Undefined Server IDs
Missing Temporary Tables
Not Replicating All Updates
Excessive Replication Lag
Oversized Packets from the Source
No Disk Space
Replication Limitations
Summary


240
240
241
241
242
244
247
247
248
249
251
251
251
252
252
252
252
254
254
254
255

10. Backup and Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Why Backups?
Defining Recovery Requirements
Designing a MySQL Backup Solution
Online or Offline Backups?
Logical or Raw Backups?
What to Back Up

Incremental and Differential Backups
Replication
Managing and Backing Up Binary Logs
Backup and Recovery Tools
MySQL Enterprise Backup
Percona XtraBackup
mydumper
mysqldump
Backing Up Data
Logical SQL Backups
Filesystem Snapshots
Percona XtraBackup
Recovering from a Backup

x

|

Table of Contents

258
259
260
261
263
265
266
268
269
269

269
270
270
270
270
270
272
278
281


Restoring Logical Backups
Restoring Raw Files from Snapshot
Restoring with Percona XtraBackup
Starting MySQL After Restoring Raw Files
Summary

282
283
284
285
286

11. Scaling MySQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
What Is Scaling?
Read- Versus Write-Bound Workloads
Understanding Your Workload
Read-Bound Workloads
Write-Bound Workloads
Functional Sharding

Scaling Reads with Read Pools
Managing Configuration for Read Pools
Health Checks for Read Pools
Choosing a Load-Balancing Algorithm
Queuing
Scaling Writes with Sharding
Choosing a Partitioning Scheme
Multiple Partitioning Keys
Querying Across Shards
Vitess
ProxySQL
Summary

287
289
289
290
291
291
292
294
295
297
298
299
300
302
302
303
306

311

12. MySQL in the Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Managed MySQL
Amazon Aurora for MySQL
GCP Cloud SQL
MySQL on Virtual Machines
Machine Types in Cloud
Choosing the Right Machine Type
Choosing the Right Disk Type
Additional Tips
Summary

313
314
317
318
318
319
320
322
324

13. Compliance with MySQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
What Is Compliance?
Service Organization Controls Type 2
Sarbanes–Oxley Act
Payment Card Industry Data Security Standard

326

326
326
327

Table of Contents

|

xi


Health Insurance Portability and Accountability Act
Federal Risk and Authorization Management Program
General Data Protection Regulation
Schrems II
Building for Compliance Controls
Secrets Management
Separation of Roles and Data
Tracking Changes
Backup and Restore Procedures
Summary

327
327
327
328
328
329
332
333

338
341

A. Upgrading MySQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
B. MySQL on Kubernetes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

xii

|

Table of Contents


Foreword

A fresh copy of High Performance MySQL has been the first book to get plopped
down on the desk of every newly-hired DBA, systems engineer, or database-minded
developer since it came out nearly two decades ago.
When Jeremy Zawodny and Derek Balling set out to write a book about running
MySQL at scale, to bring clarity and structure to years of mystery, it was destined to
become an instant classic in the MySQL world. Through the years and several
updates, some of the content of the original and the subsequent updates has held up,
and some not so much.
MySQL itself has advanced, the MySQL community has changed a lot, and the ways
in which we use MySQL have changed. Now in the 4th edition, Silvia and Jeremy
undertake a thankless and gargantuan task to update this classic for the modern era—
and they are just the pair for the task.
In my time (now more than 20 years!) in the MySQL community, the one consistent
thing has been, well, inconsistency. Everyone uses MySQL (and databases in general)

in slightly different ways, and they each have different expectations of it. Everyone
makes some good decisions, some well-intentioned but questionable decisions, and,
always, their share of bad ones. Sometimes, progress is easy, but sometimes it takes
sage advice and a new way of thinking about the problem learned straight from an
expert.
Silvia and Jeremy are just such experts. Everything from MySQL architecture, optimi‐
zation, replication, backups, and more, stood to benefit from them sharing their
extensive experience in the trenches with MySQL. In this new 4th edition, many sub‐
jects got new treatment, a lot of outdated material was removed, errors were correc‐
ted, and a new and fresh style was brought to the material.
Like the original (now vintage, and quaintly small) 1st edition, the 4th edition prom‐
ises to help carry the newest generation of developers, DBAs, and their bosses into

xiii


the new world of MySQL; sometimes with excitement, but perhaps sometimes kick‐
ing and screaming.
Thanks, Silvia and Jeremy, for your hard work to nurture the next generation of
MySQL geeks who will be keeping the world’s data safe and the world’s top websites
and other data-driven systems performing at their peak.
Congratulations on getting this done through COVID and everything else. The rest
of us will make sure to get all the new DBAs a copy.
— Jeremy Cole
near Reno, Nevada
October 2021

xiv

|


Foreword


Preface

The official documentation maintained by Oracle gives you the knowledge necessary
to install, configure, and interact with MySQL. This book serves as a companion to
that documentation, helping you understand how best to leverage MySQL as a pow‐
erful data platform for your use case.
This edition also expands on the growing role of compliance and security as parts of
operating a database footprint. New realities such as privacy laws and data sover‐
eignty have changed how companies build their products, and that naturally introdu‐
ces new complexities in how the technical architecture evolves.

Who This Book Is For
This book is first and foremost for engineers looking to grow their expertise in run‐
ning MySQL. This edition assumes its audience is familiar with the basic principles of
why you want to use a relational database management system (RDBMS). We also
assume some experience with general system administration, networking, and oper‐
ating systems.
We will offer you proven strategies for running MySQL at scale with a modern archi‐
tecture and more up-to-date tooling and practices.
Ultimately, we hope that the knowledge you gain from this book of MySQL’s internals
and scaling strategies will help you in scaling the data storage layer at your organiza‐
tion. And we hope that your newfound insight will help you to learn and practice a
methodical approach to designing, maintaining, and troubleshooting an architecture
that is built on MySQL.

xv



What Is Different in This Edition
High Performance MySQL has been a part of the database engineering community for
years, with past editions released in 2004, 2008, and 2012. In these previous editions,
the goal was always to teach developers and administrators how to optimize MySQL
for every drop of performance by focusing on deep internal design, explaining what
various tuning settings mean, and arming the user with the knowledge to be effective
in changing these settings. This edition maintains the same goal but with a different
focus.
Since the third edition, the MySQL ecosystem has seen a lot of changes. Three new
major versions have been released. The tooling landscape expanded significantly
beyond Perl and Bash scripts and into full-fledged tooling solutions. Entirely new
open source projects have been built that change how organizations manage scaling
MySQL.
Even the traditional database administrator (DBA) role has evolved. There’s an old
joke in the industry that says that DBA stands for “Don’t Bother Asking.” DBAs had a
reputation for being speed bumps in the software development life cycle (SDLC), not
explicitly because of any curmudgeonly attitude, but simply because databases weren’t
evolving as fast as the rest of the SDLC around them.
With books like Database Reliability Engineering: Designing and Operating Resilient
Database Systems by Laine Campbell and Charity Majors (O’Reilly), it has become the
new reality that technical organizations look to database engineers more as enablers
of business growth and less as the sole operators of all databases. Where once a DBA’s
primary day-to-day involved schema design and query optimization, they now are
responsible for teaching those skills to developers and managing systems that allow
developers to deploy their own schema changes quickly and safely.
With these changes, the focus should no longer be on optimizing MySQL to get a few
percentage points faster. We think that High Performance MySQL is now about giving
people the information they need to make educated decisions about how to best use

MySQL. This begins by understanding how MySQL is designed, which gives way to
understanding what MySQL is and is not good at.1 Modern releases of MySQL offer
reasonably sane defaults, and there’s very little tuning you need to do unless you’re
experiencing a very specific scaling problem. Modern teams are now dealing with
schema changes, compliance issues, and sharding. We want High Performance MySQL
to be a comprehensive guide to how modern companies run MySQL at scale.

1 Famously, people often used MySQL as a queue and then learned the hard way why it was bad. The most cited

reasons were the overhead of polling for new queue actions, the management of locking records for process‐
ing, and the unwieldy size of queue tables as data grows over time.

xvi

|

Preface


Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width

Used for program listings, as well as within paragraphs to refer to program ele‐
ments such as variable or function names, databases, data types, environment
variables, statements, and keywords.
Constant width bold


Shows commands or other text that should be typed literally by the user.
Constant width italic

Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.
This icon signifies a tip or suggestion.

This icon signifies a general note.

This icon indicates a warning or caution.

Preface

|

xvii


O’Reilly Online Learning
For more than 40 years, O’Reilly Media has provided technology
and business training, knowledge, and insight to help companies
succeed.

Our unique network of experts and innovators share their knowledge and expertise
through books, articles, and our online learning platform. O’Reilly’s online learning
platform gives you on-demand access to live training courses, in-depth learning
paths, interactive coding environments, and a vast collection of text and video from
O’Reilly and 200+ other publishers. For more information, visit .

How to Contact Us

Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at />Email to comment or ask technical questions about this
book.
For news and information about our books and courses, visit .
Find us on Facebook: />Follow us on Twitter: />Watch us on YouTube: />
xviii

| Preface


Acknowledgments for the Fourth Edition
From Silvia
First and foremost, I’d like to thank my family. My parents, who sacrificed stable jobs
and lives in Egypt to bring me and my brother to the United States. My husband,
Armea, for supporting me through this and all the past years of my career as I took
on one challenge after the next, culminating in this accomplishment.
I started off in tech as an immigrant who left her college years in the Middle East to
achieve her dream of moving to the United States. After earning my degree in a state
university in California, I took a job in New York City, and I remember the second
edition of this book being the very first tech book I bought with my own money that
was not a college book. I owe the authors of the previous editions teaching me a lot of
fundamental lessons that prepared me to manage databases during my career.
I am grateful for the support of so many people I have worked with in my career.

Their encouragement has gotten me to write this edition of this book that taught me
so much earlier in my career. I’d like to thank Tim Jenkins, the former CTO of Send‐
Grid, for hiring me for the job of a lifetime even though I told him in my interview
that he was using MySQL replication the wrong way, and for trusting me with what
turned out to be a rocket ship.
I’d like to thank all the amazing women in tech who have been my support network
and cheerleaders. Special thanks to Camille Fournier and Dr. Nicole Forsgren for
writing the two books that have influenced the past few years of my career and
changed my view on my day-to-day work.
Thank you to my team at Twilio. To Sean Kilgore for making me a much better engi‐
neer who cares about a lot more than just the databases. To John Martin for being the
most optimistic human I ever worked with. Thanks to Laine Campbell and her Palo‐
minoDB team (later acquired by Pythian) who helped support me and taught me so
much during the toughest years, and to Baron Schwartz for encouraging me to write
about my experiences.
Finally, thanks to Virginia Wilson for being an excellent editor, for helping turn my
stream of ideas into sentences that make sense and for helping me through this pro‐
cess with so much support and grace.

From Jeremy
When Silvia approached me to help with this book, it was in the middle of an extra‐
ordinarily stressful period of most people’s lives—the global pandemic, which started
in 2020. I was unsure that I wanted to add any more stress to my life. My wife, Selena,
told me that I would regret it if I didn’t accept, and I know better than to argue with

Preface

|

xix



her. She has always supported me and encouraged me to be the best human being I
can be. I will forever love her for all that she does for me.
To my family, coworkers, and community friends: I would have never gotten to this
point without you. You all taught me how to be who I am today. My career is the sum
of my experiences with you all. You taught me how to accept criticism, how to lead by
example, how to fail and recover, and most importantly, that the sum is better than
the individual.
Lastly, I want to thank Silvia, who trusted me to bring a shared understanding but
different perspective to this book. I hope I met your expectations.

A Thank You to Tech Reviewers
The authors also want to recognize the tech reviewers who helped get this book to
where it is today: Aisha Imran, Andrew Regner, Baron Schwartz, Daniel Nichter,
Hayley Anderson, Ivan Mora Perez, Jam Leoni, Jaryd Remillard, Jennifer Davis,
Jeremy Cole, Keith Wells, Kris Hamoud, Nick Vyzas, Shubheksha Jalan, Tom
Krouper, and Will Gunty. Thank you all for your time and effort.

xx

|

Preface


CHAPTER 1

MySQL Architecture


MySQL’s architectural characteristics make it useful for a wide range of purposes.
Although it is not perfect, it is flexible enough to work well in both small and large
environments. These range from a personal website up to large-scale enterprise appli‐
cations. To get the most from MySQL, you need to understand its design so that you
can work with it, not against it.
This chapter provides a high-level overview of the MySQL server architecture, the
major differences between the storage engines, and why those differences are impor‐
tant. We’ve tried to explain MySQL by simplifying the details and showing examples.
This discussion will be useful for those new to database servers as well as readers who
are experts with other database servers.

MySQL’s Logical Architecture
A good mental picture of how MySQL’s components work together will help you
understand the server. Figure 1-1 shows a logical view of MySQL’s architecture.
The topmost layer, clients, contains the services that aren’t unique to MySQL. They’re
services most network-based client/server tools or servers need: connection handling,
authentication, security, and so forth.
The second layer is where things get interesting. Much of MySQL’s brains are here,
including the code for query parsing, analysis, optimization, and all the built-in func‐
tions (e.g., dates, times, math, and encryption). Any functionality provided across
storage engines lives at this level: stored procedures, triggers, and views, for example.
The third layer contains the storage engines. They are responsible for storing and
retrieving all data stored “in” MySQL. Like the various filesystems available for GNU/
Linux, each storage engine has its own benefits and drawbacks. The server communi‐

1


cates with them through the storage engine API. This API hides differences between
storage engines and makes them largely transparent at the query layer. It also con‐

tains a couple of dozen low-level functions that perform operations such as “begin a
transaction” or “fetch the row that has this primary key.” The storage engines don’t
parse SQL1 or communicate with one another; they simply respond to requests from
the server.

Figure 1-1. A logical view of the MySQL server architecture

Connection Management and Security
By default, each client connection gets its own thread within the server process. The
connection’s queries execute within that single thread, which in turn resides on one
core or CPU. The server maintains a cache of ready-to-use threads, so they don’t need
to be created and destroyed for each new connection.2
When clients (applications) connect to the MySQL server, the server needs to authen‐
ticate them. Authentication is based on username, originating host, and password. X.
509 certificates can also be used across a Transport Layer Security (TLS) connection.
Once a client has connected, the server verifies whether the client has privileges for
each query it issues (e.g., whether the client is allowed to issue a SELECT statement
that accesses the Country table in the world database).

1 One exception is InnoDB, which does parse foreign key definitions because the MySQL server doesn’t yet

implement them itself.

2 MySQL 5.5 and newer versions support an API that can accept thread-pooling plug-ins, though not com‐

monly used. The common practice for thread pooling is done at access layers, which we discuss in Chapter 5.

2

|


Chapter 1: MySQL Architecture


Optimization and Execution
MySQL parses queries to create an internal structure (the parse tree) and then applies
a variety of optimizations. These can include rewriting the query, determining the
order in which it will read tables, choosing which indexes to use, and so on. You can
pass hints to the optimizer through special keywords in the query, affecting its
decision-making process. You can also ask the server to explain various aspects of
optimization. This lets you know what decisions the server is making and gives you a
reference point for reworking queries, schemas, and settings to make everything run
as efficiently as possible. There is more detail on this in Chapter 8.
The optimizer does not really care what storage engine a particular table uses, but the
storage engine does affect how the server optimizes the query. The optimizer asks the
storage engine about some of its capabilities and the cost of certain operations as well
as for statistics on the table data. For instance, some storage engines support index
types that can be helpful to certain queries. You can read more about schema optimi‐
zation and indexing in Chapters 6 and 7.
In older versions, MySQL made use of an internal query cache to see if it could serve
the results from there. However, as concurrency increased, the query cache became a
notorious bottleneck. As of MySQL 5.7.20, the query cache was officially deprecated
as a MySQL feature, and in the 8.0 release, the query cache is fully removed. Even
though the query cache is no longer a core part of the MySQL server, caching fre‐
quently served result sets is a good practice. While outside the scope of this book, a
popular design pattern is to cache data in memcached or Redis.

Concurrency Control
Any time more than one query needs to change data at the same time, the problem of
concurrency control arises. For our purposes in this chapter, MySQL has to do this at

two levels: the server level and the storage-engine level. We will give you a simplified
overview of how MySQL deals with concurrent readers and writers, so you have the
context you need for the rest of this chapter.
To illustrate how MySQL handles concurrent work on the same set of data, we will
use a traditional spreadsheet file as an example. A spreadsheet consists of rows and
columns, much like a database table. Assume the file is on your laptop and only you
have access to it. There are no potential conflicts; only you can make changes to the
file. Now, imagine you need to collaborate with a coworker on that spreadsheet. It is
now on a shared server that both of you have access to. What happens when both of
you need to make changes to this file at the same time? What if we have an entire
team of people actively trying to edit, add, and remove cells from this spreadsheet?
We can say that they should take turns making changes, but that is not efficient. We
need an approach for allowing concurrent access to a high-volume spreadsheet.

Concurrency Control

|

3


×