4 – Managing data growth
95
Common causes of space issues
The following issues are among the most-common of DB space-related sorrow:
• Poorly configured Model database – meaning that subsequent
databases adopt properties (AutoGrowth, Recovery Model and so on)
that are inappropriate for their intended use.
• Inefficient Delete, Insert or Bulk Insert statements – such processes,
plus those that create temp tables, can very quickly fill the log file with
unnecessary data. The situation is exacerbated by incorrect Model
database configuration.
• Indexes and large row counts – clustered indexes can take up a lot of
space for tables that contain millions of rows of data. However, you
simply need to plan for this because the consequences of not having
these indexes can severely impact performance.
• Blatant misuse of TempDB – Temporary tables often play an
important role when developers are tasked with comparing millions of
rows of data, to return a small subset of results. This practice can have
unwanted consequences, such as inadvertently filling the TempDB
database. It is our job, as DBAs, to make sure this does not happen, often
by performing a code review and offering an alternate solution.
Over the coming sections, I am going to delve into each of these issues, and
discuss the techniques I have used to analyze and fix each one, where possible. I
say "where possible" because sometimes data growth really does exceed all
expectation and confound even the most rigorous capacity planning. The only
course of action, in such cases, is to expand disks or add additional SAN space,
things only peripherally known to many DBAs.
I want to stress that this chapter is not going to shine a light on SQL Server
internals. I will not be taking you on a journey to the heart of the database engine
to explore the esoteric concepts of leaf level storage. Every DBA needs to
understand where and how objects, such as tables and indexes, use up space on
your servers, and be very familiar with core concepts such as pages, extents, fill
factors, as well as internal and external fragmentation. However, I will leave those
details to Books Online. Here, I intend to drive the All Terrain Vehicle of my
experience right to the source of the space allocation issues that wreak havoc on
the waking and sleeping life of the on-call DBA.
4 – Managing data growth
96
Being a model DBA
This chapter is about space utilization in SQL Server and there is no better place
to begin than with the Model database. The first thing I will say about the Model
database is that, if it were up to me, I would rename it. Out of the box, there is
nothing "model" about it; it is not a "model" citizen nor should it be considered a
"role model" for other databases. Nevertheless, it is the template upon which all
subsequent databases are based, including TempDB. In other words, new
databases created on the server, unless otherwise specified, will inherit the
configuration settings of the model database.
The full list of options for the Model database, including their default settings, can
be found at The
defaults for most of the options are fine for most databases. Most significantly,
however, the model database settings determine the following:
• Autogrowth properties for the data and log files
• Recovery model for the database
The default settings for each of these are definitely
not
appropriate for all
databases, and it's easy for new DBAs, or even us old haggard DBAs, to forget to
check these settings especially where we're working with a server configured by a
previous DBA.
Beware of default autogrowth and recovery
By default, the data file (modeldev) for the Model database, for both SQL Server
2005 and 2008 will be roughly 3MB in size initially, and is set to autogrow in 1 MB
(1024 K) increments, unrestricted, until the disk is full. The log file is set at an
initial size of 2MB and is set to grow in 10% increments, again until the disk is full.
These settings are shown in Figure 4.1.
NOTE
Microsoft SQL Server 2008 Books Online states: "The sizes of these files can
vary slightly for different editions of SQL Server." I am using Standard Edition
for the examples in this chapter.
In SQL Server storage terms, 1024K is 128 pages; pages are stored in 8K blocks.
For applications that are going to potentially load millions of records, growing the
data file of a database every 128 pages incurs a large performance hit, given that
one of the major bottlenecks of SQL Server is I/O requests.
4 – Managing data growth
97
Figure 4.1: Initial sizes and growth characteristics for the model database
data and log files.
Rather than accept these defaults, it is a much better practice to size the data file
appropriately at the outset, at say 2G. The same advice applies for the log file.
Generally, growth based on a percentage is fine until the file reaches a threshold
where the next growth will consume the entire disk. Let's say you had a 40G log
file on a 50G drive. It would only take two 10% growths to fill the disk, and then
the alerts go out and you must awake, bleary-eyed, to shrink log files and curse the
Model database.
Coupled with the previously-described file growth characteristics, our databases
will also inherit from the default model database a recovery model of Full.
Transactions in the log file for a Full recovery database are only ever removed
from the log upon a transaction log backup. This is wonderful for providing point
in time recovery for business critical applications that require Service Level
Agreements (SLAs), but it does mean that if you do not backup the transaction
log, you run the risk of eventually filling up your log drive.
If you have a database that is subject to hefty and /or regular (e.g. daily) bulk
insert operations, and you are forcing the data file to be incremented in size
regularly, by small amounts, then it's likely that the performance hit will be
significant. It is also likely that the size of your log file will increase rapidly, unless
you are performing regular transaction log backups.
4 – Managing data growth
98
To find out how significant an impact this can have, let's take a look at an
example. I'll create a database called
All_Books_Ever_Read, based on a default
model database, and then load several million rows of data into a table in that
database, while monitoring file growth and disk I/O activity, using Profiler and
PerfMon, respectively. Loading this amount of data may sound like an extreme
case, but it's actually "small fry" compared to many enterprise companies, that
accumulate, dispense and disperse Terabytes of data.
NOTE
I just happen to own a file, Books-List.txt, that allegedly contains a listing of all
books ever read by everyone on the planet Earth, which I'll use to fill the table.
Surprisingly the file is only 33 MB. People are just not reading much any more.
The first step is to create the All_Books_Ever_Read database. The initial sizes of
the data and log files, and their growth characteristics, will be inherited from the
Model database, as described in Figure 4.1. Once I've created the database, I can
verify the initial data (mdf) and log file (ldf) sizes are around 3 and 2 MB
respectively, as shown in Figure 4.2.
Figure 4.2: Data and log files sizes prior to data load.
The next step is to back up the database. It's important to realize that, until I have
performed a full database backup, the log file will not act like a typical log file in a
database set to Full recovery mode. In fact, when there is no full backup of the
database, it is not even possible to perform a transaction log backup at this point,
as demonstrated in Figure 4.3.
Figure 4.3: Can't backup log if no full database backup exists.
Until the first full backup of the database is performed, this database is acting as if
it is in Simple recovery mode and the transaction log will get regularly truncated at
4 – Managing data growth
99
checkpoints, so you will not see the full impact of the data load on the size of the
log file.
With the database backed up, I need to set up Profiler and PerfMon so that I can
monitor the data load. To monitor auto growth behavior using Profiler, simply
start it up, connect to the SQL Server 2008 instance that holds the
All_Books_Ever_Read database, and then set up a trace to monitor Data and
Log file Auto Grow events, as shown in Figure 4.4.
Figure 4.4: Setting SQL Server Profiler to capture data and log file growth.
All you have to do then is click "Run".
Next, I'll set up Perfmon (Administrative Tools | Performance) in order to
monitor disk I/O activity. Click on the "+" button in the toolbar of the graph;
Perfmon will connect to the local server by default. Select "Physical Disk" as the
performance object, as shown in Figure 4.5, and then select "% Disk Time" as the
counter and click" Add".
Next, change to the Physical Disk object and select the "Average Disk Queue
Length" and "Current Disk Queue Length" counters. These settings will capture
the amount of disk activity, to review after the data load.