CHAPTER 26 • MONITORING AND OPTIMIZING SQL SERVER 2000
970
12. Click Next.
13. Under Select Tables to Tune, click Select All Tables.
14. Click Next; the Wizard will now start tuning your indexes.
15. You will be asked to accept the index recommendations; click Next.
2627ch26.qxd 8/22/00 11:22 AM Page 970
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
971
16. If there were recommendations, you would be asked to schedule them for later or
run them now, but because there are no recommendations for this workload file,
you are taken directly to the final screen. Click Finish to complete the Wizard.
17. When you receive a message stating that the Wizard has completed, click OK.
18. Exit Profiler.
Tips and Techniques
If you want the best results from SQL Server’s monitoring tools, you need to know
and use the proper techniques. If you don’t, the end result will not be what you are
hoping for—or what you need.
Setting a Measurement Baseline
You will never know if your system is running slower than normal unless you know
what normal is, which is what a measurement baseline does: It shows you the resources
(memory, CPU, etc.) SQL Server consumes under normal circumstances. You create
the measurement baseline before putting your system into production so that you
have something to compare your readings to later on.
TIPS AND TECHNIQUES
Advanced Topics
PART
VI
2627ch26.qxd 8/22/00 11:22 AM Page 971
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 26 • MONITORING AND OPTIMIZING SQL SERVER 2000
972
The first thing you need to create an accurate measurement baseline is a test net-
work with just your SQL Server and one or two client machines. You limit the number
of machines involved because all networks have broadcast traffic, which is processed
by all the machines on the network. This broadcast traffic can throw your counts
off—sometimes a little, sometimes quite a bit. You may instead want to consider shut-
ting down as many machines as possible and generating your baseline off-hours if
your budget does not allow for a test network.
You can then start your baseline. The Windows NT counters mentioned at the out-
set of this chapter as well as the preset SQL Server counters should provide an accu-
rate baseline with which you can compare future readings. Then you can move to the
next technique.
Data Archiving and Trend Tracking
Although the consequences of throwing away your SQL Server monitoring records are
not quite as severe as facing an IRS auditor without records and receipts, you still
need to save, or archive, your records. One of the primary reasons to do so is to back
up requests for additional equipment. For example, if you ask for funds to buy more
memory for the SQL Server, but don’t bring any proof that the system needs the RAM,
you are probably not going to get the money. If you bring a few months’ worth of
reports, however, and say, “After tracking SQL Server for a time, we’ve found this…”
management may be far more willing to give you the money you need. Using
archived data in such fashion is known as trend tracking.
One of the most valuable functions of using your archived data for trend tracking
is proactive troubleshooting—that is, anticipating and avoiding problems before they
arise. Suppose you added 50 new users to your network about three months ago and
are about to do it again. If you archived your data from that period, you would be
able to recall what those 50 users did to the performance of the SQL Server, and you
could compensate for it. On the other hand, if you threw that data away, you might
be in for a nasty surprise when your system unexpectedly slows to a crawl.
Optimization Techniques
SQL Server can dynamically adjust most of its settings to compensate for problems. It
can adjust memory use, threads spawned, and a host of other settings. In some cases,
unfortunately, those dynamic adjustments may not be enough—you may need to
make some manual changes.
We’ll look at a few specific areas that may require your personal attention.
2627ch26.qxd 8/22/00 11:22 AM Page 972
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
973
Queries and Stored Procedures
The first thing to ask yourself when you are getting slow response times is whether
you could be using a stored procedure instead of a local query. Stored procedures are
different from local code in two ways: They are stored on the SQL Server, so they do
not need to be transmitted over the network, which causes congestion. In addition,
stored procedures are precompiled on the server; this saves system resources, because
local code must be compiled once it gets to the system.
Overall, stored procedures are the way to go, but if you need to use local queries,
you should consider how they are written, because poorly constructed queries can
wreak havoc on your system. If, for example, you have a query that is returning every
row of a table when only half of that is required, you should consider rewriting the
query. Improper use of WHERE clauses can also slow your queries down. Make sure
that your WHERE clauses reference indexed columns for optimal performance.
Tempdb
Is your tempdb big enough to handle the load that your queries put on it? Think of
tempdb as a scratchpad for SQL Server; when queries are performed, SQL Server uses
this scratchpad to make notes about the result set. If tempdb runs out of room to
make these notes, system response time can slow down. Tempdb should be between
25 and 40% of the size of your largest database (for example, if your largest database is
100MB, tempdb should be 25 to 40MB).
Query Governor
Right out of the box, SQL Server will run any query you tell it to, even if that query is
poorly written. You can change that by using the Query Governor. This is not a sepa-
rate tool, but is part of the database engine and is controlled by the Query Governor
Cost Limit. This setting tells SQL Server not to run queries longer than x (where x is a
value higher than zero). If, for example, the Query Governor Cost Limit is set to 2, any
query that is estimated to take longer than 2 seconds would not be allowed to run. SQL
Server can estimate the running time of a query because SQL Server keeps statistics
about the number and composition of records in tables and indexes. The Query Gover-
nor Cost Limit can be set by using the command sp_configure ‘query governor
cost limit’, ‘1’ (the 1 in this code can be higher). The Cost Limit can also be set on
the Server Settings tab of the Server Properties page in Enterprise Manager.
OPTIMIZATION TECHNIQUES
Advanced Topics
PART
VI
2627ch26.qxd 8/22/00 11:22 AM Page 973
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 26 • MONITORING AND OPTIMIZING SQL SERVER 2000
974
NOTE If the Query Governor Cost Limit is set to zero (the default), all queries will be
allowed to run.
Setting Trace Flags
A trace flag is used to temporarily alter a particular SQL Server behavior. Much like a
light switch can be used to turn off a light and then turn it back on again, a trace flag
can be used to turn off (or on) a behavior in SQL Server. Trace flags are enabled with
DBCC TRACEON and turned off with DBCC TRACEOFF. The command to enable
trace flag 1204 would look like this: DBCC TRACEON(1204). Table 26.3 lists some of
the trace flags available to you.
TABLE 26.3: USES OF TRACE FLAGS
Trace Flag Use
107 This instructs the server to interpret numbers with a decimal point as type
float instead of decimal.
260 This trace flag prints version information for extended stored procedure
Dynamic Link Libraries. If you write your own extended stored procedures,
this trace flag will prove useful in troubleshooting.
1204 This will tell you what type of locks are involved in a deadlock and what com-
mands are affected.
1205 This flag returns even more detailed information about the commands
affected by a deadlock.
1704 This will print information when temporary tables are created or dropped.
2528 This trace flag disables parallel checking of objects by the DBCC CHECKDB,
DBCC CHECKFILEGROUP, and DBCC CHECKTABLE commands. If you know
that the server load is going to increase while these commands are running,
you may want to turn these trace flags on so that SQL Server checks only a
single object at a time and therefore places less load on the server. Under
ordinary circumstances, though, you should let SQL Server decide on the
degree of parallelism.
3205 This will turn off hardware compression for backups to tape drives.
3604 When turning on or off trace flags, this flag will send output to the client.
3605 When turning on or off trace flags, this flag will send output to the error log.
7505 This enables 6.x handling of return codes when a call to dbcursorfetchx
causes the cursor position to follow the end of the cursor set.
2627ch26.qxd 8/22/00 11:22 AM Page 974
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
975
Max Async I/O
It should go without saying that SQL Server needs to be able to write to disk, because
that’s where the database files are stored—but is it writing to disk fast enough? If you
have multiple hard disks connected to a single controller, multiple hard disks con-
nected to multiple controllers, or a RAID system involving striping, the answer is
probably no. The maximum number of asynchronous input/output (Max Async I/O)
threads by default in SQL Server is 32. This means that SQL Server can have 32 out-
standing read and 32 outstanding write requests at a time. Thus, if SQL Server needs
to write some data to disk, SQL Server can send up to 32 small chunks of that data to
disk at a time. If you have a powerful disk subsystem, you will want to increase the
Max Async I/O setting.
The value to which you increase this setting depends on your hardware, so if you
increase the setting, you must then monitor the server. Specifically, you will need to
monitor the Physical Disk: Average Disk Queue Performance Monitor counter, which
should be less than two (note that any queue should be less than two). If you adjust
Max Async I/O and the Average Disk Queue counter goes above two, you have set Max
Async I/O too high and will need to decrease it.
NOTE You will need to divide the Average Disk Queue counter by the number of phys-
ical drives to get an accurate count. That is, if you have three hard disks and a counter
value of six, you would divide six by three—which tells you that the counter value for each
disk is two.
LazyWriter
LazyWriter is a SQL Server process that moves information from the data cache in
memory to a file on disk. If LazyWriter can’t keep enough free space in the data
cache for new requests, performance slows down. To make sure this does not hap-
pen, monitor the SQL Server: Buffer Manager – Free Buffers Performance Monitor
counter. LazyWriter tries to keep this counter level above zero; if it dips or hits zero,
you have a problem, probably with your disk subsystem. To verify this, you need to
check the Physical Disk: Average Disk Queue Performance Monitor counter and ver-
ify that it is not more than two per physical disk (see above). If the queue is too high,
LazyWriter will not be able to move data efficiently from memory to disk, and the
free buffers will drop.
OPTIMIZATION TECHNIQUES
Advanced Topics
PART
VI
2627ch26.qxd 8/22/00 11:22 AM Page 975
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 26 • MONITORING AND OPTIMIZING SQL SERVER 2000
976
RAID
RAID (Redundant Array of Inexpensive Disks) is used to protect your data and speed
up your system. In a system without RAID, data that is written to disk is written to
that one disk. In a system with RAID, that same data would be written across multiple
disks, providing fault tolerance and improved I/O. Some forms of RAID can be imple-
mented inexpensively in Windows NT, but this uses such system resources as proces-
sor and memory. If you have the budget for it, you might consider getting a separate
RAID controller that will take the processing burden off Windows NT. RAID is dis-
cussed in detail in Chapter 4, but here is a quick refresher:
RAID 0 Stripe Set: This provides I/O improvement, but not fault tolerance.
RAID 1 Mirroring: This provides fault tolerance and read-time improve-
ment. This can also be implemented as duplexing, which is a mirror that has
separate controllers for each disk.
RAID 0+1 Mirrored Stripe Set: This is a stripe set without parity that is
duplicated on another set of disks. This requires a third-party controller,
because Windows NT does not support this level of RAID natively.
RAID 5 Stripe Set with Parity: This provides fault tolerance and
improved I/O.
Adding Memory
SQL Server, like most BackOffice products, needs significant amounts of RAM. The
more you put in, the happier SQL Server will be. There is one caveat about adding
RAM, however: your level 2 cache. This is much faster (and more expensive) than
standard RAM and is used by the processor for storing frequently used data. If you
don’t have enough level 2 cache to support the amount of RAM in your system, your
server may slow down rather than speed up. Microsoft tells you that the minimum
amount of RAM that SQL Server needs is 32 to 64MB, but because SQL Server benefits
greatly from added RAM, you should consider using 256MB of RAM, which requires
1MB of level 2 cache.
Manually Configuring Memory Use
Although SQL Server can dynamically assign itself memory, it is not always best to let
it do so. A good example of this is when you need to run another BackOffice program,
such as Exchange, on the same system as SQL Server. If SQL Server is not constrained,
it will take so much memory that there will be none left for Exchange. The relevant
2627ch26.qxd 8/22/00 11:22 AM Page 976
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
977
constraint is the max server memory setting; by adjusting it, you can stop SQL Server
from taking too much RAM. If, for example, you set it to 102,400—100 × 1024 (the
size of a megabyte)—SQL Server will never use more than 100MB of RAM.
You could also set min server memory, which tells SQL Server to never use less than
the set amount; this should be used in conjunction with set working size. Windows NT
uses virtual memory, which means that data that is in memory and has not been
accessed for a while can be stored on disk. The set working size option stops Windows
NT from moving SQL Server data from RAM to disk, even if SQL Server is idle. This
can improve SQL Server’s performance, because data will never need to be retrieved
from disk (which is about 100 times slower than RAM). If you decide to use this
option, you should set min server memory and max server memory to the same size, and
then change the set working size option to 1.
Summary
This chapter has stressed the importance of monitoring and optimization. Monitor-
ing allows you to find potential problems before your users find them; without it, you
have no way of knowing how well your system is performing.
Performance Monitor can be used to monitor both Windows NT and SQL Server.
Some of the more important counters to watch are Physical Disk: Average Disk Queue
(which should be less than two) and SQLServer:Buffer Manager: Buffer Cache Hit
Ratio (which should be as high as possible).
Query Analyzer allows you to see how a query will affect your system before you
place the query in production. The Profiler is used to monitor queries after they have
been placed in general use; it is also useful for monitoring security and user activity.
Once you have used Profiler to log information about query use to a trace file, you
can run the Index Tuning Wizard to optimize your indexes.
Once you have created all logs and traces, you need to archive them. The various
log files can be used later for budget justification and trend tracking. For example,
suppose you added 50 users to your system six months ago and are about to add 50
more. If you kept records on what kind of load the last 50 users placed on your sys-
tem, you will be better prepared for the next 50.
This chapter also presented some tips for repairing a slow-running system. You can
change the Max Async I/O setting if your disk is not working hard enough to support
the rest of the system, and you may need to upgrade your disk subsystem if the SQL
Server: Buffer Manager – Free Buffers Performance Monitor counter hits zero. RAID
can also speed up your SQL Server. If you can afford a separate controller, you should
SUMMARY
Advanced Topics
PART
VI
2627ch26.qxd 8/22/00 11:22 AM Page 977
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 26 • MONITORING AND OPTIMIZING SQL SERVER 2000
978
get one to take some of the burden off Windows NT. If you can’t afford one, you can
use Windows NT RAID level 1 for fault tolerance and speed.
Now that you know how to optimize your server and keep it running at peak per-
formance, it will be much easier to perform all of the tasks on your SQL Server. This is
especially true of the next topic that we will discuss—replication.
2627ch26.qxd 8/22/00 11:22 AM Page 978
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 27
Replication
FEATURING:
Understanding Replication 980
Setting Up Replication 990
Creating and Subscribing to a
Transactional Publication 999
Creating and Subscribing to a
Snapshot Publication 1017
Creating and Subscribing to a
Merge Publication 1028
Using Replication Monitor 1040
Summary 1046
2627ch27.qxd 8/22/00 11:24 AM Page 979
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
F
or one reason or another, many companies have more than one database
system, especially in larger companies where there is more than one location
or multiple departments keep their own servers. Regardless of the reason,
many of these servers need to have copies of each other’s databases. For
example, if you have two servers for your human resources department (one in New
York and one in Singapore), you may need to keep a copy of each database on each
server so that all of your human resources personnel can see the same data. The best
way to copy this data is through replication.
Replication is designed specifically for the task of copying data and other objects
(such as views, stored procedures, and triggers) between servers and making certain
that those copies stay up-to-date. In this chapter, we will look into the inner work-
ings of replication. First we will discuss some terminology that is used to describe
the various parts of replication. After you have an understanding of the terms, we
can discuss the roles that SQL Servers can play in the replication process. Next we
will move into the types and models of replication, and finally we will replicate.
Let’s get started.
Understanding Replication
The sole purpose of replication is to copy data between servers. There are several good
reasons for doing so:
• If your company has multiple locations, you may need to move the data closer
to the people who are using it.
• If multiple people want to work on the same data at the same time, replication
is a good way of giving them that access.
• Replication can separate the functions of reading from writing data. This is espe-
cially true in OLTP (online transaction processing) environments where reading
data can place quite a load on the system.
• Some sites may have different methods and rules for handling data (perhaps the
site is a sister or child company). Replication can be used to give these sites the
freedom of setting their own rules for dealing with data.
• Mobile sales users can install SQL Server 2000 on a laptop, where they might
keep a copy of an inventory database. These users can keep their local copy of
the database current by dialing in to the network and replicating.
You may be able to come up with even more reasons to use replication in your
company, but to do so, you need to understand the publisher/subscriber concept.
2627ch27.qxd 8/22/00 11:24 AM Page 980
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
981
The Publisher/Subscriber Metaphor
Microsoft uses the publisher/subscriber metaphor to make replication easier to under-
stand and implement. It works a lot like a newspaper or magazine company. The
newspaper company has information that people around the city want to read; there-
fore the newspaper company publishes this data and has newspaper carriers distribute
it to the people who have subscribed. As shown in Figure 27.1, SQL Server replication
works much the same in that it too has a publisher, a distributor, and a subscriber:
Publisher: In SQL Server terminology, the publisher is the server with the
original copy of the data that others need—much like the newspaper company
has the original data that needs to be printed and distributed.
Distributor: Much like the newspaper company needs paper carriers to dis-
tribute the newspaper to the people who have subscribed, SQL Servers need
special servers called distributors to collect data from publishers to distribute
to subscribers.
Subscriber: A subscriber is a server that requires a copy of the data that is
stored on the publisher. The subscriber is akin to the people who need to read
the news, so they subscribe to the newspaper.
FIGURE 27.1
SQL Server can
publish, distribute,
or subscribe to
publications
in replication.
NOTE A SQL Server can be any combination of these three roles.
The analogy goes even further: All of the information is not just lumped together
in a giant scroll and dropped on the doorstep—it is broken up into various publica-
Publication
Article
Article
Article
Publisher
Contains original
copy of data
Distributor
Collects changes
from publishers
Subscriber
Receives a
copy of data
UNDERSTANDING REPLICATION
Advanced Topics
PART
VI
2627ch27.qxd 8/22/00 11:24 AM Page 981
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 27 • REPLICATION
982
tions and articles so that it is easier to find the information you want to read. SQL
Server replication follows suit:
Article: An article is just data from a table that needs to be replicated. Of
course, you probably do not need to replicate all of the data from the table, so
you don’t have to. Articles can be horizontally partitioned, which means that
not all records in the table are published, and they can be vertically parti-
tioned, which means that not all columns need be published.
Publication: A publication is a collection of articles and is the basis for sub-
scriptions. A subscription can consist of a single article or multiple articles, but
you must subscribe to a publication as opposed to a single article.
Now that you know the three roles that SQL Servers can play in replication and
that data is replicated as articles that are stored in publications, you need to know the
types of replication.
Replication Types
It is important to control how publications are distributed to subscribers. If the news-
paper company does not control distribution, for example, many people may not get
the paper when they need it, or other people may get the paper for free. In SQL
Server, you need to control distribution of publications for similar reasons, so that the
data gets to the subscribers when it is needed. There are a few factors to consider
when choosing a replication type:
Autonomy: Autonomy is the amount of independence that your subscribers
have over the data they receive. Some servers may need a read-only copy of the
data, while others may need to be able to make changes to the data they receive.
Latency: This refers to how long a subscriber can go without getting a fresh
copy of data from the server. Some servers may be able to go for weeks without
getting new data from the publisher, while other instances may require a very
short wait time.
Consistency: Possibly the most popular form of replication is transactional
replication, where transactions are read from the transaction log of the pub-
lisher, moved through the distributor, and applied to the database on the sub-
scriber. This is where transactional consistency comes in. Some subscribers may
need all of the transactions in the same order they were applied to the server,
while other subscribers may need only some of the transactions.
Once these factors have been considered, you are ready to choose the replication
type that will work best for you.
2627ch27.qxd 8/22/00 11:24 AM Page 982
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
983
Distributed Transactions
In some instances, multiple servers may need the same transaction at the exact same
time, as in a bank, for example. Suppose that the bank has multiple servers for storing
customer account data, each server storing a copy of the same data—all servers can
modify the data in question. Now suppose that a customer comes to an Automatic
Teller Machine and withdraws money from their account. The action of withdrawing
money is a simple Transact-SQL transaction that removes money from the customer’s
checking account record, but remember that more than one server holds this data. If
the transaction makes it to only one of the bank’s servers, the customer could go to
ATMs all over town and withdraw enough money to retire on, and the bank would
have a very hard time stopping them.
To avoid such a scenario, you need to get the exact same transaction to all of the
subscribers at the same time. If the transaction is not applied to all of the servers, it is
not applied to any of the servers. This type of replication is called distributed transac-
tions or two-phase commit (2PC). Technically this is not a form of replication; 2PC uses
the Microsoft Distributed Transaction Coordinator and is controlled by the way the
Transact-SQL is written. A normal, single-server transaction looks like this:
BEGIN TRAN
TSQL CODE
COMMIT TRAN
A distributed transaction looks like this:
BEGIN DISTRIBUTED TRAN
TSQL CODE
COMMIT TRAN
Using distributed transactions will apply the same transaction to all required
servers at once or to none of them at all. This means that this type of replication has
very low autonomy, low latency, and high consistency.
Transactional
All data modifications made to a SQL Server database are considered transactions,
whether or not they have an explicit BEGIN TRAN command and corresponding
COMMIT TRAN (if the BEGIN…COMMIT is not there, SQL Server assumes it). All of
these transactions are stored in a transaction log that is associated with the database.
With transactional replication, each of the transactions in the transaction log can be
replicated. The transactions are marked for replication in the log (because not all
transactions may be replicated), then they are copied to the distributor, where they
are stored in the distribution database until they are copied to the subscribers.
UNDERSTANDING REPLICATION
Advanced Topics
PART
VI
2627ch27.qxd 8/22/00 11:24 AM Page 983
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 27 • REPLICATION
984
The only real drawback is that subscribers to a transactional publication must treat
the data as read-only, meaning that users cannot make changes to the data they receive.
Think of it as being like a subscription to a newspaper—if you see a typo in an ad in the
paper, you can’t change it with a pen and expect the change to do any good. No one
else can see your change, and you will just get the same typo in the paper the next day.
So, transactional replication has high consistency, low autonomy, and middle-of-the-
road latency.
Transactional with Updating Subscribers
This type of replication is almost exactly like transactional replication, with one major
difference: The subscribers can modify the data they receive. You can think of this type
of replication as a mix of 2PC and transactional replication in that it uses the Distrib-
uted Transaction Coordinator and distributed transactions to accomplish its task.
The publisher still marks its transactions for replication, and those transactions get
stored on the distributor until they are sent to the subscriber. On the subscriber,
though, there is a trigger that is marked NOT FOR REPLICATION. This special trigger
will watch for changes that come from users of the server, but not for changes that
come from the distributor as a process of replication. This trigger on the subscriber
database will watch for changes and send those changes back to the publisher, where
they can be replicated out to any other subscribers of the publication.
Snapshot
While transactional replication copies only data changes to subscribers, snapshot
replication copies entire publications to subscribers every time it replicates. In
essence, it takes a snapshot of the data and sends it to the subscriber every time it
replicates. This is useful for servers that need a read-only copy of the data and do not
require updates very often—in fact, they could wait for days or even weeks for
updated data.
A good example of where to use this type of replication is in a department store
chain that has a catalog database. The headquarters keeps and publishes the master
copy of the database where changes are made. The subscribers can wait for updates to
this catalog for a few days if necessary.
The data on the subscriber should be treated as read-only here as well, because all of
the data is going to be overwritten anyway each time replication occurs. This type of
replication is said to have high latency, high autonomy, and high consistency.
Snapshot with Updating Subscribers
The only difference between this type of replication and standard snapshot replica-
tion is that this type will allow the users to update the data on their local server. This
2627ch27.qxd 8/22/00 11:24 AM Page 984
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
985
is accomplished the same way it is accomplished in transactional replication with
updating subscribers—a trigger is placed on the subscribing database that watches for
local transactions and replicates those changes to the publishing server by means of
the Distributed Transaction Coordinator. This type of replication has moderate
latency, high consistency, and high autonomy.
Merge
By far, this is the most complex type of replication to work with, but also the most
flexible. Merge replication allows changes to be made to the data at the publisher as
well as at all of the subscribers. These changes are then replicated to all other sub-
scribers until finally your systems reach convergence, the blessed state at which all of
your servers have the same data.
The biggest problem with merge replication is known as a conflict. This problem
occurs when more than one user modifies the same record on their copy of the data-
base at the same time. For example, if a user in Florida modifies record 25 in a table at
the same time that a user in New York modifies record 25 in their own copy of the
table, a conflict will occur on record 25 when replication takes place, because the
same record has been modified in two different places, and therefore SQL Server has
two values from which to choose. The default method of choosing a winner in this
conflict is based on site priority (which you will see how to set later in this chapter).
Merge replication works by adding triggers and system tables to the databases on
all of the servers involved in the replication process. When a change is made at any of
the servers, the trigger fires off and stores the modified data in one of the new system
tables, where it will reside until replication occurs. This type of replication has the
highest autonomy, highest latency, and lowest transactional consistency.
But how does all of this occur? What is the driving force behind replication? Let’s
look at the four agents that make replication run.
Replication Agents
Any of the types of subscriptions listed in the last section can be either push or pull
subscriptions. A push subscription is configured and controlled at the publisher. This
method of subscription is like the catalogs that you receive in the mail—the publisher
decides when you get updates because the publisher knows when changes have been
made to the information inside the catalog. The same is true of a push subscription in
replication—the publisher decides when changes will be sent to the subscribers.
Pull subscriptions are more like a magazine subscription. You write to the publisher
of the magazine and request a subscription—the magazine is not automatically sent to
UNDERSTANDING REPLICATION
Advanced Topics
PART
VI
2627ch27.qxd 8/22/00 11:24 AM Page 985
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 27 • REPLICATION
986
you. Pull subscriptions work much the same in that the subscriber requests a subscrip-
tion from the publisher—the subscription is not sent unless the subscriber asks for it.
With either method of replication, four agents are used to move the data from the
publisher to the distributor and finally to the subscriber:
Log reader agent: This agent is used primarily in transactional replication.
It reads the transaction log of the published database on the publisher and looks
for transactions that have been marked for replication. When it finds such a
transaction, the log reader agent copies the transaction to the distribution
server, where it is stored in the distribution database until it is moved to the sub-
scribers. This agent runs on the distributor in both push and pull subscriptions.
Distribution agent: This agent moves data from the distributor to the sub-
scribers. This agent runs on the distributor in a push subscription, but in a pull
subscription, it runs on the subscriber. Therefore, if you have a large number of
subscribers, you may want to consider using a pull subscription method to
lighten the load on the distribution server.
Snapshot agent: Just by reading the name of this agent, you would expect
it to work with snapshot replication, but it works with all types of replication.
This agent makes a copy of the publication on the publisher and either copies
it to the distributor, where it is stored in the distribution working folder
(\\ distribution_server\Program Files\Microsoft SQL Server\MSSQL$
(instance)\REPLDATA), or places it on removable disk (such as a CD-ROM or
zip drive) until it can be copied to the subscriber. With snapshot replication,
this agent runs every time replication occurs; with the other types of replica-
tion, this agent runs on a less frequent basis and is used to make sure that the
subscribers have a current copy of the publication, including the most up-to-
date structure for the data. This agent runs on the distributor in either a push
or a pull subscription.
TIP New to SQL Server 2000 is the ability to compress snapshot files. This can save
quite a bit of hard-disk space, because snapshot files can be sizable.
Merge agent: This agent controls merge replication. It takes changes from
all of the subscribers, as well as the publisher, and merges the changes with all
other subscribers involved in replication. This agent runs on the distributor in a
push subscription and on the subscriber in a pull subscription.
2627ch27.qxd 8/22/00 11:24 AM Page 986
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
987
Once you have selected the type of replication you need, you can pick the physical
model to go with it.
Replication Models
There are three roles that a SQL Server can play in replication: publisher, distributor,
and subscriber. Before you can successfully implement replication, you need to know
where to place these servers in your scheme. Microsoft has a few standard replication
models that should make it easier for you to decide where to put your servers
Single Publisher, Multiple Subscribers
In this scenario, there is a single, central publishing server where the original copy of
the data is stored and several subscribers that need copies of the data. This model
lends itself well to transactional or snapshot replication.
A good example of when to use this is if you have a catalog database that is main-
tained at company headquarters and your satellite offices need a copy of the catalog
database. The database at headquarters could be published, and your satellite offices
would subscribe to the publication. If you have a large number of subscribers, you
could create a pull subscription so that the load is removed from the distribution
server, making replication faster. Figure 27.2 should help you visualize this concept.
FIGURE 27.2
Several servers can
subscribe to a single
publisher.
Publisher/
Distributor
Subscriber
Subscriber
Subscriber
UNDERSTANDING REPLICATION
Advanced Topics
PART
VI
2627ch27.qxd 8/22/00 11:24 AM Page 987
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 27 • REPLICATION
988
Multiple Publishers, Single Subscriber
This model has a single server that subscribes to publications from multiple servers.
As shown in Figure 27.3, this lends itself to the following scenario:
Suppose that you work for a company that sells auto parts and you need to
keep track of the inventory at all of the regional offices. The servers at all of
the regional offices can publish their inventory databases, and the server at
company headquarters can subscribe to those subscriptions. Now the folks
at company headquarters will know when a regional office is running low on
supplies, because headquarters has a copy of everyone’s inventory database.
FIGURE 27.3
A single server can
also subscribe to
multiple publishers.
Multiple Publishers, Multiple Subscribers
In this model, each server is a publisher, and each server is a subscriber (see Figure 27.4).
You may instantly think that this lends itself to merge replication, but that is not always
the case. This model can lend itself to other types of replication as well.
For example, suppose that you work at a company that rents videos. Each video
store needs to know what the other video stores have in stock so that when a cus-
tomer wants a specific video, they can be instantly directed to a video store that has a
copy of the desired video. To accomplish this, each video store would need to publish
a copy of their video inventory, and each store would need to subscribe to the other
stores’ publications. In this way, the proprietors of the video store would know what
Publisher/
Distributor
Publisher/
Distributor
Publisher/
Distributor
Publisher/
Distributor
Subscriber
2627ch27.qxd 8/22/00 11:24 AM Page 988
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
989
the other video stores have in stock. If this is accomplished using transactional repli-
cation, there will be very little latency, because the publication would be updated
every time a transaction takes place.
FIGURE 27.4
Servers can both
publish and subscribe
to one another.
Remote Distributor
In many instances, the publishing server also serves as the distributor, and this works
fine. However, there are instances when it is advantageous to devote a server to the
task of distribution. Take the following scenario, for example (as shown in Figure 27.5):
Many international companies need data replicated to all of their subsidiaries
overseas. A company with headquarters in New York may need to have data
replicated to London, Frankfurt, and Rome, for example. If the server in New
York is both the publisher and the distributor, the process of replication would
involve three very expensive long-distance calls: one to each of the three sub-
scribers. If you place a distributor in London, though, the publisher in New
York would need to make only one call, to the distributor in London. The dis-
tributor would then make connections to the other European servers and there-
fore save money on long-distance calls between servers.
Publisher/
Subscriber
Publisher/
Subscriber
Publisher/
Subscriber
Publisher/
Subscriber
UNDERSTANDING REPLICATION
Advanced Topics
PART
VI
2627ch27.qxd 8/22/00 11:24 AM Page 989
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 27 • REPLICATION
990
FIGURE 27.5
A server can be
dedicated to the
task of distribution.
Heterogeneous Replication
Not all replication takes place between SQL Servers. Sometimes you need to have
duplicate data on a Sybase, Oracle, Access, or other database server. Heterogeneous
replication is the process of replicating data from SQL Server to another type of data-
base system. The only requirement for the subscriber in this case is that it must be
Open Database Connectivity (ODBC) compliant. If the target is ODBC compliant, it
can be the recipient of a push subscription from SQL Server. If you find that you
need to pull a subscription from SQL Server to a third-party database system, you
will need to write a custom program that accesses the SQL-DMO (Distributed Man-
agement Objects). In this way, you can make a program that will pull a subscription
to a third-party system.
With a thorough understanding of the terminology, the types, and the models of
replication, you are ready to start setting it up. Let’s replicate, shall we?
Setting Up Replication
There are a few steps to setting up and configuring replication. First you need a dis-
tributor to collect changes from the publishers and copy them to the subscribers.
Distributor
London
Publisher
New York
Subscriber
Frankfurt
Subscriber
Rome
Subscriber
London
2627ch27.qxd 8/22/00 11:24 AM Page 990
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
991
Then you need a publisher on which to create articles and publications. Finally you
need a subscriber to accept these publications.
The distributor will have a lot of work to do, especially if it is servicing more than
one publisher and more than one subscriber, so give it plenty of RAM (about 256MB
should do the trick). Also, all of the changes from the publishers are stored in one of
two places: For transactional replication, all of the changes are stored in the distribu-
tion database; for other types, the changes are stored in the distribution working
directory (\\distribution_server\Program Files\Microsoft SQL Server\MSSQL$
(instance)\REPLDATA), so make sure you have enough disk space to handle all of the
changes that will be flowing through the system.
NOTE The distribution database stores changes and history for transactional replica-
tion; for all other types, the distribution database merely stores history—changes are stored
in the distribution working folder.
WARNING Because only administrators have access to the C$ share on any given
server, the account used by the SQLServerAgent service needs to be an administrator on
the distribution server, or replication will fail.
Once the distributor is ready to go, you can proceed with replication. The first step
is to configure the distributor, which we will do now.
NOTE For the exercises in this chapter, you will need to have two instances of SQL
Server running. To configure this, please see Appendix B.
1. Open Enterprise Manager by selecting it from the SQL Server 2000 program
group under Programs on the Start menu.
2. Select your default instance of SQL Server and then on the Tools menu, point to
Replication and click Configure Publishing, Subscribers and Distribution. This
starts the Configure Publishing and Distribution Wizard. Click Next.
SETTING UP REPLICATION
Advanced Topics
PART
VI
2627ch27.qxd 8/22/00 11:24 AM Page 991
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 27 • REPLICATION
992
3. On the second screen, you are asked to select a distributor; this is where the dis-
tribution database and distribution working folder reside. You will work with
the local server, so check the radio button labeled Make ‘Server’ Its Own Distrib-
utor and click Next.
2627ch27.qxd 8/22/00 11:24 AM Page 992
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
993
4. The next screen asks whether you would like to customize the distribution
server properties or let SQL Server do it for you. If you want to place the distrib-
ution database on a different hard disk than the default or you want to place the
distribution working folder elsewhere, you should customize the properties. In
most cases, customization is recommended, so select the radio button labeled
Yes, Let Me Set the Distribution Database Properties and click Next.
5. On the next screen, you need to provide some information about the distribution
database: its name, data file location, and transaction log location. It is best to
have the data file and transaction log on different physical hard disks for recover-
ability, but for this exercise, accept all of the defaults and click Next to continue.
SETTING UP REPLICATION
Advanced Topics
PART
VI
2627ch27.qxd 8/22/00 11:24 AM Page 993
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 27 • REPLICATION
994
6. The next screen allows you to enable publishers. Enabling a publisher means
that it is allowed to use this server as a distributor. This keeps unauthorized
servers from bogging down your distributor. In this case, select both the primary
and SECOND servers (if you do not have a SECOND server, please refer to
Appendix B).
2627ch27.qxd 8/22/00 11:24 AM Page 994
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.