Nielsen c35.tex V4 - 07/21/2009 2:10pm Page 812
Part V Data Connectivity
Service Broker can handle complex message groups, such as multiple line items of an order that may not
appear consecutively in the queue due to other messages being received simultaneously. The conversa-
tion group can be used to select out the related messages.
Monitoring Service Broker
While Management Studio has no visibility in the activity of a queue, nor summary page reports for
the queue object, you can select directly from the queue or select a
count(). In addition, there are
database catalog views to shed light on the queue:
* sys.dm_broker_activated_tasks
* sys.dm_broker_connections
* sys.dm_broker_forwarded_messages
* sys.dm_broker_queue_monitors
SQL Trace/Profiler has a Broker event class with 10 Service Broker–related events that can be traced.
Summary
Service Broker is one of those technologies that provides no benefit ‘‘out of the box.’’ Unless you make
the effort to architect the database using Service Broker, it offers no advantage. However, if you do
take the time to design the database using Service Broker, you’ll see significant scalability benefits, as
Service Broker queues buffer the workload.
The next chapter continues the progression through SQL Server technologies and discusses ADO.NET
2.0 and its powerful methods for connectivity.
812
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 813
Replicating Data
IN THIS CHAPTER
Replication concepts
Configuring replication
R
eplication is an optional native SQL Server 2008 component that is used
to copy data and other database objects from one database or server to
another.
Replication is used for many purposes, listed here in order from most popular to
rarely used:
■ Offloading reporting from an OLTP server to a reporting server
■ Data consolidation — for example, consolidating branch office data to a
central server
■ Data distribution — for example, distributing data from a central server
to a set of member servers to improve read performance
■ Disaster recovery — replication can be used to keep a DR (disaster
recovery) server synchronized with the main server, and clients can be
manually redirected to the DR with minimal interruption
■ Synchronizing data with a central server and a mobile sales force
■ Synchronizing data with handheld devices (such as PDAs and
smartphones)
Replication processes can be made to be highly scalable, and typically can syn-
chronize data between servers/databases with acceptable latency. Latency reflects
the lag of time between when data is sent (replicated) from the source server and
received at the destination server.
Replication is not the o nly way to move data between servers. There a re several
alternatives, each with its own pros and cons:
■ bcp utility
■ SSIS
813
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 814
Part V Data Connectivity
■ Distributed transactions
■ Triggers
■ Copy Database Wizard
■ Backup and restore
■ Log shipping and database mirroring
Bulk copy program (bcp) is a command-line tool that can be used to send tabular data to the file sys-
tem, and from there to a remote server. While it can be scripted, it is slower than replication processes,
requires significant work to set up, and the DBA/developer needs to ensure that all objects are in place
on the destination server. For example, all tables, views, stored procedures, and functions need to be on
the destination server. There is no provision for change tracking. In other words, bcp can’t tell what has
changed in the data, and only sends the changes to the destination server. The solution requires change
tracking — a way to determine what has been inserted/updated/deleted on the source server. These may
involve using Change Data Capture or the Change Tracking features in SQL 2008.
SSIS can be thought of as a programmatic interface to a high-performance bcp utility. It can be faster
than bcp. As with bcp, it requires that the DBA/developer place all objects on the destination server, and
there is no provision for change tracking.
Distributed transactions normally involve using MS DTC (Microsoft Distributed Transaction Coordi-
nator). With a distributed transaction, the transaction is committed on the source server, then on the
destination server, and then the application can do the next unit of work. (This is sometimes called
a split write.) The application has to be configured to use distributed transactions, and the network
connection must be stable and have ample bandwidth; otherwise, the transactions will fail. With
distributed transactions, only changes are ‘‘replicated.’’ The DBA/developer needs to place all tables
(along with the initial data), stored procedures, views, and functions on the destination server.
Triggers are very similar to distributed transactions. With distributed transactions, all application code
(for example, stored procedures, and sometimes ADO.NET code) must be rewritten for the distributed
transactions. With triggers, the ‘‘replication’’ logic is incorporated on the trigger. And like distrib-
uted transactions, only changes are ‘‘replicated.’’ The DBA/developer needs to place all tables (along
with the initial data), stored procedures, views, and functions on the destination server. There is also
overhead with using triggers, especially over a network.
The Copy Database Wizard will move or copy a database from one server to another. It is intended for
a single use move or copy. In the move mode, you can only move the database one time. In the copy
mode, the database can be copied multiple times if you specify the options to delete the database and
the database files that might exist on the destination server.
Backup and restore will copy the entire database to the destination server. The level of granularity possi-
ble for the preceding options are tables (bcp, SSIS) and transactions (triggers, distributed transactions).
Backup and restore, log shipping, database mirroring, and the Copy Database Wizard ‘‘replicate’’ entire
databases. As the name suggests, backup and restore involves backing up the database on the source
server and restoring it on the destination server. This option is not scalable for large databases, and the
database must go offline while the database is being restored. It is not a good option in environments
with real-time data requirements, as the data becomes progressively out of date until the latest backup is
restored on the destination server.
814
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 815
Replicating Data 36
Log shipping is continuous backup and restore. The log is backed up on the source server and applied
to a previously restored database backup on the destination server. Log shipping is not considered to
be scalable, especially for large databases or large numbers of databases. The database on the destina-
tion server is not accessible with log shipping. There are options to make it accessible, but it will be in
read-only mode, and users need to be kicked off when the next log is r eady to be applied.
Database mirroring is continuous log shipping. Changes to the database transaction log are continually
shipped from the source server to the destination server. The database on the destination server will be
inaccessible while being mirrored. There are two modes of database mirroring: high performance and
high safety. With high safety, application writes on the source server are not committed on the source
server until they are also committed on the destination server. This can cause increased latency for all
writes on the destination server, which may make database mirroring not a good fit for your particular
requirements. High-performance mode does not have this problem, as changes occurring on the source
server are applied to the destination server asynchronously. However, the high-performance option is
only available on the Enterprise Edition of SQL Server 2005 and SQL Server 2008.
What’s New with Replication?
T
here are several new features in SQL Server 2008 replication:
■ Integration with database mirroring. If you have a remote distributor, you can con-
figure your publisher on your principal to failover to your mirror without having to
reinitialize y our subscribers.
■ Much f aster snapshot delivery on Windows 2008 servers
■ A new Wizard for deploying nodes in a peer-to-peer topology. For more information,
consult />■ Conflict detection in peer-to-peer replication
■ Ability to make schema changes in peer-to-peer replication without having to stop all
users from using the topology while changes are deployed.
Replication Concepts
SQL Server replication operates according to a publishing metaphor. There can be three types of servers
in a replication topology:
■ Publisher: The source server
■ Distributor: For transactional replication and peer-to-peer replication, the distributor is where
the changes are stored until they are replicated to the destination server. For merge replication,
the distributor is merely a repository for replication process history. Changes and historical
information are stored in a database called the distribution database.
■ Subscriber: The destination server
815
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 816
Part V Data Connectivity
Types of replication
Based on the publishing metaphor, SQL Server 2008 offers five basic types of replication, each serving a
different purpose:
■ Snapshot replication: A point-in-time image of database objects (a snapshot) is copied
from the source server to the destination server. This image generation and deployment can
be scheduled at whatever interval makes sense for your requirements; however, it is best
used when the majority of your data seldom changes, and when it does, it changes at the
same time.
■ Transactional replication: Transactions occurring on the source server are asynchronously
captured and stored in a repository (called a distribution database) and then applied, again
asynchronously, on the destination server.
■ Oracle publishing: This is a variant of transactional replication. Instead of SQL Server being
the source server, an Oracle server is the source server, and changes are replicated from the
Oracle server to SQL Server. This SQL Server can be the final destination for the Oracle
server’s data, or it can act as a gateway, and changes can be replicated downstream to other
SQL Servers, or other RDBMs. Oracle publishing is only available on SQL Server Enterprise
Edition and above.
■ Peer-to-peer replication: Another variant of transactional replication that is used to replicate
data to one or more nodes. Each node can publish data to member nodes in a peer-to-peer
replication topology. Should one node go offline, changes occurring on the offline node and
the other member servers will be synchronized when that node comes back online. Changes
are replicated bi-directionally, so a change occurring on Node A will be replicated to Node B,
and changes occurring on Node B will be replicated to Node A. Peer-to-peer replication
is an Enterprise Edition–only feature that is scalable to approximately 10 nodes, but your
results may vary depending on your replicated workload, your hardware, and your available
bandwidth.
■ Merge replication: As the name indicates, merge replication is used to merge changes occur-
ring on the destination server with changes occurring on the source server, and vice versa. It
is highly scalable to hundreds if not thousands of destination servers. With merge replication,
there is a central clearinghouse for changes that determines which changes go where. With
peer-to-peer replication, any member node in the topology can assume the clearinghouse role.
Replication agents
As you might imagine, a lot of work is required to move data between the various servers in the pub-
lishing metaphor. To do so, SQL Server replication makes use of three agents:
■ Snapshot agent: Generates the tabular and schema data or schemas for the objects you
wish to replicate. The tables and schema data, and related replication metadata, is frequently
referred to as the snapshot. The snapshot agent is used by all replication types. The snapshot
agent writes the tabular/schema data to the file system.
■ Distribution agent: Used by snapshot replication to apply the snapshot on the subscribers,
and used by transactional replication to apply the snapshot on the subscriber and to replicate
subsequent changes occurring on the publisher to the subscriber.
816
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 817
Replicating Data 36
■ Merge agent: Detects changes that have occurred on the publisher and the subscriber
since the last time these agents ran and merges them together to form a consistent set on
both the publisher and the subscriber. In some cases, the same primary key value will be
assigned on the publisher and one or more subscribers between runs of the merge agent
(called a sync). When the merge agent runs it detects this conflict and l ogs it to conflict
tables that can be viewed using the conflict viewer. With merge replication, the data that
is in conflict will persist on the publisher and the subscriber by default. For example, if
a primary key value of 1,000 for a table is assigned on the publisher, and then the same
value is assigned on the same table on the subscriber, when the merge agent runs it will log
the conflict, but keep the publisher’s values for the row with a PK (primary key) of 1,000
on the publisher, and keep the subscriber’s values for the row with a PK of 1,000 on the
subscriber.
Merge replication has a rich set of features to handle conflicts, including one that skips
changes to different columns occurring on the same row between publisher and subscriber.
This is termed column-level conflict tracking. For example, a change to John Smith’s home
phone number occurring on the publisher and his cell phone number occurring on the
subscriber would be merged to have both changes persisting on both the publisher and sub-
scriber. By default, merge replication uses row-level conflict tracking that might result in the
change to John Smith’s home phone number being updated on both the publisher and the
subscriber, but his cell phone change being rolled back, with this conflict and the conflicting
values being logged to the conflict tables.
Best Practice
A
single server can serve as both the publisher and the distributor and even as the subscriber. An excellent
configuration for experimenting with replication is a server with multiple SQL Server instances. However,
when performance is an issue, a dedicated distributor server is the best plan. This remote distributor can act
as a distributor for multiple publishers; in fact, you can configure this remote distributor to have a separate
distribution database for each publisher.
The publisher server organizes multiple articles (an article is a data source: a single table, view, function,
or stored procedure) into a publication. You may find that you get better performance by grouping large
articles (tables) into their own publication. The distributor server manages the replication process. The
publisher can initiate the subscription and push data to the subscriber server, or the subscriber can set
up the subscription and pull the subscription from the publisher.
Transactional consistency
The measure of transactional consistency is the degree of synchronization between two replicated
servers. As the lag time between synchronizations increases, transactional consistency decreases. If
the data is identical on both servers most of the time, transactional consistency is said to be high.
Conversely, a replication system that passes changes every two weeks by e-mail has low transactional
consistency.
817
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 818
Part V Data Connectivity
Configuring Replication
Using wizards is the simplest way to implement replication. Developers and DBAs generally avoid wiz-
ards because they have limited features, b ut implementing replication without wizards requires numer-
ous calls to arcane stored procedures and is a tedious and painful process prone to user errors.
Before configuring r eplication, it is important to understand the limitations of various SQL Server
editions. For example, SQL Server Express can only act as a subscriber, and the number of subscribers
each edition can have is limited. For example, the Standard Edition can only have five subscribers, the
Web 25 subscribers. Merge replication can only be used to replicate to subscribers with same version or
lower. For example, you can’t have a SQL Server 2005 publisher merge replicating to SQL Server 2008
subscribers; however, a SQL Server 2008 publisher can replicate to a SQL Server 2005 subscriber. Merge
replication is the only replication type that can replicate to SQL Server CE subscribers.
Creating a publisher and distributor
To enable a server as a publisher you must first configure it as a subscriber. While you can configure
the publisher with a local or remote distributor, it is recommended that you configure the distrib-
utor first, before creating your first publication. This way, if there is a problem, it will be easier to
troubleshoot.
The following steps walk you through the process of creating your first distributor:
1. Connect to the server that will be acting as your publisher/distributor or remote distributor
using SQL Server Management Studio. You need to use the SQL Server 2008 version of SQL
Server Management Studio for this.
2. Once you have connected, right-click on the replication folder and select the menu option
Configure Distribution.
If you do not see the Configure Distribution option, either your SQL Server edition is SQL
Server Express or you do not have the replication components installed. To install the
replication components, you need to run Setup again.
3. After clicking through the initial splash screen, you will have the option to select which
server you should use as your distributor: either the local server or a remote server. If you are
using a remote server, you need to ensure that the remote server is already configured as a
distributor. Because this is a local distributor, select the default option and click Next.
4. You will be prompted for a folder to serve as the default location where the snapshot agent
deposits the snapshot. Select a different location if the default folder does not have adequate
space for your snapshots, or if you want to minimize I/O contention. The snapshot generation
process is an I/O-intensive process during snapshot generation. You do have the option to
select a snapshot folder or share for each publication when you create it, so the snapshot
folder location is not of critical importance.
5. Once you have selected the location for your snapshot folder or snapshot share, click Next.
The Distribution Database dialog enables you to name your distribution database and select
folders where the database data and log files will reside.
818
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 819
Replicating Data 36
Best Practice
I
f you have a large number of subscribers, or you are replicating over a WAN, you should use a share
for your snapshot folder, or use FTP along with pull subscribers (you will configure FTP server details
when you create your publication). With pull subscribers, the merge or distribution agent or process runs on
the s ubscriber. With push subscribers, the distribution and merge agents run on the publisher/distributor or
distributor. If you are using push subscribers with a remote distributor, your snapshot folder must also be
configured as a snapshot share. It is not a good security practice to use the Admin shares (i.e., C$), but rather
a share name that hides the path of the actual physical snapshot folder location, and does not require the
distribution or merge agents to run under an account that has rights to access the snapshot share.
Best Practice
O
ptimal configuration of a distributor or a distribution server is on a 64-bit server with ample RAM and
RAID 10 drives. The distributor server will be I/O and network bound, so the more RAM available for
caching and the greater the available network bandwidth, the greater the throughput of your transactional
replication solution. Merge replication is CPU and network bound, so these best practices do not apply for it.
6. Click Next to enable the publishers that you wish to use this distributor. If this is a local
publisher/distributor, your publisher will already be selected. If not, you need to click the Add
button if you want to enable other publishers to use this distributor.
7. Click Next to assign a distributor password. This allows remote publishers to use this
distributor as their distributor.
8. Click Next, Next again, and Finish to com plete the creation of your distributor.
To configure a publisher to use a remote distributor, follow these steps:
1. Connect to the publisher using SQL Server Management Studio, right-click on the Replication
folder, and select Configure Distribution.
2. When you get to the option to select which server you wish to use as your distributor, select
the ‘‘Use the f ollowing server as the Distributor’’ option.
3. Click the Add button and enter the connection information to connect to the remote distribu-
tor. You will be prompted for the password you configured to access the remote distributor.
4. Click Next, Next again, and Finish.
Your remote distributor is now ready to use.
Creating a snapshot/transactional publication
Once a distributor is set up for your server, you can create your publications. A publication is defined
as a collection of articles, where an article is an item to be published. An article in SQL Server can be a
819
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 820
Part V Data Connectivity
table, a view, an indexed view, a user-defined function, or even a stored procedure or its execution. If
you choose to replicate the execution of a stored procedure, the stored procedure call will be executed
on the subscriber.
For example, if you fire a stored procedure that updates 10,000 rows on a table, this table is replicated,
and the execution of the stored procedure is executed, only the stored procedure call will be executed.
If the replication of the stored procedure execution was not replicated, 10,000 update statements would
have to be replicated by the publisher, through the distributor to the subscriber. As you can imagine,
there are considerable benefits to doing this.
Typically tables are published, but views can also be published. You just need to ensure that the base
tables referenced by the views are also published.
To create a publication, execute the following steps:
1. Connect to your publisher using SQL Server Management Studio, and expand the Replication
folder, then right-click on the local publication folder and select New Publication.
2. After clicking through the initial splash screen, select the database you wish to replicate from
the Publication Databases section.
3. Click Next. In the Replication Types dialog that appears, select the replication type you wish
to use. You will then get a dialog entitled Articles, from which you can select the type of
objects you wish to replicate.
4. Expand each object type tree and select the articles you wish to replicate. For e xample, if
you wish to replicate tables, expand the table tree and select the individual tables you wish
to replicate. You can elect to replicate all tables by selecting the check box next to the table
tree. You a lso have the option to replicate only a subset of the columns in tables you are
replicating.
If you see a table with what appears to be a red circle with a slash through it next to
the table, this table does not have a primary key and you will be unable to replicate
it in a transactional publication. Snapshot and merge replication allow you replicate tables without
primary keys.
If you highlight a table and then click the article properties drop-down box, you can configure
options regarding how the table will be replicated to the subscriber. For example, you can
replicate user triggers, include foreign key dependencies, and determine what will happen if a
table with the same name already exists on the subscriber. The options are as follows:
■ Drop the subscriber table
■ Do nothing
■ Keep the table, but delete all of its data
■ Keep the table, but delete all of the data that meets your filtering c riteria (covered in the
next step)
5. Once you have selected the objects you wish to replicate, click Next. The Filter Table Rows
dialog will appear. From here, you can configure filtering criteria that sends only a subset of
the rows to the subscriber. For example, if you were replicating a table with a state column,
820
www.getcoolebook.com
Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 821
Replicating Data 36
you may decide that the subscriber should have only rows from California. To enable that, you
would click the Add button, select the table in the drop-down box in the ‘‘Select table to filter
option,’’ click the State column in the ‘‘Complete the filter statement’’ section, and then add the
state value. In code, it might look like this:
SELECT <published_columns> FROM [dbo].[SalesStaff] WHERE [State]=’CA’
This would ensure that the subscriber only receives data and changes from sales staff when the
value of
State is CA.
6. Once you have e nabled your filters, click Next. The next dialog is Snapshot Agent, which
controls two snapshot options:
■ Create a snapshot immediately and keep the snapshot available to initialize subscriptions.
■ Schedule the Snapshot Agent to run at the following times.
The first option generates the snapshot immediately; and every replicated change that occurs
in the publication is not only replicated to the subscriber, but also added to the snapshot files.
This is a great option when you have to deploy a lot of snapshots frequently, but it does add
a constant load to your publisher. The second option to schedule the snapshot agent generates
a snapshot on a schedule, so the snapshot files are updated each time you run the snapshot
agent. Changes not in the snapshot have to be stored i n the distribution agent, which may
mean extra storage requirements on the distributor. For most DBAs/developers, it is not a good
practice to enable these options.
7. Click Next to configure Agent Security. This option allows you to select the security context
you wish your replication agents to run under. By default, SQL Server runs the replication
agents under the same account under which the SQL Server agent account runs.
This is not considered to be a good security practice, as buffer overflow, worm attacks, or
Trojan attacks might be able to hijack the replication agent and run commands with the
same security context as the SQL Server Agent on the publisher, distributor, or subscriber. This dialog
enables you to control which account the replication agent is going to run under; ideally, this will be an
account with as few rights as possible on the publisher, distributor, or subscriber. Figure 36-1 illustrates
this dialog.
8. Click the Security Settings button to display the Snapshot Agent Security dialog shown in
Figure 36-2. From here, you can enter the Windows or SQL Server Agent account under
which you wish the snapshot agent to run. If you choose a Windows account, it needs to be
added using the following syntax: DomainName\AccountName.
9. Once you have selected the agents you wish to use, click OK, and then Next to exit the Agent
Security dialog.
10. The next dialog is the Wizard Actions dialog. This enables you to create the publication imme-
diately, create a script to create the publication, or both. Once you have made your selection,
click Next. In the Complete the Wizard dialog that appears, you can name your publication.
11. Once you have given your publication a name, click Finish to create it.
After you have created your publication, you can now create one or more subscriptions to it.
821
www.getcoolebook.com