Tải bản đầy đủ (.pdf) (40 trang)

Tài liệu SQL Server MVP Deep Dives- P19 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.03 MB, 40 trang )

674
C
HAPTER
53
SQL Server Audit, change tracking, and change data capture
ALTER SERVER AUDIT ServerAudit
WITH ( STATE = ON );
GO
USE [HR];
GO
CREATE DATABASE AUDIT SPECIFICATION HRAudit
FOR SERVER AUDIT ServerAudit
ADD ( SELECT, INSERT, UPDATE, DELETE ON dbo.Employees BY [public] )
WITH ( STATE = ON );
GO
When reviewing audit information (whether in a file, or in the event log), there is a
variety of information available to you, including the time of the action, the
session_id
(
SPID
) of the user that performed the action, the database, server and
object that was the target of the action, and whether or not the action succeeded. For
a full listing of the columns written to an audit row, see the Books Online topic, “
SQL
Server Audit Records,” located at />cc280545.aspx. I was disappointed to see that host name and/or
IP
address of the
session_id
is not recorded. This can be important information in some instances,
and is difficult to determine after the session has disconnected from the server. For
example, if


SQL
Authentication is enabled, and the sa (or another sysadmin) password
is commonly known, then anyone can connect that way via their own machine, and be
relatively untraceable.
Another important note here is that the type of action (for example,
SELECT
or
ALTER
) is recorded, but in the case of
SELECT
or
DML
queries, none of the data
involved is included. For example, if you run the statement in listing 3, the event log
entry will look like listing 4 (I’ve left out some of the columns).
UPDATE dbo.Employees
SET Salary = Salary * 1.8
WHERE EmployeeID = 5;
Log Name: Application
User: N/A
Event ID: 33205
Audit event: event_time:2008-10-05 18:14:31.3784745
action_id: UP
session_id: 56
session_server_principal_name: sa
server_instance_name: SENTINEL\SQL2008
database_name: HR
schema_name: dbo
object_name: Employees
statement: UPDATE [dbo].[Employees] set [Salary] = [Salary]*@1 WHERE

[EmployeeID]=@2
Listing 3 Updating the Employees table
Listing 4 Event log entry for the
UPDATE
command in listing 3
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
675
How does SQL Server 2008 solve these problems?
(Note that you may also see other events in the event log corresponding to the audit-
ing activity itself.)
Because the literal values in the statement are replaced by parameter placeholders,
and because the previous version of the data is not included, it is going to be difficult
to find the entries where salaries were increased. Also, the information in the event
log entry does not include the host name and/or
IP
address of the computer that
issued the statement. So, if you are using
SQL
Authentication and your developers
share a single login, it will be difficult with auditing alone to figure out who per-
formed this update. You can work your way around this by adding the
SUCCESSFUL_LOGIN_GROUP
to the Server Audit Specification, as shown in listing 5.
USE [master];
GO
CREATE SERVER AUDIT SPECIFICATION CaptureLogins
FOR SERVER AUDIT ServerAudit
ADD ( SUCCESSFUL_LOGIN_GROUP )
WITH ( STATE = ON );

GO
Once you do this, you will have login records in the log or file that you can correlate
with
session_id
and
event_time
to the relevant database audit activity. The successful
login entry will have (in addition to
session_id
and other data observed above) host
name information in the following form, under the
Additional

information
field:
Additional information:<action_info...>...
<address>local machine / host name / IP</address>
If you are using Windows Authentication, on the other hand, then this seems like a
reasonable way to capture exactly who executed the statement (without having to cor-
relate to login events), but not necessarily what values they passed in. Take the case
where you find that Employee 5’s salary has been increased from $100,000 to
$250,000. Three such events appear in the Application Log, from three different
users, with the exact same
UPDATE
statement. The first could have updated the salary
to $250,000, and the other two could have left it that way (by explicitly defining
$250,000 in their
UPDATE
statement, even though it did not ultimately change the data
in the table). Or, the increment could have been performed by the second or third

person, or each person could have increased the salary by $50,000. There are millions
of other possible permutations, and this is a simple case. Imagine trying to unravel
this mystery on a busy system with thousands of simultaneous users all affecting the
same table.
Before moving on to the next section, if you have created the sample code above,
you can remove it using the code in listing 6.
Listing 5 Creating a Server Audit with the
SUCCESSFUL_LOGIN_GROUP
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
676
C
HAPTER
53
SQL Server Audit, change tracking, and change data capture

USE [HR];
GO
IF EXISTS
(
SELECT 1
FROM sys.database_audit_specifications
WHERE name = 'HRAudit'
)
BEGIN
ALTER DATABASE AUDIT SPECIFICATION HRAudit
WITH (STATE = OFF);
DROP DATABASE AUDIT SPECIFICATION HRAudit;
END
GO

USE [master];
GO
IF EXISTS
(
SELECT 1
FROM sys.server_audit_specifications
WHERE name = 'CaptureLogins'
)
BEGIN
ALTER SERVER AUDIT SPECIFICATION 'CaptureLogins'
WITH (STATE = OFF);
DROP SERVER AUDIT SPECIFICATION 'CaptureLogins';
END
GO
IF EXISTS
(
SELECT 1
FROM sys.server_audits
WHERE name = 'ServerAudit'
)
BEGIN
ALTER SERVER AUDIT ServerAudit
WITH (STATE = OFF);
DROP SERVER AUDIT ServerAudit;
END
GO
Change tracking
Change tracking is a feature that adds the ability to determine, at a glance, which rows
in a table have changed in a specified period of time. This can be useful for synchro-
nizing data between the primary database and a middle-tier data cache, and for allow-

ing semi-connected applications to detect conflicts when updates have been made on
both sides. Change tracking is meant to allow you to identify the rows that changed,
but does not keep any information about the values that were changed (for example,
Listing 6 Cleaning up the audit specification
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
677
How does SQL Server 2008 solve these problems?
a previous version of the row). Change tracking occurs synchronously, so there is
some overhead to the process. In general, the overhead is equivalent to the mainte-
nance costs of adding an additional nonclustered index to the table.
The process assumes that you can always get the current version of the row directly
from the table, and that you only care about knowing whether or not a row has
changed. (Change tracking is described more in-depth in Books Online, starting at
the topic, “Change Tracking,” at http:
//msdn.microsoft.com/en-us/library/
cc280462.aspx.)
To set up change tracking on a table, the table must have a primary key, and you
must first enable the feature at the database level. (Books Online also suggests that
the database must be at least at a compatibility level of 90, and that snapshot isolation
is enabled.) Using the
HR
database and the dbo.Employees table created in the previ-
ous section, you can enable change tracking as shown in listing 7.
ALTER DATABASE HR SET ALLOW_SNAPSHOT_ISOLATION ON;
ALTER DATABASE HR SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 3 DAYS, AUTO_CLEANUP = ON);
GO
USE HR;
GO

ALTER TABLE dbo.Employees
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = OFF);
GO
At the database level, the option
CHANGE_RETENTION
indicates how long you keep
information about rows that have changed. If the applications last checked for
changed data before that period started, then they will need to proceed as if the
entire table is brand new (so a caching application, for example, will need to reload
the entire table and start from scratch).
AUTO_CLEANUP
is the option that specifies
whether this periodic purging should take place, and it can be disabled for trouble-
shooting purposes. Although this sounds like something that requires
SQL
Server
Agent, it is handled by an internal background task. It will work on all editions of
SQL
Server, including Express Edition, with or without
SQL
Server Agent enabled.
At the table level, the
TRACK_COLUMNS_UPDATED
option is used to specify whether
the system should store information about which columns were changed, or store the
fact that the row was changed. The former can be useful for an application that tries
to synchronize or cache data from a table that contains both an
INT
column and a

LOB
column (for example,
VARCHAR(MAX)
). Instead of pulling an identical copy of the
LOB
column for a row that changed, it can ignore that column and keep its local copy if it
knows that it was not a part of any update that has happened since it was last loaded.
Listing 7 Enabling change tracking
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
678
C
HAPTER
53
SQL Server Audit, change tracking, and change data capture
Once change tracking is enabled, what an application can do is connect to the
database, and determine the current baseline version of the table. This is a
BIGINT
value that is returned by calling the new function
CHANGE_TRACKING_CURRENT_
VERSION()
(this represents the most recent committed transaction). Once the appli-
cation knows this value, it can load all of the data from the table, and then can check
for further updates later using the
CHANGETABLE()
function. This function will return
a set of data representing any rows that have changed in the specified table since the
baseline version retrieved above. The following is all in
T-SQL
, but you can envision

how an application would use the same logic. Open two new query windows in Man-
agement Studio, connected to the
HR
database, and run the code in listing 8.
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
-- Check the current baseline:
SELECT Baseline = CHANGE_TRACKING_CURRENT_VERSION();
-- Load the current version of the table:
SELECT EmployeeID, FirstName, LastName, Salary FROM dbo.Employees;
-- Now, switch to the second query window, and make some updates to the
table:
UPDATE dbo.Employees SET LastName = 'Kinison' WHERE EmployeeID = 2;
DELETE dbo.Employees WHERE EmployeeID = 5;
INSERT dbo.Employees
(
EmployeeID, FirstName, LastName, Salary
)
SELECT
6, 'Kirby', 'Quigley', 62500;
Listing 9 shows the code to retrieve the changes made to the Employees table. Replace
<x>
with the result from the baseline query in listing 8.
SELECT
NewBaseLine = CHANGE_TRACKING_CURRENT_VERSION(),
cv = SYS_CHANGE_VERSION,
ccv = SYS_CHANGE_CREATION_VERSION,
op = SYS_CHANGE_OPERATION,
EmployeeID
FROM CHANGETABLE(CHANGES dbo.Employees, <x>) AS ChT;
The results should look something like this:

NewBaseLine cv ccv op EmployeeID
3 3 NULL U 2
3 2 NULL D 5
3 1 1 I 6
Listing 8 Determining (and updating) the baseline version of a table
Listing 9 Retrieving changes to the Employees table
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
679
How does SQL Server 2008 solve these problems?
Now, the application can use this output to determine which of the following it needs
to do:

Reload the data from the row for EmployeeID 2, because it has been updated
since the last time it was loaded.

Load the new row for EmployeeID 6, because it has been created since the table
was last loaded.

Remove the row for EmployeeID 5 from its local cache, because it has been
deleted.

Record the new baseline value (3) as it will need to use this as the version to
check against the next time it polls for changes.
Note that change tracking does not record any information about the user who made
the change. It only records the fact that a row changed. But as with
SQL
Server Audit,
there are ways around this limitation.
SQL

Server 2008 supports new syntax to allow
change tracking to add some contextual information to the
DML
statement, allowing
that data to be stored along with other details of the change. This was intended to pro-
vide an application the ability to differentiate between its own updates from those of
other applications, but you can use it for whatever other devious purposes you can
dream up. For example, as shown in listing 10, you can easily add information such as
host name and user name using an
INSTEAD

OF

UPDATE

TRIGGER
, by utilizing the new
WITH

CHANGE_TRACKING_CONTEXT()
construct, in order to store information about
users performing updates to your table.
CREATE TRIGGER dbo.AppendEmployeeUpdates
ON dbo.Employees
INSTEAD OF UPDATE
AS
BEGIN
SET NOCOUNT ON;
DECLARE @i VARBINARY(128);
SET @i = CONVERT

(
VARBINARY(128), SUSER_SNAME() + '|' + HOST_NAME()
);
WITH CHANGE_TRACKING_CONTEXT (@i)
UPDATE e
SET e.FirstName = i.FirstName,
e.LastName = i.LastName,
e.Salary = i.Salary
FROM dbo.Employees e
INNER JOIN inserted i
ON e.EmployeeID = i.EmployeeID;
END
GO
Listing 10 Using
WITH

CHANGE_TRACKING_CONTEXT()
in an
INSTEAD

OF
trigger
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
680
C
HAPTER
53
SQL Server Audit, change tracking, and change data capture
In this case, because you are not tracking individual column updates, you don’t have

to worry about only updating those columns that have changed. If you do implement
a solution where individual columns matter, you might want more complex logic such
that the trigger only touches the base table columns that should now contain different
values. And for an even more bulletproof trigger, you would also want to handle the
case where the primary key might change (even though, in theory, this should never
happen). You could do this in a stored procedure instead, if you can prevent direct
updates to the table itself, and enforce all access via stored procedures. That is possi-
ble in some environments, but not all.
Once the trigger is in place, you can run the following
UPDATE
statement:
UPDATE dbo.Employees SET LastName = 'Malone' WHERE EmployeeID = 2;
And now when you call the
CHANGETABLE
function, as shown in listing 11, you can add
a new column that will return that contextual information (assuming the existing
baseline was 3 after the above statements).
SELECT
NewBaseLine = CHANGE_TRACKING_CURRENT_VERSION(),
[user|host] = CONVERT(NVARCHAR(128), SYS_CHANGE_CONTEXT),
cv = SYS_CHANGE_VERSION,
ccv = SYS_CHANGE_CREATION_VERSION,
op = SYS_CHANGE_OPERATION,
EmployeeID
FROM CHANGETABLE(CHANGES dbo.Employees, 3) AS ChT;
-- results:
NewBaseLine user|host cv ccv op EmployeeID
4 SENTINEL\Aaron|SENTINEL 11 NULL U 2
Arguably, you could also use the trigger to store the old and new values off in a table
somewhere for deferred analysis. But that would require you to manually create tables

to capture all of that information, and come up with your own cleanup mechanism.
And, without spoiling any surprises, you would be duplicating the functionality of
another feature added in
SQL
Server 2008.
Before proceeding, you can disable change tracking on the
HR
database and the
dbo.Employees table using the code in listing 12.
USE [HR];
GO
ALTER TABLE dbo.Employees
DISABLE CHANGE_TRACKING;
ALTER DATABASE HR
SET CHANGE_TRACKING = OFF;
Listing 11 Calling the
CHANGETABLE
function
Listing 12 Disabling change tracking
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
681
How does SQL Server 2008 solve these problems?
Change data capture
Change data capture (
CDC
) is similar to change tracking in that it captures information
about changes to data. But the information it captures (and how) is significantly differ-
ent. Instead of capturing the primary key for each row that has changed, it records the
data that has changed, for all columns, or for the subset of columns you specify. It

records all of the data for
INSERT
s and
DELETE
s, and in the case of
UPDATE
s, it records
both the before and after image of the row. And it does this by periodically retrieving
data from the
SQL
transaction log, so the activity does not interfere directly with your
OLTP
processes. It does require that
SQL
Server Agent is enabled and running.
The primary motivation for including
CDC
in
SQL
Server 2008 was to facilitate an
easier process for extract, transform, and load (
ETL
) applications. Making all of the
changed data available separately allows the application to pull only the updated data,
without having to go to the base tables for the data (or to verify timestamp columns or
perform expensive joins to determine deltas). You can investigate
CDC
in much more
depth starting with the Books Online topic, “Overview of Change Data Capture,”
located at http:

//msdn.microsoft.com/en-us/library/cc627397.aspx.
To set up
CDC
, you must be running Enterprise or Developer Edition, and you
must enable it at the database level first, and then for each table you want to capture.
Note that unlike
SQL
Server Audit and change tracking,
CDC
features are enabled and
disabled via system stored procedure calls. Using the same
HR
database and
dbo.Employees table as in previous sections, listing 13 shows the commands necessary
to start capturing data changes.
USE HR;
GO
EXEC sys.sp_cdc_enable_db;
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Employees',
@supports_net_changes = 1,
@role_name = NULL;
GO
The first two parameters to the
enable_table
stored procedure are self-explanatory,
but the last two are not. The
@supports_net_changes

parameter dictates whether the
changed data can be retrieved as a data set that includes one row per key value, sum-
marizing all of the changes that took place in the indicated timeframe (in this way, it
works similarly to change tracking, but you will also see the data in each column in
addition to the primary key value). Note that to support net changes, the source table
must have a primary key or a unique index defined. You will still be able to investi-
gate each individual change, but if you look at the net, this will allow you to perform
one
DML
statement on the target instead of multiple, in the case where your extract,
Listing 13 Enabling a database and table for change tracking
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
682
C
HAPTER
53
SQL Server Audit, change tracking, and change data capture
transform, and load (
ETL
) program is replicating changes to another system. The
@role_name
parameter is used to specify who can access the changed data table. This
can be a fixed server role, a database role, or left as
NULL
(in which case, sysadmin
and db_owner have full access, and other users inherit their permissions from the
base table).
The
sys.sp_cdc_enable_table

stored procedure has five other optional parame-
ters. One is called
@captured_column_list
, which allows you to capture only changes
to a specific subset of the columns. For example, you may not want to capture
VAR-
CHAR(MAX)
or
VARBINARY(MAX)
contents, when all that has changed is a
BIT
column.
The other is
@filegroup_name
, which lets you place the captured data on a filegroup
other than
PRIMARY/DEFAULT
. The other three are
@capture_instance
, which allows
you to specify a name for your
CDC
instance (because you can have multiple captures
on the same table);
@index_name
, allowing you to specify an unique index instead of
the primary key; and
@allow_partition_switch
, which lets you dictate whether parti-
tion switches are allowed against the source table. The

@capture_instance
parameter
can be particularly useful in preventing the system from trying to create conflicting
names for the capture instance table. For example, if you have a table called
dbo_foo.bar and another table called dbo.foo_bar, enabling both for
CDC
, without
specifying a value for
@capture_instance
, will fail. This is because
CDC
tries to name
both capture tables “dbo_foo_bar.” Although this is a fairly contrived case, if in doubt,
use the
@capture_instance
parameter to ensure you have unique names.
To retrieve information about data changes to a table, you use the new
CDC
func-
tions
cdc.fn_cdc_get_all_changes_<capture_instance>
and, if you have enabled
net changes,
cdc.fn_cdc_get_net_changes_<capture_instance>
. These procedures
require from and to parameters, but they are not based on time; instead you must
determine the range of log sequence numbers (
LSN
s) that you wish to query. To
obtain this information, you can use the function

sys.fn_cdc_map_time_to_lsn
.
Now that
CDC
is enabled for the dbo.Employees table (make sure once again that
SQL
Server Agent is running), you can make some changes to the data, and see how
you (or your applications) might query for the individual or net changes. Run the
DML
statements in listing 14.
SELECT CURRENT_TIMESTAMP;
INSERT dbo.Employees
(
EmployeeID,
FirstName,
LastName,
Salary
)
SELECT
7,
'Howard',
Listing 14 Inserting data into the Employees table
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
683
How does SQL Server 2008 solve these problems?
'Jones',
80000;
UPDATE dbo.Employees SET LastName = 'Donaldson' WHERE EmployeeID = 3;
UPDATE dbo.Employees SET Salary = Salary * 2 WHERE EmployeeID = 4;

DELETE dbo.Employees WHERE EmployeeID = 6;
UPDATE dbo.Employees SET LastName = 'Stern' WHERE EmployeeID = 7;
UPDATE dbo.Employees SET LastName = 'Malone' WHERE EmployeeID = 3;
Be sure to copy the result from the very first line in the query. You will need this to
determine the range of
LSN
s you will need to pull from the
CDC
table. Now you can
run the query in listing 15.
DECLARE
@start DATETIME,
@end DATETIME,
@lsn_A BINARY(10),
@lsn_B BINARY(10);
SELECT
@start = '<plug in the value from above>',
@end = CURRENT_TIMESTAMP,
@lsn_A = sys.fn_cdc_map_time_to_lsn('smallest greater than', @start),
@lsn_B = sys.fn_cdc_map_time_to_lsn('largest less than', @end);
SELECT operation = CASE __$operation
WHEN 1 THEN 'D'
WHEN 2 THEN 'I'
WHEN 4 THEN 'U' ELSE NULL END,
EmployeeID, FirstName, LastName, Salary
FROM cdc.fn_cdc_get_all_changes_dbo_Employees(@lsn_A, @lsn_B, 'all');
-- result:
Operation EmployeeID FirstName LastName Salary
I 7 Howard Jones 80000.00
U 3 Don Donaldson 125000.00

U 4 Teemu Selanne 227000.00
D 6 Kirby Quigley 62500.00
U 7 Howard Stern 80000.00
U 3 Don Malone 125000.00
This result

set does not include the before images of rows affected by
UPDATE
state-
ments, because it is intended to be used to make a target data source look like the
source. Hopefully you can see here that it would be easy to reconstruct all of the
DML
statements, in order to apply the same changes to another table that looked identical
to this one before you made changes. If you change the final
SELECT
to use the
get_net_changes
function instead, as shown in listing 16, you can see that the set is
compressed. Only the values necessary to make the target table look like the source
(with one row per key) are included.
Listing 15 Query against (and results from) a change data capture function
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
684
C
HAPTER
53
SQL Server Audit, change tracking, and change data capture

SELECT operation = CASE __$operation

WHEN 1 THEN 'D'
WHEN 2 THEN 'I'
WHEN 4 THEN 'U' ELSE NULL END,
EmployeeID, FirstName, LastName, Salary
FROM cdc.fn_cdc_get_net_changes_dbo_Employees(@lsn_A, @lsn_B, 'all');
-- result:
Operation EmployeeID FirstName LastName Salary
U 4 Teemu Selanne 227000.00
D 6 Kirby Quigley 62500.00
I 7 Howard Stern 80000.00
U 3 Don Malone 125000.00
And finally, as per listing 17, you can see the before and after image of each key row
throughout all updates by looking directly at the
CDC
table.
SELECT [image] = CASE __$operation
WHEN 3 THEN 'BEFORE'
WHEN 4 THEN 'AFTER' ELSE NULL END,
EmployeeID, FirstName, LastName, Salary
FROM cdc.dbo_Employees_CT
WHERE __$operation IN (3,4)
ORDER BY __$start_lsn, __$operation;
-- result:
Image EmployeeID FirstName LastName Salary
BEFORE 3 Don Mattingly 125000.00
AFTER 3 Don Donaldson 125000.00
BEFORE 4 Teemu Selanne 113500.00
AFTER 4 Teemu Selanne 227000.00
BEFORE 7 Howard Jones 80000.00
AFTER 7 Howard Stern 80000.00

BEFORE 3 Don Donaldson 125000.00
AFTER 3 Don Malone 125000.00
One challenge you might come across is when your schema changes. In this case you
will need to disable
CDC
for the table and re-enable it when the change is complete.
CDC
will not break without this action, but if you add, remove, or rename columns,
your captured data will be incomplete.
Also, because change tracking and
SQL
Server Audit are synchronous, and
CDC
polls the transaction log after the fact, it is not so straightforward to capture the user-
name responsible for the change. If this is an important part of your solution, then you
are probably better off sticking to one of the other features discussed in this chapter, or
resorting to more traditional means (for example, triggers, log reading utilities).
To clean up the
CDC
settings, you can use the code in listing 18.
Listing 16 Using the
get_net_changes
function
Listing 17 Viewing the before and after image of each key row
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
685
Comparison of features

USE HR;

GO
EXEC sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name = N'Employees',
@capture_instance ='dbo_Employees';
GO
EXEC sys.sp_cdc_disable_db;
GO
Comparison of features
At first glance, these three new features in
SQL
Server 2008 seem quite similar. As
demonstrated here, their functionality may overlap in some cases, but they are clearly
different and serve unique purposes. This treatment should help equip you with
much of the information you will need to decide which feature you will need to use.
To wrap up, table 1 should help you decide whether to use
SQL
Server Audit, change
tracking, or
CDC
.
Listing 18 Cleaning up change data capture settings
Table 1 Comparing SQL Server Audit, change tracking, and change data capture
Criteria
SQL Server
Audit
Change
tracking
Change data
capture

Capture DML statements
Yes
1
No No
Capture result of DML statements No Yes Yes
Capture before and after values No No Yes
Capture intermediate values No No Yes
Capture
SELECT
statements
Yes No No
Capture user name /
spid
Yes
Yes
4
No
Capture host name
Yes
2
Yes
4
No
Should use specific isolation level No
Yes
5
No
Require SQL Server Agent No No Yes
Available in all SKUs
No

3
Yes
No
3
1. You can see a tokenized copy of the DML statement, but the values in the statement are replaced by
parameter placeholders.
2. You can capture host name in a separate login audit event, then correlate it manually with the event in
question.
3. This feature is available in Enterprise, Evaluation, and Developer Editions only.
4. You can capture this information using a trigger to affect the context information included with the
change tracking data.
5. Using snapshot isolation level can significantly impact tempdb usage and performance. Additionally, this
may be a concern if you use distributed transactions, change schema frequently, disable constraints
when bulk loading, or take databases offline (for example, detach or auto-close). You should read up on
snapshot isolation level in Books Online: />ms177404(SQL.100).aspx
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
686
C
HAPTER
53
SQL Server Audit, change tracking, and change data capture
Summary
SQL
Server 2008 provides a healthy offering of features that can assist you in tracking
and dealing with changes to your data and schema. My goal for this chapter was to pro-
vide a useful and practical guide to help you decide how these features might help
solve data management issues in your environment. Hopefully this will give you a good
starting point on implementing one or more of these features where you need it most.
About the author

Aaron Bertrand is the Senior Data Architect at One to One
Interactive, a global marketing agency headquartered in Boston,
Massachusetts. At One to One, Aaron is responsible for database
design and application architecture. Due to his commitment to
the community, shown through blogging at http:
//www.sql-
blog.com, peer-to-peer support on forums and newsgroups, and
speaking at user group meetings and code camps, he has been
awarded as a Microsoft
MVP
since 1998. Aaron recently pub-
lished a technical white paper for Microsoft, detailing how to use
the new Resource Governor feature in
SQL
Server 2008.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
687
54 Introduction to
SSAS 2008 data mining
Dejan Sarka
With
SQL
Server 2008, you get a complete business intelligence (
BI
) suite. You can
use the
SQL
Server Database Engine to maintain a data warehouse (
DW

),
SQL
Server Reporting Services (
RS
) to create managed and ad hoc reports,
SQL
Server
Integration Services (
SSIS
) to build and use extract, transform, and load (
ETL
)
applications, and
SQL
Server Analysis Services (
SSAS
) to create Unified Dimen-
sional Model (
UDM
) cubes.
Probably the easiest step into business intelligence is using reports created with
RS
. But this simplicity has a price. End users have limited dynamic capabilities when
they view a report. You can extend the capabilities of
RS
with report models, but
using report models to build reports is an advanced skill for end users. You also
have to consider that the performance is limited; for example, aggregating two
years of sales data from a production database could take hours. Therefore,
RS

reports aren’t useful for analyses of large quantities of data over time directly from
production systems.
In order to enable end users to do dynamic analysis—online analytical process-
ing (
OLAP
)—you can implement a data warehouse and
SSAS

UDM
cubes. In addi-
tion to dynamic change of view, end users also get lightning-speed analyses. End
users can change the view of information in real time, drilling down to see more
details or up to see summary information. But they’re still limited with
OLAP
analy-
ses. Typically, there are too many possible combinations of drilldown paths, and
users don’t have time to examine all possible graphs and pivot tables using all possi-
ble attributes and hierarchies. In addition, analysts are limited to searching only for
patterns they anticipate.
OLAP
analysis is also usually limited to basic mathematical
operations, such as comparing sums over different groups, operations that end
users can solve graphically through client tool
GUI
.
Data mining (
DM
) addresses most of these limitations. In short, data mining is
data-driven analysis. When you create a
DM

model, you don’t anticipate results in
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
688
C
HAPTER
54
Introduction to SSAS 2008 data mining
advance. You examine data with advanced mathematical methods, using data mining
algorithms, and then you examine patterns and rules that your algorithms find. The
SSAS
data mining engine runs the algorithms automatically after you set up all of the
parameters you need; therefore, you can check millions of different pivoting options
in a limited time. In this chapter, you're going to learn how to perform data mining
analyses with
SSAS
2008.
Data mining basics
The first question you may ask yourself is what the term data mining means. In short,
data mining enables you to deduce hidden knowledge by examining, or training, your
data with data mining algorithms. Algorithms express knowledge found in patterns
and rules. Data mining algorithms are based mainly on statistics, although some are
based on artificial intelligence and other branches of mathematics and information
technology as well.
Nevertheless, the terminology comes mainly from statistics. What you’re examin-
ing is called a case, which can be interpreted as one appearance of an entity, or a row
in a table. The attributes of a case are called variables. After you find patterns and
rules, you can use them to perform predictions. In
SSAS
2008, the

DM
model is stored
in the
SSAS
database as a kind of a table. It’s not a table in a relational sense, as it can
include nested tables in columns. In the model, the information about the variables,
algorithms used, and the parameters of the algorithms are stored. Of course, after the
training, the extracted knowledge is stored in the model as well. The data used for
training isn’t part of the model, but you can enable drillthrough on a model, and use
drillthrough queries to browse the source data.
Most of the literature divides
DM
techniques into two main classes: directed algo-
rithms and undirected algorithms. With a directed approach, you have a target vari-
able that supervises the training in order to explain its values with selected input
variables. Then the directed algorithms apply gleaned information to unknown exam-
ples to predict the value of the target variable. With the undirected approach, you’re
trying to discover new patterns inside the dataset as a whole, without any specific tar-
get variable. For example, you use a directed approach to find reasons why users pur-
chased an article and an undirected approach to find out which articles are
commonly purchased together.
You can answer many business questions with data mining. Some examples include
the following:

A bank might ask what the credit risk of a customer is.

A customer relationship management (
CRM
) application can ask whether there
are any interesting groups of customers based on similarity of values of their

attributes.

A retail store might be interested in which products appear in the same market
basket.

A business might be interested in forecasting sales.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
689
Data mining basics

If you maintain a website, you might be interested in usage patterns.

Credit card issuers would like to find fraudulent transactions.

Advanced email spam filters use data mining.
And much, much more, depending on your imagination!
Data mining projects
The Cross-Industry Standard Process for Data Mining (
CRISP
-
DM
) defines four main
distinct steps of a data mining project. The steps, also shown in figure 1, are as follows:

Identifying the business problem

Using
DM
techniques to transform the data into actionable information


Acting on the information

Measuring the result
In the first step, you need to contact business subject matter experts in order to iden-
tify business problems. The second step is where you use
SQL
Server
BI
suite to pre-
pare the data and train the models on the data. This chapter is focused on the
transform step. Acting means using patterns and rules learned in production.
You can use data mining models as
UDM
dimensions; you can use them for
advanced
SSIS
transformations; you can use them in your applications to implement
constraints and warnings; you can create
RS
reports based on mining models and
predictions; and more. After deployment in production, you have to measure
improvements of your business. You can use
UDM
cubes with mining model dimen-
sions as a useful measurement tool. As you can see from figure 1, the project doesn’t
have to finish here: you can continue it or open a new project with identifying new
business problems.
The second step, the transform step, has its own internal cycle. You need to under-
stand your data; you need to make an overview. Then you have to prepare the data for

data mining. Then you train your models. If your models don’t give you desired
results, you have to return to the data overview phase and learn more about your data,
or to the data preparation phase and prepare the data differently.
Measure
IdenƟfy
Transform
Act
Figure 1 The CRISP-DM standard
process for data mining projects
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

×