Tải bản đầy đủ (.pdf) (10 trang)

Hands-On Microsoft SQL Server 2008 Integration Services part 48 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (478.22 KB, 10 trang )

448 Hands-On Microsoft SQL Server 2008 Integration Services
Dimension Columns Change Type
Country Historical attribute
Email Fixed attribute
Home Phone Changing attribute
Work Phone Changing attribute
Mobile Phone Changing attribute
When you’ve configured all the columns, the wizard screen should look as shown
in Figure 10-20. Click Next to move on.
16. In the Fixed and Changing Attribute Options screen, uncheck the “Fail the
transformation if changes are detected in a fixed attribute” option, because
we will make changes to the Email field to test how it goes. In a production
environment, you need to consider both of the options in this page in light of
your requirements. Click Next to open the Historical Attribute Options screen.
Figure 10-20 Setting change types on the dimension columns
Chapter 10: Data Flow Transformations 449
17. You have two options to record a Historical attribute. If you select a single column
option, you can then select a column and the value pair to indicate the Current/
Expired or True/False values. Alternatively, you can select start and end dates to
identify current and expired records. Select the radio button next to the “Use start
and end dates to identify current and expired records” option. Select StartDate in the
Start Date column and EndDate in the End Date column fields. Then select System::
StartTime in the “Variable to set date values” field, as shown in Figure 10-21.
This will update the values of StartDate and EndDate fields using the start time
of the package provided by the specified system variable. Click Next.
18. By default, the inferred member support is enabled in the SCD transformation,
and you can select from either of the two options to identify the inferred member
record. In the first option, you can specify null as the column values for all columns
with an inferred member change type, and in the second option, you specify an
indicator column. Select the radio button for “Use a Boolean column to indicate
whether the current record is an inferred member.” Select IsInferred Column from


the drop-down list in the Inferred Member Indicator field as shown in Figure 10-22.
Click Next to review the outputs summary of the SCD Wizard.
Figure 10-21 Setting Historical attribute options in the SCD Wizard
450 Hands-On Microsoft SQL Server 2008 Integration Services
19. Click Finish after reviewing the summary information. The SCD Wizard will
take a while to build the data flow for you, but when it’s done, you will have a lot
to review. After moving some of the components around, your data flow should
look like the one shown in Figure 10-23.
The SCD Wizard has added a data flow to the four outputs out of six outputs
available. Let’s do a quick review.
Inferred Member Updates Output
c Double-click the OLE DB Command
transformation attached to this output and go to the Component Properties tab.
Check the SQL Statement in the SqlCommand field to find that this statement
will update the DimCustomer for the Inferred member record (IsInferred = 1)
and also reset its status while updating (IsInferred = 0). Click Cancel.
Changing Attribute Updates Output
c Double-click the OLE DB Command
attached to this output and go to the Component Properties tab. Check the SQL
Statement in the SqlCommand field to find that this statement will update the
Changing attributes—i.e., the phone numbers for the member records. Click
Cancel to exit.
Figure 10-22 Inferred member support in the SCD Wizard
Chapter 10: Data Flow Transformations 451
Historical Attribute Inserts Output c Double-click the Derived Column
component attached to this output and note that this transformation creates a
new column EndDate and populates the package start time in this new column.
Click Cancel. Double-click the OLE DB Command attached to the output of the
Derived Column transformation and go to the Component Properties tab. Check
the SQL Statement in the SqlCommand field to find that this statement will

update the member record of the DimCustomer table with the EndDate derived
in the previous component hence expiring the active record. Click Cancel. After
the OLE DB Command, the output is then combined with the data flow of New
Output using the Union All component. Double-click the Union All component
to open the editor and note that the EndDate field created earlier has not been
passed on to the New Output data flow. is will actually leave the EndDate column
value null, indicating that the added record is the active member. So, in the Historical
Figure 10-23 Data flow created by the SCD Wizard
452 Hands-On Microsoft SQL Server 2008 Integration Services
Attribute Inserts Output data flow, the current member is first expired and then
a new row is inserted for the same member with the new attributes as an active
member.
New Output
c is output contains new records that are combined with the “to be
inserted” records from Historical Attribute Inserts output as stated in the preceding
item. After that, the output is connected to Derived Column component. Double-
click the Derived Column component and note that it adds a new StartDate column
and populates it with a package start time using a system variable. Finally, it connects
to an OLE DB Destination to insert these new records into the DimCustomer table.
Two other outputs, Fixed Attribute Output and Unchanged Output, are available
at the Loading DimCustomer SCD transformation for which SCD Wizard has not
created any data flow. However, if you want to capture the rows from those outputs,
you can create a downstream data flow for them.
Exercise (Execute Loading Slowly Changing Dimension Package)
In this final part, you will execute the package to see how SCD deals with different
types of updates.
20. Double-click the New Output Data Flow Path to open the Data Flow Path Editor.
Go to Data Viewers page and then click Add. Click OK to add a grid-type data
viewer in the Configure Data Viewer dialog box. Click OK again to close the
editor. Similarly add data viewers to the other three outputs.

21. You are now ready to run this package. First, though, let’s see the data that is
going to be played and replayed to see the responses of SCD transformation to
different changes. Open SQL Server Management Studio and connect to the
database engine. Open a new query pane and run the following query to see
records in the Customer table:
Select * from [Campaign].[dbo].[Customer]
Note that the Customer table contains 14 records and the CustomerID of the
last four records is NULL. As per our package design, the Conditional Split
transformation should filter out these records. Switch to BIDS, right-click the
Loading Slowly Changing Dimension.dtsx package in the Solution Explorer, and
choose Execute Package from the context menu.
As the package executes, the data viewer that gets data first will pop up, which is
the one on New Output in our case. If you move around on the screen, as shown
Chapter 10: Data Flow Transformations 453
in Figure 10-24, you will Note that the Customers Filter component diverts
four rows to the Excel destination and sends the other ten rows to Loading
DimCustomer SCD transformation. Further, the SCD transformation identifies
that all these records are in fact new records and diverts them to New Output.
No record flows through any other output. Click the Detach button on the New
Output Data Viewer to let the data flow to the destination and be inserted into
DimCustomer table. Click the Detach button on other data viewers and close
them. Press -
5 to return to design mode.
Figure 10-24 Executing Loading Slowly Changing Dimension Package
454 Hands-On Microsoft SQL Server 2008 Integration Services
22. Let’s make changes to Customer’s data and see how this transformation works
with them. Run the CustomerChanges.sql from the C:\SSIS\RawFiles folder in
the query pane of SQL Server Management Studio. This script makes changes to
the first three records in Customer table as follows:
CustomerID Column Changed New Value

101 [Home Phone] 020885711000
102 [Email]
103 [Address Line 1]
[Address Line 2]
15 Abercrombie Avenue Wooburn Green
Then it adds an Inferred member record first in DimCustomer table with
minimal information as shown here:
CustomerID Columns Value
111 [IsInferred]
[StartDate]
1
getdate()
Then inserts additional information for inferred member in the Customer table,
as shown next, to simulate a real-life scenario:
CustomerID Columns Value
111 [CustomerID]
[FirstName]
[LastName]
[Address Line 1]
[Address Line 2]
[Address Line 3]
[City]
[State]
[Postcode]
[Country]
[Email]
[Home Phone]
[Work Phone]
[Mobile Phone]
111

Mark
Morris
Flat 22, Crescent Flats The Ridgeway
Sketty
Chertsey
West Midlands
PE7 3RQ
United Kingdom


079576516756
23. After running the CustomerChanges.sql script successfully, switch to BIDS and
execute the package again. As the package executes, this time three data viewers
will pop up. First, you will see a record in the data viewer attached to Changing
Attribute Updates output. Note that this is the record for which you changed the
Chapter 10: Data Flow Transformations 455
home phone number, which is allowed, and no history is to be kept. Click Detach
to let it process. Then look at the data viewer attached to the Historical Attribute
Inserts Output will display a record with CustomerID = 103. Note that this is the
record that changed the address for which a history is to be kept. This record will
first update the existing record with EndDate and then will be combined with the
New Output using Union All transformations so as to be inserted as a new record.
Click Detach to let it process. Next look at the data viewer attached to Inferred
Member Updates Output that shows a record with CustomerID = 111. This
record will be updated in the DimCustomer table. Click Detach to let it process.
The New Output path will also be processed and the package will complete.
24. Press -5 to switch back to design mode. Press -- to save all the
files and close the project.
25. Switch to SQL Server Management Studio and check the DimCustomer table to
see that it has been loaded with the expected values.

Review
This exercise has explained loading a dimension, which is a daily chore of a DBA’s life.
Loading a data warehouse dimension has been made easy with the Slowly Changing
Dimension transformation, and configuring this transformation has been made easier
still by the SCD Wizard. The SCD Wizard does so much work for you that you will
probably use it every time you need to configure a SCD transformation. With the
ease of use and coverage for most of the loading scenarios for a dimension, you may
think perhaps SCD has nailed the problem; however, SCD has a shortcoming: its
performance when it comes to loading into a large dimension. The use of OLE DB
components to update each member in a dimension slows down the SCD to almost
unacceptable levels of performance. You will need to do some work here to improve
performance. One solution could be to create indexes on input and destination
columns, while the other one could be to replace OLE DB Command with a staging
OLE DB destination and an Execute SQL task to upload a data set virtually converting
a row-based operation into a set-based operation. I’ve seen other customized solutions
where developers have derived different flags, first using a script task and then loading
the data set straightaway, converting the row-based decision-making logic at run time
to a set-based operation and gaining performance. However, whichever method you
use, testing for performance is the last, must-do step.
Data Mining Query Transformation
You can use a Data Mining Query transformation whenever you want to perform
prediction queries against data mining models. This transformation has one input and
one output and no error output. As you can envision, to execute your Data Mining
456 Hands-On Microsoft SQL Server 2008 Integration Services
Extensions (DMX) query will require you to create a connection to the data mining
model, so the user interface of this transformation provides two tabs—Mining Model
and Query—for you to configure. In the Mining Model tab, you specify the Analysis
Services Connection Manager to connect to an Analysis Services project or server.
You select the mining structure from the drop-down list in the Mining Structure field.
Once you’ve selected a mining structure, the Mining Models field lists the mining

models associated with that mining structure.
In the Query tab, you can either type in your query or use the graphical query
designer by clicking Build New Query. The graphical query designer helps you build
DMX queries. If you type directly in the query mode and later on switch to design
mode, you will get a prompt alerting you that your changes may get lost. The designer
in the Query Builder allows you to select mining models and data flow input columns
from two list boxes. You can drag and drop fields from these boxes onto the designer
cells and build custom DMX queries for evaluating data flow input data against an
existing mining model You can create more than one prediction query using multiple
mining models. However, the mining models you select must belong to the same
mining structure.
Term Lookup Transformation
Using the Term Lookup transformation, you can count the number of times a text term
occurs in the input data row and create custom word lists and word frequency statistics.
This transformation reads the terms from a lookup table to look for matches in an
input column and then, by default, adds two columns named Term and Frequency
to the output containing the term and the count for the term. The transformation
supports one input and one output. The reference data can be in SQL Server 2000
and later or in an Access database, and this transformation can do a lookup only on
a column that is either the DT_WSTR or the DT_NTEXT data type.
When you open the editor, you will see the following three tabs:
Reference Table
C Specify an OLE DB Connection Manager and a reference
table name from which the transformation can read lookup terms.
Term Lookup
C Map an Input column to a Reference column to indicate that the
lookup terms in the Reference column are to be counted in the Input column. You
also select the columns that you want to pass through the data flow. Based on your
selections, an advanced property—the InputColumnType of the Input column—
is set that is available in Advanced Editor. Selecting a column only for pass-through

sets the InputColumnType property to 0, mapping a column for lookup only sets
the InputColumnType property to 1, and selecting a column for both pass-through
and lookup operations sets the InputColumnType property to 2.
Chapter 10: Data Flow Transformations 457
Advanced c Select to use a case-sensitive term lookup in which uppercase words
are treated separate from lowercase words. However, if a word is a first word in
a sentence and its first letter is capitalized, this can still match with a lowercase
equivalent—so, for example, the word travel will match with the word Travel in
the sentence “Travel in style and comfort.”
It is relatively straightforward to configure this transformation. The mechanics are
also simple. This transformation loads the terms it is to look up from the reference
table in its private memory. It works in a fully precached mode only, so it loads all the
values before processing lookups against the input column. However, to get accurate
results you need to understand how this transformation behaves for different types of
matches. The Term Lookup transformation extracts the required term from the input
column by breaking the text into sentences, breaking sentences into words, and then
normalizing the words. To extract the matching term, this transformation observes the
following rules:
If you specify the singular form of the word or phrase in the reference table, this
c
transformation will match both singular and plural forms of the word or phrase.
If you use a plural form of the noun or noun phrase in the reference table, the
c
transformation matches it only with a plural form of the noun or noun phrase in
the input data.
If you want to do a match for nouns and noun phrases that contain special
c
characters such as %, @, &, $, #, *, :, ;, ., , , !, ?, <, >, +, =, ^, ~, |, \, /, (, ), [, ], {, }, “,
and ‘, you can do so by including these special characters in the nouns and noun
phrases in the reference table.

e Term Lookup transformation returns only one result for any lookup input c
column in which multiple overlapping terms are involved.
While normalizing the input column words, the Term Lookup transformation
c
will affect the last word in the lemmatized noun phrase for normalization.
Term Extraction Transformation
Using a Term Extraction transformation, you can extract terms from the text of an
input column and can thus build a list of terms used repeatedly in the input column
for text mining and data analysis. This transformation, however, is limited in that it
can extract only nouns or noun phrases or a combination of both, in English text only.
It is aware of linguistic information about English and comes with its own English
dictionary. At run time, the Term Extraction transformation reads the specified input

×