Tải bản đầy đủ (.pdf) (10 trang)

Hands-On Microsoft SQL Server 2008 Integration Services part 43 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (474.83 KB, 10 trang )

398 Hands-On Microsoft SQL Server 2008 Integration Services
specify in the Cache Size fields. Two Cache Size fields have been provided in the
Partial caching section—one is for a 32-bit system and the other one applies to a
64-bit system. What happens with partial cache mode is that the partial reference
data rows are loaded into the cache based on the cache size value and as the
lookup operation applies, the input data rows are matched against the reference
data in the cache. If a match is found, the row is directed to the match output, and
if a match is not found in the cache, a query is sent to the OLE DB source to find
if a match exists. Again, if a match is found the row is sent to the match output;
however, if a match is not found in the reference database the non-matching row
is directed to the error output or no-match output, depending on how you’ve
chosen to handle the no-match rows. is trip to an external reference database to
find a match or no match could be quite expensive and can hurt the performance
of a lookup operation. If your reference data is not changing while the package is
running and you’ve sufficient free memory available on the server, you can choose
to store the no-matching rows in the cache using the option “Enable cache for
rows with no matching entries.” You can also apply a limit on the amount of cache
no matching rows can use by specifying a percentage value in the Allocation From
Cache field. e next time the no-matching row with the same key appears in the
input data set, the lookup transformation already knows about this, as it has it in
the cache and doesn’t do a trip out to the database. is process goes on till the
cache is completely filled up, at which time the Lookup transformation removes
the least frequently used rows from the cache.
You can also improve the query performance by selecting the Modify The SQL
Statement check box. Using this option, you can use an SQL statement to select a
subset of the reference data set. For example, if you have selected a table or a view
as the reference data set in the Connection page that contains address details for
the whole country and your input records contain records for a state only, you don’t
need to load all the reference data set in the memory; rather, you need to load only
the particular state’s address details. To limit the reference data set dynamically,
you can also use values or parameters in the WHERE clause of the SQL statement


and use the Parameters button to map parameters to input columns. When you use
parameters in the WHERE clause mapped to input columns, these parameters are
first evaluated from the input data set at run time and limit the data set that needs
to be referenced and hence increases overall performance.
No cache
c is option does not load reference data set into the cache; rather,
it makes a connection to the external data source using an OLE DB Connection
Manager to perform the lookup operation. is option works like partial cache
mode with a difference that all the input rows are matched against data in the
database table, other than the last retrieved row, which always resides in the cache.
Chapter 10: Data Flow Transformations 399
When you choose this option, the Partial Caching section in the Advanced option
is disabled as there is no cache to configure. However, the Custom query can be
used to modify the SQL statement that is applied in the Connection page.
You can specify the connection settings for the reference data set in the Connection
page. The graphical interface in this page changes to match the options you have
selected in the General page. When you select full cache mode, you can choose a
Connection type of either a Cache Connection Manager or an OLE DB Connection
Manager. If you choose the Cache Connection Manager option, the Connection page
has only one field in which you can specify the Cache Connection Manager. However,
if you choose an OLE DB Connection Manager in the case of full cache mode or are
using a partial cache mode or no cache mode, the interface of the Connection page
changes to allow you to specify settings for an OLE DB Connection Manager. In this
case, the reference data set can be a table or view in a database or can be a result of an
SQL statement. So, you specify an OLE DB Connection Manager, select a table or
view or an SQL query if you prefer to use an SQL query in the Connection page. In
this page, you can either select an existing table or create a new table by clicking the
New button provided for the “Use a table or a view” option and can populate this table
at run time before this Lookup transformation executes. If you select “Use results of
an SQL query,” you can either type your SQL statement directly in the Option field

or you can build the query with the Query Builder, which you can invoke by clicking
Build Query. You can also read the query from a file by clicking Browse.
You specify the join columns in the Columns tab and can also select reference columns
that you want to include in the data flow. To make a join, drag a column from Available
Input Columns and drop it on the column you want to join with in the Available Lookup
Columns list. You won’t be able to map or join two columns that don’t have matching
data types. To select a column to include in the output, check the box provided for the
Available Lookup Columns. As you select a column, it will be listed in the lower half
of the editor window, where you can select the lookup operation and assign an output
alias as well to the selected column. You can add the selected column as a new column or
replace the values of the same column in the input columns.
Finally, note a few cautions when you add this transformation in the data flow. The
Lookup transformation’s lookup operation is case-sensitive—i.e., the string comparison
in the lookup is case sensitive—while this may not be the case with your database if
you’re using a case-insensitive collation. So you need to be careful with the input data
set case, vis-à-vis the case of the reference data set, and you also need to understand
where the comparison is made. If you’re using full cache mode, the comparison is
made within the lookup transformation and is case sensitive; however, if you are using
no-cache mode, then the lookup operation is made in the database engine and, based
400 Hands-On Microsoft SQL Server 2008 Integration Services
on the database collation, could be case-insensitive. Similarly, you need to be clear how
you are configuring your cache in the partial cache mode and where the comparison will
happen. One solution may be to convert both the string data sets to either lowercase
or uppercase for this transformation. Another reason for lookup failures could be the
null values in data. If the lookup operation matches columns that contain null values,
it will fail when you are using no or partial caching and the comparison happens in the
database engine instead of in the lookup transformation. A possible solution for this
could be either to use full caching mode or to remove null values by modifying the SQL
statement. Last, you can join (map) columns in the lookup transformation with any
data type other than DT_R4, DT_R8, DT_TEXT, DT_NTEXT, or DT_IMAGE

because they are not supported data types for joins.
Hands-On: Updating PersonContact Data
You receive contact details for persons interested in your products from different
sources. As a policy, you standardize data as you insert it into a PersonContact table by
performing lookups for address details on the basis of postal code, which is a mandatory
requirement for the data to be accepted. This data mart will be used for marketing
purposes, and hence you want to make sure that all the contacts have a Salutation and
there are no duplicated contacts in the table.
Method
In this exercise, you receive data in two different file formats, the Microsoft Excel file
format and the flat file format, and you will load this data into a PersonContact table in
the Campaign database after applying the required transformations. In the data flow of
this package, you’ll combine the data from these two files, format it to the required data
types, generate a salutation in the data flow, perform a lookup on the postal code to get
the correct city, and delete duplicate records before loading it into the table. Here is the
step-by-step process:
Add data flow sources to get the data in the data flow.
C
Convert the data to match the data types outputted by both the data flow sources
C
and then combine the data from both data streams.
Derive a salutation.
C
Enhance address details.
C
Chapter 10: Data Flow Transformations 401
Delete duplicates and insert data into the PersonContact table. c
Execute the package to see the results. c
Exercise (Add Data Flow Sources)
You will create a new Integration Services project that will subsequently be used for

all the packages developed in this chapter. You will add a data flow in the package and
then will add data flow sources in the data flow.
1. Create a new Integration Services project with the following details:
Name Data Flow transformations
Location C:\SSIS\Projects
2. When the blank project is created, rename the package as Updating
PersonContact.dtsx.
3. Drop the Data Flow task from the Toolbox onto the Control Flow Designer
surface and then double-click it to go to the Data Flow tab.
4. In the Data Flow tab, drag and drop a Flat File source and an Excel source from
the Toolbox onto the Designer surface.
5. Double-click the Flat File source to open the Flat File Source Editor. Click
the New button opposite the Flat File Connection Manager field to create a
connection manager.
6. In the Flat File Connection Manager Editor, type PersonDetails01 in the
Connection Manager Name field. Specify C:\SSIS\RawFiles\PersonDetails01.txt
in the File Name field. Select the check box for the “Column names in the first
data row” option. The rest of the options will be automatically filled for you with
default, as shown in Figure 10-1.
7. Go to the Columns page and the columns will be listed for you as the Flat File
Connection Manager selects {CR}{LF} as the row delimiter and comma {,} as
column delimiter values automatically.
8. Go to the Advanced page where you can specify the column data type details. In
this exercise, our final destination is the PersonContact table, so you can check
the schema of this table and update column data types accordingly. The following
table shows these details, which you need to apply to the column properties in
this page. For example, to specify settings for the Postcode column, click this
column and then in the right side pane, go to the OutputColumnWidth field and
change it to 12. Make sure DataType is set to String [DT_STR]. Similarly apply
402 Hands-On Microsoft SQL Server 2008 Integration Services

the following settings to all the columns. After having applied all the settings,
click OK to close the page and return to the Flat File Source Editor.
Name OutputColumnWidth DataType
FirstName 50 string [DT_STR]
LastName 50 string [DT_STR]
Gender 1 string [DT_STR]
Married 1 string [DT_STR]
AddressLine1 255 string [DT_STR]
AddressLine2 255 string [DT_STR]
Postcode 12 string [DT_STR]
Figure 10-1 Flat File Connection Manager settings
Chapter 10: Data Flow Transformations 403
9. Go to the Columns page and check out that all the Available External Columns
have been selected. Click OK to close the editor. Right-click the Flat File source
and rename it PersonDetails01.
10. Right-click the Excel source and rename it PersonDetails02. Double-click
PersonDetails02 to open the Excel Source Editor. Click the New button provided
opposite to the OLE DB Connection Manager field.
11. In the Excel Connection Manager dialog box, specify C:\SSIS\RawFiles\
PersonDetails02.xls in the Excel File Path field. Leave Microsoft Excel 97-2003
specified in the Excel Version field, and leave the check box selected for “First
row has column names.” Click OK to create this connection manager.
12. In the Excel Source Editor, leave “Table or view selected” in the Data Access
Mode field. Select PersonDetails02$ in the “Name of the Excel sheet” field from
the drop-down list and go to the Columns page to see that all the Available
External Columns have been selected. Click OK to close this editor.
13. Rename the Excel Connection Manager to PersonDetails02.
Exercise (Combine Two Data Streams)
In this part of the exercise, you will add a Data Conversion transformation to convert
the data type of columns coming from the Excel source to match with the data types

of columns coming through Flat File source and then use Union All transformation to
combine these two data streams.
14. Right-click the Excel source and choose Show Advanced Editor from the context
menu. Go to the Input and Output Properties tab, expand the Excel Source Output
and check out the properties of columns listed in External Columns and Output
Columns. Note that these columns are of the Unicode string [DT_ WSTR] data
type with column width (Length) equals 255. These columns need to be converted
to the data type so that you can combine them with the columns from the Flat File
source. Close the Advanced Editor.
15. Drop a Data Conversion transformation from the Toolbox onto the Data Flow
surface just below the Excel source. Join the Excel source with this transformation
using the green arrow.
16. Double-click the Data Conversion transformation to open its editor. As you
want to convert the data type of all the columns, select the check boxes provided
in front of all the columns in the Available Input Columns. As you select the
columns, a row for each column will be added in the lower grid section, where
you can configure the changes.
17. For the FirstName Input Column, note its Output Alias, which means the converted
column will be added to the transformation output as Copy of FirstName. Click
in the Data Type field and change it to string [DT_STR] from the drop-down list
404 Hands-On Microsoft SQL Server 2008 Integration Services
and change the Length to 50. A Code Page will be selected by default based on the
code page of the computer; however, you can change it if you need to by selecting
a different code page from the drop-down list. Similarly, change settings for all the
columns as shown in the Figure 10-2.
18. Click OK to close the transformation. Rename the Data Conversion transformation
Converting PersonDetails02.
19. Drop the Union All transformation from the Toolbox on the Data Flow surface
between PersonDetails01 and Converting PersonDetails02. Join both of these
components to the Union All transformations using green arrows.

Figure 10-2 Converting Excel source data using Data Conversion transformation
Chapter 10: Data Flow Transformations 405
20. Double-click the Union All transformation to open the editor. Here, you will
see Output Column Name columns automatically mapped to Union All Input
1 columns, whereas you will need to map Union All Input 2 columns yourself.
Click in the FirstName column under Union All Input 2 and select the converted
Copy of FirstName column. If you select FirstName in this column, you will get
an exception error for incompatible data type. Map all the output columns to the
converted columns as you did for FirstName column (Figure 10-3).
21. Click OK to close the editor. Rename the Union All transformation Merging
PersonDetails01 and PersonDetails02.
Figure 10-3 Combining two data streams using the Union All transformation
406 Hands-On Microsoft SQL Server 2008 Integration Services
Exercise (Derive Salutation)
You will use a Derived Column transformation in this part to derive a salutation. You
need to derive a salutation as Mr., Mrs., or Miss.
22. Drop a Derived Column transformation from the Toolbox below the Merging
PersonDetails01 and PersonDetails02 and join both of them by dragging the
green arrow onto the Derived Column transformation.
23. Double-click the Derived Column transformation to open the editor. This
transformation has an Expression Builder interface. Click in the Derived Column
Name field and type Salutation. By default, <add as new column> will be displayed
in the Derived Column field, which you can change to replace an existing column
from the drop-down list.
24. Type the following expression in the Expression field (refer to Figure 10-4):
Gender == "m" ? "Mr." : (Gender == "f" && Married == "y" ? "Mrs." : "Miss")
Figure 10-4 Deriving a salutation using Derived Column transformation
Chapter 10: Data Flow Transformations 407
In this expression, you are using a conditional operator that returns one of the two
expressions based on the evaluation of the Boolean expression.

25. Select string [DT_STR] from the drop-down list in the Data Type field and specify
4 in the Length column if not automatically specified. If you can’t change the data
type here, you can in the Input and Output Properties tab of the Advanced Editor.
Click OK to close the editor. Rename this transformation Deriving Salutation.
Exercise (Enrich Address Details)
One of the requirements you will address in this exercise is to standardize the address detail
by including City on the basis of the Postcode available in the data. You’ve a table called
PostalCity in the Campaign database that contains a list of cities against postal codes, and
you will perform a lookup operation to get the correct city against a postal code. As a lookup
operation is a case-sensitive operation in SSIS, you will convert the Postcode column in
both places—i.e., in the pipeline as well as in the reference data to UPPERCASE. Let’s
first use a Character Map transformation to convert the pipeline data.
26. Drop a Character Map transformation from the Toolbox on the Data Flow
Designer. Join the Character Map transformation with Deriving Salutation by
dragging and dropping the green arrow from the latter.
27. Double-click the Character Map transformation to open the editor. Click in the
check box next to the Postcode column to select it. A row will be added in the
lower grid area with Input Column selected as Postcode.
28. Check that the Destination column in the lower grid has only two possible values:
New Column and In-place Change. Select the In-Place Change value in the
Destination column. Note that the Output Alias is also changed to Postcode (see
Figure 10-5).
29. Invoke the drop-down list of available operations in the Operation field. Select
Uppercase by clicking the check box in front of it. Click OK to complete your
selection.
30. Click OK to close the editor. Rename Character Map transformation Uppercasing
Postcode.
31. Now drop a Lookup transformation on the Data Flow surface after the Uppercasing
Postcode transformation. Join both of these transformations using a green data
flow path.

32. Double-click the Lookup transformation to open the editor. Select the Partial
Cache mode in the General page.
33. In our data stream, some of the postal codes will not find matches for cities, so
they will fail the Lookup transformation. You will configure this transformation
to divert these mismatching records to a flat file so that you can review them later
on. Select the “Redirect rows to no match output” option under the “Specify how
to handle rows with no matching entries” field.

×