Tải bản đầy đủ (.pdf) (10 trang)

Hands-On Microsoft SQL Server 2008 Integration Services part 44 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (476.11 KB, 10 trang )

408 Hands-On Microsoft SQL Server 2008 Integration Services
34. Go to Connection page, click the New button next to the OLE DB Connection
Manager field, and choose localhost.Campaign from the Data Connections list;
then click OK to add this connection manager.
35. Select the Use results of an SQL query option and type in the following SQL query.
SELECT [City], UPPER([Postcode]) Postcode FROM [Campaign].[dbo].[PostalCity]
36. Go to the Columns page and map the Postcode column in the Available Input
Columns list to the Postcode in the Available Lookup columns.
Figure 10-5 Configurations of Character Map transformation
Chapter 10: Data Flow Transformations 409
37. Click in the City Column check box to add this column as a new column in the
output, as shown in Figure 10-6.
38. Click OK to close the Lookup transformation. Rename this transformation as
Adding City Column.
39. From the Data Flow destinations section of the Toolbox, drag and drop the Flat
File destination below the Adding City Column transformation. Drag the green
arrow from the Adding City Column and drop it on the
Flat File destination.
This will open the Input Output Selection dialog box. Select Lookup No Match
Output in the Output field and click OK to close.
Figure 10-6 Adding a new column in the output using the Lookup transformation
410 Hands-On Microsoft SQL Server 2008 Integration Services
40. Double-click the Flat File destination to open its editor and click the New button
next to Flat File Connection Manager. Select the Delimited Option radio button
in the pop-up dialog box asking you to select a Flat File Format, and then click
OK. This will open the Flat File Connection Manager Editor.
41. Type No Match Lookups in the Connection Manager Name field and C:\SSIS\
RawFiles\NoMatchLookups.txt in the File Name field. Click to check the
Column names in the first data row option.
42. Go to the Columns page and check out that {CR}{LF} is selected as Row delimiter
and Comma {,} is selected as Column delimiter. Go to the Advanced page and


check out the properties of the columns. Note that the connection manager has
correctly picked up the data types and the length of the columns. Click OK to close
it and return to the Flat File Destination Editor. Leave the Overwrite Data in the
file option selected.
43. Go to the Mappings page to create mappings between the Available Input Columns
and Available Destination Columns and then click OK to close it. Rename Flat File
destination as No Match Lookups File.
Exercise (Delete Duplicates and Load PersonContact)
In this part of the exercise, you will first delete the records from the PersonContact
table that are to be updated and are also coming in the pipeline and will then insert all
the records coming in the data flow into the PersonContact table.
44. Drop the OLE DB command from the Toolbox below the Adding City Column
transformation. Connect both of these transformations using the green data
flow path.
45. Double-click the OLE DB command to open the Advanced Editor for OLE
DB Command. Choose localhost.Campaign in the Connection Manager field
from the drop-down list in the Connection Managers tab. Go to the Component
Properties tab.
46. Type Deleting Duplicates in the Name field. Type the following SQL statement
in the SqlCommand field and click OK.
DELETE PersonContact WHERE FirstName = ? AND LastName = ?
47. Go to Column Mappings page, where you will see Param_0 and Param_1 columns
in the Available Destination Columns. These represent parameters that you’ve
used in the preceding SQL statement. Map Param_0 to FirstName and Param_1
to LastName columns in the Available Input columns (see Figure 10-7) and click
Refresh. This should remove any validation error appearing at the bottom of the
Advanced Editor. Click OK to close.
48. Drop an OLE DB destination on the Designer surface and join it with the green
data flow path from the Deleting Duplicates transformation.
Chapter 10: Data Flow Transformations 411

49. Double-click the OLE DB destination to open the editor. You will see localhost
.Campaign selected in the OLE DB Connection Manager field, as this is the
only OLE DB Connection Manager configured in the package. Leave “Table
or view—fast load” selected in the Data access mode field. Select the [dbo].
[PersonContact] table from the drop-down list in the “Name of the table or the
view” field.
50. Go to the Mappings page to create the required mappings. Click OK to close
this. Rename this destination PersonContact. After some adjustments, your
package should look like one shown in Figure 10-8. Press -- to save
all the items in the project.
Figure 10-7 Mapping parameters in the OLE DB Command transformation
412 Hands-On Microsoft SQL Server 2008 Integration Services
Exercise (Execute the Package)
Here, you will add a data viewer on the Data Flow path to see the data flowing through
the pipeline at run time and execute the package.
51. You can add as many data viewers to the package as you like before executing
the package, but as a minimum, add a data viewer before the Deriving Salutation
component and before the Deleting Duplicates component. To add a data viewer,
double-click the green line joining the two components to open the Data Flow
Path Editor. Go to the Data Viewers page, click Add, and then click OK in the
Configure Data Viewer dialog box to return; then click OK again to add a grid
type data viewer.
52. Press 5 to execute this package. As the package is executed, you will see two
data viewer output windows appear. If you adjust them on the screen (without
Figure 10-8 Updating the PersonContact package
Chapter 10: Data Flow Transformations 413
dropping them on each other) so that you can see the background package as
well, you can notice that the package execution is halted till the place where the
data viewer is added in the data flow.
53. In the beginning of the package execution, note that 20 rows from flat file and

30 rows from Excel file are combined by the Union All transformation. In the
first data viewer, you can see that 50 records (see in the status bar at the bottom
of the data viewer window) have been combined and are flowing as a single data
collection. Detach this data viewer and see the data in the second data viewer,
where the Salutation has been derived and the City column has been populated
with city names. This data viewer shows only 46 records, as 4 records did not
find an exact match in the lookup table and so were sent out to the No Match
Lookups output—i.e., No Match Lookups File. Click Detach to complete the
package execution.
54. Press -5 to stop debugging. Close the project and exit BIDS.
Review
In this first data flow Hands-On exercise, you used several components that are the
basic building blocks in a data flow. You also saw how you can use these components
one after another to convert data to match the succeeding components requirements,
while keeping in line with the final destination’s requirements. You also captured
records from the Lookup transformation for which there were no exact matches in
the Lookup table. The concept of a Lookup table is extensively used to standardize
data and to identify updates and inserts in the loading process. There are examples in
Books Online that you can refer to understand more about configurations of lookup
transformation in full cache mode or partial or no cache mode. Later in the exercise
you have used an OLE DB Command transformation that uses parameters to map
to input column values. If you are updating a table with a large number of columns, it
will become confusing and troublesome to configure OLE DB transformation as the
parameter names are not intuitive. Fortunately, this problem can be solved by using
a stored procedure than using raw SQL in the SqlCommand property. For example,
in the preceding case you could also create a stored procedure as follows:
CREATE PROCEDURE uspDeletingDuplicates
@FirstName VARCHAR(50),
@LastName VARCHAR(50)
AS

BEGIN
SET NOCOUNT ON;
DELETE PersonContact
WHERE FirstName = @FirstName AND LastName = @LastName
END
414 Hands-On Microsoft SQL Server 2008 Integration Services
And you could execute the stored procedure using SqlCommand field as
EXEC dbo.uspDeletingDuplicates ?, ?
And in Column Mappings tab, you could have mapped columns using the variable
names as shown in Figure 10-9. As you can make out, it is much easier to understand
and maintain.
Figure 10-9 Updating the PersonContact package
Chapter 10: Data Flow Transformations 415
Rowset Transformations
The Rowset transformations work on record sets. These transformations first receive
all the rows and then do the data processing as the operations they perform need all
the rows upfront. Among these are the Aggregate transformation, Sort transformation,
Percentage and Row Sampling transformations, and Pivot and Unpivot transformations.
In this section, you will work through two Hands-On exercises built around these
transformations—one built around the Aggregate transformation and the other one
built around the Pivot transformation, in which you will also use the Sort transformation
along with other data flow components.
Sort Transformation
The Sort transformation allows you to sort input rows in ascending or descending order
by selecting one or more input columns for sort order criteria, similar to the ORDER
BY clause of T-SQL. To sort the records, as you can imagine, this transformation
requires collecting all the rows before applying sorting order. While sifting through the
records, this transformation can also look for duplicate records with the same sort key
values and can “de-dupe” them. The Sort transformation supports one input and one
output to perform its operation.

The user interface of this transformation is simple and has a list of Available Input
Columns in the upper half of the dialog box, along with check boxes before and after
the column names. The selection of the check boxes after the column names—i.e., the
Pass Through check boxes—allow the columns to be included in the sorted output; the
selection of check boxes before the column names allow you to sort the input records
on those columns. The lower half of the dialog has five fields to specify the columns
you want to work with and the criteria to apply for sort order.
When you open the Sort Transformation Editor, you will realize that all the Pass
Through check boxes have already been selected. This means by default all the columns
will be sent to the Sort Transformation output columns. To configure the sort order, you
select the input columns by clicking the check boxes before the Available Input Columns.
Alternatively, you can select a column from the drop-down list invoked by clicking in
the Input Column field. As you select an input column, notice that Output Alias, Sort
Type, and Sort Order are automatically assigned to this column, though you can modify
these values. The default value for the Sort Type is ascending, but it can be changed
to descending. The Sort Type setting applies a sorting order to the data in the selected
column and has nothing to do with the Sort Order field, which applies a sort order to
the columns. The Sort Order is a numerical value assigned to a column on the basis of
its position in the list of columns selected for sorting. This value starts with 1 for the first
column and is increased by 1 for subsequent columns selected. The Sort Order value
416 Hands-On Microsoft SQL Server 2008 Integration Services
determines which column is sorted first. The column with the smaller value is always
sorted before a column with bigger Sort Order value—i.e., the column with a Sort Order
value of 1 will be sorted before the column with a Sort Order value of 2, and so on.
The Sort Type and Sort Order are represented by a single property of the input column
in the Advanced Editor. If you open the Advanced Editor for Sort Transformation, go
to the Input and Output Properties tab, expand the Sort Input and then expand Input
columns, you will see a list of input columns that are available to the transformation and
have been selected to pass through. If you click any of the input columns and scroll to
the bottom of its properties, you will see the property NewSortKeyPosition, which holds

a positive or negative value. The positive value indicates that the sort type is ascending,
and a negative value indicates that the sort type is descending. The numerical value—
i.e., 1, 2, or 3 and so on—represents the sort order property of customer user interface.
A value of 0 indicates that the column is not included in the sort criteria; in fact, these
are the columns that have only Pass Through check boxes selected against them. The
input columns that are not included in the sort criteria, but are selected to pass through,
are copied along with the sorted columns to the output columns.
Once the Sort transformation collects all the rows, it can apply the sorting criteria to the
data rows. A sort operation can be an expensive process and can cause performance issues.
You need to decide what you want to achieve before using this transformation; for example,
do you want to provide all the server power to the sort process or want to limit usage of
resources for sort operation? In the Advanced Editor for Sort Transformation, you can
specify the maximum number of threads that can be used by the Sort transformation in the
MaximumThreads property in the Component Properties tab. The default value for this
property is –1, which indicates an infinite number of threads available to the transformation.
Returning to the Sort Transformation Editor, where you’ve selected input columns
and specified sort criteria, notice the Comparison Flags field. If you click in this field,
you will see a list of options. The purpose of this field is to specify how the sorting
should handle the data comparison. As you sort columns, the Sort transformation
compares the data to allocate a proper sort order for the data rows. You can specify
the options in this field that can affect the sensitivity of the comparison of the data.
For example, you can choose the Ignore Case option to specify that uppercase and
lowercase data is to be treated equally. Six options are available, and you can choose
more than one option here:
Ignore Case
c You choose this option to specify that while comparing data, the
Sort transformation will not distinguish between uppercase and lowercase letters.
Ignore Kana Type
c If you are using the Japanese language, selecting this option
requires the Sort transformation to ignore the distinction between hiragana and

katakana, the two types of Japanese kana characters. For the benefit of those who
Chapter 10: Data Flow Transformations 417
are less acquainted with Japanese, kana is a general term used to express the two
types of Japanese syllabic scripts. Hiragana is a cursive and flowing variety of kana
used in most modern Japanese texts, and katakana is a relatively angular kana used
for writing foreign words or official documents such as telegrams.
Ignore Nonspacing Characters c e data you deal with may contain diacritics,
especially in the age of Internet when the data entry is left in the hands of end
users across the globe. A diacritic is a mark added to a letter to indicate a special
phonetic value—such as the acute accent of resumé. Selecting the check box for
this option treats the spacing characters and diacritics alike.
Ignore Character Width
c Using this option allows you to treat the single-byte
characters (non-Unicode) and double-byte characters (Unicode characters) alike.
Integration Services may automatically convert the data to Unicode data before
comparing text data.
Ignore Symbols c Sometimes it is useful to compare the string data by
eliminating the symbols and white-space characters. is may be due to poor
data quality or standardization, again due to free style data entered by users. e
Sort transformation ignores the symbols when you select this option and treats
HNO# and HNO as identical.
Sort Punctuation As Symbols
c You can configure the Sort transformation to
remove all punctuation symbols except hyphens and apostrophes appended before the
string data before comparison. Using this option treats .NET and NET as identical.
The sort criteria you specify for the input column generates a sort key value used to
compare string data and sort it appropriately. If duplicate rows are included and you
want to remove these, you can do so by selecting the “Remove rows with duplicate sort
values” option provided on the lower-left side of the Sort Transformation Editor.
If you can extract data from the source in a sorted way, then you can avoid using

Sort transform. However, some components in data flow need to know that the data
is sorted or not and on what fields; for example, the Merge Join transformation will
need this information before joining data from two different sources. You can do so by
using the IsSorted and SortKeyPosition advanced properties on source adapters. Refer
to Chapter 15, particularly Figure 15-4 and Figure 15-5 and the explanation around
them, to understand how to configure them.
Percentage Sampling Transformation
When you need to give out data to call centers for telesales activities, you are generally
asked to create a sample set from a data segmentation. Sometimes the requirement
is defined as a percentage—for example, you may be asked to create a sample set of

×