Tải bản đầy đủ (.pdf) (10 trang)

Hands-On Microsoft SQL Server 2008 Integration Services part 41 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (117.53 KB, 10 trang )

378 Hands-On Microsoft SQL Server 2008 Integration Services
M
ost of the business requirements will need you to work with data and make
it suitable for deriving some business value out of it. The data and schema
are modified in Integration Services using data flow transformations. In
this chapter, you will explore the preconfigured transformations in detail. You will
also be working with Hands-On exercises that will cover most of the transformations
from the perspective of their usability. As you progress in the chapter, the Hands-On
exercises will include more and more complex transformations and business scenarios,
and of course they will be more interesting as well. To keep such a progression and your
interest in the chapter, the transformations may not appear in the order in which they
appeared in Chapter 9. We’ll start with the easiest ones first.
Row Transformations
Row transformations are simple compared to other types of transformations. Starting
with the simplest transformation—i.e., Copy Column transformation—to copy an input
column to output columns, we will study the Character Map transformation, the Data
Conversion transformation, and then the Data Derivation transformation. The next two
transformations deal with columnar data—the Export Column transformation exports
partial data from the pipeline into a file, and the Import Column transformation reads data
from a file and adds it into the pipeline. The ability to perform more complex functions
will increase as we go till we reach the last two transformations, Script Component and
OLE DB Command, in this category, which can be programmed to perform the functions
you want. Let’s start our journey with Copy Column transformation.
Copy Column Transformation
The Copy Column transformation is probably the simplest of the transformations
available in the SQL Server Integration Services. It does precisely what it says—it
copies a column in the data flow to add a new column in the data flow. Sometimes
you may need to output a column two times for other applications, and as SSIS does
not provide the flexibility of mapping a column twice in the destination adapters, you
would use this transformation to fill in. This new column then can be given a name
within the transformation. To perform the desired function, this transformation has


been designed to support one input and one output.
The Copy Column transformation has a custom user interface defined by the Copy
Column Transformation Editor. In the Copy Column Transformation Editor, you
can select the columns you want to copy either from the upper half by checking the
boxes in front of columns in the list under Available Input Columns or select from
the drop-down list provided in the Input column. You can select multiple columns to
be copied or choose to create multiple copies of a column in the same transformation.
The selected columns will be copied and added as new columns to the data flow output
Chapter 10: Data Flow Transformations 379
columns of this transformation. When you select a column in the Input column, the
Output Alias shows the Input column name prefixed with a Copy Of string by default.
You can change this alias and specify a more appropriate name for the column. As
this transformation does not cause any changes to the data schema or data quality, this
transformation doesn’t introduce any errors in the data flow and hence doesn’t have
error output.
Character Map Transformation
You use this transformation to apply string functions to the string data type columns.
Configuring this transformation and applying any of the following string functions
to one of the columns are simple operations. The converted data can then either be
populated in a new column or it can perform an in-place data conversion. An in-place
change means that the data will be modified in the existing column and no new column
will be added to the data flow. The string operation you specify is applied row by row as
the data flows through this transformation. To perform a function, this transformation
supports one input, one output, and one error output.
To add the Character Map transformation to the data flow and open the editor UI,
you need to drop this transformation in your data flow task and join with an upstream
component before you double-click it. The user interface in the Character Map
Transformation Editor is simple and intuitive. In the upper part, a list provides the
Available Input Columns with a check box in front of each column. Select a column
by clicking the check box, and a line is added in the lower half of the window with the

selected column shown in the Input column. Alternatively, you can select a column for
applying string operations by clicking in the Input column and selecting a column from
the drop-down list. Note that the Character Map transformation can be used only for
columns with string data types. If you configure this transformation with a column that
has a data type other than string, validation errors will occur with the error message
stating that the input column has an unsupported data type. The lower half of the
transformation UI has four columns.
Input Column
c Here you can select the column you want to apply Character Map
transformations.
Destination
c e Destination column allows you to specify whether you want
the modified data to be copied into a New Column, which is a default option, or
treated for an in-place change. You select either option by clicking in the column
and choosing it from the drop-down list. When you select the New Column
value, the Output Alias field shows Column Name prefixed with Copy Of for the
new column, and when you choose In-Place Change, the Output Alias shows
Column Name as is.
380 Hands-On Microsoft SQL Server 2008 Integration Services
Operation c e Operation column allows you to select from any of the string
operations listed in the following table. You can select multiple operations by
clicking the multiple check boxes in the list; the Operation column will show
the selected operations in a comma-delimited list. However, there are some
restrictions on the selection of multiple operations on the same column—for
example, you cannot select Uppercase and Lowercase operations to be applied on
the same column in one transformation. Refer to Microsoft SQL Server 2008
Books Online for more details on these restrictions.
Operation Description
Lowercase Convert characters of a column to lowercase.
Uppercase Convert characters of a column to uppercase.

Byte reversal Reverses the byte order of a column data.
Hiragana Convert katakana characters of a column data to hiragana characters.
Katakana Convert hiragana characters of a column data to katakana characters. Both hiragana
and katakana are part of Japanese syllabary.
Half width Convert full-width characters of a column data to half-width characters.
Full width Convert half-width characters of a column data to full-width characters.
Linguistic casing Apply linguistic casing instead of the system rules.
Simplified Chinese Convert traditional Chinese characters of a column data to simplified Chinese
characters.
Traditional Chinese Convert simplified Chinese characters of column data to traditional Chinese
characters.
Output Alias c is column is used to specify a name to the newly added data
column in the data flow.
You can specify the error handling behavior of this transformation by clicking the
Configure Error Output button. As this component applies string transformations to
the text columns, truncations can occur in the transformed data. For example, when
you convert Half Width data to Full Width data, the column length may not support
that and can cause the data to be truncated. In such cases, you can use the Configure
Error Output button to specify the action that this component should take. When you
click this button, the Configure Error Output dialog box opens, where you can specify
whether errors should fail the component, be ignored, or redirect the failing rows to the
error output for each of the input or output columns that you’ve selected or created in
the transformation.
Chapter 10: Data Flow Transformations 381
Data Conversion Transformation
When data is coming from disparate sources into a data warehouse or a data mart,
data type mismatches can occur during the data upload process. One of the functions
performed during data loading to a data warehouse or a data mart is to convert data
type of the column to the one that matches the data type of the destination column.
This transformation supports one input, one output, and one error output to perform

its functions.
When the data is read from the data source, depending on the data and the data
source, a data type is assigned to the Input column, which may not be exactly what
you want. For example, a date column read from a text file is generally assigned a
string data type that needs to be converted into a date time data type before loading
into a destination expecting a date time data type. This problem was handled in DTS
2000 using CAST or CONVERT functions of T-SQL within the Execute SQL task.
SSIS provides Data Conversion transformation as a preconfigured task to perform
this operation, though you still can write T-SQL code using OLE DB Command
transformation within the data flow.
The Data Conversion transformation allows you to convert the data type of an input
column to a different data type and add the converted data to a new output column.
During data conversion, the data of the input column is parsed and then converted into
a new Integration Services data type before being written to a new output column. By
now you know that Integration Services provides two types of parsing routines—locale-
insensitive fast parsing and the locale-sensitive standard parsing routine. You can
specify the parsing routines for each of the output columns in the Advanced Editor for
Data Conversion. To specify the parsing method, open the Advanced Editor for Data
Conversion, go to the Input And Output Properties tab, click the appropriate output
column by expanding the Output Columns folder under Data Conversion Output. Go
to the Custom Properties section of the selected output column and choose between
the False and True values for the FastParse option.
In the Data Conversion Transformation Editor, you can select the input columns
that you want to convert to a different data type in the input column or select check
boxes in the list of Available Input Columns. You can select multiple columns for
conversion or apply multiple conversions to the same column. As the converted data is
copied into a new output column, you can define an output alias for the newly created
column in the data flow. In the Data Type column, you can specify the data type to
which you want to convert the selected input column. The Data Type column helps
you to select a data type for the column from the drop-down list of available Integration

Services data types. Depending on the data type you select, you can set other attributes
of the new data type such as length, precision, scale, and code page. While specifying
the code page for the new string data type output column, you must keep the code
382 Hands-On Microsoft SQL Server 2008 Integration Services
page for the new output column the same as the input column code page when you are
converting data between string data type columns.
As Data Conversion transformation creates new columns with modified data types
and modifies the data flow, error and truncation occurrences in data are possible.
To handle these, this transformation provides an error-handling mechanism via the
Configure Error Output button. In the Configure Error Output dialog box, you will
find a list of all the output columns that you have created in this transformation. You
can specify whether to fail the component, ignore the error, or redirect the failing row
against error and truncation for each of the column.
Derived Column Transformation
The Derived Column transformation enables you to perform more complex derivations
on an input column data. Until now you’ve been performing simple operations on the
data, such as copying a column, applying string transformations using Character Map
transformation, or even changing data types. With this transformation in your toolkit,
you can derive your columns using complex derivation rules. Using this transformation,
you will be able to perform operations such as deriving sensible information from
the Notes or Comments column, concatenating two text strings using two columns
or a column and a variable to populate in a single column, performing mathematical
operations on a numerical column, using conditional derivations, and much more. This
transformation is designed to have one input, one regular output, and one error output.
As this transformation will be making changes to the data and the data schema, an
error or a truncation can occur in the data. To handle such an instance, this transformation
provides a Configure Error Output button to invoke the Configure Error Output
dialog box, where you can specify to fail the transformation, ignore the error, or redirect
the failing rows to an error output for each of the derived column for errors and for
truncations.

This is a very versatile transformation within data flow task that is repeatedly used
within your packages and is also used in other components and transformations. For
instance, the transformations you have studied so far are special applications of a
derived column transformation, though with a slightly different UI. The user interface
of a derived column transformation resembles the Expression Builder dialog box. In the
Derived Column field in the lower half of the window, you can either select the Add
As New Column option to add the derived data into a new column or choose to replace
an existing column. Depending on your choice, the Derived Column Name field will
be filled in with a default name that you can change to a more appropriate name.
The Expression field is the most important field in the Derived Column Transformation
Editor. Here you specify the expression to derive the data. You can use any combination of
Chapter 10: Data Flow Transformations 383
variables, input columns, mathematical functions, string functions, date/time functions,
null functions, type casts, and operators to build this expression. After specifying an
expression to derive data, you can specify the data type, length, precision, scale, and code
page for the data. Once you have successfully configured one row in the Expression
Builder, you can add more rows to derive additional columns and hence can perform
multiple derivations using the same transformation. The left-upper half of the window
lists variables and input columns that you can drag and drop in the Expression field.
Similarly, the right-upper half of the window lists functions and operators that can also
be embedded in the expression simply by dragging and dropping.
There are six categories of such functions and operators.
Mathematical Functions
c Allow you to perform mathematical operations such
as returning absolute positive value with ABS, rounding a numeric value to the
closest integer value using ROUND, or calculating the square of a numeric value
using SQUARE.
String Functions c Allow you to perform string operations such as FINSTRING,
SUBSTRING, TRIM, LOWER, UPPER, LEN, and RIGHT.
Date/Time Functions

c Allow you to manipulate date and time values using
functions such as DAY, MONTH, YEAR, DATEADD, DATEPART, and
DATEDIFF.
Null Functions
c Allow you to find whether an expression is a null using the
ISNULL function or return a null value of specified data type using functions
such as NULL(DT_WSTR, <length>).
Type Casts
c Allow you to cast your expressions in a specified data type; for
instance, the (DT_I4) function will convert the result of your expression into an
integer with length 4.
Operators
c Allow you to perform various types of operations on the expressions,
including add, subtract, multiply, divide, concatenate, equal, greater than or equal
to, logical and, logical or, and conditional operation. You will be using these
versatile operators quite frequently, so make sure you acquaint yourself with all the
available operators.
While configuring the Derived Column transformation is not difficult, writing
expressions for derivations of the data could be challenging sometimes, especially for a
newbie. So, here are some of the cookies for you to get started quickly. These examples
384 Hands-On Microsoft SQL Server 2008 Integration Services
are only for you to practice and do not cover most of the instances you might come
across in real life; however, you might find them a good starting point.
To trim a column CustomerName, you can use either TRIM([CustomerName])
c
or RTRIM(LTRIM([CustomerName]))
e following expression returns the first position of the semicolon (;) in the string.
c
FINDSTRING("abcd;cc;c",";",1)
You can combine functions with other functions to solve complex issues. For c

example, using SUBSTRING with FINDSTRING functions, you can truncate
the preceding string up to the characters till the first semicolon.
SUBSTRING("abcd;cc;c",1,FINDSTRING("abcd;cc;c",";",1) - 1)
In the following example, you use simple operators to derive PercentageProfit c
column and round off the result to a precision of 2.
ROUND(((SalePrice – CostPrice) * 100) / CostPrice, 2)
In this example, you are working on MiddleName column. When you get a null c
value in this column and do not want to pass it through but prefer to replace
null with a blank string value, you would use the IsNull function along with a
conditional operator to return a blank string.
IsNull(MiddleName) ? " " : MiddleName
The conditional operator evaluates a Boolean condition, which is on the left side
of this expression—i.e., before the question mark sign—IsNull(MiddleName—
and if true it returns the first value after the question mark sign, or if the Boolean
evaluates to false, it returns the second value, which is shown after the colon in
the preceding expression. Another similar example is with numeric data when
DamageCost contains null, but you want to pass a 0 instead.
ISNULL(DamageCost) ? 0 : DamageCost
You can build quite complex expressions using this conditional operator, as you
can also nest conditional operations as well.
is example shows use of a conditional operator along with an OR operator to
c
create a bit more complex Boolean condition. Here in the Country field you get
two-digit country codes, which are okay for all other countries other than the UK,
for which two values exist in data: UK and GB. You want to pass only UK if the
country code is either UK or GB and let other codes pass through as is.
Country == "UK" || Country == "GB" ? "UK" : Country
Chapter 10: Data Flow Transformations 385
Last is again an example of conditional operator, but this time it is using dates. c
Here you return a 1900-01-01 date in the date time format if EndDate is null;

else, you let the date time pass through as is.
ISNULL(EndDate) ? (DT_DBTIMESTAMP)"1900-01-01 00:00:00.000" : EndDate
As you have used a DT_DBTIMESTAMP type cast here, similarly you can cast
a date time into any other required data type format using one of the several date
time format type casts.
Export Column Transformation
The Export Column transformation has one input, one output, and one error output.
Using this transformation, you can export data from the data flow to an external file.
For example, you can export images from the data flow and save them to individual files.
To use this transformation, two columns must exist in the data flow: one that contains
data—i.e., images—and the second that contains the paths for the files to which you want
to save those images. By properly configuring the Export Column transformation, you
can save each image in every row to a separate file specified by the column that contains
the file path information. When you open the transformation editor after connecting an
input, you can select columns that you want to extract in the Extract Column field and
select the column in the File Path Column that specify file paths for the extracted data.
The Extract Column and the File Path Column fields provide drop-down lists of the
columns in the data flow; however, the Extract Column displays only the columns that
have the DT_TEXT, DT_NTEXT, or DT_IMAGE data type.
You will be able to select Allow Append and Force Truncate check boxes as
well. The Allow Append check box allows the transformation to append data to an
existing file, and the Force Truncate check box allows transformation to delete and
re-create the file before writing data into it. The editor’s user interface doesn’t allow
you to select both the check boxes simultaneously; however, if you set these properties
programmatically, the component fails. At run time, the transformation exports data
row by row and writes the exported data to the files. Depending on your selections of
Allow Append and Force Truncate options and the existence of files to which you want
to save data, the transformation decides the appropriate action. For example, if the file
to which data is to be inserted does not exist, the transformation creates a new file and
writes the data to it irrespective of your settings of these check boxes. However, if the

file to which data is to be inserted already exists, the transformation uses information
from the check boxes. In this case, if the Allow Append check box is selected, the
transformation opens the file and writes the data at the end of the file, and if the Force
Truncate option is checked, the transformation deletes and re-creates the file and writes
the data in to the file. If you do not select any of the check boxes and the file exists, a
run-time error occurs because you are effectively not letting the transformation append
or truncate the file, prohibiting it from writing data to the file.
386 Hands-On Microsoft SQL Server 2008 Integration Services
The last option you may find on the user interface is a check box to Write Byte-Order
Mark. A Byte-Order Mark (BOM) is a ZERO-WIDTH NO-BREAK SPACE
character used to denote Endianness of a string. This is particularly useful for images
saved in the TIFF format that store pixel values as words, and BOM makes a difference
in performance. Images stored in JPEG or GIF format are not word-oriented and
Byte-Order does not matter. Note that BOM is written only when the data type is
DT_NTEXT and data is written to a new file.
Import Column Transformation
As you can expect, the Import Column transformation is the reverse of the Export
Column transformation and has one input, one output, and one error output. Using
this transformation, you can read data from files and add the data to the columns in the
data flow. For example, you can add images using this transformation in the data flow
along with the text fields. An input column in the transformation input contains the
file paths for the files that contain data that you want to add to the data flow. At run
time, this transformation processes each row, gets the file path information from the
input column, and then loads the data from it to an output column in the data flow.
This transformation doesn’t have its own user interface but uses the Advanced
Editor to expose its properties for configurations. When you open the Advanced Editor
for Import Column transformation, you can select an input column that contains
the file path information in the Input Columns tab. Then you can add an output
column into which you want to copy the data in the Input And Output Properties
tab. To add an output column, select the Output Columns collection in the Import

Column Output node and click Add Column. When this new column is added, select
the desired data type for the column. The data type of the output column must be
DT_TEXT, DT_NTEXT, or DT_IMAGE. Note the ID of this column and expand
Input Columns collection in the Import Column Input node. You will see the column
that you’ve selected in the Input Columns tab earlier. Scroll down in the Custom
Properties section to locate the FileDataColumnID property and specify the ID of the
new output column in this property. By specifying the ID of the output column in the
FileDataColumnID property of the input column, you tell the transformation ID of
the column into which you want to receive the imported data. If the data type of the
new output column is DT_NTEXT, you can also specify True in the ExpectBOM
field if the file is expected to begin with a BOM.
Script Component
You have learned about this Data Flow component in the Data Flow Sources and
Destinations. By now you understand that this component can also be used as a
Chapter 10: Data Flow Transformations 387
transformation. The basic functionality of this component remains the same—i.e., it
enables you to write custom code and include that custom code as a component inside
the package. Using the Script component as a transformation, you can write custom
code to call functions that are not available in SSIS—i.e., you can call the .NET
assembly to use working code that is available outside SSIS; build transformations
that are not available in SSIS; or apply multiple transformations with custom code to
enhance the performance of your package. You can in fact write custom code to do
all the transformations within the same Script component; although you could but
probably you shouldn’t for simplicity, modularity, and performance reasons. All this is
covered in greater detail in Chapter 11.
OLE DB Command Transformation
The Execute SQL task of DTS 2000 is commonly used to perform a variety of
functions varying from work flow to data-related operations such as truncating
tables and log files; creating or dropping tables and views; and updating, deleting, or
inserting data in the database objects. As the control flow and the data flow engines

have been separated in Integration Services, so the Execute SQL task has been made
available in both the engines of Integration Services to perform the appropriate
functions within that engine.
The Execute SQL task provided in the control flow of Integration Services is
designed to perform the workflow functions such as truncating tables, creating and
dropping tables, and updating variable values using the results of an SQL statement,
while the OLE DB Command transformation in the data flow engine provides
for performing data-related operations. This transformation allows you to run an
SQL statement that can insert, update, or delete rows in a database table. As this
transformation is designed to be a Row transformation, the SQL statement is actually
run for each row of the data flowing through this transformation. The real power of
the OLE DB transformation lies in the ability to run a parameterized query to perform
insert, update, or delete operations against each row. This also means that you can use
a stored procedure with parameters to run against each row of the data. Though it is
easy to use this component, as it processes rows one by one, for a considerable size of
rowset, the performance will be noticeably less when compared to alternate set-based
operations such as staging and loading.
The OLE DB transformation has one input and one output and also supports
one error output; however, it doesn’t have a custom user interface and hence uses the
Advanced Editor to expose its properties. In the Connection Managers tab of the
Advanced Editor for OLE DB Command, you specify the connection to the database
that you want to update, delete, or insert into.

×