348 Hands-On Microsoft SQL Server 2008 Integration Services
OLE DB Source
You will be using an OLE DB source whenever you need to extract data from a
relational database that is OLE DB–compliant, such as Microsoft SQL Server, an
Oracle database server, IBM’s DB2 database server, an Access database, or even an
Excel file. An OLE DB provider is used in an OLE DB Connection Manager to
Figure 9-5 Flat File Advanced Editor showing FastParse and UseBinaryFormat properties
Chapter 9: Data Flow Components 349
connect to the database, and an OLE DB source uses the OLE DB Connection
Manager to bring the data into the data flow. Just to remind you here that if you use
an Integration Services Data Source inside an OLE DB Connection Manager, you
can actually make Data Source or Data Source view available through the OLE DB
Connection Manager to all of your projects within an Integration Services project. Not
all OLE DB providers are the same, you should check the documentation of the OLE
DB provider that you want to use in your package, as some limitations are associated
with different types of OLE DB providers. The OLE DB source has one regular
output and one error output.
This data flow component has a custom user interface that is similar to the Excel
source adapter discussed earlier, and its properties can also be configured using the
Advanced Editor. When you open an OLE DB Source Editor, it takes you into the
Connection Manager page. Here you can specify an OLE DB Connection Manager
from the drop-down list provided in the OLE DB Connection Manager field. The
OLE DB Source provides you the following four levels of data access mode options:
Table Or View
c Specify the name of the table or the view from which you want
to extract data.
Table Name Or View Name Variable
c Specify the variable name, which will be
the holding name of the table or the view.
SQL Command
c Write an SQL statement to extract data. You have the option
of using a parameterized SQL query here.
SQL Command From Variable
c Specify the variable name that will be holding
an SQL statement to be passed on to OLE DB source.
For more details on these options, refer to the section “Excel Source.” where they
have been discussed in detail.
In the Columns page, you can see the Output Columns mappings to the External
Columns. The matching name will be written for you in the Output Column, which
you can change if you wish to do so.
In the Error Output page, you can specify how the OLE DB source should handle
an error or a truncation for each column.
Raw File Source
All the prebuilt Integration Services sources other than this Raw File Source require a
connection manager to connect to a data source. The Raw File source doesn’t require
a connection manager. The Raw File source establishes a direct connection to the file
containing the raw data and brings that raw data into the data flow. The raw data
350 Hands-On Microsoft SQL Server 2008 Integration Services
written in the raw file is native to the source and requires no parsing or translation
during import, so the Raw File source can extract the data much faster. You can use
raw files to stage data due to its fast read and write operation; for example, in scenarios
for which you export data from your system for later transformation and loading back
into the similar system, this may be an ideal choice. The Raw File source has only one
output and no error output.
The Raw File source has been given a custom user interface in this version of
Integration Services, but that doesn’t mean that you can’t use the Advanced Editor to
configure its properties. The custom UI is very simple and contains only two pages.
Connection Manager
c ough this page is named like the ones in other
components where you select a connection manager, this component doesn’t
use any SSIS connection manager; rather, it connects to the raw file directly. In
the Access Mode field, you can specify a filename or choose to get the filename
from a variable. Depending on your choice of access mode, the interface changes
the available fields to collect relevant information. Choosing File Name lets you
specify the raw filename in the File Name field, and choosing File Name From
Variable lets you specify the name of the variable that holds the raw filename in
the Variable Name field.
Column Mappings c Shows the mappings of external columns to output columns
and allows you to rename the output columns.
Script Component Source
The preconfigured data flow sources in Integration Services have only one output
available—for example, Flat File source, Excel source, and OLE DB source all have
single outputs. If you need to output data to more than one downstream component in
the data flow, you can’t do that using these preconfigured components. The only option
you have in this case, or in other similar cases when existing source components do not
meet your requirements, is to create your own data flow source. You can write yourself
a custom component, but an easier option is to use the script component as a source.
The script component has not been shown under data flow sources as a data source, but
it can be configured as a data source. When you drop the script component onto the
Data Flow Designer surface, you are asked to select whether you want to configure this
component as a source, a transformation, or a destination. Based on your choice, the
script component customizes the interface and options appropriate for the purpose. As
a source, the script component doesn’t have any input and only one output to start with.
You can add additional outputs using the Add Output button in the Inputs and Outputs
page. Configuring a script component as a data flow source is covered in Chapter 11.
Chapter 9: Data Flow Components 351
XML Source
XML Source reads the XML data from an XML file or from a variable containing
XML data and brings that data into the data flow. This source has a custom user
interface to edit properties but also uses the Advanced Editor for configurations of
some of its properties.
When you open the XML Source Editor, the Connection Manager page appears,
where you specify how you want to connect to and access data from the XML data file.
Depending upon the option you choose, the interface changes to collect the relevant
information:
XML file location
c Lets you specify the location and the filename for the XML
data file in the XML Location field.
XML file from variable c Allows you to use a variable to specify the XML data
filename and location. You then provide name of the variable containing XML
data file details in the Variable Name field.
XML data from variable c Access XML data directly from a variable by specifying
the name of the variable in the Variable Name field.
Next you can choose schema options: use an inline schema or provide an XML
schema definition file in the XSD format. When the XML data file contains the XSD
schema itself to validate its structure and data, you will be using inline schema option;
otherwise, you will have to supply an external schema definition file (XSD). If you
don’t have an XSD file with you, you can generate this file by clicking the Generate
XSD button and providing a location and name for the XSD file. This file is required
to interpret the relationships among the elements in the XML data file.
In the Columns page, you can map output columns to external columns and in the
Error Output page, you can specify how the XML Source should handle errors and
truncations of data for each column. This source can have multiple regular outputs and
multiple error outputs.
Data Flow Transformations
Once the data has been captured by source adapters and passed on to data flow path,
you can modify this data using a wide range of data flow transformations provided
in SSIS. You can use data flow transformations to aggregate column values, update
column values, add columns to the data flow, merge data, and accomplish many
more data modifications. The data flow transformations can have single or multiple
inputs or outputs, depending upon the type of transformation you choose. A data
352 Hands-On Microsoft SQL Server 2008 Integration Services
flow transformation receives data on its input columns from the output columns
of a data flow source or a transformation. After applying transformations on to the
data, the data flow transformation provides the output through its output columns
to the input columns of the downstream component, which can be another data flow
transformation or a data flow destination. Some of the data flow transformations can
also have error outputs. The data flow transformations send the data rows that fail to
transform to the error output columns that can be dealt with separately. Data flow
transformations do not have external columns, but you can create one with external
columns programmatically.
The following tables list the 29 data flow transformations grouped together in
categories on the basis of the function they perform in Integration Services. Future service
packs or add-ons may bring in more transformations. These transformations provide a
rich set of functionalities in many areas such as data cleansing, data standardization, BI
functionalities, loading slowly changing data warehouse dimension tables, pivoting and
unpivoting, and a facility to write script using the Script Component transformation.
However, if you still need a functionality that can’t be met by preconfigured components,
you can write custom transformations with synchronous outputs or asynchronous outputs.
Transformations with synchronous outputs make modifications to data inline—i.e.,
as the data rows flow through the component one at a time—whereas transformations
with asynchronous outputs cannot process each row independently of all other rows—for
example, an aggregate transformation needs all the rows before it can perform an
operation across rows. You will learn more about synchronous and asynchronous
components and programming options in Chapter 11.
The following sections contain brief definitions about the data flow transformations,
however, the details of configurations and usage exercises will be covered in next chapter.
Business Intelligence Transformations
This category groups together the transformations that allows you to perform business
intelligence operations such as data cleansing, data standardizing, text mining, and
running DMX prediction queries.
Transformation Description
Performs data cleansing and Standardizes values in column data.
Uses fuzzy matching to cleanse or standardize data.
Configures the updating of slowly changing dimension in data warehouse dimension tables.
Chapter 9: Data Flow Components 353
Transformation Description
Extracts a term from text in the input columns and write the extracted term to an output column.
Performs a lookup and count for the terms in a table that are defined in a lookup table.
Runs DMX queries for performing prediction queries against data mining models.
Row Transformations
The transformations in this category allow you to update column values or create new
columns on a row-by-row basis.
Transformation Description
Applies string functions to string data type columns.
Creates new columns in the transformation output by copying input columns.
Converts data type of a column and optionally copies the converted data to a new output
column.
Creates new derivations of data by applying expressions using a combination of input columns,
variables, functions, and operators; the results of this derivation can be used to modify an
existing column or can be copied in to a new column in the output.
Exports data from a pipeline column in to a file. This transformation can be especially useful to
export DT_TEXT, DT_NTEXT, or DT_IMAGE data type data from the pipeline into a file.
Reads data from a file and add it to the columns in the data flow.
Hosts and runs a script that can be used to transform data.
Updates, inserts, or deletes rows using SQL commands in a data flow.
354 Hands-On Microsoft SQL Server 2008 Integration Services
Rowset Transformations
The transformations in this category work on collections of rows and allow you to
perform operations such as aggregate, sort, sample, pivot, and unpivot row sets.
Transformation Description
Performs aggregate functions such as average, sum, and count and copy the results to the
output columns.
Sorts input columns in ascending or descending order and copies the sorted data to the output
columns.
Creates a sample data set by specifying a percentage to randomly select input rows.
Creates a sample data set by specifying the exact number of output rows to randomly select
input rows.
Pivots the input data on a column value to get a less normalized but compact view of data.
Creates a more normalized version of a de-normalized table.
Split and Join Transformations
This category groups the transformations that allow you to distribute rows to different
outputs, multicast input rows, join multiple inputs into one output, and perform lookup
operations.
Transformation Description
Routes data rows to different outputs, depending on the content of the data.
Distributes the input data set to multiple outputs.
Merges multiple input data sets into one output.
Merges two sorted data sets into a single output data set.
Chapter 9: Data Flow Components 355
Transformation Description
Joins two sorted input data sets using a FULL, LEFT, or INNER join into a single output.
Prepares a cache file to be used by a lookup transformation by writing data in the pipeline to a
cache connection manager.
Perform lookups by joining data in the input columns with reference data set columns.
Auditing Transformations
These transformations are used to add audit information and count rows.
Transformation Description
Includes the system environment information in the data flow.
Counts the rows flowing through the data flow and writes the final count in a variable.
Data Flow Destinations
A data flow destination is the last component in the Integration Services data flow
and writes the data received to an external data store. This component has no output
columns, though it may have error output columns for redirecting error rows. A data
flow destination writes the data received at its input columns to an external data store
or to an in-memory data set via its external columns interface. You can configure the
data flow destination to handle errors in data in the same way as you configure a data
flow source. You can choose to fail the component, ignore the error, or redirect the
error rows to the error output columns. If the data flow destination can’t write some of
the data rows to the external data store due to errors in data or data not matching with
the external columns, it can redirect those rows to the error output columns depending
how it is configured to redirect error rows.
Integration Services provides 12 preconfigured destinations plus a script component
that can be used as a destination. The following tables list the available destinations and
the functions they perform.
356 Hands-On Microsoft SQL Server 2008 Integration Services
Destination Description
Writes data to ADO.NET–compliant databases.
Passes the data to train data mining models through the data mining model algorithms.
Exposes data in the data flow using ADO.NET DataReader interface for applications such as
SQL Server Reporting Services.
Loads and processes an SQL Server Analysis Services Dimension.
Writes data from a data flow to an Excel workbook.
Writes data from a data flow to a flat file.
Loads data from a data flow to an OLE DB–compliant data stores.
Loads and process an SQL Server Analysis Services partition.
Writes data to a raw file.
Creates and populates an in-memory ADO record set.
Hosts and runs a script that can be used to load data. Though not shown under Data Flow
Destinations in the Toolbox, Script Component can be used as a data flow destination. This
component is covered in Chapter 11.
Inserts rows into an SQL Server Compact database.
Bulk-loads data into an SQL Server table or view.
All these destinations have an input, and some of them have an error output to
meet most of your requirements to write data to external stores. However, if you find
that existing data flow destinations do not do what you want, you can custom build
a destination to suit your requirements in the Integration Services object model. The
following table lists input and outputs available in each data flow destination.
Chapter 9: Data Flow Components 357
Data Flow Destination Input Output Error Output Custom UI Connection Manager
ADO NET destination 1 No Yes Yes ADO.NET Connection Manager
Data Mining Model Training
destination
1 No No Yes Analysis Services Connection
Manager
DataReader destination 1 No No No Not Required
Dimension Processing
destination
1 No No Yes Analysis Services Connection
Manager
Excel destination 1 No 1 Yes Excel Connection Manager
Flat File destination 1 No No Yes Flat File Connection Manager
OLE DB destination 1 No 1 Yes OLE DB Connection Manager
Partition Processing destination 1 No No Yes Analysis Services Connection
Manager
Raw File destination 1 No No Yes Not Required
Recordset destination 1 No No No Not Required
SQL Server Compact destination 1 No 1 No OLE DB Connection Manager
SQL Server destination 1 No No Yes OLE DB Connection Manager
ADO NET Destination
You can use an ADO NET destination to load data into a table or view of an ADO.NET–
compliant database. If you have a database into which you can load data using either an
ADO NET destination or an OLE DB destination, it is generally preferred to use the
OLE DB destination due to its performance and resilience; however, an ADO NET
destination does provide more options for you to load data into the databases when an
OLE DB destination cannot be used. The ADO.NET Destination Editor has three
pages: Connection Manager, Mappings, and Error Output. The Connection Manager
page enables you to specify an ADO.NET Connection Manager and the table or
view into which the data is to be loaded. You have options in this page to create a new
connection manager or a new table. In the Mappings page, you can map input columns
to the destination columns. Finally, in the Error Output page, you can configure the
error response conditions.
Data Mining Model Training Destination
Data changes with time, and the type of information that can be gathered from the data
changes along with it. Mining data for useful and relevant information requires regular
data mining model training so that it can keep up with the challenges of ever-changing
data. The data mining model training destination passes data through the data mining
model algorithms to train the data mining models.