328 Hands-On Microsoft SQL Server 2008 Integration Services
Review
Event Handlers are like subpackages that can respond to events occurring at run time,
and if properly designed and developed, they can let you have a good night’s sleep
most nights. In this exercise, you created simple event handlers for OnTaskFailed
and OnPostExecute events and have seen how these event handlers wake up on the
occurrence of these events and perform the work they are configured to perform. You
can create quite a complex event handler to perform alternative actions if a particular
event happens. However, you also need to be careful while using these event handlers,
as the events are flagged up to the parent container at the cost of CPU cycles, unless
you filter out the events not to be sent to parent container. Last, you can disable event
handlers during design and testing using the DisableEventHandlers property on the
objects on which the event handlers have been defined.
As a Data Source for Reporting Services Report
This is an excellent feature that allows you to use an SSIS package as a data source
inside a reporting services report. You will study about a DataReader Destination of
Data Flow task in Chapter 9 that can act as a data source for external applications. The
Data Reader Destination is not actually a destination to write data to some data store;
rather, it is an interface for ASP.NET applications to connect and read data from an
SSIS package. This feature is not enabled by default in Reporting Services, so you have
to enable it before using it. The following describes the steps you need to perform.
Enable SSIS as a Data Source
To enable SSIS as a data source, you need to configure two reporting services config
files—RSReportDesigner.Config and RSReportServer.Config files.
1. First, locate the RSReportDesigner.Config file, which exists in the C:\Program
Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies folder in
a default installation.
2. Open this file using either any text editor or Visual Studio 2008. The later
displays XML in a nice format that is easy to work with. Locate the Extensions
section and note that there are three different types of extensions—Render, Data,
and Designer extensions. SSIS being a data source, we will add the following
code in Data element as shown in Figure 8-15:
<Extension Name="SSIS"
Type="Microsoft.SqlServer.Dts.DtsClient.DtsConnection,Microsoft.SqlServer
.Dts.DtsClient, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd808
0cc91"/>
Chapter 8: Advanced Features of Integration Services 329
3. And add the following code in the Designer element:
<Extension Name="SSIS"
Type="Microsoft.ReportingServices.QueryDesigners.
GenericQueryDesigner,Microsoft.ReportingServices.QueryDesigners"/>
4. Now locate and open the rsreportserver.config file, which should be located
in C:\Program Files\Microsoft SQL Server\MSRS10_50.MSSQLSERVER\
Reporting Services\ReportServer folder in a default installation. In the file locate
the Extensions section and add the following code in the Data element:
<Extension Name="SSIS" Type="Microsoft.SqlServer.Dts.DtsClient
.DtsConnection,Microsoft.SqlServer.Dts.DtsClient, Version=10.0.0.0,
Culture=neutral, PublicKeyToken=89845dcd8080cc91"/>
5. You may have to restart the Reporting Services service for the changes to
take effect.
Figure 8-15 Editing the RSReportDesigner.Config file
330 Hands-On Microsoft SQL Server 2008 Integration Services
Using SSIS as a Data Source
After you have enabled the SSRS to use SSIS as a data source, you can use this feature.
1. First you have to build an SSIS package in which your data flow task should
have DataReader Destination as a destination. The name you assign to the
DataReader Destination will be used in the Reporting Services, so keep it short.
In our example, I’m using the default name—DataReaderDest.
2. Now start a new reporting services project in BIDS and add a data source. In
the Type field, you will see SSIS listed as a data source. Select SSIS as shown in
Figure 8-16 and then add -f C:\SSIS\Packages\Sales.dtsx in the connection
string field. The SSRS data source will execute the package using DTEXEC
command and needs only the connection path in the connection string.
Figure 8-16 Using SSIS as a data source in the SSRS Report and specifying the connection path
Chapter 8: Advanced Features of Integration Services 331
3. Next add a report and select the data source created in the preceding step. Then
in the query field specify the name of the DataReader Destination, which in our
case is DataReaderDest as shown in Figure 8-17. You have done whatever was
required to use the SSIS DataReader Destination as a source in an SSRS report.
Now proceed to complete the report. When you run the report or preview it, you
will see data served at run time.
This method is easily configurable, though Microsoft does not recommend it. The
primary concern is the security settings, as the package runs under the Reporting
Services service account, which might be an issue in some situations. You need to
be careful when you deploy this method. However, this example is very useful in
demonstrating that the DataReader Destination can serve data dynamically to external
ASP.NET applications.
Figure 8-17 Specifying the DataReader Destination name in the Query String
332 Hands-On Microsoft SQL Server 2008 Integration Services
Summary
This was an absorbing chapter, as you’ve worked through various advanced features
provided in Integration Services to enhance the service quality of your packages.
You’ve worked with logging and log providers and have enabled logging for a package
in the Hands-On exercise. Then you worked with transactions and configured your
package to use transactions to handle three different kinds of scenarios for processing
transactional data. You also used checkpoints to restart your package from the point
of failure and learned about the effect of using transactions on checkpoints. Later in
the chapter you extended the Contacting Opportunities package to create personalized
e-mails using variables and property expressions. In the last Hands-On exercise,
you created event handlers in your package and saw how they become active when a
specified event happens.
With this chapter, you’ve concluded the workflow configurations of Integration
Services projects and their administration. In the next two chapters, you will learn
all about the data flow engine and how to transform data using Integration Services
Data Flow transformations. Until now, your packages did not include any data
flow or Data Flow tasks; however, from now on, we will focus mainly on data flow
components and their configurations and will not do much in the control flow. Note
that you cannot exclude control flow from your packages, as it must be part of an
Integration Services package.
Data Flow Components
Chapter 9
In This Chapter
c
From Control Flow to
Data Flow
c
Data Flow Component
Interfaces
c
Considerations for Bringing
Data into Data Flow
c
Data Flow Sources
c
Data Flow Transformations
c
Data Flow Destinations
c
Data Flow Paths
c
Summary
334 Hands-On Microsoft SQL Server 2008 Integration Services
I
ntegration Services provides several data-oriented services and the related
components that can be configured in the data flow task designer. The data
flow within a package extracts the data from a data source; performs the
transformations, computations, and derivations; and then loads the transformed data
to the data destination. The assembly used to represent a data flow is called a data
flow pipeline. The data flow engine of Integration Services is a data-driven execution
engine that provides a steady flow of data streams onto which atomic computations or
processes are applied.
The data flow constitutes three main components: data flow sources, data flow
transformations, and data flow destinations. Additionally, these components use data
flow paths to move the data from one component to the next in the data flow.
This chapter discusses the separation of data flow from the control flow and the
benefits thus achieved. It then discusses the data flow components consisting of data flow
sources, data flow transformations, data flow destinations, and data flow paths, plus the
inputs, outputs, error outputs, and the external columns to explain their functions. At the
end of the chapter, you will work through a Hands-On exercise to use these components.
From Control Flow to Data Flow
An Integration Services package relies on control flow and data flow as integral parts,
though other parts such as event handlers and log providers also support the package.
The data flow consists of three major components—data flow sources, data flow
transformations, and data flow destinations—which are connected via data flow paths.
One of the key enhancements in Integration Services over DTS 2000 is the separation
of control flow and data flow engines. The immediate predecessor to SSIS—i.e., SQL
Server 2000 Data Transformation Services—had all the control and data flow features
available in one engine and hence was difficult to use for creating complex packages to
meet some stringent data requirements. Integration Services introduces two engines,
called the integration services run-time engine and integration services data flow engine,
which separate data flow from control flow.
Inherently, the requirements of control flow and data flow are different for operational
and performance reasons. The requirements that drive the creation of an ETL package
are to extract data, perform some transformations, and load; when you get to the design
board, you realize that the package needs to populate variables along the way, wait for
data to be available that might be populated by some other part of the package, perform
operations in a particular order, move data from buffer to buffer, and so on. This order
of tasks, availability of data, and ability to move data as fast as possible from a buffer
to another buffer after applying transformations drive the designers in two opposing
directions—applying control to the package work flow and applying transformations
that are fast enough to the data flow. Although DTS 2000 fits the bill for tackling these
Chapter 9: Data Flow Components 335
operations, it lacks the ability to strike a perfect balance between control and data flow
requirements for the packages you design for various purposes.
The provision of a separate run-time engine and data flow engine provides better
control over packages and enhances the way the packages can be designed with separate
data flow. The run-time engine controls the order of the package workflow and provides
associated services, such as the ability to define an alternative workflow on the occurrence
of an event, to apply breakpoints on the tasks, to manage connections to the data sources,
to log data, and to manage transactions and variables. The data flow engine provides
high-performance data movement functionality, transformations optimized for speed,
and great extensibility by allowing you to create custom transformations on top of prebuilt
data sources, transformations, and destinations to load data capable of meeting most of
your requirements. You can include multiple data flow tasks in a package, with each data
flow task able to support multiple data sources, transformations, and destinations.
When you drop the Data Flow task on the Control Flow Designer surface,
you invoke the data flow engine. The Data Flow task is a special task provided in
Integration Services that allows you to create data movements and transformations
for data in a package. The Data Flow task replaces the DTS 2000 tasks, such as the
Data transformation task and the Data Driven Query task, which used to provide data
movement and manipulation functionalities. Unlike other tasks, double-clicking the
Data Flow task doesn’t invoke the Properties or Editor dialog box; instead, it opens
another designer surface represented by the Data Flow tab in BI Development Studio
(BIDS). The Data Flow task, accessed through the Data Flow tab, provides many
additional components such as data sources, transformations, and destinations to build
your data movement and transformation part of the package. The following section
covers the components available in the Data Flow tab.
Data Flow Component Interfaces
The Data Flow task consists of source and destination adapters that extract and load
data between heterogeneous data stores; transformations that modify, summarize,
cleanse, and extend data; and the paths that link these components together by
connecting output of one component to the input of the other component, thus
providing a sequence to the data flow. Some components accept multiple inputs and
can have multiple outputs. As mentioned, the Data Flow task can have multiple data
sources, transformations, and destinations. These components are designed for high
performance with a focus on data movement and manipulation efficiency.
All these components are available in the Toolbox window of the Data Flow tab
and can be added to the data flow just like control flow tasks. These components are
categorized in three ways: data flow sources, data flow transformations, and data flow
destinations. Data flow sources are used to extract data, data flow transformations help
336 Hands-On Microsoft SQL Server 2008 Integration Services
in modifying the extracted data, and this modified data is finally loaded into the data
silos using data flow destinations. While these components perform their functions,
they use data flow paths to move data from a source component to the destination
component. These data flow paths are similar to pipes in a pipeline. A data flow path
is quite interesting and helpful to debug and visualize the data. You can see the data
and metadata of the data flowing through data flow path using data viewers. Data flow
paths are covered in detail later in the chapter.
Each of the data flow components may have one or more inputs, outputs, or error
outputs associated with it. The data flow source reads the data from the external data
source, such as a flat file or a table of a relational database, using its external interface. It
then makes this data available to the downstream components. A data flow source uses
outputs to send the data in one or more columns via the data flow path to the inputs of
the transformations or destinations. A transformation receives data at its inputs and sends
data out at its outputs. Finally, a destination receives data at its inputs and sends data out
through its external interface to the destination. These components can optionally be
configured to have error outputs as well that can contain all the input columns and two
extra columns, one for error code and one for an error column to indicate the reason for
failure and the failing column. Let’s discuss these interfaces in detail.
External Metadata
This interface writes or reads data to and from external data stores (for example, an Excel
worksheet) and keeps a copy of their metadata. During design time, when you create a data
flow and add sources to your package, the metadata of the data from the sources is copied
to the external columns on data flow sources. Similarly, a data flow destination keeps a
copy of the destination metadata in its external columns as a snapshot of the destination
data store and compares the data schema with this snapshot before writing to the data
store. So, as the external columns keep a schema snapshot of the external data stores, they
help in the process of package validation. So if a data store is to be created at run time that
doesn’t exist at design time, or you make changes to a data store without updating the
package, you will get validation errors or warnings displayed by SSIS components. Some
of these warnings can be handled by using the ValidateExternalMetadata property on
the components, which works like DelayValidation property on the tasks and delays the
validation until the run time of the component.
Inputs
Inputs receive data from the data flow path in the input columns. The components
keep a copy of the metadata in the input interface as well, and the data received at
the inputs can be validated against the cached schema. You can configure an input
Chapter 9: Data Flow Components 337
to fail a component, ignore failure, or redirect a row in case it receives errors in
the input columns. In an Integration Services data flow, only the destinations and
transformations have inputs. You might think that as the data flow sources bring in
data to the data flow, they have inputs, but this is not the case. The data flow sources
use external columns to bring data into the data flow. Inputs and outputs (discussed
in the next section) are used when the data flows from one data flow component to
another data flow component. Whenever external data sources or storages are involved,
the data flows directly through the data flow sources or data flow destinations via the
external interface.
Outputs
The data is sent out from the outputs of the sources and transformations through the data
flow path to the inputs of the downstream components that could be transformations
or destinations. The output columns’ data can be validated against the external column
schema depending on the data flow components configurations. The output columns
are exposed in the Input and Output Properties tab of the Advanced Editor, where you
can configure them.
Error Outputs
When the data flows through the processing logic of the component, errors and mismatches
can occur. These errors could be related to the data type mismatch between the input
data and the corresponding metadata of an external column, or the data coming in may
be longer than the length defined in the corresponding external column. These kinds
of mismatches may cause data rows to raise errors or truncations. You can configure the
error handling in the component to fail the component on an error, ignore the error, or
redirect the failing rows to an error output. You can specify the error handling behavior
for each column using one of three options. Whenever the failing rows are redirected to
error output, two additional columns, ErrorColumn and ErrorCode, are also added to
indicate the failing column and the code of failure.
Considerations for Bringing Data into Data Flow
Integration Services uses the data flow engine to extract data from a data source,
transform and modify the data once it is in the data flow, and then load it to an external
data store. The data flow engine uses data flow sources for extracting data from a data
store. Figure 9-1 shows how the data is being extracted and parsed by a data source; it
also shows how an erroneous record will be handled.
In Figure 9-1, two records demonstrate how data flows through a data flow source.
As the data flow source reads the data from the external source, it translates the source