Tải bản đầy đủ (.pdf) (140 trang)

Microsoft Press microsoft sql server 2005 PHẦN 10 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.45 MB, 140 trang )

813
Chapter 21
Creating Full-Text Catalogs
SQL Server, like all database platforms, is built to store and retrieve large amounts of
data. The system enables efficient data management by imposing a structure on the
data it stores in its tables. However, not all data has a well-defined structure, and not
all queries conform to basic true/false rules for retrieving data. To manage this type of
data and its associated queries, other platforms rely on third-party tools. But SQL
Server’s Full-Text Search component provides a powerful and flexible feature called
full-text indexing to manage queries issued against unstructured data. This chapter
provides an overview of full-text search elements and terminology, explains how to
create full-text catalogs and indexes, and shows how to populate the indexes and
keep them up to date. Then the chapter shows you how to execute full-text queries to
search full-text indexed columns for matching words.
Exam objectives in this chapter:
■ Implement a full-text search.
❑ Create a catalog.
❑ Create an index.
❑ Specify a full-text population method.
Lessons in this chapter:
■ Lesson 1: Creating a Full-Text Catalog. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
■ Lesson 2: Creating a Full-Text Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
■ Lesson 3: Populating a Full-Text Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825
■ Lesson 4: Querying Data by Using a Full-Text Index . . . . . . . . . . . . . . . . . . . . 828
Before You Begin
To complete the lessons in this chapter, you must have
■ SQL Server 2005 installed.
■ Full-text indexing installed.
■ A copy of the AdventureWorks sample database installed in the instance.
C2162271X.fm Page 813 Friday, April 29, 2005 8:07 PM
814 Chapter 21 Creating Full-Text Catalogs


NOTE Full-text search
SQL Server 2005 provides Full-Text Search as a separately installable component. You can find the
option to install full-text functionality under the Database Engine node within the SQL Server 2005
Setup Wizard. If you specify default settings for installing the Database Engine, Full-Text Search is
selected and installed. Full-text indexing has its own service, called Microsoft Full-Text Engine for
SQL Server (MSFTESQL), for populating and managing full-text catalogs. One instance of full-text
indexing is installed for each SQL Server instance, with each instance having its own MSFTESQL
service and service account.
MORE INFO Installing full-text search
For complete information about installing full-text search, see the SQL Server 2005 Books Online
article “Installing and Upgrading Full-Text Search.” SQL Server 2005 Books Online is installed as part
of SQL Server 2005. Updates for SQL Server 2005 Books Online are available for download at
www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx.
Real World
Michael Hotek
One of the largest recruiting agencies in the world spent years developing a pro-
prietary application that allowed recruiters to quickly and flexibly search the
agency’s database for resumes that matched desired criteria. On any given day,
agency employees ran thousands of queries against several hundred thousand
resumes to fill thousands of openings spanning every industry and job function.
To its competitors, this company was the model of success. However, this suc-
cess came at the cost of hundreds of hard-working research assistants who spent
35–40 hours a week parsing resumes into a massive keyword index because the
programming team couldn’t keep pace with the industry’s rate of change.
Every week, the recruiting agency had to deal with hundreds of new job titles,
technology changes, and terminology shifts. The IT team loaded all these
changes into the automated parsing routines on which the search system was
based. Then the team executed hundreds of tests to ensure accurate results
before releasing the new code base. After the new search code was released, the
IT team had to reparse the entire database of resumes, compare it with the pre-

vious parsing, and then rebuild the keyword index. When the system was origi-
nally deployed, this process took two to three days. Two years later, it was taking
four to five weeks and growing longer all the time. The company had to find a
solution.
C2162271X.fm Page 814 Friday, April 29, 2005 8:07 PM
Before You Begin 815
I was called in to help, and after spending about three hours gaining an under-
standing of the company’s environment, I asked the IT staff if we could run a
simple set of tests on a prototype solution. The staff was hesitant because all pre-
vious “tests” they had performed with a variety of vendors required days or
weeks of effort and yielded mixed results. But after assuring them that the initial
tests should be completed by the end of the day, I was able to proceed.
We installed SQL Server 2000’s Full-Text Search component, built a full-text cat-
alog, and added two indexes. The entire process took about an hour on the sub-
set of test data we were using. We then executed hundreds of the IT team’s test
queries and compared the results with previous results. The results weren’t
encouraging. Less than 10 percent of the results from the full-text queries
matched the results from the proprietary search algorithms. We then looked at
the results more closely. It turns out that our full-text queries were picking up
thousands of resumes that the proprietary algorithms missed due to misspell-
ings, synonyms, and other factors. The full-text results were also more accurate
when dealing with the series of keywords on which recruiters normally
searched.
Our simple test turned into a full-blown pilot program. In less than a day, the
developers could switch over the application’s querying capability to use the
full-text index. Three days later, the application was in production with spectac-
ular results. The day the new application went into production, the company
shattered all previous records for matching potential candidates to job openings.
Over the next two months, the company hit a record for placements, only to
break it the following week. The agency no longer needed the position of

research assistant, so it moved its research assistants into other roles, with most
of them receiving promotions to junior recruiter.
Implementing the full-text feature also let the company eliminate the entire scan-
ning and optical character recognition (OCR) process it previously used.
Resumes submitted in plain-text format were loaded into one column. Resumes
that were submitted in any other format were converted to Microsoft Word or
PDF format and loaded directly into the database. The IT team then used the
full-text engine with add-in filters that could break the resumes down into words
and index them in native document format without requiring any of the previ-
ous time-consuming text conversions.
C2162271X.fm Page 815 Friday, April 29, 2005 8:07 PM
816 Chapter 21 Creating Full-Text Catalogs
In one case, one of the agency’s sales representatives was visiting a potential new
customer, hoping to sign a contract to manage the customer’s recruiting efforts.
The customer decided to give the agency a test on the spot and handed the rep-
resentative a profile for a new job title that it was creating based on changes in its
industry that had occurred just two weeks earlier. The sales rep did not know
the customer was considering four other recruiting agencies. After getting a net-
work connection, the sales rep immediately found 15 potential candidates for
the new position. She walked out of the meeting with a contract in hand because
none of the competitors could even find a reference to the skill set the customer
was asking for.
Over the next two years, this company expanded operations to span the globe,
recording a corresponding 50× increase in number of placements. All this success
came with very little investment in IT because full-text indexing could adapt
itself to any language needed. As we write this book, the recruiting agency is fin-
ishing its pilot program for upgrading to SQL Server 2005 and is expecting to
reap significant performance improvements.
NOTE Chapter conventions
As with many technologies within SQL Server 2005, you can use SQL Server Management Studio

(SSMS) to administer full-text indexing by pointing and clicking your way through administration
screens. And you might choose to use SSMS to manage full-text functionality in your organization.
However, walking through the screens in the SSMS graphical user interface (GUI) doesn’t explain
very much about the functionality you can leverage. Because the SSMS screens and wizards submit
Transact-SQL commands to SQL Server to perform the specified tasks, this chapter uses this code
to explain what you can do to take advantage of full-text indexing in a variety of situations.
C2162271X.fm Page 816 Friday, April 29, 2005 8:07 PM
Lesson 1: Creating a Full-Text Catalog 817
Lesson 1: Creating a Full-Text Catalog
Full-Text Search is based on the technology of full-text indexes. Although you create full-
text indexes on columns within tables in SQL Server databases, the full-text indexes are
maintained in a structure outside of SQL Server called a full-text catalog. A full-text cata-
log stores one or more full-text indexes. In this lesson, you will see how to use the Trans-
act-SQL CREATE FULLTEXT CATALOG command to create a full-text catalog.
After this lesson, you will be able to:
■ Create a full-text catalog.
Estimated lesson time: 20 minutes
How to Create a Full-Text Catalog
The first step in creating full-text indexing is to create a full-text catalog to hold the
indexes. You create a catalog by using the CREATE FULLTEXT CATALOG Transact-
SQL command, as the following general syntax shows:
CREATE FULLTEXT CATALOG catalog_name
[ON FILEGROUP filegroup ]
[IN PATH 'rootpath']
[WITH <catalog_option>]
[AS DEFAULT]
[AUTHORIZATION owner_na me ]
<catalog_option>::=
ACCENT_SENSITIVITY = {ON|OFF}
After giving the catalog a name, you specify a filegroup for the catalog, which needs to

be part of the database for which the catalog will contain indexes. Although you can
put the catalog on the default filegroup, it is a good practice to put a catalog on a sec-
ondary filegroup and to use this filegroup only for full-text catalogs. This configura-
tion lets you use filegroup backup and restore to back up and restore a full-text
catalog independently of the rest of the database.
You use the command’s IN PATH clause to specify the root directory in which the full-
text catalog will be stored. For full-text catalogs, the filegroup specification simply
associates a full-text catalog to a filegroup for use with backup and restore operations.
However, the actual catalog is stored within a physical directory structure outside a
database. When you create a catalog, a directory with the same name as your catalog
is created in this root directory. If a directory that uses the same name as your catalog
already exists, a suffix is appended to the name to create a unique directory structure.
C2162271X.fm Page 817 Friday, April 29, 2005 8:07 PM
818 Chapter 21 Creating Full-Text Catalogs
Within this directory structure, as indexes are added to the catalog, subdirectories are
created to contain them.
You use the command’s WITH clause to specify accent sensitivity. If you don’t specify
an option for this clause, the full-text catalog uses the setting from the database’s col-
lation. Otherwise, you can explicitly specify whether the catalog should be sensitive
to accents. If you change this option later, you must rebuild all full-text indexes within
the catalog.
The next clause, AS DEFAULT, serves a similar purpose as setting a default filegroup.
When you create full-text indexes without explicitly specifying a catalog, SQL Server
creates the indexes within the default catalog.
The command’s AUTHORIZATION clause simply specifies the user or role that owns
the catalog.
Quick Check
1. What is the purpose of a full-text catalog?
2. Where is a full-text catalog stored?
Quick Check Answers

1. A full-text catalog provides the basic storage container for one or more full-
text indexes.
2. Full-text catalogs, along with their associated indexes, are stored in a direc-
tory structure that is external to SQL Server.
PRACTICE Create a Full-Text Catalog
In this practice, you create a full-text catalog to use with the AdventureWorks database.
1. Create a directory on the operating system named C:\test.
2. Launch SSMS, connect to your instance, and open a new query window.
3. Add a new filegroup to the AdventureWorks database that you will use for the full-
text catalog by executing the following batch:
USE master
GO
ALTER DATABASE AdventureWorks ADD FILEGROUP FTFG1
GO
ALTER DATABASE AdventureWorks ADD FILE ( NAME = N'AdventureWorksFT_data',
C2162271X.fm Page 818 Friday, April 29, 2005 8:07 PM
Lesson 1: Creating a Full-Text Catalog 819
FILENAME = N'C:\TEST\AdventureWorksFT_data.ndf' , SIZE = 2048KB , FILEGROWTH =
1024KB ) TO FILEGROUP [FTFG1]
GO
NOTE Filegroup must have primary file
Although full-text catalogs and indexes are stored in a directory structure external to SQL
Server, the filegroup on which a full-text catalog is placed must have at least one active file.
This file cannot be marked READ ONLY or taken OFFLINE.
4. Create a full-text catalog on the FTFG1 filegroup by executing the following
command:
USE AdventureWorks;
GO
CREATE FULLTEXT CATALOG AWCatalog ON FILEGROUP FTFG1 IN PATH 'C:\TEST' AS DEFAULT;
GO

Lesson Summary
■ The first step in setting up full-text indexing is to define a catalog to store one or
more full-text indexes that are used to process queries.
■ You use the CREATE FULLTEXT CATALOG Transact-SQL command to create a
full-text catalog.
■ Although you must associate a full-text catalog with a filegroup for backup and
restore purposes, full-text catalogs are stored in a directory structure external to
the database.
Lesson Review
The following questions are intended to reinforce key information presented in this
lesson. The questions are also available on the companion CD if you prefer to review
them in electronic form.
NOTE Answers
Answers to these questions and explanations of why each answer choice is right or wrong are
located in the “Answers” section at the end of the book.
1. Where does the full-text catalog physically exist?
A. Within the database in which it is associated
B. In the msdb database
C. In an external directory structure
D. In a filegroup for the database
C2162271X.fm Page 819 Friday, April 29, 2005 8:07 PM
820 Chapter 21 Creating Full-Text Catalogs
Lesson 2: Creating a Full-Text Index
After you have created a full-text catalog, you need to create one or more full-text
indexes before you can execute full-text queries. In this lesson, you will review the
powerful architecture of full-text indexing and then see how to create an index by
using the CREATE FULLTEXT INDEX Transact-SQL command.
After this lesson, you will be able to:
■ Explain the terminology associated with full-text indexing.
■ Create a full-text index.

Estimated lesson time: 20 minutes
Full-Text Index Architecture
You can build full-text indexes on textual data stored in char, nchar, varchar, nvarchar,
varchar(max), text, ntext, image, varbinary, varbinary(max), and xml columns. How-
ever, the image, varbinary, and varbinary(max) columns require special handling if
you want to use them for full-text processing.
You use multiple helper services to build a compact and efficient full-text index. These
services include word breakers and stemmers, language files, noise word files, filters,
and protocol handlers.
Word breakers are routines that find the breaks between words and generate a basic
word list for each row within the column or columns that you are indexing. Stemmers
conjugate verbs. Word breakers and stemmers work with language files to understand
the words that are in the input stream. Language files, in conjunction with word
breakers and stemmers, allow full-text indexing to handle multiple languages without
requiring translation routines or specialized processing.
Commonly used words in a language are referred to as noise words. Noise words are
contained in language-specific noise files, which contain basic structural elements
that are not useful for search routines. Examples of noise words for the English lan-
guage are “the,” “a,” and “an.” When the word-breaker routine encounters a noise
word for the particular language being processed, it ignores the word. Thus, a full-text
index does not include all possible words in a column, but only those that are inter-
esting for queries.
C2162271X.fm Page 820 Friday, April 29, 2005 8:07 PM
Lesson 2: Creating a Full-Text Index 821
NOTE Configuring noise words
SQL Server ships with a default set of noise word files for each language. These files are stored in
$SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\. The files are simple text
files that you can edit to include noise words specific to your application that you want to exclude.
If a word exists in this file, it is not indexed and is excluded from any full-text queries.
At this point, you might be thinking that you can create full-text indexes only on text-

based columns. This is not true. You use protocol handlers and filters when you want to
create a full-text index on a varbinary, varbinary(max), or image column. These ser-
vices let you extract text from Word, Excel, and PowerPoint files as well as PDF and
other files that are stored in a native format inside SQL Server. For the filters to work,
you need to add a column to the table to contain a value that indicates the type of doc-
ument stored in the column. The filter then loads up the binary stream stored in the
column, strips all the formatting information, and returns the text within the docu-
ment to the word-breaker routine.
BEST PRACTICES Filters
By taking advantage of filters, you no longer have to convert files to a text-based format before
being able to use full-text indexing on them. You can store files in their native format inside SQL
Server while still allowing full-search capability.
After the word-breaker routine has a list of valid words for a row within a column, the
full-text engine calculates tokens to represent the words. A token is simply a com-
pressed form of the original word that saves space and ensures that full-text indexes
can be created in as compact a form as possible.
The full-text functionality then builds all the tokens in a column into an inverted,
stacked, compressed structure within a file that is used for search operations. This
unique structure allows ranking and scoring algorithms to efficiently satisfy possible
queries.
How to Create a Full-Text Index
To create a full-text index, you use the CREATE FULLTEXT INDEX Transact-SQL com-
mand, as the following generic syntax shows:
CREATE FULLTEXT INDEX ON table_name
[(column_name [TYPE COLUMN type_column_name]
[LANGUAGE language_term] [, n])]
C2162271X.fm Page 821 Friday, April 29, 2005 8:07 PM
822 Chapter 21 Creating Full-Text Catalogs
KEY INDEX index_name
[ON fulltext_catalog_name]

[WITH
{CHANGE_TRACKING {MANUAL | AUTO | OFF [, NO POPULATION]}}
]
The first part of this command specifies the table on which you want to create the full-
text index. Although you can index multiple columns in a table, only one full-text
index per table is allowed.
You then specify the column or columns you want to index. If you specify a column
of type varbinary, varbinary(max), or image for indexing, you must also specify the
TYPE COLUMN clause. This clause refers to the column discussed earlier that you
need to add to the table to designate the format of the column’s data.
NOTE Type columns
A type column is a character column that contains an abbreviation that corresponds to the con-
tents of a column being indexed. For example, a value of .doc indicates a Word document. This
value is entered on a row-by-row basis, so multiple different document types can be stored in a
single column. This column is used to load the correct filter for the word-breaker routine when the
index is built on a varbinary, varbinary(max), or image column.
As you are specifying the column and column type for the index, you can also specify
an explicit language for the column. You might need to specify this clause when you
are indexing a table that contains multiple columns in which each column contains
different languages, such as a column that is translated into multiple languages.
The command’s KEY INDEX clause specifies the table’s unique column. This column
uniquely identifies each row in the table so that the full-text index can be correlated
to rows in the table. The key must be a single column in the table; compound keys are
not allowed.
The next clause, ON, enables you to specify the full-text catalog on which the index is
created.
And the final clause specifies whether changes to the indexed data are tracked. With reg-
ular indexes, SQL Server always maintains the index in sync with the underlying data by
causing changes in the index at the same time as changes to the referenced data are
made. Full-text indexes, however, are separated from normal database transaction pro-

cesses so that changes to data in columns that are full-text indexed are propagated into
the index via a background process that does not immediately reflect the data changes.
When the change-tracking value is set to MANUAL, changes to the data in the col-
umns need to be propagated into the index either manually or by scheduling a job in
C2162271X.fm Page 822 Friday, April 29, 2005 8:07 PM
Lesson 2: Creating a Full-Text Index 823
SQL Server Agent to propagate the changes. The default value of AUTO causes a
change to be propagated into the index by using a background process that occurs
outside of the transaction making the change. And when this value is set to OFF, SQL
Server does not track any changes, which causes the index to become further and fur-
ther out of date until it is rebuilt either manually or via a SQL Server Agent job. The
OFF option also includes a NO POPULATION clause that you can specify to cause a
full-text index to be created without populating the index.
BEST PRACTICES Initial catalog population
Populating a full-text index is a very resource- and input/output (I/O)-intensive operation. The ini-
tial creation of a full-text index should usually be performed when activity in the database is very
low, so most database administrators (DBAs) create full-text indexes by using the OFF and NO
POPULATION clauses, and then they create a job to populate all full-text indexes when minimal
database activity is occurring. After the index is populated, if the column on which the index is
created does not change frequently, you can then normally set change tracking to AUTO to keep
the index up to date.
Quick Check
■ What are the requirements for creating a full-text index?
Quick Check Answer
■ Only one full-text index can be created on a table. The columns in a full-text
index can be character (all types), varbinary, and image data types. A single-
column unique key must exist on the table.
PRACTICE Create a Full-Text Index
In this practice, you create two full-text indexes. The first index is on a character-based
column. The second index takes advantage of the filters that ship with SQL Server to

index a column containing Word documents stored in a varbinary(max) column.
1. If necessary, launch SSMS, connect to your instance, and open a new query
window.
2. Create a full-text index on the ProductionDescription column in the Produc-
tion.ProductDescription table in the AdventureWorks database by executing the
following command:
CREATE FULLTEXT INDEX ON Production.ProductDescription (Description) KEY INDEX
PK_ProductDescription_ProductDescriptionID ON AWCatalog WITH CHANGE_TRACKING AUTO;
C2162271X.fm Page 823 Friday, April 29, 2005 8:07 PM
824 Chapter 21 Creating Full-Text Catalogs
3. Create a full-text index on the Document column of the Production.Document
table by executing the following command:
CREATE FULLTEXT INDEX ON Production.Document (Document TYPE COLUMN FileExtension) KEY
INDEX PK_Document_DocumentID ON AWCatalog WITH CHANGE_TRACKING AUTO;
4. Observe the changes on the file system after the indexes are created.
5. View the full-text catalog and associated indexes inside SSMS.
Lesson Summary
■ To create a full-text index, you use the CREATE FULLTEXT INDEX Transact-SQL
command.
■ You can create full-text indexes on a variety of columns, including text-based,
binary, and image columns.
■ Varbinary and image columns let you store files in their native format within
SQL Server while still making these files available for full-text indexing and
searching.
■ To build a compact and efficient full-text index, you use multiple helper services,
including word-breaker routines, language files, noise word files, filters, and pro-
tocol handlers.
Lesson Review
The following questions are intended to reinforce key information presented in this
lesson. The questions are also available on the companion CD if you prefer to review

them in electronic form.
NOTE Answers
Answers to these questions and explanations of why each answer choice is right or wrong are
located in the “Answers” section at the end of the book.
1. Which of the following are requirements for creating a full-text index? (Choose
all that apply.)
A. Primary key
B. Single-column unique index
C. Image column
D. Text-based column
C2162271X.fm Page 824 Friday, April 29, 2005 8:07 PM
Lesson 3: Populating a Full-Text Index 825
Lesson 3: Populating a Full-Text Index
As Lesson 2 noted, because of the external structure for storing full-text indexes,
changes to underlying data columns are not immediately reflected in the full-text
index. Instead, a background process enlists the word breakers, filters, and noise
word files to build the tokens for each column, which are then merged back into the
main index either automatically or manually. This update process is called population
or a crawl. To keep your full-text indexes up to date, you must periodically populate
them. This lesson shows you how to perform a full or partial population of a full-text
index.
After this lesson, you will be able to:
■ Specify an index-population method.
Estimated lesson time: 20 minutes
Specifying an Index-Population Method
You can choose from three modes for full-text index population:
■ Full
■ Incremental
■ Update
A full population causes the full-text engine to read and process all rows from the

table for the indexed columns. Because full population is very resource-intensive, you
typically use full population for the initial population of the full-text index and then
use either an incremental or update population to keep the index up to date.
Incremental population automatically populates the index for rows that were modi-
fied since the last population. Incremental population requires a timestamp column
on the table, which the full-text engine uses to determine which rows have changed.
If any metadata for the index has changed since the last population, the incremental
population is performed as a full population.
Update population uses the changes that SQL Server tracks to process any inserts,
updates, and deletes since the last time a change-tracked index was populated. With
this population mode, you can specify how you want to propagate the changes to
the index. Specifying AUTO for change tracking enables automatic processing; with
MANUAL you can implement a manual method for processing changes.
C2162271X.fm Page 825 Friday, April 29, 2005 8:07 PM
826 Chapter 21 Creating Full-Text Catalogs
You use the ALTER FULLTEXT INDEX Transact-SQL command to populate a full-text
index, as the following general syntax shows:
ALTER FULLTEXT INDEX ON table_name
{ SET CHANGE_TRACKING { MANUAL | AUTO | OFF }
| START { FULL | INCREMENTAL | UPDATE } POPULATION
| STOP POPULATION
}
Populating a Full-Text Catalog
In addition to periodically populating your full-text indexes, you might also need to
rebuild or reorganize a full-text catalog to update all the indexes in the catalog. The
following syntax shows the ALTER FULLTEXT CATALOG Transact-SQL command
that enables you to operate on all indexes in a full-text catalog at the same time:
ALTER FULLTEXT CATALOG catalog_name
{ REBUILD [ WITH ACCENT_SENSITIVITY = { ON | OFF } ]
| REORGANIZE

| AS DEFAULT
}
When you use the REBUILD option for this command, the full-text catalog is deleted
from the file system and rebuilt. You generally use this option only when you need to
change the ACCENT_SENSITIVITY setting for the catalog.
Specifying the REORGANIZE option causes all indexes in the catalog to have all
changes merged. This operation frees up disk and memory resources, and you should
run the ALTER FULLTEXT CATALOG command with this option periodically to
achieve maximum full-text performance.
Quick Check
■ Why do you need to periodically perform index populations?
Quick Check Answer
■ Full-text indexes are an external structure, so they are not updated at the
same time as changes are made to the underlying data columns. A back-
ground process enlists the word breakers, filters, and noise word files to
build the tokens for each column, which are then merged back into the
main index either automatically or manually.
C2162271X.fm Page 826 Friday, April 29, 2005 8:07 PM
Lesson 3: Populating a Full-Text Index 827
PRACTICE Populate a Full-Text Index
In this practice, you perform a full repopulation of the two indexes you created earlier.
1. If necessary, launch SSMS, connect to your instance, and open a new query
window.
2. Execute a full population of the full-text indexes on the ProductDescription and
Document columns by executing the following batch:
ALTER FULLTEXT INDEX ON Production.ProductDescription START FULL POPULATION;
ALTER FULLTEXT INDEX ON Production.Document START FULL POPULATION;
3. Explain why the output of these commands was a warning that the commands
will be ignored.
Lesson Summary

■ To keep full-text indexes in sync with the columns they are built on and to per-
form maintenance on the indexes, you must periodically populate the indexes.
■ You can completely rebuild or incrementally populate an individual index by
using the ALTER FULLTEXT INDEX Transact-SQL command.
■ On a periodic basis, you can also reorganize a full-text catalog to free up disk and
memory for all full-text indexes in the catalog by using the ALTER FULLTEXT
CATALOG command.
Lesson Review
The following questions are intended to reinforce key information presented in this
lesson. The questions are also available on the companion CD if you prefer to review
them in electronic form.
NOTE Answers
Answers to these questions and explanations of why each answer choice is right or wrong are
located in the “Answers” section at the end of the book.
1. Which of the following is a valid population option for a full-text index?
A. REORGANIZE
B. INCREMENTAL
C. REBUILD
D. COMPLETE
C2162271X.fm Page 827 Friday, April 29, 2005 8:07 PM
828 Chapter 21 Creating Full-Text Catalogs
Lesson 4: Querying Data by Using a Full-Text Index
Full-text indexes are useful only if they are used to satisfy requests. However, regular
Transact-SQL statements will not cause the query optimizer to automatically select a
full-text index. You gain full-text query capability by using the CONTAINS, CONTAIN-
STABLE, FREETEXT, and FREETEXTTABLE full-text query keywords. Full-text query
keywords are available in two types: predicate functions and rowset functions. The
CONTAINS and FREETEXT functions are query predicates that return a simple True
or False result to limit the result set. The CONTAINSTABLE and FREETEXTTABLE
functions return a rowset that must be joined to another table based on a key value;

you can use these functions to extend the capabilities of your queries. This lesson
looks at each of these full-text keywords and describes which are appropriate for dif-
ferent needs, showing you the query syntax you need to use as well as query examples
to help you understand the different results you can achieve.
After this lesson, you will be able to:
■ Explain the differences between the full-text query keywords.
■ Submit full-text queries.
Estimated lesson time: 20 minutes
Query Execution
When you execute queries that use full-text functions, SQL Server first parses and
compiles them and then hands them to the query optimizer. The optimizer recog-
nizes the full-text functions and routes them to the full-text search engine. The full-
text search engine takes the search terms passed and routes them through the
dedicated query processor for full-text queries. Before the query processor can
search for the keywords in the index, the keywords must be transformed into
matching tokens. For this transformation, the full-text query processor launches
the word breakers, stemmers, and noise word files discussed previously in this
chapter. It also interrogates a thesaurus, which returns a list of synonyms that are
also searched.
It is this extended capability for matching on derivatives as well as synonyms that
makes full-text searches so powerful and flexible—capable of even handling common
word misspellings. Let’s look at each of the full-text query keywords in turn.
C2162271X.fm Page 828 Friday, April 29, 2005 8:07 PM
Lesson 4: Querying Data by Using a Full-Text Index 829
FREETEXT
The FREETEXT function accepts one or more columns to search and a search argu-
ment. This function performs a fuzzy search in that it automatically searches for inflec-
tional forms (stemming) as well as related words that the thesaurus identifies. The
general syntax of the FREETEXT function is as follows:
FREETEXT ( { column_name |(column_list)|*}

,'freetext_string' [ , LANGUAGE language_term ])
This function does not provide the customization or precision that you typically want
in production applications. For example, searching for the keyword “bike” by using
the FREETEXT function would return “bike”, “biker”, “bike riding”, “bike-riding”, and
various synonyms as well as any word that contains the word “bike”.
Let’s say you execute the following query against the Production.ProductDescription
table using first the FREETEXT function and then the CONTAINS function:
SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE FREETEXT(Description, N'bike');

SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, N'bike');
The FREETEXT query returns 16 rows of data as opposed to the 14 rows that the
CONTAINS query returns.
Search Argument Data Types for Full-Text Functions
When the SQL Server query optimizer decides how to efficiently satisfy a query,
it uses an inspection algorithm that looks at value-distribution statistics to deter-
mine whether an index should be used for the query. The full-text engine also
maintains a set of distribution statistics.
The optimization process includes an algorithm generically referred to as
“parameter sniffing,” which can handle explicit values as well as values con-
tained within variables. The algorithm gives each potential path a basic score
that indicates how selective a given path is for a query. These numbers are then
used to determine whether an index seek, index scan, nested loop, or other
method is used to satisfy a segment of a query.
What does parameter sniffing have to do with full-text queries? When the full-text
optimizer cannot use parameter sniffing for one reason or another, it essentially
C2162271X.fm Page 829 Friday, April 29, 2005 8:07 PM
830 Chapter 21 Creating Full-Text Catalogs
makes a guess. This educated guess assigns a value based on the number of rows

in the table up to a maximum value. In SQL Server 2000, the maximum value for
a full-text function was 1,000. In SQL Server 2005, this value has been increased
to 10,000. Obviously, guessing in the context of such a large maximum value has
a significant potential to generate an inefficient query plan.
You can inspect these estimated values by looking in the TotalSubtreeCost col-
umn after using the SET STATISTICS PROFILE ON command.
Many people unknowingly prevent the optimizer from using parameter sniffing
by the search arguments they use with the full-text functions. The full-text func-
tions expect a Unicode data type for the search argument. Failure to pass in a
Unicode argument prevents the optimizer from using parameter sniffing to eval-
uate distribution statistics. On large tables, this problem is magnified, forcing
the optimizer to make an educated guess that can result in a suboptimal query
plan.
FREETEXTTABLE
The FREETEXTTABLE function works exactly like the FREETEXT function except that
it returns a rowset that contains a rank column. The RANK column provides a
numeric value between 1 and 1,000 that is a relative number indicating how well the
row matches the search criteria. The KEY column returns the unique key that is used
to identify the row. The general syntax of this function is as follows:
FREETEXTTABLE (table ,{column_name |(column_list)|*}

,'freetext_string'

[ ,LANGUAGE language_term ]

[,top_n_by_rank ])
The FREETEXTTABLE version of the previous FREETEXT query would look like this:
SELECT PD.ProductDescriptionID, PD.Description, KEYTBL.[KEY], KEYTBL.RANK
from Production.ProductDescription AS PD


INNER JOIN FREETEXTTABLE(Production.ProductDescription,Description,N'bike')
AS KEYTBL ON PD.ProductDescriptionID = KEYTBL.[KEY];
Note that the column named KEY in the result set must be enclosed in brackets
because it is a Transact-SQL keyword. A higher value for RANK indicates a less-precise
C2162271X.fm Page 830 Friday, April 29, 2005 8:07 PM
Lesson 4: Querying Data by Using a Full-Text Index 831
match to the search terms. The results of this sample query provide additional insight
into why the FREETEXT and CONTAINS queries return different results. Keys 1187
and 1188 provide close but not exact matches by returning rows that have words with
a substring of the search term.
CONTAINS
The CONTAINS function lets you use precise as well as fuzzy matching algorithms to
satisfy full-text queries. As you can see from the following general syntax for the func-
tion, it accepts a variety of parameters to let you specify exact behaviors:
CONTAINS
({column_name |(column_list)|*}
, '< contains_search_condition >'
[ , LANGUAGE language_term ]
)
< contains_search_condition > ::=
{ < simple_term >
| < prefix_term >
| < generation_term >
| < proximity_term >
| < weighted_term >
}
| { ( < contains_search_condition > )
[{<AND>|<ANDNOT>|<OR>}]
< contains_search_condition > [ n ]
}

< simple_term > ::=
word | " phrase "
< prefix term > ::=
{ "word * " | "phrase *" }
< generation_term > ::=
FORMSOF ( { INFLECTIONAL | THESAURUS } , < simple_term > [ , n ])
< proximity_term > ::=
{ < simple_term > | < prefix_term > }
{{NEAR|~}
{ < simple_term > | < prefix_term > }
} [ n ]
< weighted_term > ::=
ISABOUT
({{
< simple_term >
| < prefix_term >
| < generation_term >
| < proximity_term >
}
[ WEIGHT ( weight_value )]
}[, n ]
)
C2162271X.fm Page 831 Friday, April 29, 2005 8:07 PM
832 Chapter 21 Creating Full-Text Catalogs
< AND > ::=
{AND|&}
< AND NOT > ::=
{ANDNOT|&!}
<OR>::=
{OR||}

You can specify search arguments as exact matches or as prefixes. The following
query, for example, finds all rows that have an exact match for the word “bike”:
SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, N'');
The next query returns rows that have an exact match for the word “bike” and rows
that contain any words that start with “bike”. You specify “bike” as a prefix by using
an asterisk (*) after the term and enclosing the search term in double quotation
marks:
SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, N'"bike*"');
The keywords FORMSOF, INFLECTIONAL, and THESAURUS allow matches on vari-
ants of a search term. INFLECTIONAL causes the search to consider word stems in a
search. For example, searching for the word “drive” will also produce matches on
“drove”, “driven”, “driving”, and so on. By specifying the use of a THESAURUS, the
query processor also returns synonyms as matches for the search term. For example,
“metal” also returns results for “gold”, “aluminum”, “steel”, and so on.
Examples of each of these queries are as follows:
SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, N' FORMSOF (INFLECTIONAL, drive) ');

SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, N' FORMSOF (THESAURUS, metal) ');
NOTE Thesaurus files
All thesaurus files are shipped empty. For thesaurus matches to work, these files must be popu-
lated. All thesaurus files are XML documents; you can find them in the $SQL_Server_Install_Path\
Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\ directory. For information about populating
thesaurus files, see the SQL Server 2005 Books Online article “Configuring Thesaurus Files.”
Word proximity is a common way of searching documents for multiple keywords
or phrases. This type of query uses the NEAR (
~

) keyword. The closer words are to
C2162271X.fm Page 832 Friday, April 29, 2005 8:07 PM
Lesson 4: Querying Data by Using a Full-Text Index 833
each other, the better the match for these types of queries. The proximity is used as
part of the RANK calculation for rows matching the search criteria. This keyword
is rarely used with the CONTAINS predicate because the rank of matched results
cannot be evaluated directly. The following two queries are equivalent to each
other:
SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, N'mountain NEAR bike');

SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, N'mountain ~ bike');
As if all these options were not enough, you can also assign relative weights to par-
ticular search terms. This weighting affects the ranking score that the full-text opti-
mizer returns by causing a particular term to be considered more or less significant.
The WEIGHT clause, a value between 0.0 and 1.0, has no effect on queries that use
CONTAINS, but it does affect the RANK value returned with CONTAINSTABLE.
Because the AdventureWorks sample database deals with bikes, the following query
uses weighting to place more emphasis on the word “mountain” than the word
“bike”:
SELECT Description FROM Production.ProductDescription
WHERE CONTAINS(Description,'ISABOUT (mountain weight (.8), bike weight (.2) )' );
Also note that you can use multiple keywords in a search. To specify multiple key-
words, you separate the terms by the keywords AND, AND NOT, or OR to include or
exclude rows.
CONTAINSTABLE
The CONTAINSTABLE function has the same capabilities as the CONTAINS function.
However, like the FREETEXTTABLE function, it returns a rowset that contains a
RANK and a KEY column that can be used to return the best matches to a search. The

general syntax for this function is the following:
CONTAINSTABLE ( table ,{column_name |(column_list )|*},'
< contains_search_condition > '
[ , LANGUAGE language_ter m]
[,top_n_by_rank ]
)
C2162271X.fm Page 833 Friday, April 29, 2005 8:07 PM
834 Chapter 21 Creating Full-Text Catalogs
Quick Check
■ What is the difference between the CONTAINS and FREETEXT functions?
Quick Check Answer
■ FREETEXT is a less-precise way of querying full-text data because it auto-
matically searches for all forms and synonyms of a word or words. CON-
TAINS allows a precise specification for a query, including the capability to
search by word proximity, weighting, and complex pattern matching.
PRACTICE Query a Full-Text Index
In this practice, you execute a query by using the full-text indexes that you previously
created.
1. If necessary, launch SSMS, connect to your instance, and open a new query
window.
2. Execute the following batch:
SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, 'alloy');

SELECT ProductDescriptionID, Description FROM Production.ProductDescription
WHERE CONTAINS(Description, 'same');
3. Why does the first query return results, whereas the second query does not?
Lesson Summary
■ SQL Server passes full-text queries to the full-text search engine, which routes
search terms through the dedicated query processor for full-text queries.

■ The full-text query processor provides extended capability for matching on
derivatives as well as synonyms, making full-text searches powerful, flexible, and
even capable of handling common word misspellings.
■ You can use the FREETEXT and FREETEXTTABLE full-text functions to provide
a general sampling of rows that might match the search argument.
■ You use the CONTAINS and CONTAINSTABLE functions in production applica-
tions to allow for very precise criteria to target search results.
C2162271X.fm Page 834 Friday, April 29, 2005 8:07 PM
Lesson 4: Querying Data by Using a Full-Text Index 835
Lesson Review
The following questions are intended to reinforce key information presented in this
lesson. The questions are also available on the companion CD if you prefer to review
them in electronic form.
NOTE Answers
Answers to these questions and explanations of why each answer choice is right or wrong are
located in the “Answers” section at the end of the book.
1. Which of the following is a valid option for a FREETEXT or FREETEXTTABLE
query?
A. THESAURUS
B. NEAR
C. WEIGHT
D. LANGUAGE
C2162271X.fm Page 835 Friday, April 29, 2005 8:07 PM
836 Chapter 21 Review
Chapter Review
To further practice and reinforce the skills you learned in this chapter, you can
■ Review the chapter summary.
■ Review the list of key terms introduced in this chapter.
■ Complete the case scenario. This scenario sets up a real-world situation involv-
ing the topics of this chapter and asks you to create a solution.

■ Complete the suggested practices.
■ Take a practice test.
Chapter Summary
■ SQL Server’s Full-Text Search component, based on full-text indexes, lets you
efficiently query unstructured data stored within SQL Server.
■ To implement full-text indexing, you need to take the following steps:
❑ Create a full-text catalog to contain the full-text indexes.
❑ Create one or more full-text indexes within a full-text catalog.
❑ Specify a method to populate the full-text indexes to keep them up to date
with underlying data.
■ After you create the full-text indexes, you can execute full-text queries by using
the CONTAINS, CONTAINSTABLE, FREETEXT, and FREETEXTTABLE functions.
Key Terms
Do you know what these key terms mean? You can check your answers by looking up
the terms in the glossary at the end of the book.
■ catalog population
■ crawl
■ filter
■ full-text catalog
■ full-text index
■ helper service
■ index population
C2162271X.fm Page 836 Friday, April 29, 2005 8:07 PM
Chapter 21 Review 837
■ language file
■ noise words
■ protocol handler
■ token
■ word breaker
Case Scenario: Building Full-Text Indexes

In the following case scenario, you apply what you’ve learned in this chapter. You can
find answers to these questions in the “Answers” section at the end of this book.
Contoso Limited, a health care company located in Bothell, WA, maintains a large
database of patient claims records. Each patient claim contains documents for the ini-
tial claim, documentation to justify the claim, and supporting documents such as doc-
tor evaluations and records, as well as documents that describe treatments. Contoso
has captured and stored all this data within a SQL Server 2005 database. Data exists
in the database in a variety of formats: as discrete data in varchar columns, as Word
documents, as PDFs, and as scanned images (OCR to text) in image columns.
Now Contoso wants to add several new features within the existing application to
enhance the company’s analysis capabilities. The company needs to implement a
fraud-detection system to find out whether particular doctors are involved in numer-
ous claims, far beyond normal. After doctors in this group are identified, an analyst
needs to be able to cross-reference diagnoses and claims by using flexible criteria.
Contoso also needs to be able to check a claim against prior records for the same
patient to determine whether this is a recurring injury and whether previous claims
were accepted or rejected. The company also wants to analyze claim volumes by com-
pany as well as break down the claim amounts by specific type of injury.
How would you implement these features into the application by using the least
amount of time and effort?
Suggested Practices
To help you successfully master the exam objectives presented in this chapter, com-
plete the following practice tasks.
C2162271X.fm Page 837 Friday, April 29, 2005 8:07 PM

×