Professional
Microsoft® Search
SharePoint® 2007 and Search Server 2008
Thomas Rizzo
Richard Riley
Shane Young
Wiley Publishing, Inc.
ffirs.indd v
8/2/08 2:51:52 PM
Professional Microsoft® Search
Introduction ..............................................................................................xxvii
Chapter 1: Introduction to Enterprise Search .................................................. 1
Chapter 2: Overview of Microsoft Enterprise Search Products ......................... 5
Chapter 3: Planning and Deploying an Enterprise Search Solution.................. 27
Chapter 4: Configuring and Administering Search ......................................... 45
Chapter 5: Searching LOB Systems with the BDC ......................................... 85
Chapter 6: User Profiles and People Search ................................................ 131
Chapter 7: Extending Search with Federation ............................................. 155
Chapter 8: Securing Your Search Results.................................................... 189
Chapter 9: Customizing the Search Experience ........................................... 215
Chapter 10: Understanding and Tuning Relevance ...................................... 253
Chapter 11: Building Applications with the Search
API and Web Services ............................................................. 289
Index ........................................................................................................ 333
ffirs.indd i
8/2/08 2:51:50 PM
ffirs.indd ii
8/2/08 2:51:52 PM
Professional
Microsoft® Search
ffirs.indd iii
8/2/08 2:51:52 PM
ffirs.indd iv
8/2/08 2:51:52 PM
Professional
Microsoft® Search
SharePoint® 2007 and Search Server 2008
Thomas Rizzo
Richard Riley
Shane Young
Wiley Publishing, Inc.
ffirs.indd v
8/2/08 2:51:52 PM
Professional Microsoft® Search:
SharePoint® 2007 and Search Server 2008
Published by
Wiley Publishing, Inc.
10475 Crosspoint Boulevard
Indianapolis, IN 46256
www.wiley.com
Copyright © 2008 by Wiley Publishing, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-0-470-27933-5
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2 1
Library of Congress Cataloging-in-Publication Data
Rizzo, Thomas, 1972Professional Microsoft SharePoint search / Thomas Rizzo, Richard Riley, Shane Young.
p. cm.
Includes index.
ISBN 978-0-470-27933-5 (paper/website)
1. Querying (Computer science)—Computer programs. 2. Business enterprises—Computer networks.
3. Intranet programming. 4. Microsoft SharePoint (Electronic resource) 5. Search engines—Computer
programs. 6. Internet searching—Computer programs. I. Riley, Richard, 1973- II. Young, Shane, 1977III. Title.
QA76.625.R58 2008
006.7'6—dc22
2008029091
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections
107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or
authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood
Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be
addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317)
572-3447, fax (317) 572-4355, or online at />Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties
with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties,
including without limitation warranties of fitness for a particular purpose. No warranty may be created or
extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for
every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal,
accounting, or other professional services. If professional assistance is required, the services of a competent
professional person should be sought. Neither the publisher nor the author shall be liable for damages arising
herefrom. The fact that an organization or Website is referred to in this work as a citation and/or a potential source
of further information does not mean that the author or the publisher endorses the information the organization or
Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites
listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services please contact our Customer Care Department within
the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Trademarks: Wiley, the Wiley logo, Wrox, the Wrox logo, Wrox Programmer to Programmer, and related trade dress
are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other
countries, and may not be used without written permission. Microsoft and SharePoint are registered trademarks of
Microsoft Corporation in the United States and/or other countries. All other trademarks are the property of their
respective owners. Wiley Publishing, Inc. is not associated with any product or vendor mentioned in this book.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be
available in electronic books.
ffirs.indd vi
8/2/08 2:51:53 PM
For my lovely daughter, Lexi, and my amazing wife, Stacy, this book is dedicated to you, for the sacrifices you made
and the support you gave me throughout the process. Also, for her example of strength and courage in the face of
fierce adversity, this book is dedicated to Dyana.
—Tom Rizzo
For my incredibly understanding wife, Sarah, and growing bump, thank you for putting up with me over the past
few months and not complaining when I’ve been doing, this instead of what I should have been doing;
I promise I’ll paint the nursery now!
—Richard Riley
ffirs.indd vii
8/2/08 2:51:53 PM
ffirs.indd viii
8/2/08 2:51:53 PM
About the Authors
Tom Rizzo is a director in the Microsoft SharePoint product management team. Before joining the
SharePoint team, Tom worked in the Microsoft Exchange and SQL Server product management teams.
Tom is the author of six development books on a range of Microsoft technologies.
Richard Riley is a senior technical product manager in the Microsoft SharePoint product management
team. He is responsible for driving Technical Readiness, both within, and outside of, Microsoft and
specializes in Search Server 2008 and the Search features of SharePoint Server 2007. He has more than
seven years of experience at Microsoft and has worked as a consultant in Microsoft Consultancy
Services, and as a technical specialist in sales. He has over 10 years of industry experience and is a
frequent speaker at Microsoft Technical Events.
Shane Young is the owner of SharePoint911. He has over 12 years of experience designing and
administering large-scale server farms using Microsoft enterprise technologies. For the past three years,
he has been working exclusively with SharePoint products and technologies as a consultant and trainer
for www.SharePoint911.com. Shane has been recognized by Microsoft as an authority on SharePoint
and is among an elite group of Microsoft Office SharePoint Server 2007 MVPs. Shane also maintains a
popular SharePoint focused blog, which contains a lot of beneficial
technical information about SharePoint administration.
About the Technical Editor
Andrew Edney has been an IT professional for more than twelve years and has worked for a range of
high-tech companies, including Microsoft, Hewlett-Packard, and Fujitsu Services. He has a wide range
of experience in virtually all aspects of Microsoft’s computing solutions, having designed and built large
enterprise solutions for government and private-sector customers. Andrew is also a well known speaker
and presenter on a wide range of information systems subjects. He has appeared at the annual Microsoft
Exchange Conference in Nice. Andrew is currently involved in numerous Microsoft beta programs,
including next-generation Windows operating systems and next-generation Microsoft Office products,
and he actively participates in all Windows Media Center beta programs. In addition, Andrew has
written a number of books, including Windows Home Server User’s Guide (Apress, 2007), Pro LCS: Live
Communications Server Administration (Apress, 2007), Getting More from Your Microsoft Xbox 360 (Bernard
Babani, 2006), How to Set Up Your Home or Small Business Network (Bernard Babani, 2006), Using Microsoft
Windows XP Media Center 2005 (Bernard Babani, 2006), Windows Vista: An Ultimate Guide (Bernard Babani,
2007), PowerPoint 2007 in Easy Steps (Computer Step, 2007), Windows Vista Media Center in Easy Steps
(Computer Step, 2007) and Using Ubuntu Linux (Bernard Babani, 2007).
ffirs.indd ix
8/2/08 2:51:53 PM
ffirs.indd x
8/2/08 2:51:54 PM
Credits
ffirs.indd xi
Acquisitions Editor
Production Manager
Katie Mohr
Tim Tate
Development Editor
Vice President and Executive Group Publisher
Christopher J. Rivera
Richard Swadley
Technical Editor
Vice President and Executive Publisher
Andrew Edney
Joseph B. Wikert
Production Editor
Project Coordinator, Cover
Debra Banninger
Lynsey Stanford
Copy Editor
Proofreader
Foxxe Editorial Services
Nancy Carrasco
Editorial Manager
Indexer
Mary Beth Wakefield
Jack J. Lewis
8/2/08 2:51:54 PM
ffirs.indd xii
8/2/08 2:51:54 PM
Acknowledgments
There are a lot of folks to acknowledge, who helped make this book possible. If I miss anyone, I
apologize! First, I want to thank Jim Minatel, Katie Mohr, and Christopher Rivera at Wiley. The three of
them made this book possible and also pushed us along in the process at the right times. I also want to
thank our production editor Debra Banninger and our technical editor Andrew Edney. Both of them
made our words and technical concepts crystal clear. I also want to thank my coauthors who went on
this exciting and chaotic journey with me. Finally, I want to thank the SharePoint search team at
Microsoft. They are one of the most dedicated teams in delivering high-quality, customer-centric
solutions and are always willing to answer questions or provide feedback.
—Tom Rizzo
Writing a book takes much more than one person and a keyboard, and this one is no exception, I’d like to
say a huge thank you to the very patient team at Wiley, particularly Katie Mohr and Christopher Rivera,
and my coauthors whom I’m sure were all quietly tearing their hair out at my habitual lateness with
content (including this page). I’d also like to say a heartfelt thanks to my colleagues in the Search team at
Microsoft, whom I’ve repeatedly peppered with questions: Puneet Narula, Keller Smith, Sage Kitamorn,
Sid Shah, Dan Blood, Michal Gideoni, Dmitriy Meyerzon, Karen Beattie Massey, Dan Evers, and Brenda
Carter. Last, but definitely not least, a thank you to Steve Caravajal, who rescued me from a deep hole
with the People Search chapter — I owe you one.
—Richard Riley
I would like to thank the SharePoint MVPS, my friends on the Microsoft product team, and the awesome
staff at SharePoint911. I want to send out a special thanks to my wife, Nicola. Without her understanding
and support, writing two books at the same time would never have been possible. Also, I have to send a
shout out to my two dogs, Tyson and Pugsley. I am sure I missed out on several rounds of throwing the
ball while I was busy typing away, but through thick and thin, they lay at my feet. I love you little
Sparky!
—Shane Young
ffirs.indd xiii
8/2/08 2:51:54 PM
ffirs.indd xiv
8/2/08 2:51:54 PM
Professional
Microsoft® Search
ffirs.indd xv
8/2/08 2:51:54 PM
ffirs.indd xvi
8/2/08 2:51:54 PM
Contents
Introduction
Chapter 1: Introduction to Enterprise Search
Why Enterprise Search
A Tale of Two Content Types
Security, Security, Security
Algorithms to the Rescue
We All Love the Web and HTTP
Conclusion
Chapter 2: Overview of Microsoft Enterprise Search Products
Enterprise Search Product Overviews
Windows Desktop Search/Windows Vista
Features in Windows Vista Search
1
1
1
2
2
3
4
5
5
5
6
Windows SharePoint Services
11
SharePoint Search Architecture
Crawling Content
Searching Content
Configuring Search
Platform Services
11
12
13
14
14
Microsoft Search Server 2008
16
Simplified Setup and Administration
Federation Capabilities
Different Editions of Search Server 2008
What about WSS and Microsoft Office SharePoint Server?
Microsoft Office SharePoint Server
People Search
Business Data Catalog
The Microsoft Filter Pack
Connectors for Documentum and FileNet
Windows Live
FAST and SharePoint
Other Server Products (Exchange, SQL)
Conclusion
ftoc.indd xvii
xxvii
16
18
19
20
20
20
21
23
23
24
25
25
25
8/2/08 2:53:23 PM
Contents
Chapter 3: Planning and Deploying an Enterprise Search Solution
Key Components
The
The
The
The
Index Role
Query Role
Shared Services Provider
Database Server
Search Topologies
Single Server
A Small Farm
A Three-Server Farm
A Medium Server Farm
Larger Farms
Search Software Boundaries
Hardware Sizing Considerations
The Index Server
Query Servers
Database Servers
Testing
Performance Monitoring
Search Backups
Index Server Recovery Options
Using Federation to scale?
Conclusion
27
27
27
28
28
29
29
29
30
31
31
32
33
34
35
37
37
38
39
41
42
43
44
Chapter 4: Configuring and Administering Search
45
Configuring Search from Central Administration
45
The Search Services
The Office SharePoint Server Search Service
Windows SharePoint Services Search
Manage Search Service
Manage Content Database — Search Server 2008
45
46
50
52
57
Configuring Search from the Shared Services Provider
Creating or Editing the SSP Settings
SSP Search Administration
The Default Content Source
Full versus Incremental Crawls
Search Schedule
Additional Content Sources
Interacting with a Content Source
Crawl Rules
58
59
59
60
62
63
64
65
66
xviii
ftoc.indd xviii
8/2/08 2:53:24 PM
Contents
Crawl Logs
File Types
Reset All Crawled Content
Search Alerts
Authoritative Pages
Federated Locations
Managed Properties
Shared Search Scopes
Server Name Mappings
Search Result Removal
Search Reporting
The Other Search Settings
Configuring Search Locally on the Server
IFilters
Installing the Microsoft Filter Pack
Maximum Crawl Size
Reset the Search Services
Crawling Case-Sensitive Web Sites
Diacritic-Sensitive Search
Conclusion
Chapter 5: Searching LOB Systems with the BDC
BDC Architecture and Benefits
The Application Definition File
XSD Schema File
BDC Definition Editor Tool
BDC Metadata Model Overview
MetadataObject Base Class
LobSystem
LobSystemInstances and LobSystemInstance
Entities and Entity Element
Identifiers and Identifier Element
Methods and Method Element
Parameters and Parameter Element
FilterDescriptors and FilterDescriptor Element
Actions, Action, and ActionParameter Elements
MethodInstance Element
TypeDescriptors, TypeDescriptor, DefaultValue Elements
Associations and Association Element
Complete BDC XML Samples
70
72
72
72
72
73
73
75
77
78
78
79
80
80
80
82
82
82
82
83
85
85
87
87
87
88
89
89
92
94
95
96
98
98
101
101
106
108
109
xix
ftoc.indd xix
8/2/08 2:53:24 PM
Contents
BDC Web Parts, Lists, and Filters
Business Data List Web Part
Business Data Related List Web Part
Business Data Item Web Part
Business Data Actions Web Part
Business Data Item Builder Web Part
BDC in SharePoint Lists
Modifying Your BDC Profile Page
Searching the BDC
Adding a Content Source for Crawling BDC Data
Mapping Crawled Properties
Create a Search Scope
SharePoint Designer and the BDC
The BDC API
The BDC Assemblies
The Microsoft.Office.Server Namespaces
Putting It Together: Building Custom Applications for the BDC
Connecting to the Shared Services Database
Displaying LOBSystemInstances
Working with Entities
Working with an Entity – Finders, Fields, and Methods
Executing a Method and Displaying the Results
Working with Associations and Actions
TroubleShooting the BDC
Conclusion
Chapter 6: User Profiles and People Search
User Profiles
Managing User Profiles
Profile Services Connections
Configuring Profile Imports
Profile Properties
Configuring Profile Properties
BDC Supplemental Properties
Configuring for BDC Import
People Search
The Search Center
People Search Page and Tab
Results Page
109
110
110
111
112
112
113
116
117
117
119
119
120
122
122
123
123
125
125
125
126
126
128
129
129
131
131
133
133
134
138
139
142
143
145
145
146
147
xx
ftoc.indd xx
8/2/08 2:53:24 PM
Contents
Customizing People Search
Customizing the Advanced Search Options Pane
Adding Custom Properties to the People Results Page
Summary
Chapter 7: Extending Search with Federation
The Concept
Crawl and Index or Federate?
Unified Result Set
Freshness of Results
Avoiding “Double Indexing” Content
Content outside Your Firewall
Federated Search Locations
Triggers
Always
Prefix
Pattern
Using Named Capture Groups to Send Only Specific Query Terms
Location Type
Query Template
OpenSearch 1.0/1.1
Required and Optional Query Template Tokens
Search Index on this Server
More Results Link
Display Information
Specifying a Branding Icon
Customizing the Title of a Federated Results Web Part
Writing Custom XSL
Properties
Sample Data
Restrictions
Credentials
Anonymous
Common
User
Configuring Search Server 2008 for “User” Authentication
Configuring SharePoint Server for User Authentication
Security Trimming Federated Results
150
151
152
154
155
156
158
158
158
159
159
160
163
164
164
164
165
168
168
168
170
171
172
172
174
174
175
175
176
177
177
178
178
178
179
180
180
xxi
ftoc.indd xxi
8/2/08 2:53:25 PM
Contents
Federation Web Parts
Federated Search Results Web Part
Federated Search Results Web Part Properties
Location
Results per Page
Use Location Visualization
Retrieve Results Asynchronously
Results Query Options
Show More Results Link
Appearance Settings
Caching Location Settings
Top Federated Results Web Part
Multiple Locations
Retrieve Results Asynchronously
Summary
Chapter 8: Securing Your Search Results
Security Architecture in SharePoint
Best Bets
Controlling Indexing to Secure Content
What about IRM-Protected Documents?
Custom Security Trimmers
Implementing a Custom Security Trimmer
Registering the Custom Security Trimmer
Performance Considerations
Bringing It All Together: Building a Custom Security Trimmer
Getting Started
Signing Your DLL and Adding It to the GAC
Registering your Customer Security Trimmer
Debugging Your Customer Security Trimmer
BDC Security Trimming
Authentication and the BDC
BDC and Search
Performance Implications
Writing Your Custom Security Trimmer
Deploying Your BDC Security Trimmer
Debugging Your BDC Security Trimmer
182
183
185
185
186
186
186
186
187
187
187
187
187
188
188
189
189
191
191
191
191
192
193
193
194
194
198
198
200
201
201
202
202
202
206
209
Securing Your Search Server 2008
212
Default Content Access Account
Single Server Deployment
Server Farm Deployment
212
212
212
Conclusion
213
xxii
ftoc.indd xxii
8/2/08 2:53:25 PM
Contents
Chapter 9: Customizing the Search Experience
Describing the Flow of a Typical Search
Customizing the Search Center — No Code
215
215
218
Tab User Interface
Thesaurus and Synonyms
Restarting the SharePoint Search Service
Stemming
Customizing People Results
SharePoint Designer Support
219
222
224
224
225
225
Customizing your Search Results — XSLT
225
Stepping through the XSLT
Root Template
Parameter Section
No Search Results
Search Results
Hit Highlighting Template
Display Size Template
Display String Template
Document Collapsing Template
More Results for Fixed Query
Discovered Definitions
Working with SPD to Create Your XSLT
Customizing Hit Highlighting in Search
Adding and Rendering Custom Properties
Adding a New Managed Property
Customizing Core and Advanced Search Results
Customizing People Search and Results with Custom Properties
Using Fixed Queries
Adding Custom Web Parts
Search Community Toolkit
Conclusion
Chapter 10: Understanding and Tuning Relevance
What Is Relevance?
Built-In Ranking Elements and Algorithms
Understanding and Tuning Relevance
Things You Can Change outside of the Ranking Algorithms
Content Management
Language
Understanding Users’ Query Behavior with Query Logs
227
227
228
229
230
230
230
231
231
231
231
231
235
237
241
243
246
251
251
252
252
253
253
254
254
255
255
260
269
xxiii
ftoc.indd xxiii
8/2/08 2:53:25 PM