Full Text Indexing – Part 1
Posted by Josh | Posted in Full Text Indexing, Notes From The Lab | Posted on 01-05-2011
Tags: Full-Text Index, Lab, SQL Server
3
Introduction
This series of lab notes will look at Full Text Indexes in some detail, and, at least initially, will largely cover the material I’m reading for the MCTS 70-432 exam.
What is Full Text Indexing?
From Books On-Line:
SQL Server 2008 provides the functionality for applications and users to issue full-text queries against character-based data in SQL Server tables. Before full-text queries can be run on a given table, the database administrator must create a full-text index on the table. The full-text index includes one or more character-based columns in the table. These columns can have any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, or varbinary(max). Each full-text index indexes one or more columns from the base table, and each column can have a specific language. Beginning in SQL Server 2008, full-text search supports more than 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali, and Hindi.
Basically, Full Text Indexing allows for users to perform text based searching against large data fields, in what I would call a search-engine-like style.
Where Would I Use It?
I can think of several cases where Full-Text indexing could be useful, such as:
- Browsing a catalog
- Searching a technical manual or book
- Querying against a trouble ticket system to, for example, find older instances of a currently occurring problem (like searching for “Transaction log full”).
Steps to Setup Full-Text Indexing
Create The Full-Text Catalog
First, we need to create the catalog. This is the storage structure where all the full-text indexes are stored. The CREATE FULLTEXT CATALOG syntax is used, and takes parameters such as FILEGROUP, IN PATH (which is deprecated and should not be used), DEFAULT (to specify that the created catalog should be used as the default one, similar to file groups), and AUTHORIZATION (to specify an owner of the catalog).
Example
CREATE FULLTEXT CATALOG [FTC_Production] ON FILEGROUP [Primary] AS DEFAULT
This creates a catalog called FTC_Production, stored on the Primary filegroup, marked as the default full-text catalog. Because no AUTHORIZATION clause is specified, the catalog will be owned by the ‘dbo’ user.
I’m thinking an avenue to investigate in a later session is what benefits there are to segregating the full text catalog onto another filegroup, such as recovery and / or balancing of IO. The MCTS book specifically recommends this as a best practice.
Note that in SQL 2008+, the entire catalog is stored within the database (hence the deprecated IN PATH option). While I did not see anything in the MCTS book, I would guess that means easier recoverability and backup.
Creating the Index
Full-text indexes can be created on several column types, including:
- VARCHAR/CHAR
- XML
- VARBINARY
Indexes on CHAR based columns use direct parsing, whereas XML and BINARY columns are parsed using special processors. In the case of files stored in VARBINARY(MAX) columns, SQL can understand and parse numerous standard file formats, such as HTML and most Office documents.
At this point I had a “WOW” moment. I had no idea full text search could parse and read binary documents. That is pretty darn cool and I could see how that might be of tremendous value to those using SQL Server.
The index engine users special “helper services” (MCTS’s language) called stemmers and breakers to divide and conquer the contents of the indexed columns. You can exclude common words from being indexed (the examples given include “a”, “the”, etc) using the stop words functionality.
To create the index we use the CREATE FULLTEXT INDEX syntax, and specify options for the column to be included, the key column (which must uniquely identify a row), the name of the full-text catalog in which the index should be stored (which was created above), and the option for change tracking. Some notes about these options:
- While you can specify more than one column in a full-text index definition, only one index per table can be created.
- The
CHANGE_TRACKINGoption is particularly important, because unless you set it toAUTO, you will have to manually schedule population of the full-text index.
Example
CREATE FULLTEXT INDEX ON Production.ProductDescription(Description) KEY INDEX PK_ProductDescription_ProductDescriptionID ON FTC_Production WITH CHANGE_TRACKING = AUTO
This will create a full text index on the Production.ProductDescription table, containing the Description column, with the PK_ProductDescription_ProductDescriptionID index serving as the unique key. The index will be stored in the FTC_Production catalog, and will be automatically populated thanks to the CHANGE_TRACKING = AUTO option.
Interestingly enough, here I ran into an odd choice of error message on Microsoft’s part. Let’s say we accidentally mistype the name of the primary key index, instead calling it PK_ProductDescription_ProductionDescriptionID. Now, you’d think that the statement would fail with something along the lines of “Index ‘PK_ProductDescription_ProductionDescriptionID’ does not exist.” Instead, we get:
Msg 7653, Level 16, State 1, Line 1
'PK_ProductDescription_ProductionDescriptionID' is not a valid index to enforce a full-text search key. A full-text search key must be a unique, non-nullable, single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed column, does not have a filter, and has maximum size of 900 bytes. Choose another index for the full-text key.
This lead to me chasing my tail for a good 10 minutes before realizing I’d just fat fingered the index name. Now yes, it does say ” ‘Index_xxx’ is not a valid index…”, but I’d call that language pretty ambiguous. I could easily interpret it as “The index exists, but is not valid for use as a full text key.”, as opposed to “The index does not exist at all, and therefore is not valid for use as a full text key.” Fail trombone for that design decision IMHO.
Summary
At this point the index has been created and is online. Next lab we’ll try our hand at querying the index using various sample business use scenarios.


This appears to come from Mike Hotek’s MCTS training kit.
@DBA – right you are, that excellent book is one of my major study points. It’s a great walk through, though I will say you really do need to check out the corrections / comments KB to be sure you’re getting accurate information.
[...] problem? Well, because I’ve got many many other things that I should be learning about, like Full-Text indexes, and the Entity Framework. And if I keep on finding more things to take my attention, I’ll [...]