DDL in Stored Procedures and The Magic Of WITH EXECUTE AS

Posted by Josh | Posted in Notes From The Lab, SQL Server | Posted on 05-04-2011

Tags: ,

1

Twice over the last week I’ve run into a scenario where a user is trying to execute a stored procedure and gets a message similar to the following:

Msg 1088, Level 16, State 7, Procedure p_TruncateTest, Line 5
Cannot find the object "TestTable" because it does not exist or you do not have permissions.

At first this was a bit puzzling, because the user in question had both EXECUTE rights on the stored procedure, along with VIEW DEFINITION rights across the schema. I’m used to the idea that as long as a user is granted rights to execute a procedure, and the procedure is owned by the same principal as the underlying object being manipulated, no other permissions are checked. It’s one of the fundamental reasons why stored procedures are such a good practice.

In this case though, things were a little more complex. It turns out that the line in question was issuing a TRUNCATE statement. Now, if this were executed via dynamic SQL, I could understand this being an issue. But it was hardcoded; nothing dynamic about it. That being the case, I wanted to see if the behavior was reproducible.

To the lab!

CREATE TABLE dbo.TestTable (c CHAR(1));
CREATE USER TruncateUser WITHOUT LOGIN;
GO

CREATE PROCEDURE dbo.p_TruncateTest
AS
BEGIN
	TRUNCATE TABLE TestDB;
END
GO
GRANT EXECUTE ON dbo.p_TruncateTest TO [TruncateUser];
GO

EXECUTE AS USER = 'TruncateUser';
EXEC dbo.p_TruncateTest;

Sure enough, the same message popped up. So it seems the TRUNCATE statement does not behave the same way as other DML style statements, such as SELECT,INSERT,UPDATE,DELETE. So that got me thinking: was this the same for other DDL statements?

REVERT;
GO

CREATE PROCEDURE dbo.p_AlterTest
AS
BEGIN
	ALTER TABLE TestTable ADD c2 CHAR(1);
END
GO
GRANT EXECUTE ON dbo.p_AlterTest TO TruncateUser;

EXECUTE AS USER = 'TruncateUser';
EXEC dbo.p_AlterTest;
GO

After executing this I was greeted with the same exact error message.

Ok, so the behavior is consistent with other forms of DDL. That being the case, I needed to find some alternatives to actually granting the user the full ALTER rights on the objects in question, since that would allow them to do all sorts of other nasty things.

Enter EXECUTE AS.

From this BOL article, we see that we can use this nifty clause to effectively elevate a calling user’s rights for the duration of the stored procedure execution. In this case, we’re going to have the procedure execute as the ‘dbo’ user. Normally I’d say it’s a better practice to have the execution be under a less privileged user, such as one that owns the application schema, and not one in the db_owner role.

REVERT;
GO

ALTER PROCEDURE dbo.p_TruncateTest
WITH EXECUTE AS 'dbo'
AS
BEGIN
	TRUNCATE TABLE dbo.TestTable;
END
GO

EXECUTE AS USER = 'TruncateUser';
EXEC dbo.p_TruncateTest;
GO

Voila! The procedure executes without issue.

Now there is one catch here: when using that WITH EXECUTE AS clause you have be careful about how you determine the name of the user performing the operation. If, for example, you use the CURRENT_USER function to log who issued the command, what you’ll see is the user named in the WITH EXECUTE AS clause, not the calling user. Instead, make sure you use the ORIGINAL_LOGIN function. Thanks to Kimberly Tripp for pointing this out.

HA? We don’t need no stinkin’ HA (Part 3)

Posted by Josh | Posted in SQL Server | Posted on 04-04-2011

Tags: , ,

0

This is part three of a series cataloging my search for the ideal High Availability solution for my development and QA environments. The other parts:

  1. Part 1 – an introduction to why dismissing the need for HA in development / test systems is not necessarily the right decision.
  2. Part 2 – a quick poll of the community to see what others are doing.

In part three, we’re going to take a look at the first option for providing HA: clustering.

So what exactly is a cluster? From MSDN:

A cluster is a group of independent computer systems, referred to as nodes, working together as a unified computing resource. A cluster provides a single name for clients to use and a single administrative interface, and it guarantees that data is consistent across nodes.

Windows Clustering encompasses two different clustering technologies. These technologies implement the following two types of clusters.

  • network load balancing cluster filters and distributes TCP/IP traffic across a range of nodes, regulating connection load according to administrator-defined port rules.
  • failover cluster provides high availability for services, applications, and other resources through an architecture that maintains a consistent image of the cluster on all nodes and that allows nodes to transfer resource ownership on demand.

In this post we’re going to look exclusively at failover clustering. Network load balancing will be looked at in a later post, combined with the log-shipping HA methodology.

Briefly, a SQL failover cluster is composed of one or more (yes, you can make a one node cluster, though it obviously negates the HA feature) nodes, each with SQL Server installed on them. The nodes are backed by a common set of shared storage, such as a series of SAN-provided LUNs. The shared storage is used to house all SQL related files (except binaries, which are stored locally on each cluster), as well as any other items that are part of the cluster, such as backup volumes or custom components. An example of the latter might include custom Powershell or other scripts used by Agent jobs or whatnot.

Prior to Windows Server 2008, Microsoft only supported clustering between identical hardware, and that hardware had to be in the Windows Server Catalog. Basically, this a huge list of hardware that has been certified by Microsoft as being a supported configuration. If your hardware isn’t in that list, and you call Microsoft for support of a cluster issue, they will “offer troubleshooting tips “(reference this KB), but a resolution is not guaranteed.

Now in my environments (and I would guess in many other development environments), the vast majority of systems are virtual machines, rather than physical servers. That being the case, we would refer to this KB article from VMWare, listing all the documents and links to information about clustering under VMWare. To me, the most useful one is the link to the Cluster team’s blog, wherein we find this tidbit:

Windows Server 2003

For a cluster solution to be supported by Microsoft it must be a tested solution which has been qualified and verified to function properly with the Failover Clustering feature.  The full Windows Server 2003 cluster support policy is documented here:  http://support.microsoft.com/kb/309395

When a cluster solution has been qualified it will receive a ‘Designed for Microsoft® Windows® Server 2003′ logo and be listed on the Windows Server Catalog under “Cluster Solutions”.

Two separate VMware configurations have received a logo and are supported in Windows Server 2003 with vSphere 4.0 and EMC storage.  One configuration is with EMC V-Max storage and the other with EMC CLARiiON CX4 storage.  Details can be found here:

These are the only two supported Windows Server 2003 guest clustering configurations.  The Windows Server 2003 cluster logo program stopped accepting new submissions as of 12/31/09, no additional configurations will be added in the future.

Basically, for Windows 2003 clustering, only the two configurations listed are supported. Outside of that, you’re on your own.

For Windows Server 2008, the outlook is slightly better:

Windows Server 2008 / Windows Server 2008 R2

The Microsoft support policy for Failover Clustering radically changed with Windows Server 2008 to become much more flexible.  The following criteria must be met for a solution to be supported by Microsoft:

1.       On a host Windows Server Failover Cluster all hardware and software components must meet the qualifications to receive the appropriate “Certified for Windows Server 2008” or “Certified for Windows Server 2008 R2” logo.

a.       If a guest cluster is running inside a virtual machine on non-Microsoft hardware virtualization software, the virtual machine must be hosted by a virtualization solution that is listed in the Server Virtualization Validation Program

2.       The solution must not fail any of the tests in the cluster Validation tool.  With virtualized servers in a cluster, run the cluster validation wizard as you would with any other new cluster. The requirement for running the wizard is the same regardless of whether you have a “host cluster” (where failover will occur between two physical computers), a “guest cluster” (where failover will occur between guest operating systems all on the same physical computer), or some other configuration that includes one or more virtualized servers.

I read this as “so long as your hosts and VM provider are certified, and the cluster validation wizard passes you, you’re supported.”

There is also an excellent whitepaper from VMWare on the subject, available here. Some points of note:

  • It includes a detailed, step-by-step set of instructions on how to configure VMs to allow for clustering.
  • Only Windows Server 2003 and Windows Server 2008 R2 are considered supported.
  • By configuring your VMs in this manner, you lose the following VMWare features:
    • HA
    • DRS
      Note: my VMWare admins tell me that these settings also break snapshots and vRanger backups, though that’s not mentioned here.

Conclusion

For SQL 2008 R2 / Windows Server 2008 R2, the cluster solution appears valid, albeit a bit of a complicated setup and at the cost of DRS and live vMotion capability. I can vouch that the configuration does work as advertised, once you get all the pieces right. If you’re looking to implement this, do yourself (and your VMWare admins) a big favor: read the [fantastic] manual.

For Windows 2008 / SQL 2008 and Windows 2003 / SQL 2005 (our standard builds) on the other hand, this doesn’t appear to be a viable solution. The two specific setups listed in the Windows Server Catalog (linked above) aren’t in use where I work, and outside of purchasing physical hardware or relying on hand-me-downs as setups are decommissioned (neither of which I would consider a good approach), I don’t see a way to get a supportable configuration.

HA? We Don’t Need No Stinkin’ HA (Part 2)

Posted by Josh | Posted in SQL Server, The Rookie DBA | Posted on 27-03-2011

Tags: ,

1

In part 1 of the series, we went through a little visualization exercise in order to feel what it would be like to have a development system go “casters up” in the middle of an important project. I don’t know about you, but the imagery sure convinced me of the need to do some level of planning around availability, even in development environments.

I wanted to continue the series by opening up the question to the community, and seeing how others have handled this scenario. I’ll post the results later on in the series.

What level of availability planning do you engage in for your Development environments?

View Results

Loading ... Loading ...

Which of the following HA technologies are you using in your Development environments? (Pick all that apply)

View Results

Loading ... Loading ...

HA? We don’t need no stinkin’ HA (Part 1)

Posted by Josh | Posted in SQL Server, The Rookie DBA | Posted on 21-03-2011

Tags: ,

3

As a DBA, one of our primary responsibilities is always to ensure that we have a solid availability plan in place for the databases we support. Naturally the plan itself could vary greatly depending upon the product using the database in question. An internal-only reporting database might not require a double redundant, cluster and mirroring backed setup, but a time critical trading application or a patient tracking system for a hospital might well. The real point is, as long as the strategy meets the needs of our business and has been thoroughly vetted and tested, we’re on solid ground.

While I’ve found that it rarely takes much convincing to get support behind availability planning with regard to production systems, I usually find the opposite attitude present with regards to the development environment. When I took over in my new position as the development DBA, about 90% of all my servers had no backups (SQL or OS wise), no build documentation (i.e. who to call if it goes down, what customization such as linked servers or custom assemblies are present, etc), not even so much as a maintenance plan doing regular integrity checks. I was horrified; what happens if one of the databases gets corrupted? What about if a server goes down? Hell, what about if the data center gets flooded or burns down, and we’re forced into a company wide disaster scenario?

The almost universal answers was “It’s development, it doesn’t matter.”

To some degree, I agree with this. If a development server goes down (or two, or three), we’re likely not going to face immediate financial penalties, miss trading deadlines, or cause general mayhem and chaos. We don’t have 24 hour support for development systems, and even when things do happen during normal business hours, our response is usually along the lines of “OK, we’ll look into it when we have a chance”, unless there’s some deliverable in jeopardy.

Sidenote: this isn’t always true. I’ve seen so-called “Development” systems turn into mission-critical ones because the funding to productionize the product dried up, and before you know it, you’ve got developers supporting an enterprise platform running on a dusty server sitting in the data center floor with a note saying “Do not unplug without calling ext xxxx, MISSION CRITICAL” stuck to it. Don’t laugh too hard, it’s probably more common than you think.

On the other hand, let’s consider this scenario:

A development team has been working on a massive, multi-man-month effort to launch a new flagship product for the company. It’s coming down to the wire, and people are working long hours to get the deliverable out. Multiple clients have signed contracts based on the availability of this product, with millions of dollars in revenue streams on the line.

Suddenly, for whatever reason, the main development database gets corrupted or goes offline. How isn’t terribly important, other than to say in my (somewhat short) experience, I’ve concluded that the shoddy setup of Dev environments lends itself more to corruption and loss of data, so it’s eminently possible. Regardless, the database is gone, along with all the code, logic, and data within it.

Let’s assume there are no backups available. What are our options to recover the system?

  1. Rebuild from scratch, using script in source control.

    This could work, but it assumes that developers have been incredibly disciplined about keeping all their changes and work checked in. This is something that I’ve grown to be very skeptical about in my time. Also, any sample data (whether generated or loaded from actual future sources) would not be recovered, and would have to be re-created or re-loaded.

  2. Simply refresh all development environments from Production.

    This assumes that a) there is a production copy available, which if this is a new product, is not likely; b) there are no auditory or security concerns around letting developers have full access to production data (assuming no scrub-scripts are available, again something I’ve grown skeptical about); c) there is some way to determine the difference between the production system and the state of the codebase at the time the database was lost.

Now imagine the pressure that would be coming down on the heads of the developers and their managers, as well as the downstream teams that own the development environment. Clients are breathing down the necks of the business, who is, naturally, passing the pressure right down the chain to the technology groups. Everyone is scrambling madly, trying to get things back on track. Millions are on the line, and clients are threatening to pull out of their contracts if the launch isn’t delivered on time.

Now, how’s your blood pressure? Pulse? Probably just a little elevated; I know mine sure went up when I first imagined this scenario shortly after taking over the Dev systems. I already get a lot of pressure from teams to turn around code reviews and new builds; just think of how crazy they would be going if their whole system disappeared!

I use this little visualization exercise whenever I’m talking to the development teams about the need for me to have tighter control around their systems, and to justify the effort required to keep the backups running and other maintenance going behind the scenes. Almost without fail, their eyes get about as wide a tea saucers, and their demands of me dropping what I’m doing to review their (usually horrible) T-SQL code quiet. Occasionally some folks still balk and don’t get it; that’s fine, I just tell them this strategy is mandated by management (which it is), and leave it there.

So how do we ensure that we’ve got a solid availability plan in place in our Development systems, while still balancing the classic “It’s Development, we don’t care / no one wants to spend any money / time / effort”? Since I’m still struggling with that question myself, I thought I’d ask for everyone’s opinion on the matter. In the next post in the series, I’m going to lay out a poll to see how the community responds to that question. Then, I’m going walk through some options in terms of cost, effort, and whether it meets the needs of the customer (the developers).

It Pays To Understand Your Database

Posted by Josh | Posted in SQL Server | Posted on 21-02-2011

Tags: , ,

0

Recently I was engaged at work to do some performance tuning on an old legacy application that was being brought up to snuff. I used the wonderful RML Utility toolset to find the most costly statements for the application’s workload, hoping to find some simple, low-hanging fruit such as missing indexes. While I did find some of those items, what I also found was a whole lot of really complex, over-engineered code. Here’s a rough reproduction (obviously I can’t post the real thing) of an example:

SELECT    am.AccountID,
              SUM(am.AccountValue) / MAX(ts.NumDays) AvgAccountValue,
               SUM(bp.FeeAmt) / COUNT(DISTINCT am.RecordDate) AvgFeeAmt
FROM       AccountMaster am JOIN TotalSum ts
              ON am.FundType = ts.FundType
              JOIN BasisPoint bp
              ON am.FundType = bp.FundType
                AND am.RecordDate = bp.RecordDate
WHERE     ClientID = 'AAA'
GROUP BY am.AccountID, am.FundType

At its base the statement seems fairly trivial: calculate an average balance for all accounts for client ‘AAA’ over their lifetime, along with some aggregate fee information. The execution plan was ugly though; some huge table spools and hash joins, with logical reads well over 2 million.

I spent several hours digging into this to try and determine any optimizations. There were no obvious changes to be made, such as index additions or statistics modification, and after some time I was ready to give up. Then, as I was examining the database closer, I noticed two important items:

  1. The AccountMaster table’s primary key was on the ClientID, AccountID, and RecordDate columns. Meaning, of course, that there could not be multiple rows with the same combination of those columns. This made the DISTINCT clause above redundant, since there could only be one occurrence of each RecordDate value per ClientID - AccountID combination (and we were selecting a ClientID value and grouping by the AccountID value).
  2. The TotalSum table was dynamically populated by an earlier step in the same stored procedure as the trouble statement. Because of how it was done, there was no way there would be more than one value of NumDays for any given FundType key. I put a unique index on it, and sure enough no key violations occurred. This means that the MAX aggregate function above was also worthless, since there would be only one NumDays value per FundType. Instead I moved the NumDays column into the GROUP BY clause.

Low and behold, those two changes made a world of difference in the statement’s execution. Logical reads went down to under 100,000 total, a reduction of over 90%. Naturally I had to verify that the results were the same, but, as expected, they were.

The two lessons I take away from this are:

Sometime SQL’s Optimization Engine Is Stupid

Ok, that’s probably putting it a little harshly. But still, you might think that SQL Server would be smart enough to see “Oh, because of the underlying database structure, those DISTINCT and MAX clauses aren’t needed, so I’m going to ignore them.” Then again, computers are only as smart as the people writing code for them, and the optimizer does pretty frakkin’ spectacularly for the most part, so I won’t complain.

Know What Your Database Structure Means

I’m guessing that this code was either written before the database constraints were in place, or written by someone who didn’t a) take the time to look at the constraints and indexes in place, or b) simply didn’t understand what they meant. In the first case, this should have been caught as part of reviewing the affected objects when making changes to the base table structure. In the case of the second… well, don’t get me started on people writing SQL code who don’t understand what it means. That’s one rant I could go all day on.

No You May Not Have Sysadmin Mr Developer

Posted by Josh | Posted in SQL Server | Posted on 25-01-2011

Tags: ,

0

I am a very firm believe that only a DBA should have sysadmin rights on a SQL server instance, including in development systems. The fact of the matter is that in the majority of cases, system administrative rights are nothing but overkill; developers don’t need (nor in many cases understand the how / why) to do things like change sp_configure settings or alter database level settings like auto_shrink (don’t get me started). What they do need is this:

  • Full control over their own database (db_owner). This allows them to:
    1. Backup up the database before applying changes
    2. Change schema and security at the database level
  • The ability to create and remove logins (but not system administrator logins) for use by their applications.
  • The ability to restore a backup of their database to roll back changes as part of testing or development.

In my environments I give developers this basic set of rights by giving them these permissions:

  1. db_owner on their own databases
  2. db_creator at the server level

    When I give them this right it is for restoring their own backups, not for creating databases willy-nilly. This is explicitly communicated to them and violations result in revocation of this privilege.

  3. ALTER ANY LOGIN at the server level

    See here for why I didn’t say securityadmin.

In most cases this takes care of the developer’s needs and I don’t get complaints. But from time to time, special requests come up. Yesterday I has one such item, where the developers wanted to be able to using Database Tuning Advisor and the built in SSMS reports such as the “Schema Change” report. Naturally, their request was for sysadmin rights. Not convinced, however, I looked into what was actually needed.

Database Tuning Advisor

In another case of RTFBOL, here is a clear explanation of what rights are required. Basically, a user with sysadmin rights has to initialize the system first. After that, any user with db_owner rights can use Tuning Advisor.

Built-In SSMS Reports

While I couldn’t find any built-in documentation on this subject, I was able to run a trace while running several reports (inlcuding index usage and schema changes). What I found was that the reports were doing one of two things: 1) querying various management DMVs, or 2) reading the default trace using fn_get_tracetable. Bases on these findings, I granted the VIEW SERVER STATE and ALTER TRACE permissions. Problem solved.

So no, Mr. Developer, you still can’t have sysadmin rights on my box.

SSMS Tools Custom Script For Index Information

Posted by Josh | Posted in SQL Server | Posted on 14-01-2011

Tags: , , ,

2

The other day I was having a chat with a developer around our SQL coding standards, specifically the naming conventions. He made an interesting point: the name shouldn’t matter, because it’s the metadata ( what table its on, the index key columns, included columns, etc) that matters. My response was that while that’s true, it’s helpful as a production DBA to have the name, which is readily available via a few clicks in SSMS, be a meaningful one that gives you some inkling of what the index is. This is especially true at 3AM when you’re troubleshooting issues. His response: write a script that you can run to give the information and use that instead of the GUI.

And while I’m not switching my position on naming standards (Have you seen SSMS’s default naming for indexes? Ugh…), I’m up for any challenge, so write a script I did. I actually use this with SSMS Tool‘s “Custom Script” functionality so when I right-click a table, it is easily run.

SELECT 'Index Information'
SELECT	i.name [Index Name],
		i.type_desc [Index Type],
		CASE i.is_primary_key
			WHEN 1 THEN 'Yes'
			ELSE 'No'
		END [Is Primary Key],
		(
			SELECT	c2.name + ', '
			FROM	sys.index_columns ikc
						JOIN sys.columns c2
							ON ikc.column_id = c2.column_id
							AND ikc.object_id = c2.object_id
							AND ikc.is_included_column = 0
			WHERE	ikc.index_id = i.index_id
					AND ikc.object_id = i.object_id
			ORDER BY ikc.index_column_id ASC
			FOR XML PATH('')
		) [Index Columns],
		(
			SELECT	c1.name + ', '
			FROM	sys.index_columns ikc
						JOIN sys.columns c1
							ON ikc.column_id = c1.column_id
							AND ikc.object_id = c1.object_id
							AND ikc.is_included_column = 1
			WHERE	ikc.index_id = i.index_id
			ORDER BY ikc.index_column_id ASC
			FOR XML PATH('')
		) [Included Columns]

FROM	sys.indexes i
			JOIN sys.objects o
				ON i.object_id = o.object_id
			JOIN sys.schemas s
				ON o.schema_id = s.schema_id
WHERE	o.name = '|ObjectName|'
		AND s.name = '|SchemaName|'

It returns some basic metadata about the index definition, such as the name, type, and columns (both included and key). I’ll be trying this out and undoubtedly adding more columns in the coming weeks, but wanted to share it.

What useful SSMS ToolPack custom scripts do you have?

Full Text Indexing – Part 1

Posted by Josh | Posted in Full Text Indexing, Notes From The Lab | Posted on 05-01-2011

Tags: , ,

3

Introduction

This series of lab notes will look at Full Text Indexes in some detail, and, at least initially, will largely cover the material I’m reading for the MCTS 70-432 exam.

What is Full Text Indexing?

From Books On-Line:

SQL Server 2008 provides the functionality for applications and users to issue full-text queries against character-based data in SQL Server tables. Before full-text queries can be run on a given table, the database administrator must create a full-text index on the table. The full-text index includes one or more character-based columns in the table. These columns can have any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, or varbinary(max). Each full-text index indexes one or more columns from the base table, and each column can have a specific language. Beginning in SQL Server 2008, full-text search supports more than 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali, and Hindi.

Basically, Full Text Indexing allows for users to perform text based searching against large data fields, in what I would call a search-engine-like style.

Where Would I Use It?

I can think of several cases where Full-Text indexing could be useful, such as:

  • Browsing a catalog
  • Searching a technical manual or book
  • Querying against a trouble ticket system to, for example, find older instances of a currently occurring problem (like searching for “Transaction log full”).

Steps to Setup Full-Text Indexing

Create The Full-Text Catalog

First, we need to create the catalog. This is the storage structure where all the full-text indexes are stored. The CREATE FULLTEXT CATALOG syntax is used, and takes parameters such as FILEGROUP, IN PATH (which is deprecated and should not be used), DEFAULT (to specify that the created catalog should be used as the default one, similar to file groups), and AUTHORIZATION (to specify an owner of the catalog).

Example

CREATE FULLTEXT CATALOG [FTC_Production]
  ON FILEGROUP [Primary]
  AS DEFAULT

This creates a catalog called FTC_Production, stored on the Primary filegroup, marked as the default full-text catalog. Because no AUTHORIZATION clause is specified, the catalog will be owned by the ‘dbo’ user.

I’m thinking an avenue to investigate in a later session is what benefits there are to segregating the full text catalog onto another filegroup, such as recovery and / or balancing of IO. The MCTS book specifically recommends this as a best practice.

Note that in SQL 2008+, the entire catalog is stored within the database (hence the deprecated IN PATH option). While I did not see anything in the MCTS book, I would guess that means easier recoverability and backup.

Creating the Index

Full-text indexes can be created on several column types, including:

  • VARCHAR/CHAR
  • XML
  • VARBINARY

Indexes on CHAR based columns use direct parsing, whereas XML and BINARY columns are parsed using special processors. In the case of files stored in VARBINARY(MAX) columns, SQL can understand and parse numerous standard file formats, such as HTML and most Office documents.

At this point I had a “WOW” moment. I had no idea full text search could parse and read binary documents. That is pretty darn cool and I could see how that might be of tremendous value to those using SQL Server.

The index engine users special “helper services” (MCTS’s language) called stemmers and breakers to divide and conquer the contents of the indexed columns. You can exclude common words from being indexed (the examples given include “a”, “the”, etc) using the stop words functionality.

To create the index we use the CREATE FULLTEXT INDEX syntax, and specify options for the column to be included, the key column (which must uniquely identify a row), the name of the full-text catalog in which the index should be stored (which was created above), and the option for change tracking. Some notes about these options:

  • While you can specify more than one column in a full-text index definition, only one index per table can be created.
  • The CHANGE_TRACKING option is particularly important, because unless you set it to AUTO, you will have to manually schedule population of the full-text index.

Example

CREATE FULLTEXT INDEX ON Production.ProductDescription(Description)
	KEY INDEX PK_ProductDescription_ProductDescriptionID
	ON FTC_Production
	WITH CHANGE_TRACKING = AUTO

This will create a full text index on the Production.ProductDescription table, containing the Description column, with the PK_ProductDescription_ProductDescriptionID index serving as the unique key. The index will be stored in the FTC_Production catalog, and will be automatically populated thanks to the CHANGE_TRACKING = AUTO option.

Interestingly enough, here I ran into an odd choice of error message on Microsoft’s part. Let’s say we accidentally mistype the name of the primary key index, instead calling it PK_ProductDescription_ProductionDescriptionID. Now, you’d think that the statement would fail with something along the lines of “Index ‘PK_ProductDescription_ProductionDescriptionID’ does not exist.” Instead, we get:


Msg 7653, Level 16, State 1, Line 1
'PK_ProductDescription_ProductionDescriptionID' is not a valid index to enforce a full-text search key. A full-text search key must be a unique, non-nullable, single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed column, does not have a filter, and has maximum size of 900 bytes. Choose another index for the full-text key.

This lead to me chasing my tail for a good 10 minutes before realizing I’d just fat fingered the index name. Now yes, it does say ” ‘Index_xxx’ is not a valid index…”, but I’d call that language pretty ambiguous. I could easily interpret it as “The index exists, but is not valid for use as a full text key.”, as opposed to “The index does not exist at all, and therefore is not valid for use as a full text key.” Fail trombone for that design decision IMHO.

Summary

At this point the index has been created and is online. Next lab we’ll try our hand at querying the index using various sample business use scenarios.

It’s The Business’s Way Or The Highway – TSQL Tuesday #13

Posted by Josh | Posted in SQL Server, The Rookie DBA | Posted on 13-12-2010

Tags: ,

1


It’s that time again! When SQL Server professionals across the internet gather together to wax poetic about their craft and provide some good reading material. That’s right, it’s T-SQL Tuesday!

This month is being hosted by Steve Jones (blog | twitter), and the topic is the challenges we face as database professionals in our dealings with the businesses we support.

You can have two, and they’re not picking quality…

There’s a common saying I’ve heard in software development: you can have it good, cheap, and take a long time, or you can have it good and expensive and take a short time.” Unfortunately there’s usually a third option, which is “fast and cheap”. It seems all too common that businesses want to rush a product out the door in an attempt to be first on the market, in the process neglecting such items as proper QA testing and performance evaluation. To the salesperson promising things to clients, these things don’t matter; like the mortgage brokers that brought our economy to its knees, once the contract is signed they’re off the hook.

What they don’t see is the tremendous hidden costs associated with throwing bad software on the table. I’m talking about hours spent deploying without good instructions (hell, I’d settle for any instructions half the time), more time spent troubleshooting countless issues discovered after the fact, and even more time rolling out patch after patch. Ultimately the cost may even be the support of the customers, as they take their business elsewhere or simply refuse to use this “brilliant” new product because of the bad taste left in their mouths.

How do we prevent this from happening? Here’s where I’d love to hear from others, because I have by no means found any kind of reliable solution for this. Certainly there are things to do that will help prevent it: thorough code reviews, tracking all the time spent fixing issues to use as justification to management, getting involved early on in the development cycle, even engaging directly with the business (something I think DBAs should do more of personally). But in the end, if it’s in the culture of the place, it seems that it’s going to be a real uphill battle to fix.

All I can suggest is that we try not to say the word “No” too often. If we continually shoot down every release in an effort to achieve perfection, we end up being perceived as roadblocks and naysayers, not helpers and teachers. Mind you, I’m not saying we should cave in completely either; I’ve nailed fellow technology people to the wall before because they’ve been far too enabling when dealing with businesses. But we need to learn to negotiate and compromise, instead of yell and complain. Maybe that crap code gets out the door now, but only if the Dev team agrees (in writing) to fix a list of major issues in the next month. That way, both sides get some of what they want, and everybody wins.

Be A Teacher, Not A Punisher

Posted by Josh | Posted in SQL Server, The Rookie DBA | Posted on 05-12-2010

Tags: , ,

1

Here’s a situation I’m sure we’ve all been in:

You receive a request to run some code against a production system. Upon opening the build folder, you realize that:

  • There is no README file explaining what you’re supposed to do.
  • All the stored procedures are named “sp_xxx” and have no header blocks explaining what they are. (BTW, read here why that “sp_xxx” naming convention is bad.)
  • Multiple procedures are using dynamic SQL.
  • The script is creating a user with ‘sysadmin’ rights.
  • etc…

I could go on for hours about all the mistakes and bad practices I’ve seen in my still short time as a DBA (and I’d love to hear more funny ones from my readers), but that’s not my point.

It’s All About Choice

At this point you have two paths you can go:

  1. You stomp angrily down to the developer’s desk, demand to know how they could dare to waste your time with such garbage, and end up in a Wild West style shootout in the courtyard of your building.
  2. You take the teaching approach.

So what, you may ask, is this “teaching approach”? Quite simple, actually.

Try To See Their Side

I’ve found a lot of times that the folks writing the code really don’t understand what it is they’re doing. Maybe they assume all database users must have sysadmin rights. Perhaps they are new and their manager neglected to point them to your standards document (you do have one, right?), so they don’t know they need to include a header in all their procedures. And I’m sure they don’t understand the deep down internals that make using “sp_” as a procedure name a bad idea; all they know is the rest of the procedures in that database are all named that.

Instead of accusing them of being lazy, first see if they were given the tools they needed to be successful. By listening to them and trying to understand their perspective, you’ll also be gaining their trust.

Show Them The Error Of Their Ways

If they didn’t know that you had a standards document, take a moment to point it out to them and go over some relevant sections. Give them links to Books OnLine and recommend a few good books on T-SQL tuning. If you want to be really helpful, you might even offer to sit with them for an hour or so and do some pair-coding.

I’ve always said that I’d rather teach someone to fish on their own, rather than catch the fish for them. If I blindly enforce my rules on those handing me code, the code might get fixed, but we’ll run into the same problems the next time. Instead, by teaching them how to be better T-SQL thinkers, I’m making my job easier going forward.

Will this approach work all the time? Absolutely not. I’m not going to sit here and tell you that this magical technique will transform all C# developers into instant DBAs. Some people won’t want to learn, and you might have to go the route of drawing a line in the sand from time to time to safeguard your systems.

But if you take the time to be a teacher, and reach out to those you work with, you just might see this in revision notes some day:

  1. Removed cursor functionality as it was not needed and will perform poorly.
  2. Corrected JOIN clause to use indexed columns.
  3. Added DBA standard header and updated revision history.