SQL Centre of Excellence

Indexing “Strategy”

by Stephen Archbold 16. March 2013 20:47

Hi,

Welcome to my first Prodata blog! First, allow me to introduce myself, my name is Stephen Archbold and I joined the Prodata team around 6 months ago. Having spent 5+ years in the same role as a SQL DBA, I was lucky enough to nab a role working as a SQL consultant with Prodata in order to expand my horizons. While some of the expansion has been terrifying, I’m delighted I made the move.

The Prodata blog is known for its excellent technical content and as a reliable source of information thanks to the hard work of Bob and Fintan (And sometimes Carmel when we make her!), and I hope I can keep up that tradition.

My previous blog http://simplesql.blogspot.com was geared towards “Reluctant DBA’s” and beginners, and trying to get people starting out more comfortable with the SQL product. I still feel that the gap between “Beginner” and “Comfortable” can be quite hard to bridge, so I want to introduce some of the more advanced concepts, in what I hope is an easy to follow and structured approach.

“Indexing” Versus “Indexing Strategy”

With that, I wanted to talk about a subject which has been cropping up more and more in the field. I will assume that most people know what indexes are, and why they’re useful (If not, hop over here, then hop back! http://simplesql.blogspot.ie/2011/11/why-are-indexes-good-for-your-server.html).

A pretty common indexing methodology is to play “Fix the broken query”. Find a query which is dying, give it a good index, watch it revive. And there’s not necessarily anything wrong with that. But what happens when over a 6 month period, you’ve address two poor queries per month, and they all happen to use the same table? You’ve applied the best index for your queries individually, but what impact is that having across the board. Is it better to have 12 individual “Best” indexes for 12 individual queries, or have 2 “Good enough” indexes for a workload of 12 queries?

This post is not intended to be a scare tactic against applying indexes, they are super, super important in performance tuning, and in most cases are your best friend for a poor performing query. This is just to give you an idea of some of the follow on steps you should consider after applying indexes to make sure they are effective enough to warrant the overhead they carry.

A phrase I’ve been hearing a lot when discussing indexes is “I know they can add some over head, but is it that much?”. Aside from the obvious additional overhead on maintenance (Rebuilds, Re-orgs), what about the day to day work of getting new data into your table? Well, let’s find out.

Overhead on DML

Let’s take a 3 column table, with a clustered index, and do a very simple insert.

(Please note this doesn’t include any of the Allocation Unit work (GAM, SGAM, IAM etc.), and won’t, that’s another topic for another day!)

clip_image001

2 Logical reads performed when updating a clustered index. Not so bad.

Now let’s add a non-clustered index on two of the columns and try that insert again.

clip_image002

6 logical reads, a 3X increase on the clustered index alone. Traversing the non-clustered index to perform the update added some additional IO’s.

Now let’s add a second Non Clustered Index on only one column this time and try the insert again :

clip_image003

It had a smaller index to traverse, but still incurred two additional reads to perform the insert, on top of the other Non Clustered Index we added previously.

This is a very simple demonstration, but I hope it gets the point across, that the “strategy” part of “Index strategy” is very important, as you need to make sure that what you’re gaining in read performance, you’re not losing on write performance.

And also, just because the index you add is to service one query, doesn’t stop it impacting on every insert, update and delete which occurs on the table.

Identify indexes who don’t justify their existence

This is very much “it depends” as to what quantifies justification, but the result of the below script should be a good guide. It calculates number of reads vs number of writes on the index, and if the ratio is write heavy, the index may not be the most efficient. I would advise looking for a ratio of at least 1:1 to justify keeping it, as anything below probably means you’re incurring write over head for little return on reads.

Important Note: This only tracks reads/writes since the last restart of the instance. If this doesn’t include a full business cycle, you may throw away an important month end index

This script is a slightly modified version of Brent Ozar’s (t|b) unused index script (Found over at http://sqlserverpedia.com/wiki/Find_Indexes_Not_In_Use). I’ve only dropped the “Drop Index” create statement so you have to think about what you’re doing :)

SELECT
o.name
, indexname=i.name
, i.index_id
, reads=user_seeks + user_scans + user_lookups
, writes = user_updates
, rows = (SELECT SUM(p.rows) FROM sys.partitions p WHERE p.index_id = s.index_id AND s.object_id = p.object_id)
, CASE
WHEN s.user_updates < 1 THEN 100
ELSE 1.00 * (s.user_seeks + s.user_scans + s.user_lookups) / s.user_updates
END AS reads_per_write
, 'DROP INDEX ' + QUOTENAME(i.name)
+ ' ON ' + QUOTENAME(c.name) + '.' + QUOTENAME(OBJECT_NAME(s.object_id)) as 'drop statement'
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i ON i.index_id = s.index_id AND s.object_id = i.object_id
INNER JOIN sys.objects o on s.object_id = o.object_id
INNER JOIN sys.schemas c on o.schema_id = c.schema_id
WHERE OBJECTPROPERTY(s.object_id,'IsUserTable') = 1
AND s.database_id = DB_ID()
AND i.type_desc = 'nonclustered'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
AND (SELECT SUM(p.rows) FROM sys.partitions p WHERE p.index_id = s.index_id AND s.object_id = p.object_id) > 10000
ORDER BY reads

Identify potentially redundant indexes

Kimberly Tripp has already done extensive work in this area, and I would struggle to do it justice in the way she does, so I’ll give a link to her blog, and specifically sp_helpindex at the end.

What makes an index potentially redundant (Apart from never being used)? Take the following scenario:

You have two queries

· Select Col1 from MyTable

· Select Col1, Col2 from MyTable

You create two indexes :

· One on Col1

· One on Col1 and Col2

The first index is optimal for the first query, the second index is optimal for the second query. But both indexes share the “Col1” column. This comes back to “Best” vs “Good enough”, the second index cover both queries, and only requires one index update per write. If they are used equally, the first index could be considered redundant, as the second index will also service the first query.

Summarising

Try the sp_helpindex query, and do some experimenting with your queries (on a test system!) and indexes, and try to strike the right balance between read performance and write performance.

Indexes are absolutely a good thing, and can drag a system/process back from the brink of falling over, to completing in milliseconds. But there’s no such thing as free money, so make sure you keep the “Strategy” portion in your mind when considering your Indexing Strategy.

Further reading

http://www.sqlskills.com/blogs/kimberly/removing-duplicate-indexes/ - Kimberly Tripp sp_helpindex

http://www.sqlskills.com/blogs/kimberly/category/sp_helpindex-rewrites/ - sp_helpindex blog archives

blog comments powered by Disqus

Page List

Page List