SQL Centre of Excellence

I must be going mad. I’ve worked on  thousands of database schemas over the years, including a lot of “conformed data warehouses” where there are a lot of different business areas in one user database, separated either by schema (newer style) or by some funky naming convention (older style). I’ve always found that SSMS can get a bit clunky and slow when there are lots of objects – in fact on a recent customer site with Navision (uses a silly number of objects) SSMS actually couldn't enumerate all the tables and bombed out. We had to resort to just querying sys.objects to list tables.

I’ve just noticed that you can filter tables in SSMS to only show you a subset. Very handy to just select tables in a schema, or just “Sales” tables. Here's a screen shot of the feature in action.

How did I miss that – its like been sitting on the toolbar since SSMS 2005….doh.

image

Dr Greg Low is a very distinguished SQL Speaker, Mentor and trainer from down under – he is both a certified MCM and also now works a a trainer on the Advanced certification program.

He is in Ireland over the next few weeks running some BI Architecture courses for Microsoft Partners, and Niall managed to grab him to run a SQL Users Group on Tues 27th July (Next Tues).

The session is to cover reporting services R2 enhancements. You can register for the session on the link below and read more on the synopsis.

SQL User Group with Greg Low on Reporting Services R2

I’m on vacation, but I’m hoping I’ll be able to pop along for the session. We do a lot of work in Reporting Services R2 (Carmel is the queen of reporting!!), but I have to admin that there are some features I’m not very up on like “Writing mode” and “Domain Scope”

Whenever we chat about IO and storage there are actually FOUR factors to determine requirements. Often people only discuss the first one (Capacity) so we have to tease out the others:

1. Capacity

2. Throughput

3. Latency

4. Service Levels

1. Capacity

Capacity is the easy one that no one misses. How many GB do I need ?

The obvious mistakes we see people making is to only size for the raw data. You also need to size for:

  • Indexes. These can be 20% to 100% of the data size, depending on how heavily indexed. A database that is used for both OLTP and relational reporting (and is tuned) is likely to be heavily indexed. A data warehouse that just exists to load a cube, might not have any indexes on it at all!!
  • TempDB. How to size TempDB is a topic in itself, but as a rough rule of thumb, you'll need to store at least 1.5 to 2 times your largest table (or partition) in TempDB.
  • Backup and General Space. You are going to need somewhere to back up your SQL Server, or restore from. You may also need more than one full copy to be stored. How much space you need can depend on if your backups are compressed, but in a lot of cases the space required is 2 or more times the database + index size. The good news it that this “general” storage space does not often need top of the range performance, so use of near line storage in RAID 5 can save buckets.
  • OLAP If you have cubes, then these need to be sized. A cube can 15% to 100% of the size of the data warehouse, and you may need multiple copies. Often, cubes don’t need as terribly performant storage if queries are served from RAM.
  • Data Processing. Its not uncommon to have ETL processes drop files or pick files up from a SQL Server and this requires both space and a drive letter allocated. You can’t really put this on the same volume as SQL Data files or you will get fragmentation.

2. Throughout

Throughput is how fast do I need the IO to be or rate of transfer. This is typically measures in IOP’s which is the windows perfmon counter “Logical Disk\Disk Transfers per Sec” or it can also be measured in MB/Sec.

Usually, we measure how many 64k random read/write IO’s we need per second at the correct latency (8-20ms) for data files.

An exception to this is sometimes when designing data warehouses, we speak in terms of MB/Sec. The reasoning here is that a lot of research has gone into Microsoft’s “Fast Track” design for data warehouses that estimate a single CPU can consume about 200 MB/Sec sequential reads. While a full blown “fast Track” design might be too expensive for most Irish pockets, the underlying principles and estimator calculations are still really useful for sizing.

http://www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx

 

3. Latency

 

This is how responsive do I need the disks to be and is measured by the perfmon counter “Logical Disk\Avg Secs per Transfer”. SQL Server is very prescriptive here:

  • Data Files need 8-20ms latency
  • Log files need 1-5ms latency

The exception to this is data warehouses where we accept higher latency (8-50ms). The reasoning is very simple – if a query is going to take 30 seconds to scan millions of rows then having to wait 50ms to get started isn't going to loose anyone any sleep …

3. Service Levels

This is how resilient do we need the storage to be:

  • Do we need the throughput and Latency guaranteed 9-5 or 24x7
  • What throughput and Latency can we accept during a disk failure event
  • What throughput and Latency can we accept during a backup event.
  • Following a failover to DR site what storage throughput and latency do you expect (or is this all bets are off!)

As an example, imagine if you have a 24x7 SQL Server on RAID 5 and a disk blows and will take eight hours to rebuild. What if the IOPS drops from 1,000 to 500 during the rebuild and latency goes from 8ms to 60ms? This could mean that this event is technically an outage for eight hours. if a disk blows a few times a year you will have lots of outages. Therefore the requirements should state “we need 1,000 IOPS and 8ms latency DURING a disk failure event”, or “We can accept 800 IOPS and 10ms latency during a disk failure event”. This will help the storage designer make the correct choice on configuration. For example he may use 15 spindles on RAID 5 rather than 8 to ensure requirements are met, or he may change from RAID 5 to RAID 6. If you don’t specify the requirements, you can’t really blame the storage designer or Vendor for not meeting them!!

For a real mission critical SQL Server, i would run the “SQLIO” test during these conditions to baselines expectations and check requirements met.

Bottom line is Service Levels is the stick that you use to get the storage designer to over spec the IO requirements from the basic requirements  ;-) It will raise the cost of course.

 

Ok, this topic comes up all the time and I partially feel like I’m going to get less work out of posting it, by giving away some trade secrets, but its all in the public domain anyway. So here goes…

The scenario is you are an experienced Infrastructure manager considering purchasing a new SAN and you have lots of workloads to consider: File and Print, VMware servers, exchange, SQL Server, ERP’s, Web Servers and lots of application servers. For a lot of these workloads you just tell the SAN guy how much space you need and away you go.

But, what are the sort of questions your going to want to ask a SQL consultant to help with storage design for your SAN. I’ve made up ten questions that I think are quite common

1. What are the storage requirements for SQL Server?
2. Why can’t I use the 128k or 1MB block size JimBob recommended?
3. What the best RAID Types and disk speeds for SQL Server ?
4. How do I measure what my current SQL Server needs ?
5. Any special Cache Settings?
6. Can I share SQL Server with my other workloads?
7. How do I know that my storage meets SQL Servers requirements?
8. How do I convince the SAN vendor to meet SQL Servers requirements?
9. How do I know if things are working well or not ?
10. Useful links on SQL IO and Storage Area Networks

1. What are Storage Requirement for Running SQL Server

The magic numbers that every SQL consultant has in his back pocket are 8- 20 ms and 1-5 ms  ;-)

This means that the number one requirement for SQL Server is that when it requests information from the disk subsystem or writes to it it MUST get the information back fact enough. If it does not it will kind of go into “throttle mode” and slow down the IO process.

Your SQL Server will be split into three basic components: Data files,  log files and TempDB

  • Data files must have a response time averaging about 8ms and a maximum response time of around 20ms. This is the windows performance counter “Logical Disk: Disk Secs/Transfer”
  • Log Files must have a response time averaging from 1-5ms. The Log files are often the most important thing to consider as log writes can be synchronous in nature so those users will be waiting a lot if the log disks are slow.
  • TempDB performance must be about the same as data files (1-8ms). If a SQL consultant has performance a review of your workload and warned of “heavy” TempDB usage, then placing TempDB on your fastest disk subsystem will bring a lot of benefit.

Additionally we can assume the following workload characteristics

  • Assume that data files use 64k size IOs and that are random in nature and both reading and writing is performed by SQL.
  • Assume that log files use 64k IO size and are sequential in nature, only writing.
  • assume that an OLTP workload like your ERP or Navision is 60% reads and 40% writes. If you have existing SQL workloads you should establish that answer to “what is my read/write ratio”. It is very common that we see storage optimised for read performance when the workloads is 90% writes!

It really is that simple in that latency is the ONLY real requirement SQL Server has, everything else is just guidance to help you meet this requirement!! if your IO requests that SQL server issues to your SAN meet those figures then you have a highly performance storage subsystem and happy SQL Server users.

The main thing to specify apart form your latency requirement is the throughput (IOPs). It is no good meeting the 8ms target for 100 IOPs and then finding your workloads needs 5,000 IOPs. You wont be able to meet the 8ms target!!

2. Why can’t I use the 128k or 1MB block size JimBob recommended?

Well if you can meet that 8ms for data and 1ms for log writes target, then knock yourself out, use a 5MB block size for all SQL Server cares ;-)

But realistically, you really need to use a 64k block size. There have been hundreds if not thousands of empirical tests showing that if the block size is larger than 64k then you will get a massive drop in latency and not meet the requirements for running SQL Server. For example if the block size is 512K, then the SAN will need to return 512k when SQL Server has only asked for 64k, this could take 8 times longer right…

3. What the best RAID Types and disk speeds for SQL Server ?

Nowadays most SQL consultants try and not talk about RAID types and types of disk, it can be best to leave that up to the storage guys. If the storage team can meet my requirement for 5,000 random 64k read/write IOPs at 8ms latency by using 50 old SATA drives at 5,400 rpm in RAID 5 then knock yourself out – I’m happy. Well maybe I’m happy till we have that chat about Service Level requirements during a disk degrade event but that's a different story…

Unfortunately we often do have to discuss RAID or disk speed because the core requirements are not met, so we need to optimise what we have to lower latency at the target IOPs.

As a rule of Thumb

  • RAID 5 is generally optimised for read operations and will often give you a performance hit, maybe an extra latency of 1-2 ms or so. It is therefore a poor candidate for logs buy may be a good candidate for read intensive data volumes. It is probably a poor choice for “TempDB” as TempDB is very write intensive. The golden rule here is “know your read/write ratio”. If your workload is 80% reads then maybe raid 5 will work for you.
  • RAID 10 is generally optimised better for writes and all round performance. It is usually the best choice for TempDB and Logs.
  • RAID 1 (Mirror) is when you only have one spindle in a disk group. This is generally considers the holy grail for configuring log files – one disk per log file. This is because a single disk can deliver extremely fast and reliable sequential writes which is perfect for log files. Now why don’t we just put every log on its own disk ? Cost is the main factor, so what we generally recommend is IF you have a huge SQL Server that needs the very best in performance then place the log files on their own RAID 1 disk. Otherwise your going to want a RAID 10 pool for sharing log files.
  • RAID 6 is the new kid on the blog. Often is is preferred over RAID 5 as RAID 5 sucks when a disk degrades. The switch from RAID 5 to RAID 6 usually happens if there is a conversation about guaranteed server levels during a disk degradation event.

4. How do I measure what my SQL Server Needs?

Easy. Just run the perfmon counters “Logical Disk: Disk Writes/Sec and Logical Disk: Disk Reads/Sec”. Over a period of time this will tell you your average IOP’s needed and your maximum IOPs needed.

If you have SCOM installed then the windows management pack actually captures these metrics, aggregates them up to the hour mark and puts them into a data warehouse. give us a shout and we will gladly come in and suck the data out and present it in a nice report. We will even add reports in to SCOM so you can get this data out at the push of a button.

You should quickly be able to now say. “I need 750 IOPS made up of 250 writes and 500 reads”

Two words of warning:

1 – don’t use the average figure to specify requirements or else SQL Server will become dog slow during peak usage. Consider specifying the maximum or some figure in between.

2. If your target IOPS is 1,000 then you probably don’t want to run the storage subsystem “hot”. Agree with your storage teams what sort of headroom is needed. E.g. My sql server needs 1,000 IOPS. Can you give me a storage subsystem that can do 2,000 IOPS at a latency of 8ms please.

5. Any special Cache Settings?

Well, If the storage can meet my requirement for 5,000 random 64k read/write IOPs at 8ms latency by using standard SAN cache settings, then who cares right!! knock yourself out.

But, if you need to optimise the storage to lower latency then then are a few common tricks:

  • read cache is pretty much useless for SQL Server. the reasoning is that SQL Server uses most of its memory (when over 2GB) for the buffer pool which is basically its own disk cache. It therefore won’t really ever request the same block from the SAN. Change that read cache to only 10% or off.
  • If the SAN cache is “overloaded” with writes in some cases, disabling the write cache and the cache “mirror” to the other controller can help. This is not a blanket recommendation though, it is something to test when desperate..

6. Can I share SQL Server disks with my other workloads on the SAN ?

Well, If you can run exchange, VMware and all other workloads with SQL Server and still meet my requirement for 5,000 random 64k read/write IOPs at 8ms latency then knock yourself out !!

Otherwise a side effect of sharing SQL Server storage with other workloads is that SQL Server may have to sometimes wait longer than it would hope for (8-20ms) and this will cause performance issues.

As a rule of thumb. Sharing SQL Data volumes with other workloads “can” sometimes be beneficial. For example rather than each get 1.5 disks, your 10 workloads can now enjoy 15 spindles.

As another rule of thumb, sharing log files is generally a no-no. This is because the log files need sequential IO and the sharing of workloads will degenerate this entire IO pattern to random.

As a third rule of thumb. The Service Level Agreement is king. If the end users of your SQL Server need “guaranteed” performance of max 50ms per query, then you will need to supply the SQL Server with storage that can guarantee “8ms” per IO request. Sharing disks with exchange might not be such a good idea here.

7. How do I know that my storage meets SQL Servers requirements?

This is quite easy. there is a freely available tool called “SQLIO.EXE” which you can run and it will output the amount of IOPS that the volume can do along with avg and max latency figures. the results look something like the sample below

sqlio v1.5.SG
using system counter for latency timings, 3306972 counts per second
parameter file used: param.txt
    file E:\testfile.dat with 1 thread (0) using mask 0x0 (0)
1 thread reading for 120 secs from file E:\testfile.dat
    using 64KB random IOs
    enabling multiple I/Os per thread with 6 outstanding
using specified size: 5000 MB for file: E:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:   500.59
MBs/sec:    31.28
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 8
Max_Latency(ms): 690

histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  1  1  4 10 11 13 15 10  8  6  4  3  2  2  1  2  1  1  1  1  1  0  0  0  4

Note that in the example above although the max_Latency is recorded as 690ms, we can see that 96% of the IO did return in 20ms or under, so this might be very acceptable.

If you speak you your friendly SQL consultant (cough Prodata) they will have some hand rolled tool that automates SQL IO measuring and presents back a nice graph that looks something like:

image

We can see from the chart above that the storage can sustain 8,944 IOPS and still hit SQL requirements (8ms avg and 20ms max). As it goes beyond that the SAN is saturated and latency massively increases.

8.How do I convince the SAN vendor to meet SQL Servers requirements?

Most of the conversations I have with poor SAN performance are based around the scenario where requirements are not met. how do you convince the SAN team or vendor to help meet the basic requirements for running SQL Server?

Well, ultimately its going to cost cash. There are some quick wins we discussed: block size, raid type, increasing the spindle count, and volume alignment. It may also need some help from a performance tuning specialist to highlight bottlenecks in IO such as HBA cards, switches, VMware environment slack of multipathing, or driver related.

The “best” advice is to convince your SAN vendor to let you test their SAN BEFORE you pay it. pre sales help is a lot cheaper that post sales. In Ireland a lot of the SAN manufacturers even have performance centres and know a lot about SQL Server. Not favouring anybody, but a big shout out to the EMC guys who regularly attend every advanced SQL training seminar there is going in Ireland and really know their stuff.

If you can go armed with your requirements (remember: 5,000 random 64k IOPS at 8-20ms), they can show you a configuration meeting those requirements and everyone will be happy.

9. How do I know if things are working well or not ?

Apart from your SQL users moaning you will know if your storage is causing a performance problem for SQL Server by using a few techniques:

The most basic is the windows perfmon counters discussed, You can monitor the IOPS and latency. If you are seeing consistent latency > 20ms then you have a problem.

Another route is to check SQL Server’s wait stats. SQL Server records every single time a query is halted from running due to resource waits and you can query these to see what the biggest problems are. At a high level:

  • the WRITELOG wait stat indicates that the log file cannot be written to fast enough
  • PAGEIOLATCH or ASYNC_IO_COMPLETION and IO_COMPLETION is a sign of potential disk subsystem issues.

10. Useful links on SQL IO and Storage Area Networks

Storage Top 10 Best Practices

http://technet.microsoft.com/en-ie/library/cc966534(en-us).aspx

Complete I/O Best practices whitepaper - http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx

Predeployment I/O Best Practices - http://sqlcat.com/whitepapers/archive/2007/11/21/predeployment-i-o-best-practices.aspx

SQLIO.EXE Tool

SQLIO Disk Subsystem Benchmark Tool 

Disk partition Alignment – Make the Case by Jimmy May

http://blogs.msdn.com/b/jimmymay/archive/2009/05/08/disk-partition-alignment-sector-alignment-make-the-case-with-this-template.aspx

Performance  Tuning with Wait Stats by Joe Sack

http://www.digitalconcourse.com/dropzone/MSCOMM/PASSMN/PASSEVT20090616/Joe%20Sack%20Performance%20Troubleshooting%20with%20Wait%20Stats.pdf

Here is some handy TSQL to determine all the FK relationships in a database. The basic ideas is we can use this meta data to automatically determine if a form should have a Combo Box rendered instead of a textbox.

    --  List all Foreign key Relationships in User Tables 
    SELECT  so.name as ForeignTableName, sc.name  as ColumnName, 
    s.name + '.' + PKT.name as PrimaryTableName
    from sysreferences r
    inner join  sysconstraints c on r.constid = c.constid
    inner join sys.tables pkt on PKT.object_id = r.rkeyid
    inner join sys.columns sc on sc.column_id = r.fkey1 and r.fkeyid = sc.object_id 
    inner join sys.schemas s on s.schema_id=pkt.schema_id
    inner join sys.tables so on r.fkeyid = so.object_id
    where so.is_ms_shipped=0 
    order by so.name, sc.name

TSQL to Determine Primary Keys

by Bob Duffy 21. July 2010 10:39

I’m working on an “Intelligent” Universal Table Editor at the moment. The basic idea is we are sick to death of writing “CRUD” style search forms and editing forms when its just really sitting on top of a SQL table. 

We have various frameworks in place to dynamically render editing and search forms from the base table plus some specific meta data. This allows support and customisation folk to add fields, search criteria, field level validation, and screen layout dynamically without the need for design time recompile and ship. Very useful for CRM style applications where things change quickly.

One challenge is how to get the Primary Keys of all the User Tables, along with enough meta data to be useful. Here is a sample query below:

-- Sample TSQL to retreive primary keys on all user Tables 
select so.name as TableName, sc.name as PrimaryKeyName, st.name as PrimaryKeyType, sc.is_identity
from sys.indexes si 
inner join sys.objects so on so.object_id = si.object_id
inner join sys.sysindexkeys sik on sik.id = so.object_id and si.index_id=sik.indid 
inner join sys.columns sc on sc.object_id =so.object_id and sc.column_id=sik.colid
inner join sys.types st on st.system_type_id=sc.system_type_id
where si.is_primary_key =1 and so.type='U' and is_ms_shipped=0

Thanks to everyone who attended the SQL Users group on 28th June 2010. It was a great session with Paul Randal and Kimberly Tripp waxing lyrical on some design problems that they commonly see in the field such as Nested Transactions, Poor Primary Keys and abuse of various features (ahh Triggers…)

The deck was especially good if you are about to embark on a new project or re-visiting an old one. You can use it to hit someone over the head with an “I told you so”, or at least help ensure you start on the right foot.

I’ve uploaded the session below so you can refer to it.

 

P.S.  I have permanently moved my blog to http://blogs.prodata.ie/bob Please update any RSS feeds or links. I’m going to be focused on SQL and BI technology as well as SQL community events in Ireland.

0610DublinSQLDesign.pdf (1.66 mb)

My Microsoft MVP status has been renewed for 2010/11. This is really cool as I get to go to the MVP summit in January, and its nice recognition as well ;-)

Thanks to Enda in Microsoft for endorsing me and everyone for listening to me at events in 2009/10!!

I’ve been having a few problems with the following error message in Dashboard Designer 2010, when refreshing a data connection to SSAS.

“The following lists cannot be accessed from SharePoint server. The lists have been deleted or you no longer have permission to open them. “

image

There was also a note in the application log “Failed to Read the ACLS for an item”

image

The crux of the problem is that Application pool for PerformancePoint is used to invoke SharePoint web services, and it needs at least “Read” access on the data connections folder. You can right click on the data connections library in Dashboard Designer and select “Edit Permissions”. Note that the application log tells you the user for the application pool, but if you miss it you can look in IIS for the applications pools. Be warned though as PerformancePoint defaults to a GUID for the name of the Application Pool.

I set the permissions for the service account and everything is flying along now..happy days.

Our SQL Immersion: Dublin 2010 is fully underway with Paul Randal and Kimberly Tripp (http://www.prodata.ie/events/sqlimmersiondublin2010/). Chatting to the course attendees this is all good stuff, Paul and Kimberly are truly the best of the best Instructors world wide. Now that they have Brent Ozar working with them, the mind boggles at how much SQL expertise is in one company. Now that’s just greedy!!

I’m definitely going to sit on on some of the Performance Tuning Master classes and the DR Master Classes next week.

The bit that I’m really looking forward to is we have booked the “chefs” table at Gordon Ramsey's restaurant in Powerscourt with Paul, Kimberly myself. Carmel and Sandra on Friday. Carmel has been touring Ireland delivering technical presentations for Microsoft on SQL 2008, SQL 2008R2 and the BI Stack, so were going to have to work heard to avoid discussing the benefits of optimise_for_adhoc in sql2008 or something like that ;-)

image

Page List

Page List