SQL Centre of Excellence

Storage area networks are getting larger and more sophisticated. One of the problems I face as SQL consultants is running sqlio on them.

One common stumbling block is the size of the cache on some storage area networks can be huge (128GB plus), especially with the use of solid state devices as a tier “zero” cache accelerator. If our test file is too small we simply get the “from cache” speed of the SAN and while this is interesting its not indicative of performance when we throw a couple of TB of SQL Server data onto it. The fact is that often read cache on a SAN is a waste of money for SQL Server anyway as the cache hit ratio on the SAN will be incredibly low – SQL Server already has its own cache in the buffer pool, so its not going to be reading data from the cache of the SAN anytime soon unless the cache on the SAN is larger than the size of the SQL Servers buffer pool (possible but unlikely).

So what do we do when the SAN has a 64GB cache ? Well guidelines we use is that the size of the test file should be about four times the size of the cache. So your going to need a 256GB test file.

What if you don’t want to wait for a 256GB test file to be created as you only have limited test time?

Here’s a neat trick that Joe Sack from the SQL MCM program showed me for using SQL Servers “Instant File Initialisation” feature to massively speed up creation of test files.

1. Ensure that SQL DB Engine is installed and that Instant File Initialisation is configured in the local Security Policy Editor.

2. Create a database MDF file of the correct size (say 256GB) and detach the database

3. Modify the parameter file that sqlio uses to specify that you have pre-sized the file (see below)

4. Away you go.

Modifying the sqlio parameter file:

Usually the parameter file for sqlio is called something like “param.txt” and looks like this:


You can simply change the filename to be the MDF file and then delete the last parameter so that the size is not specified.


Using this technique I recently had a 1TB test file generated in under three seconds. Nice Winking smile

Technorati Tags:

SQL IO Characteristics

by Bob Duffy 8. August 2010 05:20

This is fairly well documented in various books and white papers, but can be tricky to find out on the Internet so I am summarising it here in this blog.

The key questions is “what are SQL Server’s I/O Characteristics”. This helps answer questions like why is a 1MB block size bad; why don’t we often care about random I/O for log files; what extra benefit for I/O does enterprise edition have; and why do we mainly test random I/O for data files in OLTP systems

The table below shows general I/O characteristics for the storage engine that will be important to most OLTP workloads. If you “really” know your workload you can optimise your storage. For example:

  • if you know that most of your workload is index seeks, you may benefit from a 8k block size on the SAN - good luck convincing your SAN team of this ;-)
  • If you are doing mostly very large table scans and you have enterprise edition, then it is possible that a 1MB block size may be most performant.
  • If you are not running enterprise edition, then a 1MB block size will never be optimal as the maximum IO size SQL Server will issue is 256k, so a minimum of three quarters of your I/O will be wasted.


Random / Sequential

Read / Write

Size Range

OLTP – Log



Sector Aligned Up to 60K

OLTP – Log



Sector Aligned Up to 120K

OLTP – Data (Index Seeks)




OLTP - Lazy Writer (scatter gather)



Any multiple of 8K up to 256K

OLTP – Checkpoint (scatter gather)



Any multiple of 8K up to 256K

Read Ahead (DSS, Index/Table Scans)



Any multiple of 8KB up to 256K (1024 Enterprise Edition)

Bulk Insert



Any multiple of 8K up to 128K


This secondary table is useful as trivia, but we rarely optimise storage for these operations. One  observation is that backup and restore can issue up to 4MB IO sizes, and the importance of instant file initialisation in helping create database performance (or file growth).


Random / Sequential

Read / Write

Size Range

CREATE DATABASE (or file growth)



512KB (SQL 2000) , Up to 4MB (SQL2005+)

(Only log file is initialized in SQL Server 2005+ if instant file initialisation is enabled)




Multiple of 64K (up to 4MB)




Multiple of 64K (up to 4MB)




8K – 64K

(Read Phase)



Any multiple of 8KB up to 256K

(Write Phase)



Any multiple of 8K up to 128K

DBCC – SHOWCONTIG (deprecated, use sys.dm_db_index_physical_stats)



8K – 64K


Further References:

Hitachi – Tuning SL Server 2005 Performance http://www.servicesorientedstorage.com/assets/pdf/tuning-microsoft-sql-server-2005-performance-wp.pdf

High Availability Best Practices: I/O Subsystem


SQL Server 2000 I/O Basics (Old but still mostly relevant today)


Whenever we chat about IO and storage there are actually FOUR factors to determine requirements. Often people only discuss the first one (Capacity) so we have to tease out the others:

1. Capacity

2. Throughput

3. Latency

4. Service Levels

1. Capacity

Capacity is the easy one that no one misses. How many GB do I need ?

The obvious mistakes we see people making is to only size for the raw data. You also need to size for:

  • Indexes. These can be 20% to 100% of the data size, depending on how heavily indexed. A database that is used for both OLTP and relational reporting (and is tuned) is likely to be heavily indexed. A data warehouse that just exists to load a cube, might not have any indexes on it at all!!
  • TempDB. How to size TempDB is a topic in itself, but as a rough rule of thumb, you'll need to store at least 1.5 to 2 times your largest table (or partition) in TempDB.
  • Backup and General Space. You are going to need somewhere to back up your SQL Server, or restore from. You may also need more than one full copy to be stored. How much space you need can depend on if your backups are compressed, but in a lot of cases the space required is 2 or more times the database + index size. The good news it that this “general” storage space does not often need top of the range performance, so use of near line storage in RAID 5 can save buckets.
  • OLAP If you have cubes, then these need to be sized. A cube can 15% to 100% of the size of the data warehouse, and you may need multiple copies. Often, cubes don’t need as terribly performant storage if queries are served from RAM.
  • Data Processing. Its not uncommon to have ETL processes drop files or pick files up from a SQL Server and this requires both space and a drive letter allocated. You can’t really put this on the same volume as SQL Data files or you will get fragmentation.

2. Throughout

Throughput is how fast do I need the IO to be or rate of transfer. This is typically measures in IOP’s which is the windows perfmon counter “Logical Disk\Disk Transfers per Sec” or it can also be measured in MB/Sec.

Usually, we measure how many 64k random read/write IO’s we need per second at the correct latency (8-20ms) for data files.

An exception to this is sometimes when designing data warehouses, we speak in terms of MB/Sec. The reasoning here is that a lot of research has gone into Microsoft’s “Fast Track” design for data warehouses that estimate a single CPU can consume about 200 MB/Sec sequential reads. While a full blown “fast Track” design might be too expensive for most Irish pockets, the underlying principles and estimator calculations are still really useful for sizing.



3. Latency


This is how responsive do I need the disks to be and is measured by the perfmon counter “Logical Disk\Avg Secs per Transfer”. SQL Server is very prescriptive here:

  • Data Files need 8-20ms latency
  • Log files need 1-5ms latency

The exception to this is data warehouses where we accept higher latency (8-50ms). The reasoning is very simple – if a query is going to take 30 seconds to scan millions of rows then having to wait 50ms to get started isn't going to loose anyone any sleep …

3. Service Levels

This is how resilient do we need the storage to be:

  • Do we need the throughput and Latency guaranteed 9-5 or 24x7
  • What throughput and Latency can we accept during a disk failure event
  • What throughput and Latency can we accept during a backup event.
  • Following a failover to DR site what storage throughput and latency do you expect (or is this all bets are off!)

As an example, imagine if you have a 24x7 SQL Server on RAID 5 and a disk blows and will take eight hours to rebuild. What if the IOPS drops from 1,000 to 500 during the rebuild and latency goes from 8ms to 60ms? This could mean that this event is technically an outage for eight hours. if a disk blows a few times a year you will have lots of outages. Therefore the requirements should state “we need 1,000 IOPS and 8ms latency DURING a disk failure event”, or “We can accept 800 IOPS and 10ms latency during a disk failure event”. This will help the storage designer make the correct choice on configuration. For example he may use 15 spindles on RAID 5 rather than 8 to ensure requirements are met, or he may change from RAID 5 to RAID 6. If you don’t specify the requirements, you can’t really blame the storage designer or Vendor for not meeting them!!

For a real mission critical SQL Server, i would run the “SQLIO” test during these conditions to baselines expectations and check requirements met.

Bottom line is Service Levels is the stick that you use to get the storage designer to over spec the IO requirements from the basic requirements  ;-) It will raise the cost of course.


Ok, this topic comes up all the time and I partially feel like I’m going to get less work out of posting it, by giving away some trade secrets, but its all in the public domain anyway. So here goes…

The scenario is you are an experienced Infrastructure manager considering purchasing a new SAN and you have lots of workloads to consider: File and Print, VMware servers, exchange, SQL Server, ERP’s, Web Servers and lots of application servers. For a lot of these workloads you just tell the SAN guy how much space you need and away you go.

But, what are the sort of questions your going to want to ask a SQL consultant to help with storage design for your SAN. I’ve made up ten questions that I think are quite common

1. What are the storage requirements for SQL Server?
2. Why can’t I use the 128k or 1MB block size JimBob recommended?
3. What the best RAID Types and disk speeds for SQL Server ?
4. How do I measure what my current SQL Server needs ?
5. Any special Cache Settings?
6. Can I share SQL Server with my other workloads?
7. How do I know that my storage meets SQL Servers requirements?
8. How do I convince the SAN vendor to meet SQL Servers requirements?
9. How do I know if things are working well or not ?
10. Useful links on SQL IO and Storage Area Networks

1. What are Storage Requirement for Running SQL Server

The magic numbers that every SQL consultant has in his back pocket are 8- 20 ms and 1-5 ms  ;-)

This means that the number one requirement for SQL Server is that when it requests information from the disk subsystem or writes to it it MUST get the information back fact enough. If it does not it will kind of go into “throttle mode” and slow down the IO process.

Your SQL Server will be split into three basic components: Data files,  log files and TempDB

  • Data files must have a response time averaging about 8ms and a maximum response time of around 20ms. This is the windows performance counter “Logical Disk: Disk Secs/Transfer”
  • Log Files must have a response time averaging from 1-5ms. The Log files are often the most important thing to consider as log writes can be synchronous in nature so those users will be waiting a lot if the log disks are slow.
  • TempDB performance must be about the same as data files (1-8ms). If a SQL consultant has performance a review of your workload and warned of “heavy” TempDB usage, then placing TempDB on your fastest disk subsystem will bring a lot of benefit.

Additionally we can assume the following workload characteristics

  • Assume that data files use 64k size IOs and that are random in nature and both reading and writing is performed by SQL.
  • Assume that log files use 64k IO size and are sequential in nature, only writing.
  • assume that an OLTP workload like your ERP or Navision is 60% reads and 40% writes. If you have existing SQL workloads you should establish that answer to “what is my read/write ratio”. It is very common that we see storage optimised for read performance when the workloads is 90% writes!

It really is that simple in that latency is the ONLY real requirement SQL Server has, everything else is just guidance to help you meet this requirement!! if your IO requests that SQL server issues to your SAN meet those figures then you have a highly performance storage subsystem and happy SQL Server users.

The main thing to specify apart form your latency requirement is the throughput (IOPs). It is no good meeting the 8ms target for 100 IOPs and then finding your workloads needs 5,000 IOPs. You wont be able to meet the 8ms target!!

2. Why can’t I use the 128k or 1MB block size JimBob recommended?

Well if you can meet that 8ms for data and 1ms for log writes target, then knock yourself out, use a 5MB block size for all SQL Server cares ;-)

But realistically, you really need to use a 64k block size. There have been hundreds if not thousands of empirical tests showing that if the block size is larger than 64k then you will get a massive drop in latency and not meet the requirements for running SQL Server. For example if the block size is 512K, then the SAN will need to return 512k when SQL Server has only asked for 64k, this could take 8 times longer right…

3. What the best RAID Types and disk speeds for SQL Server ?

Nowadays most SQL consultants try and not talk about RAID types and types of disk, it can be best to leave that up to the storage guys. If the storage team can meet my requirement for 5,000 random 64k read/write IOPs at 8ms latency by using 50 old SATA drives at 5,400 rpm in RAID 5 then knock yourself out – I’m happy. Well maybe I’m happy till we have that chat about Service Level requirements during a disk degrade event but that's a different story…

Unfortunately we often do have to discuss RAID or disk speed because the core requirements are not met, so we need to optimise what we have to lower latency at the target IOPs.

As a rule of Thumb

  • RAID 5 is generally optimised for read operations and will often give you a performance hit, maybe an extra latency of 1-2 ms or so. It is therefore a poor candidate for logs buy may be a good candidate for read intensive data volumes. It is probably a poor choice for “TempDB” as TempDB is very write intensive. The golden rule here is “know your read/write ratio”. If your workload is 80% reads then maybe raid 5 will work for you.
  • RAID 10 is generally optimised better for writes and all round performance. It is usually the best choice for TempDB and Logs.
  • RAID 1 (Mirror) is when you only have one spindle in a disk group. This is generally considers the holy grail for configuring log files – one disk per log file. This is because a single disk can deliver extremely fast and reliable sequential writes which is perfect for log files. Now why don’t we just put every log on its own disk ? Cost is the main factor, so what we generally recommend is IF you have a huge SQL Server that needs the very best in performance then place the log files on their own RAID 1 disk. Otherwise your going to want a RAID 10 pool for sharing log files.
  • RAID 6 is the new kid on the blog. Often is is preferred over RAID 5 as RAID 5 sucks when a disk degrades. The switch from RAID 5 to RAID 6 usually happens if there is a conversation about guaranteed server levels during a disk degradation event.

4. How do I measure what my SQL Server Needs?

Easy. Just run the perfmon counters “Logical Disk: Disk Writes/Sec and Logical Disk: Disk Reads/Sec”. Over a period of time this will tell you your average IOP’s needed and your maximum IOPs needed.

If you have SCOM installed then the windows management pack actually captures these metrics, aggregates them up to the hour mark and puts them into a data warehouse. give us a shout and we will gladly come in and suck the data out and present it in a nice report. We will even add reports in to SCOM so you can get this data out at the push of a button.

You should quickly be able to now say. “I need 750 IOPS made up of 250 writes and 500 reads”

Two words of warning:

1 – don’t use the average figure to specify requirements or else SQL Server will become dog slow during peak usage. Consider specifying the maximum or some figure in between.

2. If your target IOPS is 1,000 then you probably don’t want to run the storage subsystem “hot”. Agree with your storage teams what sort of headroom is needed. E.g. My sql server needs 1,000 IOPS. Can you give me a storage subsystem that can do 2,000 IOPS at a latency of 8ms please.

5. Any special Cache Settings?

Well, If the storage can meet my requirement for 5,000 random 64k read/write IOPs at 8ms latency by using standard SAN cache settings, then who cares right!! knock yourself out.

But, if you need to optimise the storage to lower latency then then are a few common tricks:

  • read cache is pretty much useless for SQL Server. the reasoning is that SQL Server uses most of its memory (when over 2GB) for the buffer pool which is basically its own disk cache. It therefore won’t really ever request the same block from the SAN. Change that read cache to only 10% or off.
  • If the SAN cache is “overloaded” with writes in some cases, disabling the write cache and the cache “mirror” to the other controller can help. This is not a blanket recommendation though, it is something to test when desperate..

6. Can I share SQL Server disks with my other workloads on the SAN ?

Well, If you can run exchange, VMware and all other workloads with SQL Server and still meet my requirement for 5,000 random 64k read/write IOPs at 8ms latency then knock yourself out !!

Otherwise a side effect of sharing SQL Server storage with other workloads is that SQL Server may have to sometimes wait longer than it would hope for (8-20ms) and this will cause performance issues.

As a rule of thumb. Sharing SQL Data volumes with other workloads “can” sometimes be beneficial. For example rather than each get 1.5 disks, your 10 workloads can now enjoy 15 spindles.

As another rule of thumb, sharing log files is generally a no-no. This is because the log files need sequential IO and the sharing of workloads will degenerate this entire IO pattern to random.

As a third rule of thumb. The Service Level Agreement is king. If the end users of your SQL Server need “guaranteed” performance of max 50ms per query, then you will need to supply the SQL Server with storage that can guarantee “8ms” per IO request. Sharing disks with exchange might not be such a good idea here.

7. How do I know that my storage meets SQL Servers requirements?

This is quite easy. there is a freely available tool called “SQLIO.EXE” which you can run and it will output the amount of IOPS that the volume can do along with avg and max latency figures. the results look something like the sample below

sqlio v1.5.SG
using system counter for latency timings, 3306972 counts per second
parameter file used: param.txt
    file E:\testfile.dat with 1 thread (0) using mask 0x0 (0)
1 thread reading for 120 secs from file E:\testfile.dat
    using 64KB random IOs
    enabling multiple I/Os per thread with 6 outstanding
using specified size: 5000 MB for file: E:\testfile.dat
initialization done
throughput metrics:
IOs/sec:   500.59
MBs/sec:    31.28
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 8
Max_Latency(ms): 690

ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  1  1  4 10 11 13 15 10  8  6  4  3  2  2  1  2  1  1  1  1  1  0  0  0  4

Note that in the example above although the max_Latency is recorded as 690ms, we can see that 96% of the IO did return in 20ms or under, so this might be very acceptable.

If you speak you your friendly SQL consultant (cough Prodata) they will have some hand rolled tool that automates SQL IO measuring and presents back a nice graph that looks something like:


We can see from the chart above that the storage can sustain 8,944 IOPS and still hit SQL requirements (8ms avg and 20ms max). As it goes beyond that the SAN is saturated and latency massively increases.

8.How do I convince the SAN vendor to meet SQL Servers requirements?

Most of the conversations I have with poor SAN performance are based around the scenario where requirements are not met. how do you convince the SAN team or vendor to help meet the basic requirements for running SQL Server?

Well, ultimately its going to cost cash. There are some quick wins we discussed: block size, raid type, increasing the spindle count, and volume alignment. It may also need some help from a performance tuning specialist to highlight bottlenecks in IO such as HBA cards, switches, VMware environment slack of multipathing, or driver related.

The “best” advice is to convince your SAN vendor to let you test their SAN BEFORE you pay it. pre sales help is a lot cheaper that post sales. In Ireland a lot of the SAN manufacturers even have performance centres and know a lot about SQL Server. Not favouring anybody, but a big shout out to the EMC guys who regularly attend every advanced SQL training seminar there is going in Ireland and really know their stuff.

If you can go armed with your requirements (remember: 5,000 random 64k IOPS at 8-20ms), they can show you a configuration meeting those requirements and everyone will be happy.

9. How do I know if things are working well or not ?

Apart from your SQL users moaning you will know if your storage is causing a performance problem for SQL Server by using a few techniques:

The most basic is the windows perfmon counters discussed, You can monitor the IOPS and latency. If you are seeing consistent latency > 20ms then you have a problem.

Another route is to check SQL Server’s wait stats. SQL Server records every single time a query is halted from running due to resource waits and you can query these to see what the biggest problems are. At a high level:

  • the WRITELOG wait stat indicates that the log file cannot be written to fast enough
  • PAGEIOLATCH or ASYNC_IO_COMPLETION and IO_COMPLETION is a sign of potential disk subsystem issues.

10. Useful links on SQL IO and Storage Area Networks

Storage Top 10 Best Practices


Complete I/O Best practices whitepaper - http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx

Predeployment I/O Best Practices - http://sqlcat.com/whitepapers/archive/2007/11/21/predeployment-i-o-best-practices.aspx


SQLIO Disk Subsystem Benchmark Tool 

Disk partition Alignment – Make the Case by Jimmy May


Performance  Tuning with Wait Stats by Joe Sack


Page List

Page List