SQL Centre of Excellence

Thanks to everyone who attended my session. Here are some links that might be useful for further reading:

- Presentation Deck (attached to this blog)

- Microsoft Assessment and Planning Application Toolkit


- Sql Consolidation Planning Add-In for Excel


- Demo of using MAP and Excel to produce IO Histogram


- Demo of using MAP and PowerPivot to analyse IO for a SQL Estate


Some  Great Questions from the audience which I’ll try to summarise here

Q: What is the maximum recommended CPU limit to decide what SQL Servers to virtualise?
In Hyper-V this is an easy question to answer as it only supports 4. VMware supports more cores, BUT the general recommendation is NOT to use more CPU’s than are available on a single NUMA node which is either 4 or 8. The reasoning is that your virtual operating system will not be “NUMA aware” and having some of the CPU’s on one NUMA node and some on another will be very expensive. There is not a lot of research yet on this type of scenario, so most people avoid it. Your mileage may vary, but we avoid crossing NUMA boundaries with virtualisation.

Q: Does the MAP tool account for growth ?

No, it only captures a snapshot of resource usage. My concern is not usually the growth in CPU usage and IOPS of workloads, but the fact that as time goes on MORE workloads will be added to the virtualised environment. You need to figure out what spare capacity will be allocated for this on the virtual hosts in terms of CPU, IOPS and disk space.

How much disk space you need to park for “growth” depends on how you are virtualising. If you are only moving physical SQL Servers and moving drives like for like, then they may already have had capacity planning for X time.

If you are doing SQL consolidation (moving data and log files to new environment), then this is trickier. It is possible to run the MAP Inventory tool at different months to measure % growth in databases to help plan for growth.

Q: Why don’t we ask for IOPS based on the MAX amount used.?

This would be cool to do for only one server as you get guaranteed performance no matter how busy it is! However imagine you have 100 servers each doing an avg 50 IOPS, 80% percentile at 80 and a max 250 IOPS.

Commissioning 25,000 IOPS at 8ms could be quite expensive, but 8,000 iops might be more practical.

Note we are not saying that the maximum IOPS we would ask for is 8,000. Just that we want 8,000 at avg 8ms latency. Hopefully the storage will be able to spike above 8,000 IOPS.

Its unlikely that all 100 servers will need their “max” at the same time, so we can save some costs with shared storage.


Thanks. Do post any follow up questions Winking smile












The Microsoft Assessment and Planning Toolkit is a fantastic tool for helping gather resource usage for a SQL consolidation / virtualisation project.


One thing that is quite basic though is its presentation of disk IOPS (see below). This article shows how to analyse the data in a bit more detail using the underlying SQL database and excel.


1) IOPs distribution graph

This is useful for seeing the distribution of IOPS so we can determine what percentage of IO falls within percentile boundaries. For example on the above example, would we want to purchase a device only capable of performing 15 IOPS. Probably not as this means that only 50% of IO is within performance targets. We may want 80% or 95% of IO coming in within performance targets. So how do we determine this….

a) Locate the database on the “MAPS” instance and query the database for IO broken down by 15 minute periods:

SELECT     CAST(FLOOR(CAST(dbo.performance_disk.collection_datetime AS float(53)) * 24 * 4) / (24 * 4) AS smalldatetime) AS time, 
                      CEILING(SUM(dbo.performance_disk.disk_transfers_per_sec) / COUNT(*) * COUNT(DISTINCT dbo.performance_disk.device_number)) AS IOPS
FROM         dbo.performance_disk INNER JOIN
                      dbo.devices ON dbo.performance_disk.device_number = dbo.devices.device_number
WHERE     (dbo.performance_disk.instance = N'_Total')
GROUP BY CAST(FLOOR(CAST(dbo.performance_disk.collection_datetime AS float(53)) * 24 * 4) / (24 * 4) AS smalldatetime)

OK – i know the query is horrible, but its a once off!

b) Copy and paste the results into excel


c) Click Insert-Pivot Table and make a Pivot table showing the IOPS on the rows and the count of time (twice on the columns). name the first count of time “Freq” and the second “Quartile”


d) Format the second series (we named Quartile) as “% Running Total In” IOPS


e) Click Insert 2D Line Chart, right click on the second series (Quartile)  and select Format Data Series – Plot Series on Secondary Axis and tidy up the chart with nice formatting.



We can now see some interesting data as: While the average IOPS is about 19.5, the 80% quartile is about 25 and the 90% quartile is about 34.

This might help specify SAN requirements as we could now say that our workload requires:

  • 25 IOPS at a latency of 8ms (80% percentile)
  • 34 IOPS at 20ms (90% quartile)
  • max IOPS of 37 (100%)

Ok, these are just small play numbers, but the concept is there. Some wider points to consider:

a) This is assuming that we don’t want to massively increase performance

b) We may need to add more IOPS for future growth and other factors

A Sample excel sheet with results is attached..Love to hear if anyone finds this useful!


Download File - IO_Distribution.zip

As mentioned previously (http://blogs.prodata.ie/post/Using-MAP-Tool-and-Excel-to-Analyse-IO-for-SQL-Consolidation-Part-I-e28093-basic-distribution.aspx), the MAP tool has a nice database containing all your disk IOPS data and this is nicely contained in a single table called “performance_disk”

We can use PowerPivot to analyse the IOPs form different perspective and start to apply some logic such as: excluding non-business hours, seeing demand by date, time and other factors, and examining read/write ratios

I’ve attached a sample PowerPivot mashup that shows us some more details on one sheet (sample below)


As its PowerPivot we can do some funky stuff like drag the “ServerName” onto a chart to see the breakdown by actual servers to help locate who are the IO gobblers (see below)

Non BI folk can stop reading now. Go away and play with the PowerPivot sheet, refresh it against your MAPS database and hopefully its useful.


For those BI folk, a few interesting challenges for PowerPivot:

  • The IOPS is semi-additive as in we need to average the IOPS across time, but then sum it across servers to get the correct figure. We do this by simply averaging the IOPS and then multiplying by the number of unique servers. the number of unique server is determined by the DAX expression as below:
  • For a better distribution graph we want to “band” IOPS into buckets. For example 20-25, 25-30 and so forth. To do this we can use the “ceiling” DAX function which is much more powerful than the TSQL equivalent as it allows for rounding up on large whole numbers like 5,10,50 or 100. the example below rounds up IOPS to the nearest two.


Download File - MAP_IO_Analysis.xlsx

SSIS Deep Dive Presentation

by Bob Duffy 6. April 2011 03:13

Here are the presentation slides from the SSIS Deep Dive. Enjoy !

1. What's New

2. Design and performance Best Practices

3. Data Warehouse Design Patterns



Thanks to everyone who came along to the SSIS SQL Masterclass last Thurs (31/05/2011). The SSIS Design patterns in particular seemed to be very popular, and we have lots of requests for the sample packages I used in the demos.

I’ve attached the source code. You’ll need to play with it to get it to work as I haven’t added configuration files - that’s another session!

Some tips on getting it to work:
a)  You’ll need to download the source database “AdventureWorks” from codeplex

b) The sql server name was “katmai1” you can add an alias to get it to work in sql configuration management

c) Inside the zip file are two sql scripts to create the extra required databases: [AdventureWorksStaging] and [AdventureWorksDWH]

d) We use two third party components as examples available from these links:


Here is a list of the design patterns and which samples demonstrate the pattern:





The Full Load



SCD Wizard



Master Reference Load



Basic Lookup SCD I






Hash Lookup SCD II



Staged Update



Pre-staged Keys



Merge Delta



SQL Delta



Partial Lookup



Inline Infer Script



Batch Infer



Parent/Child Load


Drop me an email or post comments to the blog if you can’t get the samples to work or have any questions/comments.

Happy ETL’ing,




Download File - AdventureWorksETL_Sample

I’ve been involved in helping some customers move databases (user and system) from the C: drive on a few projects now. This is particular an issue when the C: is the default or when Vendors are left to install databases on a server by themselves – quite a few don’t realize where the data and log files go, even if the volumes are called “SQL Data” and “SQL Logs”.

Here is a script I have been using. Apologies that its not very polished, but it has worked well for me. It can be used to move 1 or many databases and can either do the move or just generate the script to do the move so you can do it manually. It can also handle databases with multiple files and logical names which are not standard – a few other sample scripts I found fell just fell short on these two points.

/* Sample Script to move one or ALL user Databases to another location    

DECLARE @single_database sysname    --name of single database if only one
DECLARE @logical_name VARCHAR(50)        -- database name      
DECLARE @path sysname            -- path for data files
DECLARE @log_path sysname        -- path for Log Files
DECLARE @dbid bigint            -- db id   
DECLARE @db_name sysname
DECLARE @physical_name nvarchar(4000)
DECLARE @type_desc sysname
DECLARE @enableCMDPermanently bit 
DECLARE @file_name sysname
DECLARE @cmd nvarchar(4000)         
DECLARE @cmdExec nvarchar(4000)           
DECLARE @NoExecute bit            --Set this variable to not execute

-- ============USER CONFIGURABLE VARIABLES START=============   
SET @path = 'F:\Data\' --'S:\Data\'   -- Destination Path for all Databases   
SET @log_path = 'G:\Logs\'
SET @single_database    ='MyDatabase'    --empty string means ALL databases on C: drive to be moved
SET @enableCMDPermanently = 1   
SET @NoExecute =1
-- ============USER CONFIGURABLE VARIABLES FINISH============   
-- =====Please do not edit variables below this line=========   
IF NOT EXISTS (select * from sys.configurations where name='xp_cmdshell' and value=1)
    EXEC master.dbo.sp_configure 'show advanced options', 1   
    EXEC master.dbo.sp_configure 'xp_cmdshell', 1 -- enable CMD   

-- Table variable for db file details   
DECLARE @sysfiles TABLE   
    name nvarchar(max),   
    physical_name nvarchar(max),   
    database_id bigint,   
    type_desc nvarchar(max)   
INSERT INTO @sysfiles (name,physical_name,database_id,type_desc) 
SELECT [name], [physical_name],[database_id],type_desc 
FROM sys.master_files   
db_name (database_id) NOT IN ('master','model','msdb','tempdb')   
AND DATABASEPROPERTYEX(db_name (database_id), 'Status')='ONLINE'
and type_desc<> 'FULLTEXT'
AND (db_name  (database_id) =@single_database 
OR (physical_name like 'C:%' and @single_database='') )  
-- Start CURSOR to iterate through database ids   
SELECT [database_id], db_name(database_id) FROM @sysfiles GROUP BY [database_id]  ORDER BY db_name(database_id) 
OPEN db_cursor      
FETCH NEXT FROM db_cursor INTO @dbid     , @db_name 
    -- Force Disconnect Active Connections to Database   
    SET @cmd = 'ALTER DATABASE ' +  quotename(@db_name) + ' SET SINGLE_USER WITH ROLLBACK IMMEDIATE'   
    PRINT @cmd  
    IF @NoExecute=0 EXECUTE sp_executesql @cmd   
    SET @cmd = 'ALTER DATABASE ' +  quotename(@db_name) + ' SET OFFLINE WITH ROLLBACK IMMEDIATE'   
    PRINT @cmd      
    IF @NoExecute=0 EXECUTE sp_executesql @cmd   
     -- Move Files to a new location
    DECLARE file_cursor CURSOR FOR SELECT name,physical_name, type_desc
    FROM @sysfiles 
    WHERE database_id=@dbid
    OPEN file_cursor      
    FETCH NEXT FROM file_cursor INTO @logical_name, @physical_name, @type_desc
        SET @file_name=RIGHT(@physical_name, CHARINDEX('\', REVERSE(@physical_name))-1)
        --Move File and ALTER DATABASE DEFINTION
        IF @type_desc = 'ROWS'
            set @cmdExec ='move ' + '"'+@physical_name+'"' + ' ' + '"'+@path + @file_name + '"'
            SET @cmd = 'ALTER DATABASE '+ quotename(@db_name) +' MODIFY FILE ( NAME = '+quotename(@logical_name)+', FILENAME = '''+@path + @file_name + ''' )'   
                set @cmdExec ='move ' + '"'+@physical_name+'"' + ' ' + '"'+@log_path + @file_name + '"'
                SET @cmd = 'ALTER DATABASE ['+ @db_name +'] MODIFY FILE ( NAME = '''+@logical_name+''', FILENAME = '''+ @log_path + @file_name + ''' )'   

        -- xp_cmdshell command to move file
        SET @cmdExec = 'exec xp_cmdshell ''' + @cmdExec+ ''',no_output' 
        PRINT @cmdExec
        PRINT @cmd
           IF @NoExecute=0 EXECUTE sp_executesql @cmdExec

        --Execute ALTER DATABASE
        IF @NoExecute=0 EXECUTE sp_executesql @cmd 

        FETCH NEXT FROM file_cursor INTO @logical_name, @physical_name, @type_desc
    CLOSE file_cursor
    DEALLOCATE file_cursor
    SET @cmd = 'ALTER DATABASE ' +  quotename(@db_name) + ' SET MULTI_USER WITH ROLLBACK IMMEDIATE'  
    PRINT @cmd
    IF @NoExecute=0 EXECUTE sp_executesql @cmd 

    SET @cmd = 'ALTER DATABASE ' +  quotename(@db_name) + ' SET ONLINE WITH ROLLBACK IMMEDIATE'   
    PRINT @cmd
    IF @NoExecute=0 EXECUTE sp_executesql @cmd 

    FETCH NEXT FROM db_cursor INTO @dbid  , @db_name    

CLOSE db_cursor      
DEALLOCATE db_cursor    

IF @enableCMDPermanently = 0   
    EXEC master.dbo.sp_configure 'show advanced options', 1   
    EXEC master.dbo.sp_configure 'xp_cmdshell', 0 --disable CMD   

Page List

Page List