SQL Centre of Excellence

I’m a huge fan of RML utilities for SQL DB Engine workload analysis. Its a fantastic tool fro being able to capture a trace and point  a finger at offending users, applications, batches, queries and even see plans and resource usage.

You can download RML from this link


One issue is that the Report viewer hyperlinks don't work out of the box on Windows 8.1. A Patch is needed for the SSRS 2012 Report viewer component which you can download here.


The quality of RML utilities is one of the biggest reasons that our SQL consultants (including myself) are slow to adopt extended events in a mainstream fashion. While extended events can have great instrumentation they still lack the presentation layer and analytics that RML brings. Who knows, maybe we can convince CSS to write an extended  events version of RML utilities – that would change the scene over night!

Its no secret that optimising distinct count in SSAS/MOLAP is painful. the normal optimisation stuff is covered in various papers such as:

- INI file settings to increase PageSize

- Separate Measure Groups

- Trying to partition by the distinct count item so that each partitions min and max values don't cross over.

We recently did all this for a customer but noticed one annoying thing. When we fired up SQL Profiler and ran a simple query we are flooding with “Progress Report End” noise, hitting the same partition.


In the case of one customer this generated over a million profiler events even though they only had 150 or so partitions. On the blown up adventure works cube with 150 million rows this generates some 1,500 events when there are only four partitions!

I chatted with Microsoft CSS and Alex Whittles who has done some interesting benchmarks on Distinct Count (http://www.purplefrogsystems.com/blog/2014/03/analysis-services-tabular-or-multidimensional-a-performance-comparison/)

The reason for this seeming annoying flood is that the Distinct Count outputs a progress report message PER SEGMENT and it pretty much has to scan all segments.  I’m cool with it having to scan all the segments (this is part of the challenge of the algorithm), but does it “really” have to give a message to profiler every 64k or so…

So far feedback from MS is that this behaviour is by design. I can’t help but think that such verbose instrumentation must hurt performance somewhere.

If you do notice this. Don’t panic – its by design. of course there are many funky ways to avoid a physical distinct count but that’s another blog!


A filed a connect item to ask if the product team can tone down the verbosity on events for Distinct Count. Please up vote if you come across this.


Bitmap Indexes (*.map files) play an important role within Analysis Services. They provide a mechanism for the storage engine sub cube event to efficiently locate the relevant segments within the fact file without having to scan the whole thing. This is how you can have a 20 GB *.fact file in the partition but still get storage engine events of only 50ms or so.

However, it can be a best practise to turn off Bitmap Indexes in a few scenarios:

a) Where the attribute is not used much.

b) Where the attribute is really just a related property of an existing attribute

c) where the cardinality of the attribute is pretty much the same as the key

Further reading on this is below



One word of warning is when you turn off/on the bitmap indexes by adjusting the AttributeHierarchyOptimized property this may not actually have the desired effect.

The key thing we need to be aware of is that this AttributeHierarchyOptimized Attribute can be set in TWO places and just changing it in the shared dimension may not actually change it for your cube:

  • Once in the shared dimension
  • Once in the cube dimension

1) Changing AttributeHierarchyOptimized on the shared dimension



2) Changing AttributeHierarchyOptimized on the a cube dimension



This caused havoc on a recent tuning project where we turned off too many bitmap indexes in the normal dimension editor and then didn’t realise that when we turned them back on in the same place they were now stuck in the cube in the “off” state.

Now this behaviour is pretty much by design. The intention is that turning off bitmap indexes in the shared dimension will turn them off for all cubes, but if you leave them on you are free to turn them off on specific cubes. however it can lead to some confusion!

Speaking at #SqlBits

by Bob Duffy 17. April 2014 08:20

Just got the emails and I’ve been selected to speak at SqlBits on the 18th and 10th of July. Looking forward to Europe's largest SQL event!


My Sessions are:

  • Friday - Optimising Cube Processing
  • Saturday - Migrating to the Cloud

In addition I am making a special guest appearance in Carmel's Session as “data monkey”:

  • Saturday – The Irish Economic Crisis Visualised with Power BI

Now to think up some funky ideas for the steam punk costumes.

Cube Processing Deep Dive

by Bob Duffy 10. April 2014 09:58

Thanks to everyone who came to the Dublin SQL UG last night!

here's a link for the slides from the session


Best Regards,


If you are coming the Dublin SQL Users Group tonight, we have a special guest. One of the leading author’s and world class expert’s on Tabular Models and Analysis Services – Alberto Ferrari.

We are going to finish at about 20:00 and Alberto will help answer any questions on DAX, tabular models , etc

we also have some books as prizes at the event:

  • SQL Server 2012 Analysis Services – The BISM Tabular Model
  • Expert Cube Development with SSAS Multidimensional models

Alberto is over in Dublin teaching our courseon the Tabular Model. He is back again on the 4th June to run a 3 day advanced DAX hands on workshop.

if you are working with DAX, this will be a great chance to get deeper into DAX, you can read more about the upcoming DAX course below:




The link for tonight's user group is below. There are over 120 people registered already !


There are lots of good blogs on how to optimise the performance of a tabular model by looking at where space is going to. The alternative approach we will discuss here is to look at the trace file to determine where time has been spent on processing. This approach is very valuable for the MOLAP model, so can we do the same for the tabular model.


The diagram below shows events that can be viewed in SQL profiler when you examine the “Progress Report End” event during model processing.


By importing the trace file into a SQL table and then building a PowerPivot Excel workbook on top we can see where time has been spent.

On my blown up adventure works example which takes about 20 minutes to process I can see that time quite evenly spent on the two main fact tables which are about 100 million rows a piece.


I can also see which columns took the most time to compress and break this down by table.



Limitations with Tabular Trace Files

Unfortunately we don’t get as much value out of the trace file as we do with the MOLAP trace file. There are number of issues/bugs:

1. The ExecuteSQL complete message is recorded in the trace file as soon as the first row is returned which makes it not very useful in estimating how long a query took to run. You would need to use a DBEngine profiler to examine efficiency of the underlying queries. I hope that the product team can fix this as it would be really cool to be able to determine how long the query took to run by looking at the trace file.

2.   A lot of the events are “blocking”, so total duration is pretty much the same for the different types of event (ReadData, vertipaq, Process). As a result we can see where time was spent at a high level but not what it was spent on.


A Better Approach

While the trace file approach is “ok” for baselining processing time, you are probably better using Kasper De Jonges space tool to determine which tables and columns need optimising – this will naturally reduce processing time




Getting Started

If you want to play around with the PowerPivot model that reads in the trace file, my sample excel workbook is below



1. Capture a trace file with at least the “Progress Report End” event while processing a tabular model.

2. Import the model into a SQL Server Data Table

3. Update the PowerPivot connection to point to the data table

4. Refresh the PowerPivot Model.

You are given a cube you have never seen before and need to make it process faster – where do you start ? Do we add indexes to the data warehouse, add faster disks, delete some aggregations, add more memory, or start messing the the “INI” files ? I would say none of the above!

While there are lots of resources on the internet with tips for improving processing time, the best place to start is with a “baseline”. E.g. we need to understand how long the Analysis Services database takes to process and more importantly where time is being spent and why. Once we understand this we can focus in on a specific area and re-baseline to see improvements in that area.

The best way to baseline is to capture a profiler trace – we actually only need a single event. The “Progress Report End”. I’ll hopefully get to post the procedure for using XMLA to automate a server side trace, but for the moment lets assume you have a trace file and want to visually analyse it.

This blog shows some visualisations and data generated from an excel tool I wrote to help analyse the trace data. Feel free to use and abuse the excel workbook which is attached to the end of the blog.

Understanding Analysis Services Trace events and sub events

The chart below shows the possible “event subclass” messages that profiler will generate and this helps tell the story of where time is being spent (and where we should look to optimise)



You can see how this corresponds to an actual trace file which can look daunting at first



Enter the Excel Tool for Parsing SSAS Trace Files

I have attached an excel PowerPivot workbook I created to help analyse profiler traces for processing. An example below is based on a 100 million row blown up adventure works cube I created.

The table below shows that it took 575 seconds or 0.16 of an hour to do the ProcessFull



The table below shows us that the product dimension did take some 35 seconds. There is probably a lot of room for tuning there, but initially 35 seconds does not seem like a lot of time compared to the overall 575 seconds processing time (well find out later that the product dimension does actually kill processing performance of measure groups because of bitmap indexes)


The table below shows that the internet sales and reseller sales are where the time is being spent. For the cube to process faster we need to look first!



The two charts below is where it starts to get interesting. We need to do is to determine at what event in the processing time is being spent and we can then look to find out why.

a) Most of the time is spent in the “Read Data” event, but the duration for “ExecuteSQL” is really small. This means that SQL Server DBEngine is returning the data really fast, but Analysis Services is struggling to convert it into the requires format.

b) 46% of the time is spent in building Bitmap Index's for attributes, most of which are never going to be used to slice data (e.g. Phone number or email address). For some of my customers bitmap indexes can creep up to over 80% of processing time – a sure sign that we need to optimise the dimension attributes.


The “Detailed Trace” sheets shows statistics for each event for the objects in the cube so we can drill down to see the main offenders. From the below we can see that one partition spent 28 seconds on ReadData and a whopping 172 seconds on the bitmap indexes.



Comparing ReadData and ExecuteSQL

Where ReadData is significantly greater than ExecuteSQL you will always see a corresponding wait statistics of ASYNC_NETWORK_IO on the DBEngine. Essentially there is not much  point in making the DBEngine query faster as analysis Services is too slow to consume it.

Why would analysis services be too slow to consume the data from sql server. Well the most common culprits are (IMO):

  1. Incorrect Data Types resulting in implicit conversion which is very slow in Analysis Services. This is discussed by Henk and Dirk s in this blog (smart guys!)http://henkvandervalk.com/how-to-process-a-ssas-molap-cube-as-fast-as-possible-part-2
  2. Huge aggregation difference between the DBEngine fact data and the MOLAP cube data. You can try fixing this by a group by query, faster CPU, memory, or considering a fact table in the data warehouse at higher grain.
  3. Very high grain keys on dimensions or string keys which are slower to map data to.
  4. In very rare cases Analysis Services cannot write data fast enough so has to slow down reads.
  5. Maybe the data warehouse is servicing data from the buffer pool and is very quick.

What if ExecuteSQL is really high

If ExecuteSQL is really high then we have a problem with DBEngine side of the house:

a) Are we using proper physical tables for facts or nasty views with joins (most common issue by far)

b) Do we need indexes

c) Is the data warehouse table too wide

d) Is the data warehouse table optimised for sequential read. Eg are we getting that magic 512k read ahead IO, or something much smaller and with less throughput.

What if BuildIndex is High

This is one of the most common issues we face and can be tough to solve. The basic problem is that MOLAP will create a bitmap index for every single attribute in a dimension by default. We need to optimise these by:

a) not storing too many attributes in the cube – use the data warehouse for data dumps!

b) Turn off bitmap indexes by using the AttributeHierarchyOptimized Property

c) Turn off hierarchies for attributes that are not used

d) Ensure attribute have narrow keys (not strings or compound)

e) Don’t use bitmap indexes on attributes which are almost the same grain as the key.

f) Use attribute relationships as much as possible

g) Avoid large dimensions like the plague. Sure we may have a lot of customers, but having a dimension for the transaction grain is usually a big no no. This sort of model is not really suited to MOLAP.

h) if you do have huge dimensions and lots of attributes be aware that a bitmap index is created in EVERY partition, so partitioning you fact table by day and then doing a Process Full is going to really hurt size and process time wise.

What if Aggregate is High

This means that we may have too many (or too big) aggregations on a measure group. The best practise is to try and avoid say more than 20 aggregations on a measure group.


Locating Expensive Attributes

If you use the “cube size” workbook you can see which attributes are taking up the most space on disk and these are the ones that you should see if they can be optimised


If you are in a hurry then you can just look at the files in a partition folder on disk.  The “Product Alternative Key” below from Adventure works is a classic example. Across all the partitions this chews up some 177 objects and 327 MB by itself. The grain is almost the same as the product key so we could :

a) Turn off the bitmap index

b) Use an integer key

c) Turn it into a reporting property rather than a hierarchy.



So what Difference did tuning the bitmap indexes/Attributes make?

I turned off the bitmap indexes attributes that were not needed on the customer and product dimensions and here is the improvement in cube size and Process Index time for my blown up Adventure Works.

We achieved a 42% reduction in ProcessIndex time on the basic Adventure works. For some of my customers this is  5-10x improvement in cube processing time!


Useful Links for tuning MOLAP Processing

Here are some of my favourite links on tuning cube processing:

SQL Server 2008 White Paper: Analysis Services Performance Guide


The basics of faster fact processing


Henks tech blog



Analysis Services Processing Best Practises


Download Link for the Excel Workbook for Analysing Profiler Trace Files

below is a link to download the PowerPivot model for analysing processing trace files.


a) Capture Trace file while database is processing

b) Import Trace file into a SQL Server Data Table

c) Set The Variables on the first page of the excel workbook: Server, Database and Cube

d) Update the Worksheet “Objects” by clicking the button to run the VBA macro. This is some VBA code that relates partitions to measure groups and attributes to dimensions as the trace file doesn't contain this linkage and we need it to be able to “drill down” on performance detail.

e) Update the PowerPivot data table “Trace”. By default it uses a local database called “SSAS_Processing” and a table called “Trace”, but feel free to change the connection in PowerPivot.

AW_Processing_1.0.xlsm AW_Processing_1.0.xlsm

Want to leant more on cube Processing and Tuning

Come to one of my sessions on cube processing !


I was messing around while at #sqlsat275 with Mark Stacy (blog) and Glenn Berry (blog)with a goal to benchmark some Azure images to see if different VM’s had different performance profiles CPU and memory speed in the Amsterdam data centre.

We used geekbench which is available here:


The initial conclusion is that while there was not much variation on processor performance between the A2-A7 images there was a lot of variation in the memory speed between  older and newer images.

One the memory benchmark an older A4 image which I have had for a 4+ months only achieved a score of 697 whereas new images consistently scored 996 in the the Amsterdam centre. While the overall geekbench score was only 4% faster, the memory performance was Some 30% faster !! Pretty important for workloads like SSAS.



So if you have any older images or you get a slower one and you care about memory speed you could I guess just provision another image until you get the one you want Winking smile

How do I know if I have the faster memory?

The quickest way to test is to look at the clock speed of the CPU in either task manager or the computer properties. if it says “2.10” you are on the newer platform. if it says “2.09” you are on the older hardware. I don’t know if the extra “.01” on the processor is actual clock speed differences or just something Azure does so that they can internally distinguish between newer hardware (I suspect the latter).


Unfortunately I haven't figured out a way to get the memory clock speed in Azure (yet). tools like CPU-Z don’t report this in virtual environments, but geekbench can help here. Any suggestions welcome.

Is Dublin any faster than Amsterdam?

I couldn’t resist a quick test of a Dublin VM to compare it on geekbench to an Amsterdam VM. The CPU benchmark scores were almost identical, but the memory very slightly faster again (like 2% faster).



Now don't get me started on disk performance!! Quite a hot topic in the SQL community at the moment. It does seem like Azure has some work to do before a lot of SQL professionals start to take it seriously IO wise.

Don't forget that soon the A9 images may become available for IaaS which should help memory performance no end. Hmmm 1600 memory clock speed…yum.


A lot has changed since the last time i spoke in Copenhagen on Azure and the Cloud: prices have come down, features have popped up all over the place and images have got bigger and faster. There is no doubt that the value proposition for moving to the cloud is getting stronger as time goes on.

If you can make it to my session at ‘'#sqlsat275 I hope you learn something new about what's involved in moving to the cloud, what sort of options you have and how it affects performance.

You can download my slides for the session on “Migrating to Azure and the Cloud” Here



Page List

Page List