SQL Centre of Excellence

Cube Processing Deep Dive

by Bob Duffy 10. April 2014 09:58

Thanks to everyone who came to the Dublin SQL UG last night!

here's a link for the slides from the session

https://skydrive.live.com/redir?page=view&resid=A24CC88336E6B2AB!16565&authkey=!AGqooaaRdMJe0qo

Best Regards,

Bob

If you are coming the Dublin SQL Users Group tonight, we have a special guest. One of the leading author’s and world class expert’s on Tabular Models and Analysis Services – Alberto Ferrari.

We are going to finish at about 20:00 and Alberto will help answer any questions on DAX, tabular models , etc

we also have some books as prizes at the event:

  • SQL Server 2012 Analysis Services – The BISM Tabular Model
  • Expert Cube Development with SSAS Multidimensional models

Alberto is over in Dublin teaching our courseon the Tabular Model. He is back again on the 4th June to run a 3 day advanced DAX hands on workshop.

if you are working with DAX, this will be a great chance to get deeper into DAX, you can read more about the upcoming DAX course below:

http://www.prodata.ie/events/2014/Advanced_DAX_Workshop.aspx

 

 

The link for tonight's user group is below. There are over 120 people registered already !

http://www.eventbrite.ie/e/business-intelligence-datawarehousing-cube-processing-deep-dive-tickets-9549674333?aff=eorg

There are lots of good blogs on how to optimise the performance of a tabular model by looking at where space is going to. The alternative approach we will discuss here is to look at the trace file to determine where time has been spent on processing. This approach is very valuable for the MOLAP model, so can we do the same for the tabular model.

http://blogs.prodata.ie/post/Tool-for-baselining-MOLAP-Processing.aspx

The diagram below shows events that can be viewed in SQL profiler when you examine the “Progress Report End” event during model processing.

image

By importing the trace file into a SQL table and then building a PowerPivot Excel workbook on top we can see where time has been spent.

On my blown up adventure works example which takes about 20 minutes to process I can see that time quite evenly spent on the two main fact tables which are about 100 million rows a piece.

image

I can also see which columns took the most time to compress and break this down by table.

image

 

Limitations with Tabular Trace Files

Unfortunately we don’t get as much value out of the trace file as we do with the MOLAP trace file. There are number of issues/bugs:

1. The ExecuteSQL complete message is recorded in the trace file as soon as the first row is returned which makes it not very useful in estimating how long a query took to run. You would need to use a DBEngine profiler to examine efficiency of the underlying queries. I hope that the product team can fix this as it would be really cool to be able to determine how long the query took to run by looking at the trace file.

2.   A lot of the events are “blocking”, so total duration is pretty much the same for the different types of event (ReadData, vertipaq, Process). As a result we can see where time was spent at a high level but not what it was spent on.

 

A Better Approach

While the trace file approach is “ok” for baselining processing time, you are probably better using Kasper De Jonges space tool to determine which tables and columns need optimising – this will naturally reduce processing time

http://www.powerpivotblog.nl/what-is-using-all-that-memory-on-my-analysis-server-instance/

image

 

Getting Started

If you want to play around with the PowerPivot model that reads in the trace file, my sample excel workbook is below

https://skydrive.live.com/redir?page=view&resid=A24CC88336E6B2AB!16777&authkey=!AG8Kzei2taPJtCw

instructions

1. Capture a trace file with at least the “Progress Report End” event while processing a tabular model.

2. Import the model into a SQL Server Data Table

3. Update the PowerPivot connection to point to the data table

4. Refresh the PowerPivot Model.

You are given a cube you have never seen before and need to make it process faster – where do you start ? Do we add indexes to the data warehouse, add faster disks, delete some aggregations, add more memory, or start messing the the “INI” files ? I would say none of the above!

While there are lots of resources on the internet with tips for improving processing time, the best place to start is with a “baseline”. E.g. we need to understand how long the Analysis Services database takes to process and more importantly where time is being spent and why. Once we understand this we can focus in on a specific area and re-baseline to see improvements in that area.

The best way to baseline is to capture a profiler trace – we actually only need a single event. The “Progress Report End”. I’ll hopefully get to post the procedure for using XMLA to automate a server side trace, but for the moment lets assume you have a trace file and want to visually analyse it.

This blog shows some visualisations and data generated from an excel tool I wrote to help analyse the trace data. Feel free to use and abuse the excel workbook which is attached to the end of the blog.

Understanding Analysis Services Trace events and sub events

The chart below shows the possible “event subclass” messages that profiler will generate and this helps tell the story of where time is being spent (and where we should look to optimise)

image

 

You can see how this corresponds to an actual trace file which can look daunting at first

image

 

Enter the Excel Tool for Parsing SSAS Trace Files

I have attached an excel PowerPivot workbook I created to help analyse profiler traces for processing. An example below is based on a 100 million row blown up adventure works cube I created.

The table below shows that it took 575 seconds or 0.16 of an hour to do the ProcessFull

 image

 

The table below shows us that the product dimension did take some 35 seconds. There is probably a lot of room for tuning there, but initially 35 seconds does not seem like a lot of time compared to the overall 575 seconds processing time (well find out later that the product dimension does actually kill processing performance of measure groups because of bitmap indexes)

image

The table below shows that the internet sales and reseller sales are where the time is being spent. For the cube to process faster we need to look first!

image

 

The two charts below is where it starts to get interesting. We need to do is to determine at what event in the processing time is being spent and we can then look to find out why.

a) Most of the time is spent in the “Read Data” event, but the duration for “ExecuteSQL” is really small. This means that SQL Server DBEngine is returning the data really fast, but Analysis Services is struggling to convert it into the requires format.

b) 46% of the time is spent in building Bitmap Index's for attributes, most of which are never going to be used to slice data (e.g. Phone number or email address). For some of my customers bitmap indexes can creep up to over 80% of processing time – a sure sign that we need to optimise the dimension attributes.

imageimage

The “Detailed Trace” sheets shows statistics for each event for the objects in the cube so we can drill down to see the main offenders. From the below we can see that one partition spent 28 seconds on ReadData and a whopping 172 seconds on the bitmap indexes.

image

 

Comparing ReadData and ExecuteSQL

Where ReadData is significantly greater than ExecuteSQL you will always see a corresponding wait statistics of ASYNC_NETWORK_IO on the DBEngine. Essentially there is not much  point in making the DBEngine query faster as analysis Services is too slow to consume it.

Why would analysis services be too slow to consume the data from sql server. Well the most common culprits are (IMO):

  1. Incorrect Data Types resulting in implicit conversion which is very slow in Analysis Services. This is discussed by Henk and Dirk s in this blog (smart guys!)http://henkvandervalk.com/how-to-process-a-ssas-molap-cube-as-fast-as-possible-part-2
  2. Huge aggregation difference between the DBEngine fact data and the MOLAP cube data. You can try fixing this by a group by query, faster CPU, memory, or considering a fact table in the data warehouse at higher grain.
  3. Very high grain keys on dimensions or string keys which are slower to map data to.
  4. In very rare cases Analysis Services cannot write data fast enough so has to slow down reads.
  5. Maybe the data warehouse is servicing data from the buffer pool and is very quick.

What if ExecuteSQL is really high

If ExecuteSQL is really high then we have a problem with DBEngine side of the house:

a) Are we using proper physical tables for facts or nasty views with joins (most common issue by far)

b) Do we need indexes

c) Is the data warehouse table too wide

d) Is the data warehouse table optimised for sequential read. Eg are we getting that magic 512k read ahead IO, or something much smaller and with less throughput.

What if BuildIndex is High

This is one of the most common issues we face and can be tough to solve. The basic problem is that MOLAP will create a bitmap index for every single attribute in a dimension by default. We need to optimise these by:

a) not storing too many attributes in the cube – use the data warehouse for data dumps!

b) Turn off bitmap indexes by using the AttributeHierarchyOptimized Property

c) Turn off hierarchies for attributes that are not used

d) Ensure attribute have narrow keys (not strings or compound)

e) Don’t use bitmap indexes on attributes which are almost the same grain as the key.

f) Use attribute relationships as much as possible

g) Avoid large dimensions like the plague. Sure we may have a lot of customers, but having a dimension for the transaction grain is usually a big no no. This sort of model is not really suited to MOLAP.

h) if you do have huge dimensions and lots of attributes be aware that a bitmap index is created in EVERY partition, so partitioning you fact table by day and then doing a Process Full is going to really hurt size and process time wise.

What if Aggregate is High

This means that we may have too many (or too big) aggregations on a measure group. The best practise is to try and avoid say more than 20 aggregations on a measure group.

 

Locating Expensive Attributes

If you use the “cube size” workbook you can see which attributes are taking up the most space on disk and these are the ones that you should see if they can be optimised

http://blogs.prodata.ie/post/Disk-Space-Analyser-for-Analysis-Services-(MOLAP).aspx

If you are in a hurry then you can just look at the files in a partition folder on disk.  The “Product Alternative Key” below from Adventure works is a classic example. Across all the partitions this chews up some 177 objects and 327 MB by itself. The grain is almost the same as the product key so we could :

a) Turn off the bitmap index

b) Use an integer key

c) Turn it into a reporting property rather than a hierarchy.

Capture

 

So what Difference did tuning the bitmap indexes/Attributes make?

I turned off the bitmap indexes attributes that were not needed on the customer and product dimensions and here is the improvement in cube size and Process Index time for my blown up Adventure Works.

We achieved a 42% reduction in ProcessIndex time on the basic Adventure works. For some of my customers this is  5-10x improvement in cube processing time!

image

Useful Links for tuning MOLAP Processing

Here are some of my favourite links on tuning cube processing:

SQL Server 2008 White Paper: Analysis Services Performance Guide

http://www.microsoft.com/en-us/download/details.aspx?id=17303

The basics of faster fact processing

http://blogs.prodata.ie/post/The-basics-of-faster-fact-processing-in-Analysis-Services.aspx

Henks tech blog

http://henkvandervalk.com/how-to-process-a-ssas-molap-cube-as-fast-as-possible-part-1

http://henkvandervalk.com/how-to-process-a-ssas-molap-cube-as-fast-as-possible-part-2

Analysis Services Processing Best Practises

http://technet.microsoft.com/en-us/library/cc966525.aspx

Download Link for the Excel Workbook for Analysing Profiler Trace Files

below is a link to download the PowerPivot model for analysing processing trace files.

Instructions

a) Capture Trace file while database is processing

b) Import Trace file into a SQL Server Data Table

c) Set The Variables on the first page of the excel workbook: Server, Database and Cube

d) Update the Worksheet “Objects” by clicking the button to run the VBA macro. This is some VBA code that relates partitions to measure groups and attributes to dimensions as the trace file doesn't contain this linkage and we need it to be able to “drill down” on performance detail.

e) Update the PowerPivot data table “Trace”. By default it uses a local database called “SSAS_Processing” and a table called “Trace”, but feel free to change the connection in PowerPivot.

AW_Processing_1.0.xlsm AW_Processing_1.0.xlsm

Want to leant more on cube Processing and Tuning

Come to one of my sessions on cube processing !

https://www.eventbrite.ie/e/business-intelligence-datawarehousing-cube-processing-deep-dive-tickets-9549674333

I was messing around while at #sqlsat275 with Mark Stacy (blog) and Glenn Berry (blog)with a goal to benchmark some Azure images to see if different VM’s had different performance profiles CPU and memory speed in the Amsterdam data centre.

We used geekbench which is available here:

http://www.primatelabs.com/geekbench/

The initial conclusion is that while there was not much variation on processor performance between the A2-A7 images there was a lot of variation in the memory speed between  older and newer images.

One the memory benchmark an older A4 image which I have had for a 4+ months only achieved a score of 697 whereas new images consistently scored 996 in the the Amsterdam centre. While the overall geekbench score was only 4% faster, the memory performance was Some 30% faster !! Pretty important for workloads like SSAS.

image

 

So if you have any older images or you get a slower one and you care about memory speed you could I guess just provision another image until you get the one you want Winking smile

How do I know if I have the faster memory?

The quickest way to test is to look at the clock speed of the CPU in either task manager or the computer properties. if it says “2.10” you are on the newer platform. if it says “2.09” you are on the older hardware. I don’t know if the extra “.01” on the processor is actual clock speed differences or just something Azure does so that they can internally distinguish between newer hardware (I suspect the latter).

image

Unfortunately I haven't figured out a way to get the memory clock speed in Azure (yet). tools like CPU-Z don’t report this in virtual environments, but geekbench can help here. Any suggestions welcome.

Is Dublin any faster than Amsterdam?

I couldn’t resist a quick test of a Dublin VM to compare it on geekbench to an Amsterdam VM. The CPU benchmark scores were almost identical, but the memory very slightly faster again (like 2% faster).

image

 

Now don't get me started on disk performance!! Quite a hot topic in the SQL community at the moment. It does seem like Azure has some work to do before a lot of SQL professionals start to take it seriously IO wise.

Don't forget that soon the A9 images may become available for IaaS which should help memory performance no end. Hmmm 1600 memory clock speed…yum.

http://blogs.prodata.ie/post/Hope-for-running-SSAS-in-the-cloud-with-e2809cBig-Computee2809d!.aspx

A lot has changed since the last time i spoke in Copenhagen on Azure and the Cloud: prices have come down, features have popped up all over the place and images have got bigger and faster. There is no doubt that the value proposition for moving to the cloud is getting stronger as time goes on.

If you can make it to my session at ‘'#sqlsat275 I hope you learn something new about what's involved in moving to the cloud, what sort of options you have and how it affects performance.

You can download my slides for the session on “Migrating to Azure and the Cloud” Here

http://www.sqlsaturday.com/viewsession.aspx?sat=275&sessionid=18201

Thanks!

One early confusion with Power BI licensing was whether you needed to pay for the Office 365 E3 licence at €20.00 a month and also the Power BI license at €32.70 ?

Well it does look like you can just buy the Power BI license and this does actually include a full license to use SharePoint online to store documents, so if you have also paid for the E3 license you are paying twice for the same thing WRT SharePoint online.

 

image

Of course one major caveat – presently some of the Power BI features like Q+A synonyms are mainly available if you have Office 365 as opposed to office 2013, so there is a “better together story” for the moment.

You can’t start learning about DAX and Tabular models within SSAS without coming across Marco Russo and Alberto Ferrari who run SQLBI in Italy. They co –authored with Chris Webb the leading book on Tabular Model Design.

The good news is we have partnered with SQLBI to bring two courses to Dublin with SQLBI.

On the 7th-8th April a 2 Day Tabular Workshop

http://www.prodata.ie/events/2014/SSAS_TabularWorkshop.aspx

and

On the 4th-6th June a 3 day Advanced DAX Workshop

http://www.prodata.ie/events/2014/Advanced_DAX_Workshop.aspx

I’m super excited about these. DAX is a must know language for anyone working with PowerPivot or the Tabular model, and who better to learn it from than the worlds best experts here in Dublin!

Hope to see you there.

Often we have advised customers wanting performance out the their Analysis Services engines, be that MOLAP, Tabular or PowerPivot, to avoid the public cloud offerings by Microsoft and in some cases even avoid virtualisation on premise.

What - Why shouldn’t we virtualise Analysis Services?

The issue is not so much the virtualisation products like Azure, Hyper-V or VMware. All  these products have no problem running Analysis Services virtualised with little overhead compare to a physical server. The days of consultants and naysayers saying the “virtualisation” is the bottleneck are largely gone or moved to extreme scale cases.

The real problem stems from the choice of hardware for the host server. This is a gross over simplification, but basically we can buy two different types of servers:

1. Very fast PC/servers with a small number of cores and excellent single threaded and memory performance. Think your one socket almost 4Ghz beast you play games on with SSDs and 1,800+ FSB. Maybe something like Aarons Bertrand's new toy http://www.sqlperformance.com/2014/01/system-configuration/justifying-the-new-mac-pro
mac_pro_a

2. Servers with high capacity, lots of cores and much slower CPU and memory performance. Think your four NUMA node, 1.9 Ghz  80 core virtualisation host with 800 FSB ram speed and ahem “Power Saving” features to slow the CPU down some more!

Guess which type of server people typically buy for virtualisation projects and guess which type suits the CPU and memory intensive workloads of Analysis Services?

I discuss hardware selection for Analysis Services more in this blog here:

http://blogs.prodata.ie/post/Selecting-Hardware-for-Analysis-Services-(10GB-1TB-size).aspx

Some of my customers have fixed this problem by deploying “high compute” pods within their VMware infrastructure specially suited to the single threaded sensitivity of Analysis Services, but sadly the Microsoft Azure IaaS offerings have very much historically been “one size” fits clock from a compute performance perspective.(ram and core counts do vary).

Just to be clear there is nothing stopping you from virtualising SSAS workloads right now and I’m sure some people have and are quite happy with the “adequate” performance they get. However performance is often an implicitly stated requirement and customers may not be happy when the “production” solution runs slower than say my 1,000 euro notebook or my old desktop.

So what is Changing, Enter “Big Compute”

After initially tackling the broader workload of web applications and OLTP, Microsoft is now starting to look deeply at analytical workloads in the cloud, both with its platform offerings and also by starting to provide VM servers aimed at high compute workloads.

http://msdn.microsoft.com/library/windowsazure/dn594431.aspx

What does “Big Compute” actually mean. Well something like this:

  • Intel Xeon E5-2670 2.6 Ghz CPU
  • 16 cores
  • 112GB DDR3-1600 Mhz RAM
  • 40 Gbps back end connectivity (wow!)

Some limitations

  • This is only available in EU West (Amsterdam).
  • This is only available for PaaS, it is not available as a IaaS image for you to install SQL Server.

Obviously I am hoping that these limitations lift eventually so we can put Tabular models and SSAS in the cloud without the embarrassing massive drop in performance when compared to our laptops..

Call to action – ask Microsoft to:

  • Offer “small compute” images. I want an 4/8 core VM with 1,600 FSB and 2.8+ Ghz CPU with 64-128GB ram.
  • Offer the “Big Compute” images for IaaS and customers with analytics workloads on SSAS.  Big compute is not just for HPC nerds guys!!

This is just a short guide or Q & A to help speakers whop are submitting for SQL Saturday.

http://www.sqlsaturday.com/310/eventhome.aspx

Often we have had many more speakers and topics suggested than we can accommodate, so we have to select based on the number of tracks and what we think will appeal to the Irish technical community at the time,

How are Sessions and Speakers Selected?

We plan to use a minimum of five people from the Irish SQL Community to vote on speakers and topics. This is used to narrow the list. Once the list is narrowed down we review the list to make sure that we have evenly covered the tracks as best we can. 

  • Some consideration is often given to speakers who are popular in Ireland as they have spoken before and scored in the top quartile at previous events. The community will recognise there names and want to hear them again.
  • We also have an allowance for speakers who “nearly” made it into a previous event but had to be bumped due to numbers to to balance the agenda. In this case we will try and rotate so that they say get accepted every other year and do not always “just miss”.
  • Where possible we will give local speakers a fair chance, as its a SQL Saturday/PASS remit to grow local speaking talent in the community. In Ireland there are only a few local speakers anyway, so often we ask really new speakers to consider lightning talks or suggest they do a local UG first. SQL Saturday Dublin may not be the right event for your first public speaking experience, as its a large and technically excellent audience which can be intimidating.
  • As its a Microsoft technology event we love having Microsoft FT and product team speakers over.
  • We have to consider the tracks. In previous years for example we may have only a few developer focused sessions. So if you submitted a session called “The Entity Framework Deep Dive for developers” it doesn't matter who you are – you would have a high probability  of being selected.

What Level Should I pitch the talk at

Dublin has a “lot” of senior data professionals (5+ years experience). With this in mind I would suggest:

  • If you are presenting on technology that most people know the basics then people may expect an intermediate or advanced content. For example TSQL, indexing, mirroring, replication, indexing. Focus on deep dives, troubleshooting and expect people to know the basics.
  • If you are presenting on an emerging technology. For example: Extended Events, Azure, Sql 2014 features. PDW, hadoop. Then a lower level may be appropriate. Saying that “Extended Events Deep Dive” may be a crowd puller too.
  • The session entitled “beginners guide to installing SQL Server” is not going to ever be selected, but “automating SQL deployments with sysprep and PowerShell” or “Deep Dive into troubleshooting installation”  may be.

What Tracks are you running

We accept topics covering a few areas:

  • DBA topics such as monitoring, performance tuning, backup, data movement, high availability, features of the database engine, internals, query plans, Extended Events, virtualisation, consolidation , storage.
  • Developer topics. Either data modelling, DBEngine development (TSQL, XML, CLR) or data related application development topics such as data caching, ADO.Net, Entity Framework, messaging,  or design patterns for data centric applications and the cloud.
  • Business Intelligence. Subjects across Power BI, MOLAP, Tabular, Emerging cloud BI capabilities, SharePoint, data warehousing, SSIS, Big Data, reporting and data visualisation.
  • The Cloud: Any subject related to managing and consuming data in the cloud
  • Special Interest: Personal Development, Certification, Soft Skills, Career growth.

What Topics are Hot this year ?

  • As its a SQL 2014 launch this is a major focus. Almost everyone is new to SQL 2014 and we want to help people skill up at least to understand capability and features
  • Some core topics we will always need to cover: performance troubleshooting, high availability and DR, TSQL best and worst practices, indexing deep dives.
  • In previous years we have been short of developer topics: The Entity Framework, emerging data access tools, data unit testing, design patterns, data caching frameworks, data modelling.

Things to Avoid

  • If you have submitted sessions to #sqlsatdublin in a previous year and they were not selected. I wouldn't recommend just blanket re-submitting them. It really isn’t a lottery and if they were not accepted previously I would say they have zero chance of being selected.
  • If you have already spoken at a topic in a previous SQL Saturday Dublin the it may not be worth re-submitting the same session.
  • If you have been using the same topic at every other SQL Saturday across Europe its ok to submit it, but  like a movie we may have seen it before. This can be a good thing if we liked it and we think no once else in Ireland has seen it, but it can be a bad thing if we thing a lot of people may have seen it..
  • Funny topics are good only IF your topic is funny. For example Rob Volks “How to loose friends with Unicode”. However in most cases its better to try and use the title to help explain the content. For example we changed one of our session titles last year from “IO IO its off to work we go” to “File and File group Design patterns for optimising storage IO”
  • SQL 2012 is not the new kid on the block any more. So try and avoid any “Introduction” to SQL2012 topics. We are expecting talks aimed at SQL 2012 to be deeper now, unless they are aimed at a very obscure features.
  • If you are presenting on a SQL 2012 feature which is improved in SQL 2014 we will be favouring sessions that take this into account. Column Store and Availability Groups are key examples here.
  • If you are reading this, and decide to choose a topic based what you think the audience wants to see, MAKE SURE you know the topic first. We do bother to check out a speakers fit for a topic by reading blogs, etc. We are aiming to have an awesome event and make sure that both speakers and attendees get the most from the experience – we don’t want anyone to be eaten alive because they have copied slides from books online and are struggling with filling in the blanks..

 

So Why Speak at SQL Saturday Dublin ?

Here’s a blog on why William Durkin started speaking at SQL Saturdays

http://sqlsouthwest.co.uk/2014/02/09/community-interview-whats-it-like-to-speak-at-a-sql-saturday/

For speakers coming to SQL Saturday we promise you:

  • An interactive and interested audience. There are hundreds of SQL professionals in Ireland who love to learn more about SQL Server.
  • An exclusive speaker event on Friday and unique gift for coming to Dublin. Previous years have includes a party on the floating barge and night out at the famous Guinness brewery.
  • Speaker T-shirt for your collection.
  • copy of your feedback from the audience and ranking (if you opt in), this may help you gain experience and credibility for speaking at other events.
  • Grow your reputation as a technical speaker in Europe.
  • Take part in the SQL Saturday activities and after event.
  • Our undying gratitude – you are awesome.

 

Hope to see you there,

Bob

Page List

Page List