There are generally three reasons for partitioning an Analysis Services model:
- To improve processing time by running partitioning in parallel
- To improve processing time by allowing for incremental processing of some sort
- To improve query time where most queries only need a smaller slice of data – say a week or month.
To help test this I created an Adventure Works tabular model with about 100 million rows in the two sales tables and ran some tests on both partitioned and partitioned models. I also created an equivalent MOLAP model and ran the same tests.
1. Partitioning to Improve Processing Performance
The chart below shows total processing time for out 100 million row AdventureWorks model on an 8 core laptop with an SSD drive.
When we partitioned the models the MOLAP model dropped by 31% to around 12 minutes. the Tabular model actually took 5% longer, going from 20:54 to 21:56.
Why did the Tabular modal take longer to process when we partitioned it. Well a few reasons come to mind:
- The Tabular model does not support parallel processing of partitions.
- The underlying data warehouse was not partitioned so a full table scan was performed by the DBEngine per partition. In MOLAP this was achievable in parallel so we can take advantage of SQL Servers ability to execute parallel table scans more efficiently. we cant do this in tabular world (yet).
- We need or marshal more jobs to do the same work, which doesn't help much if they are running sequentially.
So partitioning a tabular model to improve processing may do more harm than good from this perspective.
2. Partitioning to allow Incremental Processing
This is a valid use case for partitioning. note that you do not “have” to partition to get incremental processing as the “ProcessAdd” command can be used as well. This is discussed more on Marcos blog below.
3. Partitioning to improve Query Performance
With MOLAP cubes its a well known practise to partition a model to improve storage engine performance by:
- Allowing for partition elimination
- having smaller sub cube events and less storage work
Does this hold true for the tabular model? Well to test this we ran a load test of 5,000 queries on four threads against a tabular model with and without partitioning and measured the average number of queries served per second.
With MOLAP, as we partitioned the model, we saw a 21% increase in throughput from 28 Tests/Sec to 34 Tests/Sec.
With the tabular model we saw a 11% decrease in performance, from 17.23 Tests/Sec to 15.3 Tests/Sec.
So the Conclusion?
While partitioning is often a must have for the molap world it is much more limited value in the tabular world. Consider carefully if you even need to bother partitioning your model!
This is underscored in section 4.0 of the excellent Tabular Performance Tuning Guide written by John Sirmon and Co.
of course the only way to know for sure how much partitioning may hurt you is to do some performance testing, so come to one of my load testing sessions at a SQL Saturday style event