SQL Centre of Excellence

Know any SQL Server experts that are going for a technical job interview ?

Have some fun by giving them 60 seconds to answer as many as possible of these true/false questions.

# Question True False
1 A clustered table is better than a heap?    
2 You should never use a GUID as a primary key ?
3 Mirroring is better than replication?
4 The datetime2 data type should always be used over datetime?
5 Page compression is recommended over row compression?
6 Compressed tables are faster than uncompressed tables?
7 Adding SSD’s will make queries faster?
8 Adding more memory will make queries faster?
9 Cursors should never be used?
10 Table variables are better than temp tables?
bonus The Tabular model is faster than the MOLAP model?    

Thanks to Vedran Kesegic (SQL MCM) who popped over for dinner last night and inspired these interview questions.

In my session on load testing analysis services we cover a few areas:

  • How to create unit tests for Analysis Services using a variety of tools
  • How to capture and analyse a workload using a service side trace
  • How to use Visual Studio run load test scenarios are regression testing in excel.
  • A case study on load testing Analysis Services to support thousands of users.
  • A case study on converting an enterprise BI solution from MOLAP to tabular and testing it.

If you want to play with the deck, source code and demos I used you can download them below.

If you want to recreate the MOLAP model I used with 100 million rows in the sales table there is a previous blog on how to scale adventure works below:

I’ll post another blog entry later with my 100 million row tabular model with the same schema as the adventure works. This is useful as you can run almost the exact same MDX on both MOLAP and tabular.

Deck and load test source download Link: ssas_load_test_deck_source.zip

Update (06/07/2015). As a few people have pointed out this is more of a named instance issue than a “tabular” specific issue. I added some details on how to optimise SSAS connections here

http://blogs.prodata.ie/post/Optimising-Connections-to-SSAS-(MOLAPTabular).aspx

http://blogs.prodata.ie/post/Solved-slow-SSAS-connections-on-Hyper-V.aspx

 

-- original post ------------------------------------------------------------------------------------------------------

Seasoned developers are often shocked when they find out the SSAS doesn't have any support for connection pooling.

With SSAS MOLAP this didn't matter so much as connections are much faster to acquire than say with the DBEngine and is considered trivial, so we don’t pool connections and neither does the product.

As a quick test I wrote a console application to open 1,000 connections on both the molap and tabular engine and counted the time taken.

image

In this test the tabular engine was some 236% slower at opening connections, on a test to a remote server it was still about 2x slower.

This will really hurt workloads like reporting services which opens a lot of connections for small parameter queries, and so far this has really hampered the tabular model in competing against MOLAP models in load testing. Even if the tabular queries are efficient the additional work to establish a connection is dragging down throughput at load.

If you are writing applications that have high concurrency tabular connections it certainly raises the question on if you should manually pool the connections rather than rely on analysis services.

if you are using say reporting services it also raises the importance of considering the usage of cached data sets for re-usable parameter queries and slowly changing data.

Here is the source code if you want to repeat the test.

Imports Microsoft.AnalysisServices.AdomdClient
Module Module1
 
    Sub Main()
        Dim t1 As Integer = System.Environment.TickCount
        Dim t2 As Integer
        Dim t3 As Integer
 
        Console.WriteLine("Testing Speed of 1000 Connections to MOLAP")
        For x = 1 To 1000
            Using con As New AdomdConnection("Data Source=localhost;Catalog=AdventureWorksDW")
                con.Open()
                con.Close()
            End Using
        Next
        t2 = System.Environment.TickCount
 
        Console.WriteLine(String.Format("Time Take={0} seconds", (t2 - t1) / 1000))
 
        Console.WriteLine("Testing Speed of 1000 Connections to Tabular")
        For x = 1 To 1000
            Using con As New AdomdConnection("Data Source=localhost\tabular;Catalog=AdventureWorksDW")
                con.Open()
                con.Close()
            End Using
        Next
        t3 = System.Environment.TickCount
        Console.WriteLine(String.Format("Time Take={0} seconds", (t3 - t2) / 1000))
 
        Console.WriteLine("Press any Key")
        Console.ReadKey()
    End Sub
 
End Module

There are generally three reasons for partitioning an Analysis Services model:

  1. To improve processing time by running partitioning in parallel
  2. To improve processing time by allowing for incremental processing of some sort
  3. To improve query time where most queries only need a smaller slice of data – say a week or month. 

To help test this I created an Adventure Works tabular model with about 100 million rows in the two sales tables and ran some tests on both partitioned and partitioned models. I also created an equivalent MOLAP model and ran the same tests.

1. Partitioning to Improve Processing Performance

The chart below shows total processing time for out 100 million row AdventureWorks model on an 8 core laptop with an SSD drive.

When we partitioned the models the MOLAP model dropped by 31% to around 12 minutes. the Tabular model actually took 5% longer, going from 20:54 to 21:56.

image

Why did the Tabular modal take longer to process when we partitioned it. Well a few reasons come to mind:

  • The Tabular model does not support parallel processing of partitions.
  • The underlying data warehouse was not partitioned so a full table scan was performed by the DBEngine per partition. In MOLAP this was achievable in parallel so we can take advantage of SQL Servers ability to execute parallel table scans more efficiently. we cant do this in tabular world (yet).
  • We need or marshal more jobs to do the same work, which doesn't help much if they are running sequentially.

So partitioning a tabular model to improve processing may do more harm than good from this perspective.

2. Partitioning to allow Incremental Processing

This is a valid use case for partitioning. note that you do not “have” to partition to get incremental processing as the “ProcessAdd” command can be used as well. This is discussed more on Marcos blog below.

http://www.sqlbi.com/articles/incremental-processing-in-tabular-using-process-add

3. Partitioning to improve Query Performance

With MOLAP cubes its a well known practise to partition a model to improve storage engine performance by:

  • Allowing for partition elimination
  • having smaller sub cube events and less storage work

Does this hold true for the tabular model? Well to test this we ran a load test of 5,000 queries on four threads against a tabular model with and without partitioning and measured the average number of queries served per second.

With MOLAP, as we partitioned the model, we saw a  21% increase in throughput from 28 Tests/Sec to 34 Tests/Sec.

image

With the tabular model we saw a 11% decrease in performance, from 17.23 Tests/Sec to 15.3 Tests/Sec.

image

So the Conclusion?

While partitioning is often a must have for the molap world it is much more limited value in the tabular world. Consider carefully if you even need to bother partitioning your model!

This is underscored in section 4.0 of the excellent Tabular Performance Tuning Guide written by John Sirmon and Co.

http://msdn.microsoft.com/en-us/library/dn393915.aspx

of course the only way to know for sure how much partitioning may hurt you is to do some performance testing, so come to one of my load testing sessions at a SQL Saturday style event Winking smile

Developing Tabular models in Visual studio can be super painful as the interface gets very “sluggish” as tables get larger. in some cases customers have told me they wait 5-10 minutes a go to say change the name of a column or add a calculation

Another of the annoying things about tabular models is that if you have a partitioned fact it can be only processed on one thread PER TABLE. If we start processing and profile  queries to SQL Server we will only see one of them, rather than one per partition or one per core as you get with MOLAP. There is pretty much no way around this ;-(

image

One solution is to consider using a smaller volume of data for development and then increasing the volume when iterative development has finished and we want more volume.

My favourite approach  so far is to use a stored procedure for each partition instead of inline SQL and then use the TABLESAMPLE to reduce the volume. We can then have a setting which we change to set the percentage of facts we want in the model. Just change the setting and reprocess to increase volume. Here is a simple example based on the Adventure Works Reseller Sales.

ALTER PROCEDURE usp_FactResellerSalesPartition
    @MinDateKey int=762
    ,@maxDateKey int=792
AS
BEGIN
    SET NOCOUNT ON;
    declare @sql nvarchar(max)
    declare @PecentSample int =1    -- set this dynamically from say a settings table
        
    set @sql = 'SELECT *    FROM [dbo].[FactResellerSales] '
    if @PecentSample<100 
        set @sql = @sql + 'TABLESAMPLE (' + convert(varchar(3),@PecentSample) + ' PERCENT) REPEATABLE (100)'
    set @sql = @sql + ' WHERE OrderDateKey  BETWEEN  @MinDateKey AND @maxDateKey'

    exec sp_executesql @sql, N'@MinDateKey int, @maxDateKey int',@MinDateKey,@MaxDateKey
END
GO
exec usp_FactResellerSalesPartition

Within visual studio we can then just call the stored procedure within the partition manager dialog box

image

An added benefit of this approach is that if we need to add a column to the fact table we only need to update the single stored procedure rather than every single hard coded TSQL statement!

Note that we “could” also use a TVF and we actually do this in a lot of customer systems, but for tabular models I choose to use a stored procedure as the TABLESAMPLE function is not supported within a TVF.

As a final note the TABLESAMPLE function needs to have a physical table rather than a view. This is a best practise for most models anyway, but if you are using a view you’ll need to expand the view out into the stored procedure – or use the ETL to create a physical table.

Using this approach I got my processing time in visual studio from 30 minutes to 90 seconds. Much nicer, and iterative development was a dream.

Anyone like this approach or have other suggestions – leave a comment.

Bob’s Speaking Schedule early 2014

by Bob Duffy 19. February 2014 12:38

Here’s a list of events I am speaking at this year so far. I foolishly have different topics at each event rather than delivering the same content at each event.

1) Load Testing Analysis Services. 1st march. #sqlsat278  Budapest

Ever deployed an Analysis Services cube that worked perfectly well with one user on the development server, only to find that it doesn't meet the required volumes of user concurrency? This session focuses on tools and methodology to load test Analysis Services in highly concurrent environments. Including how to locate resource bottlenecks and the appropriate configuration settings that can help improve performance. Sample source code will be supplied to help you load test analysis services. We will be focused on the MOLAP engine, but techniques are equally applicable to the tabular model.

www.sqlsaturday.com/227/eventhome.aspx

A video of an older version of this presentation at SqlBits in 2012 is available online:

http://sqlbits.com/Sessions/Event10/Load_Testing_Analysis_Services

2) Migrating to Azure and the Cloud.  29th March #sqlsat267 Denmark

So your CTO wants you to migrate your mission critical database application to the cloud. What's involved and what are the pitfalls? In one hour we will take a sample ecommerce application and database, show tools available for both schema and data migration and then demo performance and scalability impacts.

Both database and application options are discussed.

 http://www.sqlsaturday.com/275/eventhome.aspx

3) Cube Processing Deep Dive. 8th April Dublin SQLUG, Ireland

Ever wondered what actually happens when a cube is “processed”, why it takes so long, how you can configure and optimise cube processing, or what strategies people commonly use for incremental processing ?

This session provides a deep dive into cube processing for MOLAP cubes, to help understand how it works and what you can do to work with it.

Come to this session if you are working with Microsoft Analysis Services from either a development, Administrative or Architectural perspective.
Most of the content is aimed at MOLAP cubes but we will also include a brief discuss on the newer Tabular models.

http://www.eventbrite.ie/e/business-intelligence-datawarehousing-cube-processing-deep-dive-tickets-9549674333?aff=eorg

4) Optimising a dot.net Solution on the SQL Platform. 12th April  #sqlsat267 Portugal

In this case study we will take a sample dot.net ecommerce application running in a load test harness and show common performance worst practices, how to correct them and the impact on performance. During a series of live demos we aim to test just how far we can optimise transactional processing.This session is aimed to help developers and DBA’s understand the performance impact of choices made within a dot.net application and its interaction with SQL Server.

www.sqlsaturday.com/267/eventhome.aspx

Page List

Page List