SQL Centre of Excellence

We all know that Analysis Services uses the windows file system cache when querying, but how does it use the file system write cache when processing dimensions and measure groups?

I couldn't find much information in the will, so opened up the wonderful process monitor tool from sysinternals to check it out.

As background you'll need to get some appreciation for the windows API calls "CreateFile” and WriteFile and how they facilitate file caching which is documented here

I looked at three scenarios:

1. ProcessData on a MeasureGroup

2. ProcessFull on a dimension (specifically key stores and attribute stores)

3. Meta data files. As you may know SSAS uses a side by side processing strategy and then updates meta data XML files when the processing is complete so that the new version can now be utilised.

1. Measure Group ProcessData

The screenshot from process monitor shows that SSAS is calling the “WriteFile” API and is not specifying any flags to cause write through. We can also see the “system” process is kicking in shortly after to actually start writing the data to disk from the windows cache.

clip_image002

Interestingly enough, while the IO size is 64k into the windows file system cache we can also see that the lazy writer in the system cache optimises the IO by actually writing to disk in much larger sizes. Potentially up to 2MB in size! (see below)

clip_image002[5]

 

2. Dimension Processing

For dimension processing, we see that the CreateFile API call does not use any “pass through” flag to write through to the disk, so the windows file system cache is fully utilised (screenshot below) in exactly the same way as measure group processing

clip_image002[7]

3. Meta Data Files

when processing meta data files we can see that Analysis Services forces a flush to the disk subsystem by calling the “FlushBufferFile” API call to force the write to the disk. Assumabley this is to help ensure that server failure does not result in a the switch to the new version of an object not happening. Note that this flushing does not seem to happen when processing measure groups.

clip_image002[9]

 

So How useful is this information ?  Well it does indicate two things:

1. Having a larger block size for processing may be beneficial as although writes are in 64k for measure groups the lazy writer may aggregate this up to 2MB in size. Will try to lab test this …

2. The actual writing to the disk continues in the background by windows cache Lazy Writer for some time after SSAS reports that processing has “completed”. In my simple test I could see the lazy writer still writing to the disk some 4-5 seconds after processing was completed. This begs the question of what happens if the server or disk subsystem fails before the system lazy writer has completed the writes?

The agenda for 2012 SQL Master Class has been announced at. First one is on the 14th September!!!

http://www.prodata.ie/Events/2012/SQL-Masterclass-Programme.aspx

If you haven;t been to previous MasterClass sessions, these are advanced sessions aimed at SQL Server professionals in Ireland. We aim to bring some of the best speakers to talk on topical subjects related to SQL Server Database Engine and Business Intelligence to Ireland for short half day sessions throughout the year.  The intention is that for busy professionals this is this an ideal form of professional development.

If the topic is a new technology then we aim to bring someone who can help you get up to speed as quickly as possible and answer real world questions. If the topic is an established area then we want to bring advanced (level 400+) content which you will defiantly not find at a regular training course Ireland.

The dates and topics are:

Date Topic Speaker
Fri 14th September, 2012 Business Intelligence with Office 2013
More Details
Chris Webb
Fri 26th October, 2012 Mission Critical Availability – Availability Groups and Failover Clustering
More Details
Gavin Payne
Thu 6th Dec 2012 Azure and the Cloud for SQL Professionals
More Details
Bob Duffy
Thu 24th Jan 2013 Advanced Performance Tuning
More Details
Klaus Aschenbrenner
28th Feb 2013 Optimising SQL Server for Dot.Net Applications
More Details
Bob Duffy
Thu 25th Apr 2013 SSIS – What's new in 2012 TBA

This is a “community project”, meaning that its completely non profit. There is a small charge for the year of 350 euro to cover some speaker costs. That works out  41 euro per half day session to get access to some of the best SQL Server Server content available.

BTW do you have  a session that is not listed, if so we are always looking for feedback to shape more MasterClass Sessions. Email events@prodata.ie with feedback !

Hope to see you all there,

Bob

Thomas Kejser is going to deliver a once off SQL User Group session on Thurs 5th July at the Microsoft Atrium Building at 18:30 (http://www.mtug.ie/UserGroups/SQLServerUG/tabid/82/Default.aspx).

For those of you who don’t know Thomas, he was a principle Programme Manager for the SQLCAT team and is one of the greatest minds on SQL tuning globally. Thomas has been responsible for some of the worlds largest extreme scale OLTP and data warehouse solutions.

The sessions likely to be level 500  covering some advanced tuning topics showing how far you can really push SQL Server and the sort of challenges you will face when trying to get 100K messages a second.

If you are dealing with high volume transactions or simply want to go deeper into how to optimise SQL Server, this is a must. 

Thomas Kejsers blog is here

http://blog.kejser.org/category/grade-of-the-steel/

My Videos from SqlBits

by Bob Duffy 18. June 2012 03:33

My videos and links from SqlBits X are available online now

http://sqlbits.com/Sessions/Event10/Load_Testing_Analysis_Services

http://sqlbits.com/Sessions/Event10/SQL_2012_Alwasy_On-Deep_Dive

haven’t looked at them myself as I hate watching myself speak, but if you want to know more about either of the topics, could be worth a watch.

If you liked SQL Saturday #105 in Dublin, or if you missed it. SQL Saturday #162 is in Cambridge/UK on 8th September.

We’re going to go over for the craic and to soak up some learning's from some experts in the wider SQL community.

The registration and list of sessions so far is here:
http://www.sqlsaturday.com/162/eventhome.aspx

When setting up an AlwaysOn Availability Group with a synchronous replica you may find that the secondary replica never becomes synchronized and you are presented with the following error “Data synchronization state of some availability database is not healthy”. Effectively the databases restore to the replicas, but synchronization does not start.

The first thing to check is network connectivity. We must bear in mind that the setup Wizard uses the netbios name and the endpoint then transmitts the compressed log stream uses the fully qualified domain name on the endpoint. (AlwaysOn uses the same endpoints as mirroring with the same default port -  5022)

In my case the cluster installs a hidden virtual network device for heartbeat communication and this network connection had made its way to the top of the binding order, messing up connectivity.

You can double check the network by using ipconfig to verify the binding order and also pinging the node and its partners to check correct IP addresses are used.

We can see from the below that the ip used is one from the failover cluster and not the proper network card

image

The screenshot below shows that routing is coming from the hidden failover
cluster virtual network and not the proper network card.

image

 

So what’s the solution ?

Two options spring to mind.

a) Use fixed IP addresses for all nodes and then add entries to the “hosts” files for both netbios and fqdn.

b) Download the Nvspbind tool to change the binding order from the command line
http://archive.msdn.microsoft.com/nvspbind

After using nvspbind and rebooting I now get the proper IP address and not the automatically assigned one from the hidden “Failover Cluster Virtual Network Interface” (see below).

image

 

Lessons for Production Servers for Always On

1. When setting up a Failover cluster for Always On consider using fixed IP addresses for nodes.

2. Do a ping test of both computer name and FQDN of all nodes before creating Availability Groups!

The next SSAS Maestro course is coming to Milan, Italy on July the 9th-13th and is open for places!

http://sqlblog.com/blogs/marco_russo/archive/2012/05/07/registrations-open-for-ssas-maestro-in-july-2012-ssasmaestro-sas.aspx

For anyone serious about Analysis Services, this is a MUST attend course.Sure it’s €4,500 euro, but the early SQL Ranger courses were about €20,000 excluding accommodation and having to take a few weeks off work as a billable resources. This is only a week for a chance to learn from some of the top SSAS minds on the planet.

I really like how the Maestro has evolved from an MS internal and SQLCAT initiative into a privately run offering. I was lucky enough to attend the second series of internal rotations and it looks like Chris, Marco and Thomas have taken on board a lot of feedback from earlier attempts:

  • Its now open to anyone, but you’d probably need to have at least a good few years experience to get the most out of the course
  • It has been expanded to a full week. The original was only really three days and bulleting through 60 slides and hour its hard to soak as much in as you need to.
  • Most importantly Marco and Chis have committed to have results out within three months (with 2 months for time to submit homework). This is a breath of fresh air as marking on the first two rotations was a lot longer than anyone would have hoped for, up to 9 months!
  • On my rotation the Instructors didn't get a lot of prep time to own the content, as it was shorter notice for them. Marco, Chris and Thomas will have delivered this a few times now and ironed out any kinks I am sure. As well as adding a lot of unique value from their own backgrounds. 
  • I like Redmond myself as you get more guest appearances from developers like Akshai, but Milan looks like a good setting as well. Not that you will get much spare time to sight see.
  • It has been expanded to include some Tabular content

An old complaint about Analysis Services is we don’t really have as many “rock stars” as the SQL Server DB Engine community like Paul Randal and Kimberly Tripp who have arguably the most slick, professional, and high end courses with their flawless MCM track record and the “immersion” series. This Maestro course is definitely the closest there is Winking smile

Personally I would love this to be expanded to a three week course and cover a broader area such as: SSIS, SSRS, SharePoint, and more on MDX and DAX tuning. This will help it grow into the “BI MCM” Maybe there could be 2-3 weeks with each week optional – this is the format that the immersion now follows.

For those of you who didn't get to my sponsor session at #sqlsat105, I’m presenting a deep dive on SQL 2012 Always on at the Dublin SQL Users Group on the  23rd May in Microsoft Atrium Building, details are here:

http://www.mtug.ie/Home/tabid/38/ctl/Details/Mid/369/ItemID/100/Default.aspx?ContainerSrc=[G]Containers/_default/No+Container

If you are working with high availability or using replication to offload reporting, this session is a must see for you.

I plan to step through the main features of SQL 2012 Always on and migrate a live enterprise application from SQL 2005 to SQL 2012 to always on, while demoing features such as:

  • zero down time migration
  • Installation and Configuration
  • planned failover
  • cluster enhancements
  • unplanned failover
  • Using secondary replicas
  • Disaster Recovery
  • Some insight into the deeper workings: Virtual Network Names, Listeners, node voting, quorum, health model, etc.

Come along on the 23rd May. I’m planning to go for some pints afterwards too, so will be around  to discuss anything SQL or BI related.

Note this is on Wed, not the usual Tuesday night!

BISM allows you to start either an PowerView or an Excel report direct from the model in a SharePoint Gallery. Pretty cool stuff and I really like how we have equal choice between which tool to use. You can start the report simply by clicking one of the linked report icons top right in a Gallery view (see below)

image

One initial problem we had is the Excel report does not work out of the box with BISM. The Excel 2010 client will not recognise the linked report and you will get a dialog box asking for the Data Link properties (see below)

image

The solution is to install the SQL 2012 Microsoft Analysis Services OLEDB Provider for Microsoft SQL Server 2012 which is part of the free feature pack below.

http://www.microsoft.com/download/en/details.aspx?id=29065

Thanks to Greg Galloway (fellow MVP) who pointed me in the right direction

http://www.artisconsulting.com/blogs/greggalloway/default.aspx

As you move your platform from running on a single database and single server to running on a highly available scenario, be that a Mirror, AlwaysOn, or a cloud based platform like Windows or SQL Azure, you will find out that highly available does not equal highly reliable! The service will still go down during failover and this will not be covered by your “up time SLA”.

The solution is unfortunately the need to re-code the application tier to detect these “transient” errors and retry the offending code – hoping that the service is now running again on another node.

I will explore three common scenarios that need fault retry logic:

  • SQL Mirror (SQL 2005+)
  • SQL Always On (SQL 2012)
  • Windows or SQL Azure

1) Dealing with Fault Retry Logic on a SQL Mirror

The sales blurb may claim that a SQL Mirror is zero down time with its “transparent” client redirection.  The truth is that during a failover you will loose your current connection, and a rollback of any uncommitted transactions will occur.

If you are happy that your application crashes for a little while (e.g. web users see say and error page), and that in progress transactions are lost, then this is ok. Otherwise you will need the application code to support Fault Retry Logic.

The good news is that there is some sample code blocks available on codeplex that can be used as a starting point if you are developing on dot.net

http://sqlfaultretryprovide.codeplex.com/

If you are not developing on dot.net, then this code is still useful as you can read from the code the typical error codes that you need to look out for on failover. You can see the error numbers along with a nice KB article below.

 

  // The relevant error codes which are documented in http://support.microsoft.com/kb/109787
        // which seem to be worthwhile for retrying.  If errors occur which are not in this list they 
        // are simply rethrown.
        readonly static int[] RetryErrorNumbers = new int[] 
        { 
             1608,  // A network error was encountered while sending results to the front end
            10008,  // Bad token from SQL Server: datastream processing out of sync
            10010,  // Read from SQL Server failed
            10018,  // Error closing network connection
            10025,  // Write to SQL Server failed
            10053,  // Software caused connection abort
            10054,  // Connection reset by peer
            10058,  // Can't send after socket shutdown
            17824,  // Unable to write to server-side connection
            17825,  // Unable to close server-side connection
            17832,  // Unable to read login packet(s)            
        };

The sample code block uses a delegate to ensure facilitate the following algorithm:

a) Try method 
b) If it throws and exception, check to see if the exception number is one of the known errors associated with mirror failover (see above)
c) If it is a known error message then go to sleep and retry
d) each retry attempt takes exponentially longer until the max retries are reached

 

   1:  /// <summary>
   2:          /// This method invokes the supplied delegate with retry logic wrapped around it.  No values are returned.  If the delegate raises 
   3:          /// recoverable SQL server or client errors, then the supplied delegate is reinvoked after a certain amount of delay
   4:          /// (which grows exponentially for each retry) until the retry limit is exceeded, at which point the exception
   5:          /// is rethrown.  Other exceptions are not caught and will be visible to callers.
   6:          /// </summary>
   7:          /// <param name="body">The delegate to invoke within the retry code.</param>
   8:          internal void WithMirrorRetry (RetryBody body)
   9:          {
  10:              int retryCount = 0;
  11:   
  12:              while (true)
  13:              {
  14:                  try
  15:                  {
  16:                      body();
  17:                      return;
  18:                  }
  19:                  catch (SqlException se)
  20:                  {
  21:                      if (!RetryErrorNumbers.Contains<int>(se.Number))
  22:                          throw;
  23:                      else
  24:                          if (retryCount >= _maxRetries) throw;
  25:                  }
  26:                  //Sleep exponentially more each time through the retry loop
  27:                  Thread.Sleep((int)Math.Pow(_delayBase, retryCount));
  28:                  retryCount += 1;
  29:              }
  30:          }

 

2) Dealing with Fault Retry Logic for SQL 2012 Always On

In theory SQL 2012 is similar needs a similar approach to that of the Mirror Fault Retry Provider (above). However there is now an expanded list of potential error numbers with new messages specific to Always On. A quick search on sys.messages shows some 306 new messages that mention replicas or availability groups

image

I haven’t managed to find a definitive list of which ones are relevant for detecting failover, but so far in my testing I am only hitting error codes 976 and 983 on top of the standard mirror ones listed in http://support.microsoft.com/kb/109787

Hopefully there will be an official KB article or code block from the patterns and practices group to help narrow down the list of error codes to use.

 

3) Dealing with Transient Errors in Windows Azure

Just to make matters more complex, environments like Windows Azure also have “transient” errors relating to either failover, changed connections or just the nature of the cloud.

You can read about the Transient Fault Handling Block and download it from this link

A Word of Warning on Application Transactions

If your application is using Microsoft DTC for transactions, or explicitly starting database transactions outside of a stored procedure, then the failover will cause the entire transaction to roll back. Your retry logic needs to do a lot more work than just retry the last statement after re-opening the connection. It will need to retry all statements, including the start of the transactions.

The sample code on transient errors, or fault retry will not help you here. You will need to ensure that retry happens for the whole batch of statements within the application. This seems to be a very common practice with Java development, so make sure the project includes a review of this risk.

Page List

Page List