SQL Centre of Excellence

As a challenge for the recent Power BI demo competition I attempted to take our entire 90 minute Irish Economic Crisis BI demo and do three things

  • Reduce it to six minutes, with a 2-3 minute making of video added on.
  • Do the entire demo using Q+A (well apart from the Power MAP bit)
  • Try and get one of every type of visualisation possible with Q&A into the demo.

On the way I learned a lot about what you could and couldn’t do with the latest addition to the Power BI Stack – the natural language visualisation tool Q&A.

Getting Started

You don’t have to do anything special to get started with Q&A. just right click your Power Pivot workbook in the Power BI App and select “Add to Q&A”.


Once you do that, a new type of search will appear at the top of the web application.


When you are in, you can ask any question using natural language, or click on a featured question.

While you may start using natural language very quickly you realise that the tool is keyword driven using value names, table names or column names with a few reserved words “map”,”column chart” etc. Commas in natural language help in places for lists. It helps show you words it has ignored by greying these out and also gives you a hint at how to re-express the question along with suggestions on what the next question could be.

Lets look at the different types of visualisation you can get out of Q&A

1. The Card View – Good for scalar values and more

Not much to say here, It can display multiple answers if you use a comma or the word “and”


I am kinda disappointed that Q&A doesn't support external images in the card view (shoot the  security teams!). This allows for some nice views in the normal version of Power View (see below)


2. The Table

Q&A treats the name of a table as a noun and each value in a table as a Proper Noun. This allows for queries like the one below.(Note the spelling mistake on “show” – we could have just typed in “Countries Europe” and achieved the same result.


You can customise what you want to appear as columns for a noun (or entity in IT speak), by using the Default Field list in Power Pivot


One limitation is if you wanted to display some default measures when displaying each country. Eg its population. To date Q&A doesn't have a way to describe an entity that crosses multiple tables. I got around this by adding calculated columns in Power Pivot to denormalise attributes to the right tables, such as adding population to the countries table.

Another limitation is I wanted to have multiple related measures coming from one table. Eg the economic indicators table may have entities such as “growth”, “debt”, and “income” which would be 2-3 measures each. Sure I could split the measure group but this would be messy model-wise.

3. Ask the Map, Ask the Map…

The map visualisation works very seamlessly.


One annoyance is the continual warning or privacy. Sure I get that you want to ask me if I’m ok sending my data to bing to display a bing map. But can we not have a “yes, now go away forever” rather than a “Mr Clippit” style interruption mid-demo.


If possible its a good idea to use the advanced reporting properties in Power Pivot to tell the model which fields contain which types of geographic data. It handled continents, countries, counties and custom groups (eg EuroZone) very well.


3. The Colum Chart

With the reserved words “column chart” you can force the same data to come out as a column chart, or Q&A will often choose a column chart as the appropriate visualisation.

Go Ireland on GDP per capita for 2007! (pity about 2013).



The axis always seems to start at zero. Most Data Visualisation gurus like Stephen Few (just read his latest book, “Now You See it”) like to start the axis at just below the lowest value as the purpose of  bar chart is to demonstrate differences, right ?

4. The Line Chart

The line chart lets you do both a simple series and also multiple series



Unlike the bar chart , the line chart avoids starting the axis at zero (weird – I would have started the line chart at zero and the bar at a staggered value).


The line chart does  seem to start the Y axis at a non zero value unlike the column chart which doesn't make sense to me, but it does help make better use of the screen space.

4. The Trellis Chart

Nice !


4. The Scatter Chart (The Holy Grail of visualisations)

Great visualisation if used right. The best way to use an animated scatter seems to be to think of it as a “Gartner quadrant” chart. E.g. for the chart below we have Govt Debt per Capita versus GDP Growth. So we can see how Ireland moved from the top left quadrant (high growth and low debt), to the far right and nearly bottom quadrant (high debt and lot growth).

In this chart, GDP growth below zero means countries are in “recession”.


One major obvious issue. Where's the legend dude! try as a may I could not get a legend from Q&A or find the words to express that I wanted the country as a label / color item. I hope that this is made easier in the RTM version.

Here is the complete demo video along with a 2 minute behind the scenes at the end.


The Irish Economic Crisis - a tale of money loss and Power BI Q+A

Don’t Forget Synonyms!

You may find that there are multiple natural language words to describe an entity (table or column), or that the name used in the model is not quite right.

A new synonym editor has been added to Power Pivot to support editing these. In the example below we have many types of debt, but we just want “Govt Debt (€m)” to be known as “Debt”.



Two warnings:

  1. You NEED to use office 365 as downloaded from the Power BI site as opposed to deployed in another fashion (like from MSDN, or RTM Office 365 site).
  2. If you open the workbook in a version of excel that does not support synonyms you loose all the synonyms. Found this out the hard way as my desktop has office 365 RTM and my laptop had the Power BI version – i didn’t realise there was a difference till I kept loosing all the synonyms.

My Wish List for Q&A

  1. Let ME choose if I want to allow external images and then please go away and stop pestering me with warnings while I am trying to work.
  2. Add Q&A into excel, this helps us folk who use iterative development. E.g. we don't have to keep uploading the workbook to see the result.
  3. Allow reports to be saved as first class Power View reports
  4. Allow Column charts to have a staggered axis like line charts (in normal Power View and Q&A)
  5. Show me the legends on scatter charts, or let me choose with a “legend” keyword.
  6. Allow control of the “colour” attribute in scatter charts. Maybe with a reserved word like “color as “
  7. Allow me to make a report with the designer and then show me the language that would have made the same report.
  8. The Q&A window seems to only work well at 1,280. Projectors of the world are still mainly 1024!
  9. Add microphone support like Google search

If EVER there was a feature that needed voice recognition, then Q&A is it!


Behind the Scenes

For those of you who don’t want to see the actual demo video here is a 90 second behind the scenes look at a Power BI project

If you want to see this event registrations are almost full !

Carmel is presenting a session on analysing the Irish Economic Crisis using the Microsoft Power BI stack. Come and see if if you are interested in how to interpret and analyse data using Microsoft Technology, or just if you want to know where all of your money went.

The event is Tuesday the 5th November at the Microsoft Atrium in Sandyford


As some bonus free content Carmel has also published some free videos on using Excel to create enterprise dashboards


Hope to see you there!


Thanks for everyone who makes it to my #SqlSatCambridge talk on migrating to the cloud. I’ve attached the slide deck so you can download it if needed.

Migrating to Azure Migrating to Azure

In addition there are a few more days for the chance to win a free Aston Martin and get FREE Azure credits. Just click the link here and you can start playing with Azure




When reviewing design or performance of a MOLAP analysis services database we need to determine where the space. Common problems are:

  • We have too many or too large aggregations
  • If we have too many bitmap indexes
  • Large attributes with string or binary data in them
  • If some fact tables are growing too much.

The only real way to do this is to look at the size of various files on disk, but this can get pretty time consuming with tens of thousands of files for even a small project like Analysis Services and many different types of files to analyse.

I have written an excel macro which will read all the files in a folder and present a PivotTable to help analyse space.

We can see from the output below that its actually the customer and product dimension in Adventure Works which are hogging all the space



if we expand the Product dimension we can see that the Large Photo has a “bstore” file for storing a binary image. Should we put this in a cube – maybe not.


The table below shows the space breakdown by file type. We can see that the Adventure Works cube disk space mostly consumed by Fact data rather than Bitmap Indexes and Aggregations which often can run away on customer sites.


The excel macro spreadsheet is available for download here

SSAS Space Tool

Update: I had a few people saying that the file link didn't appear in chrome, so here is a direct link


We have been deploying office 2013 on some customer servers for POC’s. One new feature in 2013 is that when it starts up it looks for templates online. This is fine for a connected desktop, but does not work so well behind firewalls.

A typical Error message may look as below “Connecting to proxy Server officeimg.vo.msecnd.net”


Assuming that you to want to enable online content you can follow this KB article

The dword to add with a zero value is below


When migrating a SQL Server workload we need to think about a few things

  • A baseline of what resources our workload needs, at normal and peak operation, often we may just capture performance counters for CPU, Disk, maybe some SQL Server counters such as Batch Requests per second and ideally a business counter such as OPM (Orders Per Minute).
  • A way to measure the target platform to make sure it has sufficient resources to run the workload. For disk resources the defacto tool is SQLIO. I have some notes on how to specify requirements here (http://blogs.prodata.ie/post/How-to-Specify-SQL-Storage-Requirements-to-your-SAN-Dude.aspx). For CPU resources you may simply check the number of cores and frequency, or use an industry standard benchmark such as those available on www.spec.org

The cloud presents some challenges. If we are migrating to the Infrastructure as a Service,its really easy to verify we have the correct resources by simply running tools like SQLIO or running the workload and checking the performance counters for throughput and latency. If we are migrating a database to “Windows Azure SQL Database” or SQL Azure as it is still commonly called, we are a bit more restricted. WASD doesn’t allow us to use perfmon or SQLIO, so the best benchmark we can use is something measured from the application like OPM.

So what does SQLIO tell me about running SQL in the cloud ?

As a test we created a four core VM in the West Europe data centre and then run a series of SQLIO tests to measure the “saturation” point for disk IO (Thanks Fintan and Stephen!). The resulting chart is below.

This shows that the disk we tested scaled up to about 30MB/Second at just over 30ms latency. The highlighted section on the chart shows the “saturation point” where increasing the outstanding IO gives no more throughout. This is essentially the limit.


30MB/Sec equates to almost exactly 1,000 IOPS. Quite a nice round number! so much so that I think its safe to say that this is probably a limit imposed by throttling rather than the actual limitation of the hardware.

One key challenge for SQL infrastructure running in the cloud is the log drive performance. Traditionally we like to keep the latency at say 3ms, but assumabley due the replication and the fact the VHD is mounted on top of a blob store we have  very high write latency. At low yields, say only a few MB/Sec, it maintains 10-20ms average, but then quickly increases to 100ms plus average!


I ran a sample ecommerce OLTP supplied by dell on a four core VM running in IaaS. We can see that the number one wait statistics was indeed WRITELOG, so a primary bottleneck is the disk subsystem. You can see the delays from the DMV “exec requests”.



So what does this mean again ?

This gives us a few facts we can use when doing capacity planning for workloads moving to the cloud:

  • Azure IaaS (4 core) offering currently has a max ceiling of 1,000 IOPS for a disk. How much you get in terms of MB/Sec will depend on the block size: 30MB/Sec with an 8K block size and 200MB/Sec with 64k block size. You won’t get beyond 200MB/Sec or 1,000 IOPS due to throttling, but maybe this will change in the future.
  • While the max IOPS is 1,000 I was just in a session with the SQLCAT team at SqlBits and they recommend to consider 500 IOPS per disk for capacity planning..
  • Different VM Sized allow you to add more disks, and therefore get more IOPS. It seems that you probably want to consider using multiple disks as a matter of course in Azure IaaS as the charge is not based on the number of disks
  • Write latency will start to affect service quality moreso than it will with on premise disk subsystems which are not replicated.  You need to make sure that this is either not important or design applications to work more asynchronously, so this does not translate into poor user quality with log writes (as our sample OLTP suffered from)

Thoughts for some more testing

I note that there are two options for disk configuration in Azure Iaas: host caching and whether geo-replication is enabled. I would love to test the effect of both of these on IO throughput and latency. This could for example affect placement of TempDB. We ran the tests with geo-replication disabled.

I’m just going to see if we can get more throughput or better latency by utilising more than one logical disks. Will hopefully post the results ;-)

If you want to repeat these tests you can sign up for Azure IaaS here.


and my article here has some links and tips on running SQLIO


As a recap for anyone who has made it this far, here are the other three blog entries in the series:

This weeks questions are on waits and extended events. OK admission – I suck at extended events. Never really got on the gravy train with SQL 2008 and still clinging onto the hope that RML utilities will be upgraded to support SQL 2012. I’d even do the upgrade myself if the source code was on codeplex, such is my aversion to extended events. Waits on the other hand are the bread and butter of performance tuning – to be a SQL MCM I would expect you not only need to be able to articulate main wait types, but at least share some horror stories of situations where you’ve seen them. Mere mention of CXPACKET should be enough for a one hour debate.

Waits (not just wait stats)

1. Can you define “signal wait time” ?

2. In the DMV ‘sys.dm_os_wait_stats’ does the column “wait_time” include or exclude signal wait time?

3. What does a high “signal wait time” usually mean ?

4. What does a high wait time but low signal wait time mean ?

5. What is the difference between WRITELOG and LOGBUFFER wait types ?

6. When running a query what could cause an IO_COMPLETION wait type ?

7. When running a query what could cause an ASYNC_IO_COMPLETION wait type ?

8. What can cause PAGELATCH_XX ?

9. If you  have a RESOURCE_SEMAPHORE wait for a large query how could you resolve this?

10. How could you resolve THREADPOOL waits on a server with a high number of concurrent users?

11. How would you determine why a log file was continually growing ?

12. What can sys.dm_io_virtual_file_stats show you that you cannot see from perfmon counters ?

Extended Events (Assume SQL 2008R2)

  1. What is a predicate WRT extended events ?
  2. Is a predicate at the event or session scope ?
  3. Explain the difference between sys.dm_xe_packages and sys.dm_xe_objects ?
  4. Name some targets available in SQL 2008R2 (six available) ?
  5. How big is an asynchronous buffer by default ?
  6. What happens if an event is bigger than this default size?
  7. Explain the difference between the three options for dealing with a full extended events buffer using EVENT_RETENTION_MODE
  8. If the services is restarted will an xevents session continue ?
  9. Why would you want to use extended events to track page splits instead of just using the performance counter ?
  10. What is the purpose of this xevents node “//RingBufferTarget/event/action/value”

I’m at the Belfast user Group Thurs the 11th April to demo and discuss two of my favourite topics (below). If you are interested in SQL Server or want to know a bit more about Windows Azure hype v feature then please do come alone.

SQL 2012 Always On Deep Dive (Beginner/Intermediate)

Always On Availability Groups offers a huge leap forward in terms of high availability. This sessions is a demo based introduction to the high availability changes in SQL 2012 emphasising key features and benefits of Availability Groups.

During the demonstration we will show how to seamlessly upgrade a database from SQL 2005+ to Denali with ZERO down time and then use the Always on features to simplify reporting, availability and disaster recovery.

This session will be useful for anyone who is working in a high availability environment or currently using replication to provide reporting and looking for a neater solution.

Windows Azure for SQL Folk (Beginner)

So what is this cloud stuff and how does it affect database professionals?

In this session we aim to cover the fundamental concepts that you should know on the Cloud and Windows Azure. In this demo based hour we will step through building Azure applications, deploying and managing them and what storage choices you have when developing applications in the cloud.

This sessions is Ideally suited to anyone who wants to understand more behind how Azure works, what it offers and what is probably just hype. Note that we are looking at the whole of Azure rather than just SQL Azure.

Date: Thursday 11th April 2013
Time: 5:45pm-8:45pm
Venue: Liberty IT, Adelaide Exchange, 24-26 Adelaide St, Belfast.

Registration is at http://www.eventbrite.com/event/5705276636#

Last year one of the most popular sessions for SQL Saturday were the lightning talks. No need to register in advance, just register at the event for a FIVE minute slot. one hour and maybe ten speakers.

The rules are simple:

- Topic can be ANYTHING loosely related to SQL Server
- Strict five minute time, buzzer provided for cut off

We will have some mystery prizes. Last year prize categories were:

  • Best “New” speaker (no MVPs and product team folk)
  • Most entertaining
  • Best speaker

This years promised to be a mixture of short nuggets form industry experts with some light hearted content thrown in.

If you missed Matt Massons hilarious “How to use SSIS and slipstream to manage facebook” he is presenting a sequel “How to use SSIS to manage online dating”

@CarmelGunn penned “Paradise Lost? Advise found” with the catch tag of “When your data fits on a USB stick, more hardware’s unlikely to do the trick” http://blogs.prodata.ie/post/Paradise-Lost-Advise-Found.aspx

To register and for more information on this years SQL Saturday Dublin go to

or catch up at #sqlsatdublin on twitter

These questions are focused on availability, more next week on performance and tuning. I helped write some of the Mirror questions on the v1.0 MCM exam which must be well retired by now, and I must admit they were pretty obscure.  Learn those failure conditions and resources/threads required on the MSDN articles and MCM videos!

HA - Availability

1. Define the difference between RPO and RTO ?

2. What are the key components of an SLA ?

3. What are the key characteristics of an SLO (or Service Level Target) (eight possible)

4. Name five different types of disaster, what they affect and recommended mitigation steps

5. Articulate the key feature differences between Log shipping and Replication

6. When and why would you use Failover clustering instead of Mirroring

7. What would be a blocker to using failover clustering (name three blockers)

8. A major web site wants “zero down time” and “zero data loss” and “zero user impact”. What SQL Server technology will achieve this by itself?

9. How can developers ensure that mirroring and always on do not affect the user experience when they failover WRT uncommitted transactions?

10. Now that you have answered (9) do you want to change your answer to (8) ;-)

11. What is the max failover time with a mirror with default settings?

12. What types of failures can cause this?

13. Explain the difference between “Looks Alive” and “is Alive” in WFC?

14. How does this change in SQL 2012?

15. Windows 2008 cluster components no longer need the HCL, but what “certification” do they need?

16. A customer has added a new disk to the cluster. Do they need to rerun the validation tool to maintain supported status?

17. Should a cluster use the vendor MPIO driver or the MS MPIO ?

18. Any special anti-virus considerations for clustering? name them

19. There is a three node cluster with a single instance. How many nodes should be removed as possible owners before patching?

20. Why might log shipping offer better performance for a data warehouse workload than mirroring?

21. What’s the maximum number of databases that it is recommended to Mirror on a single instance?

22. How many additional threads does a mirror consume on the principal per database mirrored?

23. At what level of hardware does enterprise edition offer more redo threads with mirroring ?

24. What OS and SQL editions can be supported with the SVVP ?

25. Is VMWARE or Hyper-V “Snapshot” technology supported by MS ?

26. What does the RHS.exe do in Windows 2008 clustering ?

Page List

Page List