SQL Centre of Excellence

There are some well known design patterns and blogs for handling running sum commonly used on reports such as Profit and Loss Statements.
https://www.daxpatterns.com/cumulative-total/

I’ll show here an alternative approach that I commonly sue when we have a Many to Many table such as:
- A table mapping account codes to Financial Reporting Ranges

- A Date Range table to show YTD, WTD without any DAX

If we take the first example we may have a Profit and Loss Report with four lines on it as below with line 4 being a running sum (Net Invoice Sales)

image

The Rules table that stores mappings from account codes to the reporting ranges and headings may looks something like below for the first three lines of report

image

If we add Line #4 as below we now have a “running sum” without the need for any DAX, the many to many join will do the same function as a running sum.

image

Your mileage may vary and there is nothing wrong with the DAX pattern,
but for some scenarios we may have a M2M table anyway, so we can also leverage it for supporting custom aggregates like running sums..

1) For multiple Reports and Report Variations

As discussed in a previous blog article, simple Profit and Loss may assume a single reporting
hierarchy rolling up account numbers into a reporting line and heading.

As this starts to break down we may have a Many to Many mapping between Account Codes and Reporting Lines. Like below, and for convenience we might to the mapping based on “ranges” of account numbers, even if storage is de-normalised.

As an example we may have three financial reports all using account codes 4130 to 4139 as below. The Standard P&L uses report No 5, but other reports use other definitions and rollups

image

This may then be expanded to a more classical Many to Many Table in a physical Mode

image

And to use this in a Many to Many we follow the ugly looking pattern below with the bridge table highlighted and to make DAX people turn in their graves, a bi directional join to boot!

https://www.sqlbi.com/articles/many-to-many-relationships-in-power-bi-and-excel-2016/

image

Once in play we can have many different versions of a financial report, or have many financial reports using account codes in different ways (Eg Cashflow definitions versus P&L definitions)

For financial reporting, can we use the Chart of Accounts as a hierarchy for all our reporting needs ?

The short answer is: In the ideal world - Yes. In the not ideal world – No.

The Chart of Accounts provides us with a way to categorise Account Codes (aka GL Codes) into an account name and and sometimes a heading that can be used for rollup for P&L and BS based reporting.

It’s often portrayed as Parent/Child in sample databases and some smaller accounting packages (QuickBooks loves their hierarchies), but more often than not, its a fixed two level hierarchy above the account code.

Typically we have a hierarchy like this:
-Account Heading. Operating Expenses
-Account Range. Labour Expenses
-Account Code / Name. Account Code 5020 (Bob’s Wages)

So why can’t we use the default chart of accounts for financial reports?. There's a few reasons:

1) This assumes that there is ONE version of the truth and ONE pro forma hierarchy and rollup for Profit and Loss.

For small to mid sized companies that is certainly true, but as we move to larger organisations operating in many countries with many verticals, we end up with variations on the Profit and Loss.

For one division, say price variation might be realised at one line on the report and in another division it may be realised elsewhere.

Definitions like Net Profit may include different items above and below the line, especially as we switch from external focus to internal accounting for say Manufacturing.

So rather than one single Profit and Loss Report, we may need to support multiple Profit
and Loss hierarchies with an Account Code “moving” between sections of the report.

The company may have an accounting review and decide to restructure how account codes rollup, but want the BI platform to show both the old and new style of Profit and Loss Reports.

2) The P&L and BS are not the only Financial Reports

We defintatley have one “Account Name” per “Account Code”, and an account will either by a movement (P&L) or a balance (BS), but when we come to reporting ranges and reporting headers there are other financial report like CashFlow which rolls up account codes from both
P&L and BS into a different type of report showing both movement and balances and then reconciling them.

This gives us a choice. Do we:
- Have a single hierarchy “per report”
- Support multiple hierarchies with the chart of accounts being the “main” one.

I opt for the second option and this then means that one account code needs to map to multiple reporting ranges. Effectively a many to many.

2) The Chart of Accounts doesnt include calculated members.

A financial report will typically have many calculated members which are either
running sums, custom aggregates or alternate rollups.

If we look at example below, the following members are calculated:

- Net Invoice Sales: Running Sum
- Net Sales: Running Sum. Although accountants will explain it as Net Invoice Sales + Adjustments
- EBITA: Running Sum
- Profit Before Tax: Running Sum
- Profit After Tax: Running Sum

  image

To handle these we need a custom hierarchy with these members added in, and the correct sort order to implement running Sums

There are a few ways to implement running Sums.Within DAX the SQLBI folks have it covered here
https://www.sqlbi.com/articles/computing-running-totals-in-dax/

For my session in SQL bits on Financial Modelling in PowerBI there are couple of useful downloads if you want to follow or dissect at home.

1) AdventureWorksDW2017
This data warehouse has an albeit basic data warehouse version of an accounting table and Account Dimension, so good to use as a demo and starting point.
https://github.com/Microsoft/sql-server-samples/releases/download/adventureworks/AdventureWorks2017.bak

2) PowerBI P&L Report using “Parent Child” Chart of Accounts
https://github.com/ProdataSQL/FinancialModelling/raw/main/downloads/01_P%26L_ParentChild.pbix

This sample report shows some of the aspects of a basic P&L Statement using a parent/child hierarchy.

  • Dealing with Expense and Income signage (Credit/Debit)
  • Ragged Hierarchies and Parent/Child
  • Pro format layout and sorting

image

3) PowerBI Report using “Many to Many” Custom Chart of Accounts

https://github.com/ProdataSQL/FinancialModelling/raw/main/downloads/02_P%26L_CustomM2M.pbix

This report uses a “custom rules” table to extend the account codes into a reporting ranges and headers in a Many to Many basis.

image


4) Sample Balance Sheet Report

https://github.com/ProdataSQL/FinancialModelling/raw/main/downloads/03.%20Balance%20Sheet.pbix

image

5) Sample GL Range Rules for “Many to Many” Custom Chart of Accounts

This is sample master data spreadsheet showing Account Ranges and also Rules on how
GL or Account Codes map to those ranges.

Typically when we get “custom rules” from Financial Accountants we get something like this in a spreadsheet that becomes master data.

https://github.com/ProdataSQL/FinancialModelling/raw/main/downloads/glRules.xlsx

image

image

6) Customised AdventureworksDW2017.bak

Here. we have taken the “out of the box” AdventureworkDW2017 and added the extra "custom rules” table to support a Custom Chart of Accounts.

https://github.com/ProdataSQL/FinancialModelling/raw/main/downloads/AdventureWorksDW2017.bak

image

While the Matrix Control does has some new features for automatically handling ragged hierarchies as described by Reza Rad and others  (https://radacad.com/removing-blanks-from-organizational-ragged-hierarchy-in-power-bi-matrix-visual), it falls short of what we want to achieve on most of my projects and still doesn’t come up to where MOLAP was with its automatic support via HideMemberIf

This is the approach that I use, and I was pleased to see that DAX patterns second edition also has this type of design pattern.

https://www.daxpatterns.com/parent-child-hierarchies/#

As an example we can take the “DimAccount” parent child hierarchy from the AdventureworksDW database and render it in PowerBI. on first cut it comes out as below.

The challenge is that we need to be able to determine when the current  scope of a node is below the maximum depth.

image

To eliminate these blanks in DAX, there are a few approaches, but I’ll cover off using ISINSCOPE. The approach is as below

1) I assume that there is a PATH column in order to render the hierarchy like below
AccountPath = PATH ( Account[AccountKey], Account[ParentAccountKey] )

2) We add a calculated column to determine the Depth of the Node. as below

Account Depth = PATHLENGTH(Account[AccountPath] )

3) we add a calculated measure to determine the scope of the current node that is being navigated to. In this case the hierarchy can be between 2 and 6 levels deep.

AccountScope = switch(TRUE(), ISINSCOPE( Account[Account L6]), 6, ISINSCOPE( Account[Account L5]), 5, ISINSCOPE( Account[Account L4]), 4, ISINSCOPE( Account[Account L3]), 3, ISINSCOPE(Account[Account L2]),2 )

When that is in play we can see that the current scope of selected nodes is below the maximum depth of a hierarchy (Example below)
image

So the final step is to create a measure that returns 1 or null depending on if the level is in scope and multiply the Report measures by this. as below

HideAccount = if(Account[AccountScope] <= min(Account[Account Depth]) ,1)

and

GL Amount = CALCULATE(sum('General Ledger'[BaseAmount] ) * Account[HideAccount])

You loose a bit of DAX performance using a “helper” measure rather than “inlining” the IF, but I do find this approach easier to support, so the choice is yours.

Once we put this into play, we can now get the desired result of a ragged hierarchy with no blanks!

image

Coming off the tails of the hugely successful SQL Saturday Dublin events, Data Ceili – Dublin 2020 is shaping up to be Irelands largest Microsoft Data Platform event.

We still have till then end of Jan 2020 for speaker submissions and we have over 180 submissions from about 60 Local and International speakers. A who’s who of MVPs, experts and product team speakers to share their knowledge.

What has changed a lot since the first event in 2013 in Dublin is the sheer breadth of the Data Platform, hence the name change. This new style of event embraces the fact that modern data professional in Ireland use a diverse toolset including Power BI, Power Automate, Powershell, AI, ML, Bots, ADF, Containers Kubernetes, Databricks, Analysis services and more Azure features than one person can possibility master.

For me, I love getting an insight into emerging technologies which I’m not using yet, but plan to in the near future. I like speakers like Simon Whiteley who combine architectural principles I like such as meta data based frameworks to bleeding edge technologies like ADF and Databricks, which people are only just getting to grips with.

I also love taking a technology that I am weaker in (cough Powershell) and seeing how more dedicated veterans like Rob Sowell use it to a different level.

Often I will go an see a seasoned speaker like Kevin Kline, not so much about the topic, but more about improving the soft skills of delivery. Kevin is a god of the speaker world – I even saw him deliver a lightning talk at 10 seconds notice about a bar of chocolate of all things, with the entire audience transfixed.

Personally, much of the take away from an event this size for me, is also the people you meet: Speakers with new ideas, bumping into customers, and mixing with fellow data platform professionals. I’ve got multi-year projects on the back of events, found new employees and built a network of goto experts I can subcontract to when things need real specialists.

Inspired by the DBA tools (https://dbatools.io/) library for automating all things SQL Server DBEngine I created some Powershell Library functions to automate SSIS Build and Deployment. We use this on projects to move SSIS closer to the world of DevOps and that magic “10 deploys a day”. These Powershell functions work well when added to a VSTS deployment and release pipeline.

 

Start-DbaSsisBuild

This library function users the command line version of visual studio to build an SSIS project and create the “ispac” file that contains the packages.

Example usage below

image

 

Start-DbaSsisDeploy

This library function uses the Microsoft.SqlServer.Management.IntegrationServices Namespace to deploy ispac file to a folder in the SSIDB database. This similar functionality to the Integration Services deployment Wizard. Example of it in action is below

 

image

 

Installation and Source Code

You can get the source code and installation instructions to play around with the Powershell from my GitHub project here

https://github.com/bobduffyie/ssisTools

Cannot Start PolyBase Services

by Bob Duffy 22. February 2017 12:45

One of my pet hates attending SQL talks is when the “expert” speaker spends most of the time blabbing on about “install” and “configuration” difficulties they had. It kinda paints a picture that they probably haven’t actually done much with the technology yet.

Saying that I thought I would add some notes on problems starting the PolyBase Services that are installed with SQL 2016 as I’ve struggled with them a bit – so you guessed it by my own logic I am certainly not an expert on PolyBase!

Three common gotchas found:

1) TCP/IP Must be enabled

As PolyBase is designed to connect to HADOOP and other cloud based data sources it obviously uses TCP/IP, but bear in mind that developer edition which we all install on our laptops defaults to no enabling TCP/IP. Luckily Greg Lowe and some other MVPs figured this out before me http://sqlblog.com/blogs/greg_low/archive/2016/07/06/sql-server-2016-polybase-services-stuck-in-change-pending-state.aspx

Hopefully Microsoft can maybe add a warning or check on the install to fix this in the future.

2) Server Name Change is not supported

In my rush to setup my new laptop I installed SQL Server BEFORE changing the computer name from its default. While DBEngine supports a name change via the sp_dropserver and sp_addserver procs, the PolyBase Service will just not start with an error message in the appliction log of “No such host is known”.

The only workaround we found was to uninstall and re-install the PolyBase Services – which actually doesn't take that long.

I added a connect item to vote for a fix on this

https://connect.microsoft.com/SQLServer/feedback/details/3124189

3) Service Name Change needs CU

If you wanted to change the service account (Scale out PolyBase needs a domain account rather than the default network service account) then in the past this was an uninstall and re-install job to get the correct rights assigned to the account.

Now this is supported, but only if you have applied a newer CU

https://support.microsoft.com/en-us/help/3209896/update-to-enable-changing-the-polybase-service-accounts-without-reinstalling-the-polybase-feature-in-sql-server-2016

I’ve recently been working almost exclusively on projects running SQL Server in Azure (IaaS) and the question came on the best way to handle backup and data protection. We accept that for pretty much all SQL Server based solutions we need to have data and potentially logs backups if RTO is more frequent than about 24 hours, but the choice is do we backup to local disk/file share/3rd party product, or use the new kid in down backup to azure blob store (aka backup to URL)
https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-use-storage-sql-server-backup-restore

Well, when we say a “good” choice we probably mean three principal attributes:

  • Is it safe
  • Is it cheap
  • Is it fast

Is backing up to a Azure blob store (TO URL) Safe ?

Two types of “safe” to consider: safe from data loss and safe from some 14 year old expert hacker in china or disgruntled ex employee.

WRT data loss the Azure storage accounts come with a 99.99% SLA for “hot” storage and 99.9% SLA for the cheaper cool storage which we would recommend for backup as its dirt cheap. In addition it keeps a whopping THREE copies of your files within a data centre and if you pay the double wonga for geo replication to another data centre (Amsterdam for us here in Dublin) , you get SIX copies of the data. So I think its safe to say (you see what I did there ), that putting sql backups onto azure storage is a safe bet, probably safer than anything any of our customers have on their on premise infrastructure.

Image result for azure storage three copies

Another big plus for azure storage is its automatically “off site” and hopefully not in the same fault domain as your production services, so no eggs in one basket like you may have on premise. If you decide to NOT put your sql backups into the cloud and then have no offsite copy then you've probably increased risk of data loss, not reduced it !

WRT being safe from hackers and disgruntled ex employees that's not so clear. As azure storage account is classed as a “public” service its pretty much always available over https if you have the right account credentials, or access keys on the storage account. It relies on good people and process to secure (doesn't sound good).

One massive mitigation is the move away from single point access keys to Shared Access Signatures or SAS. These allow for a per service security token to be granted which can restrict based on time duration,  allowed IP addresses, container and access rights allowed. For example maybe the production sql server could just have write access to the azure storage account for the specified IP address for a specific container. If it was compromised, then the attacker can’t do much.

One complaint I have is that the SAS token generated by using the wizard in SSMS is pretty much “full” access rights. What happened to “secure by default” Microsoft! Hopefully later revisions of SSMS will allow for more locked down SAS tokens for backup. In the interim, no worries, you can generate your own SAS tokens from azure portal or Powershell if you want the more beefed up security.

 Is backing up to a Azure blob store Cheap?
Currently 1TB of cold azure blob storage comes in at around 8 euro per TB per month before discounts, and if your running a large 24x7 in azure I would hope you at least have an EA or some other form of reduced price scheme and don’t “pay as you go”.

Geo replication is about double that with a bit more if you want the geo site to be readable, and IMO its rude to geo replicate data and not have it readable at the remote site.
https://azure.microsoft.com/en-us/pricing/details/storage/blobs/

So whether you consider it “cheap” depends on how much you currently pay for backup storage. If you are a SOHO using a small NAS unit with cheapo 5TB SATA drives that cost a hundred euro each and have near zero OPEX/support, then maybe you think this is expensive. If you are paying for a ridiculously expensive SAN, along with support and OPEX costs, then you might be paying anything from 2k to 10k per TB, so 8 euro a month is definitely cheap. Especially give then “offsite” nature.

 Is it fast ?

Well, this is probably the most important consideration for a DBA thinking about maintenance window or recovery run book.

Each storage account is limited to about 20,000 IOPS (and to be honest I haven't yet figured out how that maps to MB/Sec throughout limits), but what I do know is that I ran some simple tests by backing up a 150 GB database recording the effective throughput below on different VM sizes in the “DSv2” series using a single 1TB P30 SSD for the data volume.

The 2 core VM put out a modest 79 MB/Sec and the most interesting thing to note was that this was faster than backing up to the Local SSD. Another interesting point is that backing up to cold read only geo-replicated storage was about the same performance profile.

image

The four core VM with identical configuration managed to get 127MB/Sec

image

The eight core VM achieve 253 MB/Sec

image

And the sixteen core VM got to a tasty 346 MB/Sec

image

I have no idea if anyone else got similar backup performance from VMs in Azure ? post to comments if you did/didn’t.

One major recommendation is that if you are every doing a emergency restore in Azure on a big data estate, you probably want to increase the core count to “max”, do the restore and then shrink the core count back to normal.

I did mess around with using multiple storage accounts to “stripe” backups across, but it only made a 10% difference throughput wise in two scenarios:

a) When I was backing up from on-premise with higher latency to storage account. Striping the backup increased the thread count which as we all know is mucho better for saturating bandwidth on high latency links. By the way fact of the day is that by default the backup uses six threads per URL backed up to (or so we observed).

b) On the 16 core VM we started to see some benefit from striping. I guess that on smaller VMs it just not possible for the a single VM to “stress” the storage account, so using multiple storage accounts wasn't worth the trouble for us.

With a “ragged” hierarchy we often need a way to make sure that the hierarchy is displayed in a user friendly fashion when measures can be linked to any level (including parents).

There are two established solutions

1) The Pure DAX approach

Alberto was one of the first to publish a DAX workaround for the problem of ragged hierarchies here:

http://sqlblog.com/blogs/alberto_ferrari/archive/2011/07/19/parent-child-hierarchies-in-tabular-with-denali.aspx

This workaround starts to get very complex when you have multiple hierarchies and performance can suffer on large hierarchies as the evaluation is at run time, but it does work and is “supported”.

2) HideMemberIf

BIDS helper exposes an “unsupported” technique to add HideMemberIf into the XMLA of the model definition. https://bidshelper.codeplex.com/ 

This solution looks more appealing as a lot of the work is done during ProcessRecalc when it does Hierarchy Processing – hopefully giving benefit at query time. It also involves less calculated measures, columns and code which sounds nice.

However two words of warning:

a) There is no guarantee that this will work in later editions

b) There are stability issues with a large number of levels. On the model we are working the ProcessRecalc and visual studio will just hang when we try to set this property on a NINE level hierarchy.

 

In the end we did use HideMemberIf, but only set the property on specific levels which we know can have measures attached to them to avoid the stability issues of having this turned on for all levels.

Page List

Page List