I’ve recently been working almost exclusively on projects running SQL Server in Azure (IaaS) and the question came on the best way to handle backup and data protection. We accept that for pretty much all SQL Server based solutions we need to have data and potentially logs backups if RTO is more frequent than about 24 hours, but the choice is do we backup to local disk/file share/3rd party product, or use the new kid in down backup to azure blob store (aka backup to URL)
Well, when we say a “good” choice we probably mean three principal attributes:
- Is it safe
- Is it cheap
- Is it fast
Is backing up to a Azure blob store (TO URL) Safe ?
Two types of “safe” to consider: safe from data loss and safe from some 14 year old expert hacker in china or disgruntled ex employee.
WRT data loss the Azure storage accounts come with a 99.99% SLA for “hot” storage and 99.9% SLA for the cheaper cool storage which we would recommend for backup as its dirt cheap. In addition it keeps a whopping THREE copies of your files within a data centre and if you pay the double wonga for geo replication to another data centre (Amsterdam for us here in Dublin) , you get SIX copies of the data. So I think its safe to say (you see what I did there ), that putting sql backups onto azure storage is a safe bet, probably safer than anything any of our customers have on their on premise infrastructure.
Another big plus for azure storage is its automatically “off site” and hopefully not in the same fault domain as your production services, so no eggs in one basket like you may have on premise. If you decide to NOT put your sql backups into the cloud and then have no offsite copy then you've probably increased risk of data loss, not reduced it !
WRT being safe from hackers and disgruntled ex employees that's not so clear. As azure storage account is classed as a “public” service its pretty much always available over https if you have the right account credentials, or access keys on the storage account. It relies on good people and process to secure (doesn't sound good).
One massive mitigation is the move away from single point access keys to Shared Access Signatures or SAS. These allow for a per service security token to be granted which can restrict based on time duration, allowed IP addresses, container and access rights allowed. For example maybe the production sql server could just have write access to the azure storage account for the specified IP address for a specific container. If it was compromised, then the attacker can’t do much.
One complaint I have is that the SAS token generated by using the wizard in SSMS is pretty much “full” access rights. What happened to “secure by default” Microsoft! Hopefully later revisions of SSMS will allow for more locked down SAS tokens for backup. In the interim, no worries, you can generate your own SAS tokens from azure portal or Powershell if you want the more beefed up security.
Is backing up to a Azure blob store Cheap?
Currently 1TB of cold azure blob storage comes in at around 8 euro per TB per month before discounts, and if your running a large 24x7 in azure I would hope you at least have an EA or some other form of reduced price scheme and don’t “pay as you go”.
Geo replication is about double that with a bit more if you want the geo site to be readable, and IMO its rude to geo replicate data and not have it readable at the remote site.
So whether you consider it “cheap” depends on how much you currently pay for backup storage. If you are a SOHO using a small NAS unit with cheapo 5TB SATA drives that cost a hundred euro each and have near zero OPEX/support, then maybe you think this is expensive. If you are paying for a ridiculously expensive SAN, along with support and OPEX costs, then you might be paying anything from 2k to 10k per TB, so 8 euro a month is definitely cheap. Especially give then “offsite” nature.
Is it fast ?
Well, this is probably the most important consideration for a DBA thinking about maintenance window or recovery run book.
Each storage account is limited to about 20,000 IOPS (and to be honest I haven't yet figured out how that maps to MB/Sec throughout limits), but what I do know is that I ran some simple tests by backing up a 150 GB database recording the effective throughput below on different VM sizes in the “DSv2” series using a single 1TB P30 SSD for the data volume.
The 2 core VM put out a modest 79 MB/Sec and the most interesting thing to note was that this was faster than backing up to the Local SSD. Another interesting point is that backing up to cold read only geo-replicated storage was about the same performance profile.
The four core VM with identical configuration managed to get 127MB/Sec
The eight core VM achieve 253 MB/Sec
And the sixteen core VM got to a tasty 346 MB/Sec
I have no idea if anyone else got similar backup performance from VMs in Azure ? post to comments if you did/didn’t.
One major recommendation is that if you are every doing a emergency restore in Azure on a big data estate, you probably want to increase the core count to “max”, do the restore and then shrink the core count back to normal.
I did mess around with using multiple storage accounts to “stripe” backups across, but it only made a 10% difference throughput wise in two scenarios:
a) When I was backing up from on-premise with higher latency to storage account. Striping the backup increased the thread count which as we all know is mucho better for saturating bandwidth on high latency links. By the way fact of the day is that by default the backup uses six threads per URL backed up to (or so we observed).
b) On the 16 core VM we started to see some benefit from striping. I guess that on smaller VMs it just not possible for the a single VM to “stress” the storage account, so using multiple storage accounts wasn't worth the trouble for us.