SQL Centre of Excellence

Most SQL folk know that we need to run DBCC frequently and the lucky ones who have attended some Paul Randal training (www.sqlskils.com) will be well aware of the intricacies of DBCC CHECKDB and when/how the repair databases and data loss risks.

Well what if you also have a SAN and clustered Hyper-V environment instead of just SQL and you get some disk corruption on your non sql servers.

1) What is the SAN equivalent of DBCC?

Most SAN’s will have a background process that checks for a disk inconsistencies like bad sectors, differences between the mirror and and the partner disk in a RAID set and other things like that. On our MD3000 its turned on by default (see below)

image

2) How do I know when a get a problem?

The SAN can pick up a corrupted disk issue in to ways:

a) the media scan can find it on a periodic check
b) When reading IO the controller may get a physical error or a Logical error (data was found on a second attempt). This will appear in the SAN event log and if configured correctly the SAN can usually send out an email (see error below)

image

In out case we got the error and then thought “oh dear”, we have like 50 virtual servers running on that storage array with over 10 volumes. How do we narrow down the issue!

2) Narrowing Down the Issue to which disk and Hyper-V image

The MD3000 like most SAN’s can produce support information, and in this case we can get ZIP file with lots of config and diagnostics data in it. one of the files contains more detail on unreadable sectors

image

 

Looking into the “UnreadableSectors” text file we can now see which actual disk had the Physical Errors and what volume is affected:

image

On the down side in our case this volume is 2TB and runs 20 hyper-v images, and we really don’t want to attempt to restore all of them, so we want narrow down the problem to which hyper-v image and which virtual host.

3) What is the Clustered Shared Volume version of DBCC ?

Now that we know which volumes affected we can ran a “chkdsk” on it to look for bad sectors and/or attempt a repair. We can choose to attempt the repair or just find out which images are affected.

One minor issue is in our case we use “Clustered Shared Volumes” so the normal chkdsk can not be used, we need to use PowerShell to locate the offending image (see below)

image

Ok, so that explains why the mail server just started blue screening Winking smile, but good news is that VHD file is just the operating system disk. We could restore that or see if the damage is fixable.

2) Narrowing Down to which files “inside” the Hyper-V image. the disk version of DBCC

Now that the parent volume is “repaired” we can go into the guest and run a chkdsk to see what the damage is and if it is “fixable”. in this case a bunch of windows DLL’s are going to be deleted, some re to do with wireless networking (no big deal on a server), others not too sure…

image

 

3) Should we restore backup or repair now we know what the problem is and can delete the offending DLL’s ?

Tricky one, I think  like DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS the general guidance is to restore to last known good backup, but In our case we are due to upgrade in a ahem “week or so” to a new operating system/email server and we don’t have any SLA on our email server, so we opted to just fix the errors and run it for another short while then go with the upgrade to exchange 2010.

If the corruption had have been in the exchange data files we would have had to use the “eseutil” which is exchange servers version of “DBCC”, but as these volumes were not affected we were quite lucky.

4) Lessons Learned?

There are may tools that do the DBCC thing for different products. Learn how one works and they kind of all work in the same way

- SQL Server had DBCC CHECKDB
- The SAN has media scan settings in its GUI or CLI
- Clustered Shared Volumes use PowerShell “Repair-ClusteredSharedVolume” cmdLet
- normal windows volumes use “chkdsk”
- exchange server uses “eseutil.exe”

Compress an entire database

by Bob Duffy 16. September 2011 04:30

I posted a script earlier to generate the script to compress an entire database. here is one to do the compress/rebuild partition by partition. one thing to bear in mind is that ONLINE rebuild is not supported  when rebuilding partition by partition. See http://technet.microsoft.com/en-us/library/ms188388.aspx 

My simpler script to just do compression at the table level is here

http://blogs.msdn.com/b/boduff/archive/2010/02/23/how-good-is-data-compression-in-sql-2008.aspx

So I think the best advice is “if” you can go offline, then do it a partition at a time. if you cannot, then do the entire table and accept the longer process.

/*
    Script to Produce Compression Statements  
    Notes: Can change MAXDOP and only select one type of index if needed

*/

select 'ALTER ' + case when si.type =1 then 'INDEX [' + si.name + '] ON ' else 'TABLE ' END +  ' [' + s.name + '].[' + o.name 
+ '] REBUILD PARTITION=' 
+ case when ds.type='PS' then convert(varchar,sp.partition_number) else 'ALL' end 
+ '  WITH (DATA_COMPRESSION=PAGE, MAXDOP=8, FILLFACTOR=100'
+ case when  ds.type='FG' then ' ,ONLINE = ON' else '' end -- Cannot REBUILD INDEX ONLINE for a partition
+'); ' 
from sys.indexes  si
inner join sys.objects o on o.object_id =si.object_id
inner join sys.schemas s on s.schema_id=o.schema_id 
inner join sys.partitions sp on si.index_id =sp.index_id  and sp.object_id =o.object_id 
inner join sys.data_spaces ds on ds.data_space_id =si.data_space_id 
where si.type>0 and o.type='U'
/* comment these out as appropriate if you only want to do one type of index or one table*/
--and si.type_desc ='CLUSTERED'
--and si.type_desc ='NONCLUSTERED'
--and o.object_id=object_id('DatabaseLog')
order by s.name, o.name, si.type, si.index_id

Page List

Page List