SQL Centre of Excellence

How to Generate a Hash in SSIS

by Bob Duffy 30. March 2011 16:33

Its quite common to have a requirement to generate a SHA1 or MD5 Hash in SSIS to help with comparing non-key columns for “delta” management.

One question that comes up is “What’s the fastest way to generate a Hash”. I just did a little test to compare two methods:

Method a) SQL Servers HashBytes(‘SHA1’, Column1+ coalesce(Column2,’’)) function

Method b) In SSIS Using the MultipleHash Component http://ssismhash.codeplex.com/

The target data set was a 10 million row table based on the Adventure Works Sales.SaleOrderHeader table scaled up on my 64 bit laptop with an 8 core i7 CPU.

Method

Time (Secs)

Avg %Processor Time

DTEXEC Working Set (MB)

  Method A - SQL HashBytes function on source

104

50

47

Method B - SSIS MultipleHash Component

644

107

156

image

 

Conclusions

Not sure if anyone else has got better performance from generating the Hash within SSIS as opposed to SQL Server but my conclusion so far is:

  • SQL Server HashBytes function is a lot faster than doing the Hash calculation in the Dataflow (x6 in my test).
  • There are other factors to choose which method is appropriate:
    • The SSIS Custom Component is easier to use (point and click)
    • The SSIS Custom Component allows you to hash columns which are  “introduced” in the pipeline
    • The SSIS component required the shipping of a DLL (bad)
    • The HashBytes function does not support nulls so needs a lot of casting to strings and checking for nulls.
    • A checksum function is faster again, but carries more risk of not detecting a change.
blog comments powered by Disqus

Page List

Page List