When setting up an AlwaysOn Availability Group with a synchronous replica you may find that the secondary replica never becomes synchronized and you are presented with the following error “Data synchronization state of some availability database is not healthy”. Effectively the databases restore to the replicas, but synchronization does not start.
The first thing to check is network connectivity. We must bear in mind that the setup Wizard uses the netbios name and the endpoint then transmitts the compressed log stream uses the fully qualified domain name on the endpoint. (AlwaysOn uses the same endpoints as mirroring with the same default port - 5022)
In my case the cluster installs a hidden virtual network device for heartbeat communication and this network connection had made its way to the top of the binding order, messing up connectivity.
You can double check the network by using ipconfig to verify the binding order and also pinging the node and its partners to check correct IP addresses are used.
We can see from the below that the ip used is one from the failover cluster and not the proper network card
The screenshot below shows that routing is coming from the hidden failover
cluster virtual network and not the proper network card.
So what’s the solution ?
Two options spring to mind.
a) Use fixed IP addresses for all nodes and then add entries to the “hosts” files for both netbios and fqdn.
b) Download the Nvspbind tool to change the binding order from the command line
After using nvspbind and rebooting I now get the proper IP address and not the automatically assigned one from the hidden “Failover Cluster Virtual Network Interface” (see below).
Lessons for Production Servers for Always On
1. When setting up a Failover cluster for Always On consider using fixed IP addresses for nodes.
2. Do a ping test of both computer name and FQDN of all nodes before creating Availability Groups!