I have been noticing one very common error that occurs while trying to failover an Availability Group in SQL Server 2012 AlwaysON setup. The Availability Group fails to come online during a fail over process.
Here is the snippet of the error message.
Failed to bring availability group ‘availability_group‘ online. The operation timed out. Verify that the local Windows Server Failover Clustering (WSFC) node is online. Then verify that the availability group resource exists in the WSFC cluster. If the problem persists, you might need to drop the availability group and create it again.
This error is usually cause due to the lack of permission for [NT AUTHORITY\SYSTEM] account to create the high-availability group. Running the below SQL script in all the replica secondaries fixed this issue.
GRANT ALTER ANY AVAILABILITY GROUP TO [NT AUTHORITY\SYSTEM] GO GRANT CONNECT SQL TO [NT AUTHORITY\SYSTEM] GO GRANT VIEW SERVER STATE TO [NT AUTHORITY\SYSTEM] GO
According to Microsoft, The [NT AUTHORITY\SYSTEM] account is used by SQL Server AlwaysOn health detection to connect to the SQL Server computer and to monitor health. When you create an availability group, health detection is initiated when the primary replica in the availability group comes online. If the [NT AUTHORITY\SYSTEM] account does not exist or does not have sufficient permissions, health detection cannot be initiated, and the availability group cannot come online during the creation process.
The below snippet shows that the NT AUTHORITY\SYSTEM runs sp_server_diagnostics which is a new internal procedure that runs on a continuous basis. This captures diagnostic data and health information about SQL Server to detect potential failures just like we had default trace (MSSQL\LOG\*.TRC) in the past
Hope this article helped you to understand the importance of the NT AUTHORITY\SYSTEM in AlwaysOn configured SQL 2012 servers.
via Failed to bring availability group ‘[availability group name]’ online.
Initially got this error..
Connecting to WIN2K8R2-3…
Msg 41131, Level 16, State 0, Line 3
Failed to bring availability group ‘SQL00CansaAG01’ online.
The operation timed out. Verify that the local Windows Server Failover Clustering (WSFC) node is online. Then verify that the availability group resource exists in the WSFC cluster. If the problem persists, you might need to drop the availability group and create it again.
Disconnecting connection from WIN2K8R2-3…
After giving 3rd node Nt Authority\system to sysadmin started getting this error..
Actually my third node is in multisubset environment and gets below error.
Connecting to WIN2K8R2-3…
Msg 41066, Level 16, State 0, Line 3
Cannot bring the Windows Server Failover Clustering (WSFC) resource (ID ‘6d9c7675-aabb-40ab-903e-f54b5eaf472d’) online (Error code 5942).
The WSFC service may not be running or may not be accessible in its current state, or the WSFC resource may not be in a state that could accept the request. For information about this error code, see “System Error Codes” in the Windows Development documentation.
Msg 41160, Level 16, State 0, Line 3
Failed to designate the local availability replica of availability group ‘SQL00CansaAG01’ as the primary replica. The operation encountered SQL Server error 41066 and has been terminated. Check the preceding error and the SQL Server error log for more details about the error and corrective actions.
Disconnecting connection from WIN2K8R2-3…
Hi Rakesh, did you ever got to resolve this error? Can you please share here if you did? Thank you
You are a lifesaver!!! I was playing with Windows Failover Cluster in our QA environment, evicting nodes and such (long story) … Anyways, when I tried to create an availability group on one node it fails with the same exact error you posted here. However, creating an availability group on another node on the same cluster group, it works. I couldn’t find a solution online and was close to just wacking my WSFC install. Thank you!
Were you able to resolve this error. I am getting the same error!
hi I’m getting the same error but all the nodes are online but when I try manual fail over to the secondary node I’m getting
The Cluster service failed to bring clustered role ‘AG-name ‘ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.
could you please tell me what went wrong
SQL 2022 and it’s still useful 🙂 Thanks!
[…] One of the common issues I found from a Google search is permissions for NT AuthoritySystem (see SQL Server – Failed to bring availability group ‘[availability group name]’ online), which I confirmed it […]