Exchange 2013 Dag BSOD

We’ve recently been having issues with an Exchange 2013 DAG running CU2v2.

The server would reboot randomly with a BSOD. Exchange services would not start. I took the dump file and ran it through windbg and got the following output:

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

CRITICAL_PROCESS_DIED (ef)
        A critical system process died
Arguments:
Arg1: fffffa800b415080, Process object or thread object
Arg2: 0000000000000000, If this is 0, a process died. If this is 1, a thread died.
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

----- ETW minidump data unavailable-----

PROCESS_OBJECT: fffffa800b415080

IMAGE_NAME:  wininit.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  0

MODULE_NAME: wininit

FAULTING_MODULE: 0000000000000000 

PROCESS_NAME:  MSExchangeHMWo

BUGCHECK_STR:  0xEF_MSExchangeHMWo

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT_SERVER

CURRENT_IRQL:  0

ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) amd64fre

LAST_CONTROL_TRANSFER:  from fffff80230e04795 to fffff802308db440

STACK_TEXT:  
fffff880`083419a8 fffff802`30e04795 : 00000000`000000ef fffffa80`0b415080 00000000`00000000 00000000`00000000 : nt!KeBugCheckEx
fffff880`083419b0 fffff802`30d9be2e : fffffa80`0b415080 00000000`144d2c41 00000000`00000000 fffff802`30a57794 : nt!PspCatchCriticalBreak+0xad
fffff880`083419f0 fffff802`30d12a01 : fffffa80`0b415080 00000000`144d2c41 fffffa80`0b415080 00000000`00000000 : nt! ?? ::NNGAKEGL::`string'+0x4a25a
fffff880`08341a50 fffff802`30d1880e : ffffffff`ffffffff fffffa80`0c889380 fffffa80`0b415080 00000000`00000001 : nt!PspTerminateProcess+0x6d
fffff880`08341a90 fffff802`308da453 : fffffa80`0b415080 fffffa80`0ce7c080 fffff880`08341b80 00000000`ffffffff : nt!NtTerminateProcess+0x9e
fffff880`08341b00 000007ff`5f312eaa : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`237fdeb8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x000007ff`5f312eaa


STACK_COMMAND:  kb

FOLLOWUP_NAME:  MachineOwner

IMAGE_VERSION:  

FAILURE_BUCKET_ID:  0xEF_MSExchangeHMWo_IMAGE_wininit.exe

BUCKET_ID:  0xEF_MSExchangeHMWo_IMAGE_wininit.exe

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:0xef_msexchangehmwo_image_wininit.exe

FAILURE_ID_HASH:  {3f0b292e-4bc2-5c6e-99a7-f9a74df1101c}

Followup: MachineOwner

The failed process is MSExchangeHMWo. This is the Microsoft Exchange Health Monitor service.

Reading more in to this, it seems to happen when the server has an issue, like low memory, slow IO etc. It also seems to be a bug in CU2 and CU3.

A good thread on the issue can be found here.

You can run the following powershell command on a CU2 server to disable automatic reboots (BSOD).

Add-GlobalMonitoringOverride -Identity ExchangeActiveDirectoryConnectivityConfigDCServerReboot -ItemType Responder -PropertyName Enabled -PropertyValue 0 -ApplyVersion “15.0.712.24

If you have CU3 or another version, change the ApplyVersion to the specific version of Exchange. The versions can be found here.

Leave a Reply