Friday, March 10, 2017

SCOM 2016 Agent Crashing Legacy IIS Application Pools

SCOM 2016 has been generally available since late last year and as is usually the case with new versions of software, compatibility issues begin to rear their heads as more organizations begin to adopt it.


During one of our recent SCOM 2016 deployments we encountered an issue where the agent (referred to as the Microsoft Monitoring Agent) was deployed to an IIS server - initially without any apparent problems. However, when the IIS server was restarted some time later to accommodate some Windows updates, the IIS Application Pools began to crash regularly. A check of the Windows Event Log on the server threw up the following Event ID 1000 error:

Log Name: Application
Source: Application Error
Date: 24.02.2017 10:42:30
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: H-SPDEMO01.nimbuscorp.com
Description:
Faulting application name: w3wp.exe, version: 8.0.9200.16384, time stamp: 0x50108835
Faulting module name: PerfMon64.dll, version: 8.0.10918.0, time stamp: 0x577fd168
Exception code: 0xc0000409
Fault offset: 0x0000000000149794
Faulting process id: 0x2c38
Faulting application start time: 0x01d24405d195eb6a
Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
Faulting module path: C:\Program Files\Microsoft Monitoring Agent\Agent\APMDOTNETAgent\V8.0.10918.0\PerfMon64.dll

Identifying the Issue

The 'Faulting Module Path' in the above application error pointing to the Application Performance Monitoring (APM) component of the agent was the first give-away to us that SCOM was the culprit. A quick uninstall of the SCOM 2016 agent and recycle of the Application Pools gave us confirmation when the errors and crashes went away.

The Microsoft Monitoring Agent APM component comes bundled in the form of a Windows service as part of the initial agent installation but is disabled by default as shown below:


APM is typically enabled through the SCOM console on a server-by-server basis and delivers some really nice DevOps scenarios for monitoring .NET workloads at a code level.

I've previously blogged about SCOM APM, presented on it at conferences and even wrote a chapter in the Mastering SCOM 2012 R2 book about it. A number of our customers also use this feature and it's always been very successful.

The weird thing about this particular IIS crashing issue though was that the APM feature was never enabled.

We needed to dig deeper to see how we could continue monitoring this server using SCOM without having the IIS Application Pools crashing and as the faulting module path in the Event Log error referenced the APM component, I decided to focus here first.

If you use the command line to install your SCOM agents, you can specify a parameter that removes the APM feature from the agent installation (check out this link for command line options) and I figured this was the best place to start as the IIS server in question didn't require the APM feature.

Removing the Agent APM Feature

In the following steps, I'll walk you through a process to remove the APM feature on the SCOM 2016 agent using the command line. The first walk-through will perform an in-place repair on an existing agent to save you from having to uninstall the agent first. An added benefit of the repair option is that the agent will stay registered as Remotely Manageable in the console and you'll avoid having to follow this process to change them.

The second walk-through will show you how to use the command line to perform a new agent installation that doesn't contain the APM feature.

(Repair Agent Install Option)

Copy the SCOM 2016 Agent installation folder (amd64) from your SCOM server to the IIS server (this folder is located at "C:\Program Files\Microsoft System Center 2016\Operations Manager\Server\AgentManagement")


Log on to the IIS server and open a command prompt using an administrative account. From there, browse to the location where you saved the SCOM 2016 Agent folder to and run the following command:

msiexec.exe /i momagent.msi NOAPM=1


This command will then launch the Microsoft Monitoring Agent Setup installer shown in the following image..


Click Next and if you've already installed the SCOM 2016 agent to your IIS server, then you'll be presented with the Program Maintenance window shown below.


Select the Repair option and hit Next to move on.

Hit Install at the next window and the agent repair should kick off. After a minute or so, the agent repair job will be complete and you'll be presented with the following confirmation of success...


Now if you open the Windows Services (services.msc) snap-in and check the services listed for the Microsoft Monitoring Agent, you'll see that the APM component is no longer installed as shown here..



With the agent sucessfully repaired, it'd be a good idea to check the Update Rollup version of the agent and if needs be, to re-apply the latest one (UR2 at this time). You can check the UR version of your agent by importing the awesome SCOM Agent Version Addendum Management Pack from Microsoft's Kevin Holman.

Recycle the IIS Application Pools on your IIS server to ensure the new agent changes take affect.

(New Agent Install Option)

If you've already uninstalled the SCOM 2016 agent from your IIS servers or haven't yet deployed it, then follow these steps to get it deployed without the APM feature:

Copy the SCOM 2016 Agent installation folder (amd64) from your SCOM server to the IIS server (this folder is located at "C:\Program Files\Microsoft System Center 2016\Operations Manager\Server\AgentManagement")


Log on to the IIS server and open a command prompt using an administrative account. From there, browse to the location where you saved the SCOM 2016 Agent folder to and run the following command:

msiexec.exe /i momagent.msi NOAPM=1


This command will then launch the Microsoft Monitoring Agent Setup installer where you will need to click Next and then hit the I Agree button in the following window to accept the license agreement.

At the Destination Folder window, confirm the installation path for the agent and click Next to continue.

When you see the Agent Setup Options window, select Connect the agent to System Center Operations Manager (shown below), then hit Next.


At the Management Group Configuration window, fill in the information required to connect the agent to your SCOM environment and remember that the Management Group Name field is case sensitive!


Click Next to continue and at the Agent Action Account window, leave the Local System option selected, then hit Next again.


Review your installation settings at the Ready to Install window, then click Install to deploy the agent without the APM feature.

When the agent has installed, open the Windows Services (services.msc) snap-in and check the services listed for the Microsoft Monitoring Agent, you'll see that the APM component is no longer installed (see image below).


As this is a new manually installed agent, you will need to change the Remotely Manageable status of the agent back to Yes by following the steps in Kevin Holman's post here (although this post references SCOM 2007, the steps are still the same for SCOM 2016).

You will also need to install the latest update rollup - which is currently at UR3 - and if you've set the Remotely Manageable status back, you should be able to push the update rollup out from the SCOM console. Make sure to reference the SCOM Agent Version Addendum Management Pack to deliver easier visibility of your agent UR versions.

Recycle the IIS Application Pools on your IIS server to ensure the new agent changes take affect.

***Update 20th August 2017 - Handy Tip for Reinstalling Agents: Microsoft's Kevin Holman and Brian Barrington have come up with a nice workaround to easily reinstall your agents with the NOAPM switch. Check it out here.***

More Information About This Issue

This issue was the first time in years that I've encountered a scenario where the SCOM agent 'broke' something and as such, I wanted to investigate it a bit further and raise it with Microsoft. One of the massive benefits of being a Microsoft MVP for me is that I get the opportunity to interact with the SCOM Product Group on a regular basis.

After a few emails back and forth, the awesome folks on the Product Group came back to me with the following detailed information about the issue:

  • Issue affects only IIS Application Pools running .NET Framework 2.0/3.5 and can be seen on any version of Windows Server or IIS that hosts these pools.
  • Switching the IIS pool to .NET Framework 4.0 (or higher) will solve the issue however, this is not a suitable workaround for SharePoint as SharePoint 2010 doesn't support 4.0 pools.
  • If you need to deploy to the system with IIS running pools 4.0+ - no action is needed and a default installation of the SCOM 2016 Agent will work fine.
  • For now, if you need to deploy to a server running .NET Framework 2.0/3.5 application pools, then you'll need to either install the SCOM 2016 agent with the NOAPM=1 switch (following my walkthrough above) or you can continue to use the SCOM 2012 R2 agent as it’s forward compatible with SCOM 2016 and doesn't crash these application pools.
  • A permanent fix for this issue should be included in Update Rollup 3 for the SCOM 2016 agent.

***Update March 21st 2017: The Product Group have also just released a blog on this issue and in their post, they have confirmed that it will be resolved in Update Rollup 3 along with a chance that a hotix may be released sooner. Check out their blog post here.***

***Update 24th May 2017: Microsoft has just released Update Rollup 3 and you can read all about it in my post here.***

***Update 31st May 2017: Microsoft has just posted that this agent crash issue is still not resolved with UR3 (not cool...). I'll update here when we have more info***

***Update 6th June 2017: Microsoft has posted more information about this issue remaining after UR3 and have mentioned a hotfix is still in the works. Check out their latest post here.***

Conclusion

Although this issue has been an annoyance, it's good to know that it only affects a small subset of legacy systems and the workaround is relatively simple to implement. Hopefully this blog post will help to serve people who've already deployed SCOM 2016 (or who are about to deploy it) and need to monitor legacy IIS application pools prior to the release of UR3.