Wednesday, December 15, 2010

OpsMgr Custom Alert: Alert on process using too much memory (without a monitor)

I needed to create a rule in SCOM that could alert if a process consumed more than x memory. To my surprise there is nothing out of the box in the Authoring Console that allows for this – I find this a bit strange as this is something that would be used quiet often. There are ways to create the rule I am looking for through a custom application monitor, or a VBScript, but those are time consuming and I feel are more effort than reward. Right clicking in the Authoring Console under the rules section proves my point.

For this example I am going to create an alert that will fire if notepad.exe is using more than 1.2 Mb of memory. Please Note: if you are testing this pack you will need to open notepad before importing this rule – if OpsMgr cannot resolve the performance counter at the time of the workflow initialization the workflow will be unloaded. The reason for choosing notepad is simple, it ships with every version of windows and we can easily change the amount of memory that it uses by opening a big text file.

Let’s Begin.

Open up the Authoring Console and create \ open a management pack. Navigate to Health Model, Rules, right click, and select New -> Custom Rule. Give the rule an ID and click OK. For this example my ID will be “CustomAlerts.AlertOnNotepadMemoryUsage”.

Under general give the rule a name, description and target it at the Microsoft.Windows.Server.Computerlass (or any other class that you want).

Click on the Modules Tab and create a new data source.

Select the System.Performance.DataProvider module, give it an ID and click OK.

Under the Data Source Module section select the module you just added and click Edit, on the screen that appears click Configure in the bottom left corner. The performance counter selection wizard appears. Select your counter (in this case its Process \ Working Set \ notepad). Use the picker below to choose the frequency to run this alert, finally click OK twice to return to the Modules Tab.

Now that we have our counter, the next thing we need to do is decide if an alert should be generated. Since the counter could potentially be below our threshold we will need to compare the value to decide if an alert should be created. In order to do this we will need to add a Condition Detection module to our alert, under the Condition Detection section click Create.

Select the System.Performance.SimpleThresholdCondition module and click OK.

Note: you can use the any other module you like for the condition detection in the list that appears (i.e. average threshold), but for now we are going to keep it simple by using the System.Performance.SimpleThresholdCondition module.

Once added, click the Edit button for the new module. There is no configuring wizard so you will have to either edit the module through notepad or use the dialog box on screen. As you can see there are 2 options that can be changed, they are Threshold and Operator these are pretty self-explanatory so I am going to go ahead and enter in 1258291 and Greater respectively.

Click OK twice to close the current window and return to the Modules Tab.

The last thing we will need to do is create an alert if the condition detection returns true. To do this, simply add the “System.Health.GenerateAlert” module to the Actions section. Click Edit and then Configure to bring up the configuration wizard for the alert. Enter in a name for the alert and any message that you want to appear, you can make use of the $Data$ variables to pull information back about the process. You can make use of the fly-outs on the message editor to get all values available.

Once complete your alerts screen should look something like this.

Click OK twice to get back to the Modules Tab.

At this point you are pretty much complete with the creating of your alert, I would recommend disabling the rule by default and then targeting it at the servers (or class) that you want to monitor. Additionally take the time to add some KB entry into the alert so support staff knows what to do with it appears in the console. I normally like to change the category of custom rules to its correct type (Alert in this case); you can do this on the Options Tab. You should now have a rule looking something like this.

Right, let’s test this rule.

Fire up your lab and import the pack, add any overrides to get the rule targeted at the correct computer.

I made a bit of a stuff up with the amount of memory to alert on so straight away I get an alert :/.

This means that the rule is working. I close notepad and wait for 5 min to see if I get any more alerts coming through (to ensure that the logic is working correctly). As mentioned earlier if I the performance counter cannot be found the workflow will be unloaded from the OpsMgr agent. Like clockwork 1 min later I get the following alert in the agents event log.

Log Name: Operations Manager
Source: Health Service Modules
Date: 12/15/2010 11:24:38 AM
Event ID: 10103
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
In PerfDataSource, could not find counter Process, Working Set, notepad in Snapshot. Unable to submit Performance
value. Module will not be unloaded.
One or more workflows were affected by this.
Workflow name: CustomAlerts.AlertOnNotepadMemoryUsage
Instance name:
Instance ID: {1A5EB665-A107-EEC5-7394-60D9B5EF8882}
Management group: HOME
Event Xml:


Operations Manager

Working Set

I expected that. Something interesting to note here – although the performance counter cannot be resolved the workflow is not uninitialized, I suspect this has something to do with the fact that we have already submitted data back to SCOM from this workflow. After leaving notepad closed on my target computer for a while I decide to re-open it and open a 500kb text file (“hello world” +- 5000 times), and after a minuet a new alert appears in the SCOM console.

Comparing the 2 alerts side by side I see that the memory usage reflects that the 500kb file has been opened.

That’s all there is to creating a custom alert in OpsMgr based on the memory usage of an application. Although this particular example will not be useful in production, this example should enable you to create an alert of any performance data easily.

Feel free to leave questions / comments / requests.

SCCM - migrating from WSUS to SCCM Updates

Ok - so we wanted to start using SCCM to implement updates instead of our existing WSUS hierarchy. The problem was we had a couple of years worth of approved updates in WSUS and needed this list in SCCM. Started scratching around on the interwebs and couldn't find any kind of end to end solution - I'm lazy - if someone has done it before why re-invent the wheel.

So I broke down the problem into steps
  • Connect to WSUS DB and get all the updates in approved updates
  • Create an Update list in SCCM and import the approved updates from previous step
  • Tidy up the list to get rid of unwanted/unneeded updates etc
Pretty straight forward on the surface until you start trying to tie up article ID's and knowledgebase ID's etc. I ended up using one SQL script, one Powershell script and one VBScript. I could have tidied these all up into one powershell script but we only needed this as a one off and didn't need to redo this on a daily basis.

OK - lets get the approved updates out of WSUS. Our DB is running on the WSUS default, the Windows Internal Database. As it turns out the interwebs were going to help me after all - every step of the way - so big thanks to these guys for their articles. You can connect to your instance using SQL Server Management Studio on the local machine with this connection string :
Once you are connected - this will get you a list of ArticleID's from your approved updates in WSUS:
select distinct KnowledgebaseArticle from PUBLIC_VIEWS.vUpdate UPD
join PUBLIC_VIEWS.vUpdateApproval APP on upd.UpdateId = app.UpdateId
order by KnowledgebaseArticle
You will get NULL and 000000 - These need to be removed from the ArticleID list and added manually to the update list.

Get these Updates by doing :
select distinct * from PUBLIC_VIEWS.vUpdate UPD
join PUBLIC_VIEWS.vUpdateApproval APP on upd.UpdateId = app.UpdateId
where KnowledgebaseArticle = 000000
--And to get NULL
select distinct * from PUBLIC_VIEWS.vUpdate UPD
join PUBLIC_VIEWS.vUpdateApproval APP on upd.UpdateId = app.UpdateId
where KnowledgebaseArticle is NULL
OK - so now we have a list of ArticleID's. We now need to determine the CI_ID value for each ArticleID and the script below ( needs to be modified - the line starting with Query1 = "Select.... Insert your ArticleID values in between the brackets as shown - Run this on your SCCM server and just redirect the output to a text file.
Connect ".","",""
Set connection = Connect(computer,userName,userPassword)
'On Error Resume next
' This is the line to change.
Query1 = "Select * from SMS_SoftwareUpdate where Articleid in ('940060','940357')"
' Run query.
Set ListOfResources1 = connection.ExecQuery(Query1, , wbemFlagForwardOnly Or wbemFlagReturnImmediately)
' The query returns a collection that needs to be enumerated.
Wscript.Echo " "
Wscript.Echo "Query: " & Query1
Wscript.Echo "--------------------------------------------------------"
For Each Resource1 In ListOfResources1
Wscript.Echo Resource1.CI_ID
'Wscript.Echo "Name: " & Resource1.LocalizedDisplayName
'Wscript.Echo "ArticleID: " & Resource1.ArticleID

Function Connect(server, userName, userPassword)
On Error Resume Next
Dim net
Dim localConnection
Dim swbemLocator
Dim swbemServices
Dim providerLoc
Dim location
Set swbemLocator = CreateObject("WbemScripting.SWbemLocator")
swbemLocator.Security_.AuthenticationLevel = 6 'Packet Privacy.
' If the server is local, don't supply credentials.
Set net = CreateObject("WScript.NetWork")
If UCase(net.ComputerName) = UCase(server) Then
localConnection = true
userName = ""
userPassword = ""
server = "."
End If
' Connect to the server.
Set swbemServices= swbemLocator.ConnectServer _
(server, "root\sms",userName,userPassword)
If Err.Number<>0 Then
Wscript.Echo "Couldn't connect: " + Err.Description
Connect = null
Exit Function
End If
' Determine where the provider is and connect.
Set providerLoc = swbemServices.InstancesOf("SMS_ProviderLocation")
For Each location In providerLoc
If location.ProviderForLocalSite = True Then
Set swbemServices = swbemLocator.ConnectServer _
(location.Machine, "root\sms\site_" + _
If Err.Number<>0 Then
Wscript.Echo "Couldn't connect:" + Err.Description
Connect = Null
Exit Function
End If
Set Connect = swbemServices
Exit Function
End If
Set Connect = null ' Failed to connect.
End Function
Ok so now we need to create an update list in SCCM for that purpose I used a Powershell script written by Joachim Meyer . It has to be run on the SCCM server with a bunch of parameters and needs to point to a reference machine(which actually isn't used for our purposes) The list of CI_ID's that are returned get added to an array in the powershell script - Edit the array - lines starting with [VOID] and replace the values of the CI_ID's there.

[switch] $force,
[switch] $verbose
$AppName = "Create-UpdateList"
$manpage = @'
Creates a Configuration Manager update list.
Create-UpdateList -ReferenceClient -UpdateListName [-SiteServer ] [-Force] [-Verbose]
This script creates an update list based on the inventory data of a reference client. The reference client should
represent the baseline for a specific operating system used within your organization. This client needs to be
present in the Configuration Manager database with valid inventory data. This script then creates an update list
which includes all the software updates reported as missing from the reference client.
Specifies the name of the reference client.
Specifies the name of the update list to be created.
Optional: Specifies the Configuration Manager site server. If not specified, the local computer is assumed
to be the site server.
Optional: Creates an update list even if an update list with the same display name already exists.
Optional: Generates detailed information about the script's operations.
if (!$ReferenceClient -or !$UpdateListName) {
Write-Host $manpage
if ($args.count -eq 1) {
$siteserver = $args[0]
elseif ($args.count -gt 1) {
Write-Host $manpage

if (!$siteserver) { $siteserver = $env:computername }
$namespace = "root\sms"
if ($verbose) {
Write-Host "`nReferenceClient: $ReferenceClient"
Write-Host "UpdateListName : $UpdateListName"
Write-Host "SiteServer : $SiteServer`n"
$smsContext = New-Object System.Management.ManagementNamedValueCollection
$smsContext.Add("ApplicationName", $AppName)
$smsContext.Add("MachineName", $env:computername)
$smsContext.Add("LocaleID", 1033)
$connOptions = New-Object System.Management.ConnectionOptions
$connOptions.Context = $smsContext
$path = New-Object System.Management.ManagementPath
$path.NamespacePath = "\\$siteserver\" + $namespace
$scope = New-Object System.Management.ManagementScope($path, $connOptions)
$ErrorActionPreference = “silentlycontinue”
if (!$?) {
$ErrorActionPreference = “continue”
$cred = Get-Credential
if (!$cred) {
Write-Host "No credentials supplied." -foregroundcolor Red -backgroundcolor Black
# Property "SecurePassword" requires .NET Framework 2.0 SP1 or higher!
$connOptions.Username = $cred.Username
$connOptions.SecurePassword = $cred.Password
$scope.Options = $connOptions
$ErrorActionPreference = “silentlycontinue”
if (!$?) {
Write-Host "Could not connect to site server $siteserver." -foregroundcolor Red -backgroundcolor Black
Write-Host $error[0] -foregroundcolor Red -backgroundcolor Black
elseif ($verbose) {
Write-Host "Successfully connected to \\$siteserver\$namespace."
$ErrorActionPreference = “continue”
$wqlquery = "SELECT * FROM SMS_ProviderLocation"
$query = New-Object System.Management.ObjectQuery($wqlquery)
$searcher = New-Object System.Management.ManagementObjectSearcher($scope, $query)
$providerLoc = $searcher.Get()
if (!$providerLoc) {
Write-Host "Could not get instances from the SMS_ProviderLocation class." -foregroundcolor Red -backgroundcolor Black
foreach ($providerInst in $providerLoc) {
if (!$providerInst.ProviderForLocalSite) {
Write-Host "SMS Provider $providerInst.SiteCode not set as local site server." -foregroundcolor Red -backgroundcolor Black
else {
$sitecode = $providerInst.SiteCode
$namespace = "root\sms\site_$sitecode"
$ErrorActionPreference = “silentlycontinue”
$scope.Path = "\\$siteserver\$namespace"
if (!$?) {
Write-Host "Could not connect to site server $siteserver." -foregroundcolor Red -backgroundcolor Black
elseif ($verbose) {
Write-Host "Successfully connected to \\$siteserver\$namespace."
$ErrorActionPreference = “continue”
# Check if the specified reference client already exists within the ConfigMgr database
$wqlquery = 'SELECT ResourceID FROM SMS_R_System WHERE Name = ' + '"' + "$ReferenceClient" + '"' + ' AND Active = 1'
if ($verbose) {
Write-Host "Verifying if the reference client $ReferenceClient actually exists in the ConfigMgr database."
Write-Host "Running WQL query: $wqlquery."
$query = New-Object System.Management.ObjectQuery($wqlquery)
$searcher = New-Object System.Management.ManagementObjectSearcher($scope, $query)
$searcher.Get() | Foreach-Object { $rscID = $_.ResourceID }
if (!$rscID) {
Write-Host "Could not find the reference client $ReferenceClient in the Configuration Manager database." -foregroundcolor Red -backgroundcolor Black
if ($verbose) {
Write-Host "Found $ReferenceClient in the database with Resource ID $rscID."
# Check if the specified name for the update list is already in use
$wqlquery = "SELECT * FROM SMS_AuthorizationList WHERE LocalizedDisplayName = " + "'"
$wqlquery += $UpdateListName + "'"
if ($verbose) {
Write-Host "Check if the specified name for the update list is already in use."
Write-Host "Running WQL query: $wqlquery."
$query = New-Object System.Management.ObjectQuery($wqlquery)
$searcher = New-Object System.Management.ManagementObjectSearcher($scope, $query)
$searcher.Get() | Foreach-Object { $ListID = $_.CI_ID }
if ($ListID) {
if (!$force) {
$msg = "`nAn update list with the name $UpdateListName already exists. If you want the update list to be created, "
$msg += "please specify the -force switch."
Write-Host $msg -foregroundcolor Yellow -backgroundcolor Black
elseif ($verbose) {
Write-Host "An update list with the name $UpdateListName does not exist."
# Get the missing software updates reported for the reference client
$wqlquery = "SELECT css.CI_ID FROM SMS_UpdateComplianceStatus css "
$wqlquery += "JOIN SMS_SoftwareUpdate ui ON css.CI_ID = ui.CI_ID "
$wqlquery += "WHERE css.MachineID = $rscID AND css.Status = 2"
if ($verbose) {
Write-Host "Getting required software updates for $ReferenceClient from the ConfigMgr database."
Write-Host "Running WQL query: $wqlquery."
$swupdates = New-Object System.Collections.ArrayList
[void] $swupdates.Add(41200)
[void] $swupdates.Add(41200)
[void] $swupdates.Add(41202)
[void] $swupdates.Add(41227)
[void] $swupdates.Add(41230)
[void] $swupdates.Add(41252)
[void] $swupdates.Add(41277)
[void] $swupdates.Add(41282)

# Get the LocaleID of the site server installation
$wqlquery = 'SELECT * FROM SMS_Identification'
$query = New-Object System.Management.ObjectQuery($wqlquery)
$searcher = New-Object System.Management.ManagementObjectSearcher($scope, $query)
$searcher.Get() | Foreach-Object { $LocaleID = $_.LocaleID }
if (!$LocaleID) { $LocaleID = 1033 }
if ($verbose) {
Write-Host "Using LocaleID $LocaleID."
$options = New-Object System.Management.ObjectGetOptions
$options.Context = $smsContext
$path = New-Object System.Management.ManagementPath("\\$siteserver\$namespace" + ":SMS_CI_LocalizedProperties")
$smsCiLoc = (New-Object System.Management.ManagementClass($scope, $path, $options)).CreateInstance()

# Workaround a PowerShell V1 issue
[void] $smsCiLoc.psbase.Properties
$smsCiLoc.DisplayName = $UpdateListName
$smsCiLoc.LocaleID = 1033
[System.Management.ManagementObject[]] $newDescriptionInfo += $smsCiLoc
$options = New-Object System.Management.ObjectGetOptions
$options.Context = $smsContext
$path = New-Object System.Management.ManagementPath("\\$siteserver\$namespace" + ":SMS_AuthorizationList")
$newUpdateList = (New-Object System.Management.ManagementClass($scope, $path, $options)).CreateInstance()
# Workaround a PowerShell V1 issue
[void] $newUpdateList.psbase.Properties
$newUpdateList.Updates = $swupdates
$newUpdateList.LocalizedInformation = $newDescriptionInfo
$putOptions = New-Object System.Management.PutOptions($smsContext)
$ErrorActionPreference = “silentlycontinue”
[void] $newUpdateList.Put($putOptions)
if (!$?) {
Write-Host "Could not create update list $UpdateListName."
Write-Host $error[0]
else {
Write-Host "Successfully created update list $UpdateListName."
Run the Powershell script on the Central Site Server, In a powershell console execute the following command:

 \reftest.ps1 -ReferenceClient  -updatelistname  -siteserver  -force -verbose 
Remember to update the variables between <> for your site. If you look at your SCCM console now under Update lists you will see your new update list. Check to see if there are any updates you can delete from here - odd languages etc before creating your update deployments from this list.

Good luck - DT

Wednesday, February 17, 2010

Stale Discovery Data - Argh!

Right - so you have a discovery with stale data. This could happen in a few ways, such as disabling a discovery for a certain group, wanting to 'clean' discovery data for an entire class, etc. The annoying thing is that once you have disabled the discovery, the previously discovered objects remain discovered.

Annoying, huh? Well, there is a solution. This method is by design because there are two methods of submitting discovery data (Snapshot and Incremental). We are able to remove all stale data by performing the following steps:

1) Create a Enabled=False Override for the class/group of your choice.
2) Open the Command Line and use the following:

This will remove all "disabled" discovery data.

...and you're done!

What Management Server do my Gateway servers report to?

If you have a rather large environment, troubleshooting where a problem lies can be quite tricky. It can originate at a Gateway Server, Management Server, etc.

SCOM doesn't provide an easy way to see where a Gateway Server reports to. In our environment, we have around 20 Gateway servers and trying to keep track of what reports where is a nightmare.

I knew there would be a way in Powershell, so I started playing around.

I eventually got this right. Here's the Powershell - it'll export the results to c:\ms.csv


$collection = @();
foreach($gatewayServer in Get-GatewayManagementServer)
$info = "" | Select GatewayServerName, ReportsToServer
$info.GatewayServerName = $gatewayServer.Name
$info.ReportsToServer = $gatewayServer.GetPrimaryManagementServer().DisplayName
$collection += $info;
$collection | Export-Csv c:\ms.csv

Monday, February 8, 2010

Creating Dashboards Using SQL Server Reporting Services 2008.

I have found that SSRS 2008 is great for creating dashboards so we can view what is happening in SCOM at all times. We have mounted 4 23” monitors with dashboards that can cycle pages to help us know what is going on in our SCOM environment at all times. The dashboards run against the warehouse and are pretty much real time.

To get this right I used SSRS 2008 embedded in an inline frame and then run IE in Kiosk mode. If you want to check this out below is an example of SSRS Dashboard that you can download.
The file contains everything mentioned in this blog.

Just change the Shared Data source to point to your OperationsManagerDW and publish it. You can do this by double clicking on DataSource1.rds under Shared Data Sources in the Solution Explorer.

If you can’t see the Report Data window on the left click on “View” and select “Report Data”. Then if you right click on DataSet1 or Num Alerts and select query you can view the SQL used.

The example dashboard I have provided displays a graph that shows the alert trends over the last 48 hours. It also shows the top 5 alerting Rules in the last hour. It also offers the ability to drill into things to get more detail. You can click on the graph to get detail about that point in time and click on the gauges to get more detail. I find it useful for letting me know what is currently happening in SCOM.

Once you have your reports working you are going to want to strip some SSRS stuff out like the toolbar. You also want to ensure that the reports refresh properly and do not used a cached page. You are also going to want to strip out the scroll bars which I do using an inline frame. The IFrame is also my way of being able to add java script and custom HTML to SSRS pages.

To strip out the SSRS toolbar use the following in your URL: rc:Toolbar=false

To stop caching use: rs:ClearSession=true

I have provided an example an HTML file, called AlertsDashboard.html, that will call the URL’s in an inline frame, strip out the unwanted tools bars, refresh them properly and cycle through them. I also call a page that acts as a screen saver. You could publish the AlertsDashboard.html page to an IIS server or just run it from a share like I do.

You will need to modify the AlertsDashboard.html file so it points to your Report Server. Find the following code in the file and replace the highlighted code with you SSRS Server Name.

dashboards: [
{url: "http://YourReportServer/ReportServer/Pages/ReportViewer.aspx?%2fSCOM+Dashboard%2fAlerts+Dashboard&rs:Command=Render&rc:Toolbar=false&rs:ClearSession=true", time: 600},
{url: "file:///C:/SCOMSites/ScreenSaver.htm", time: 60}
{url: "http://YourReportServer/ReportServer/Pages/ReportViewer.aspx?%2fSCOM+Dashboard%2fAlerts+Dashboard&rs:Command=Render&rc:Toolbar=false&rs:ClearSession=true", time: 600}
The time specified after the URLS is the time to wait in seconds before calling the next page.

Finally to load the dashboard, create a shortcut that points to the html file something like this:

"C:\Program Files\Internet Explorer\iexplore.exe" -k "C:\SCOMSites\AlertsDashboard.html"

The –K opens up IE in Kiosk Mode. If you have multiple monitors and want to move IE windows around so you have one on each, use “Shift”, “Windows Key” and left arrow to move the IE windows to the various monitors.

It’s as easy as that.

Automating "Proxying Enabled" for Cluster Nodes

If you perform a "mass installation" of OpsMgr, the task of turning agent proxying on is quite a mission. The way I see it is as follows: most people use clusters for SQL redundancy, meaning that it is pretty safe to assume that every object contained in the Microsoft.Windows.Cluster.Service class will need to have agent proxying enabled for this to actually work.

If we use this Powershell code, we can loop through all objects contained within the Cluster class and enable it in one step.


$rootMS = "RMS1"
$targetClass = "Microsoft.Windows.Cluster.Service";
Set-Location OperationsManagerMonitoring::
New-ManagementGroupConnection $rootMS
Set-Location $rootMS
$matches = Get-MonitoringClass -name $targetClass | Get-MonitoringObject | Select-Object
Foreach($object in $matches) {
[string]$currentAgent = $object.Path
$agent = Get-Agent | Where-Object {$_.Name -eq $currentAgent}
if($agent.proxyingEnabled -eq 'False')
Write-Host "$currentAgent doesn't have proxying enabled yet, enabling now"
$agent.proxyingEnabled = $true;
Write-Host "Proxying already enabled for $currentAgent, skipping..."

Easy peasy.

Maintenance Mode - Clusters, Nodes, SQL Instances

One frustrating thing about OpsMgr's Maintenance Mode system is that if you put both nodes of a cluster into maintenance mode the "Virtual" system will continue to alert. This is a common oversight and the cause of many unnecessary cluster and/or SQL alerts.

The reason for this is that the virtual computer is the top parent which hosts the Cluster Service and SQL DB Engine. Although the nodes form part of this, they are not all-encompassing and will not suffice. Although standard logic dictates that if you're putting a node into maintenance mode you're probably working on the cluster, there is an extra step needed when dealing with this.

This powershell snippet demonstrates how to put a related "Virtual Computer" into maintenance mode. This will also result in the SQL Instance being in maintenance mode as the Virtual Computer is the host of it.


$class = get-monitoringclass -name:"Microsoft.Windows.Server.Computer"
$node = get-monitoringobject -monitoringclass:$class -criteria:"DisplayName = 'Node1'"
$agent = get-agent | ?{$_.DisplayName -eq $node.DisplayName}
$clusterMachines = $agent.GetRemotelyManagedComputers()
foreach($cluster in $clusterMachines)
get-monitoringobject -monitoringclass:$class -criteria:"DisplayName = $cluster.DisplayName" | Select DisplayName
#Here we can carry on and put these associated objects into maintenance mode too

As you can see, we make use of $agent.GetRemotelyManagedComputers() to get a list of computers which the specific agent is managing. As you are well aware, we need to Set Proxying = On if we're using clusters, so this is a sure-fire way of determining if anything "virutal" is running of it.

Maintenance Mode Reminder Emails

Anyone working in a fairly large organisation will be familiar with the frustrations experienced when people don't manage maintenance mode correctly. This results in machines staying in maintenance mode way longer than needed, or the machines coming out before the work is completed. This results in severe alert storms and really messes up your alert statistics.

I developed a clever little SQL script which will get all machines in maintenance mode and send a reminder when the machine has breached a "reminder" threshold.

There are a few steps needed here and I'm not going to go through all of them. This script was customised for our environment but hey, it may help you in some way. You will however need the following:

1) A SQL Linked Server connection to Active Directory - this allows us to lookup the email address of the AD user who put the machine into maintenance mode. In our company we make use of a centralised server details page (with MM integration). If this page is used to put the machine into maintenance mode, another "service" account is used, but the username is stored in the comments section as follows: DOMAIN\USER: Comments

2) SQL Mail will need to be setup.

3) There is another portion to this - a Web Service which receives the request to extend the maintenance mode. You may choose to omit this portion but it's nice functionality to have. This is in C#.

SQL Code:
DECLARE @now datetime;
DECLARE @LocalGMTOffset int;
DECLARE @MMWindowAlertThresholdPerc int;
DECLARE @startPos int;
DECLARE @endPos int;
DECLARE @userName varchar(200);
DECLARE @sql nvarchar(4000);
DECLARE @AD varchar(200);
DECLARE @WebServer varchar(200);
DECLARE @actionAccount varchar(200);

SET @LocalGMTOffset = +2
SET @MMWindowAlertThresholdPerc = 75;
SET @NOW = dateadd(hour,(@LocalGMTOffset * -1),getdate());
SET @AD = 'LDAP://AD';
SET @WebServer = 'WebServer';
SET @actionAccount = 'DOMAIN\ServerDetailUser';

-- Remove this when live - need it for testing because temp mails are only cleared on disconnection
--drop table #tempMailsToSend

--- Get all records needing to be mailed
INSERT [monitor].[dbo].[tb_MMWindows] SELECT bme.BaseManagedEntityId, mm.StartTime, mm.ScheduledEndTime, 0
FROM maintenancemode mm
INNER JOIN BaseManagedEntity bme ON (mm.BaseManagedEntityId = bme.BaseManagedEntityId)
LEFT JOIN [monitor].[dbo].[tb_MMWindows] MMW on MMW.BaseManagedEntityId = mm.BaseManagedEntityId
WHERE mm.EndTime IS NULL AND mm.IsInMaintenanceMode = 1 AND mmw.ack IS NULL AND bme.isDeleted = 0 AND
((round(((convert(float, DATEDIFF(minute,mm.StartTime,@NOW)) / convert(float,DATEDIFF(minute,mm.StartTime,mm.ScheduledEndTime)))*100),0)) > @MMWindowAlertThresholdPerc)
ORDER BY mm.ScheduledEndTime DESC

-- Filter only on the Top Level IDs
SELECT identity(int,1,1) as id, a.* INTO #tempMailsToSend FROM (SELECT count(*) as childitems, mm.StartTime, mm.ScheduledEndTime, bme2.DisplayName, bme2.BaseManagedEntityId,
CASE WHEN([MM].[User] LIKE @actionAccount + '%') THEN
ELSE SUBSTRING(LTRIM([MM].[User]),CHARINDEX('\',[MM].[User])+1,LEN([MM].[User])) END as UserName, mm.Comments
FROM [monitor].[dbo].[tb_MMWindows] MMW
INNER JOIN MaintenanceMode MM ON (MM.BaseManagedEntityId = MMW.BaseManagedEntityId)
INNER JOIN BaseManagedEntity bme ON (mm.BaseManagedEntityId = bme.BaseManagedEntityId)
INNER JOIN BaseManagedEntity bme2 ON (bme.ToplevelHostEntityId = bme2.BaseManagedEntityId)
WHERE MMW.ack=0 AND bme2.DisplayName != 'Microsoft.SystemCenter.AgentWatchersGroup' AND MMW.ack = 0
GROUP BY bme2.TopLevelHostEntityId, MM.Comments, [MM].[User], bme2.DisplayName, mm.StartTime, mm.ScheduledEndTime, bme2.BaseManagedEntityId) a

--SELECT * FROM #tempMailsToSend

UPDATE [monitor].[dbo].[tb_MMWindows] SET ack=1

SELECT @startPos = 1, @endPos = count(*) FROM #tempMailsToSend

WHILE @startPos <= @endPos

DECLARE @bodyVal nvarchar(4000);
DECLARE @subjectVal varchar(255);
DECLARE @mailBody nvarchar(500);
DECLARE @DisplayName varchar(255);
DECLARE @StartTime datetime;
DECLARE @ScheduledEndTime datetime;
DECLARE @Comments nvarchar(1000);
DECLARE @ChildItems int;
DECLARE @BmeId varchar(100);
DECLARE @mail varchar(200);

SET @sql = 'SELECT @mail = mail FROM OPENQUERY( ADSI, ''SELECT mail FROM '''''+ @AD + '''''
WHERE objectCategory = ''''Person'''' AND objectClass = ''''user'''' AND SAMAccountName = '''''+(SELECT UserName FROM #tempMailsToSend WHERE id=@startPos)+''''''')'

EXEC sp_executesql
@query = @sql,
@params = N'@mail varchar(200) OUTPUT',
@mail = @mail OUTPUT

SELECT @BmeId = BaseManagedEntityId, @DisplayName = DisplayName,@StartTime = StartTime, @ScheduledEndTime = ScheduledEndTime, @Comments = Comments FROM #tempMailsToSend WHERE id=@startPos

IF @Comments IS NULL
SET @Comments = ''

SET @subjectVal = 'MM Reminder: ' + @DisplayName
SET @bodyVal = '
<TITLE>Maintenance Mode Reminder</TITLE>
body {
font-family: Arial, Helvetica, sans-serif;
font-size: 14px;

td {
font-family: Arial, Helvetica, sans-serif;
font-size: 12px;
h1 {
margin:0px 0px 5px 0px;
font-size: 16px;

table td {
text-align: center;
background-color: #fff;
font-size: 12px;

table th{
text-align: center;
background-color: #000;
color: #fff;
font-size: 12px;
.button {
display: inline-block;
width: 80px !important;
background-color: #ECECEC;
padding: 2px;
text-align: center;
color: #000;
text-decoration: none;
border: 1px solid #000;
.button:hover {
background-color: #D6D5C3;

<h1>SCOM Automated Maintenance Mode Reminder</h1>
Hi,'+ '<br />
<br />
Please note that the machine listed below is approaching the end of its maintenance window. It is currently passed the set threshold of <strong>' + CAST(@MMWindowAlertThresholdPerc as varchar(2)) + '% total time elapsed.'+ '</strong><br /><br />
<th>DisplayName</th><th>Start Time</th><th>Scheduled End Time</th><th>Comments</th></tr>
<tr><td>'+@DisplayName + '</td><td>' +CAST(dateadd(hour,@LocalGMTOffset,@StartTime) as nvarchar(50)) + '</td><td>' +CAST(dateadd(hour,@LocalGMTOffset,@ScheduledEndTime) as nvarchar(50))+ '</td><td>' +@Comments + '</td></tr>
<br /><br /><br />
Extend by:<br /><br />
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=1&DisplayName='+@DisplayName+'" class="button">1 hour</a>
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=2&DisplayName='+@DisplayName+'" class="button">2 hours</a>
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=4&DisplayName='+@DisplayName+'" class="button">4 hours</a>
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=8&DisplayName='+@DisplayName+'" class="button">8 hours</a><br />
<br /><br />
Other options:<br /><br />
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=0&DisplayName='+@DisplayName+'" class="button">Take out of maintenance mode</a><br />


EXEC msdb.dbo.sp_send_dbmail @recipients=@mail,
@subject = @subjectVal,
@body = @bodyVal,
@body_format = 'HTML';

SET @startPos = @startPos + 1

--MUST REMOVE THIS WHEN LIVE - this is left here for testing purposes so it'll always send
--delete from [monitor].[dbo].[tb_MMWindows]


C# Web Service Code:

using System;
using System.Configuration;
using System.Data;
using System.Linq;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.HtmlControls;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Xml.Linq;
using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Configuration;
using Microsoft.EnterpriseManagement.Monitoring;
using System.Collections.ObjectModel;
using System.Collections.Generic;

public partial class _Default : System.Web.UI.Page
protected void Page_Load(object sender, EventArgs e)
int gmtOffset = 2;
string bmid = Request.QueryString["bmid"];
int extendHours = System.Convert.ToInt32(Request.QueryString["extend"]);
string pcDisplayName = Request.QueryString["displayname"];
if (bmid != null && pcDisplayName != null)
ManagementGroup mg = new ManagementGroup("RMS1");

string mcCriteria = "Name = 'Microsoft.Windows.Computer'";
string query = "Id = '" + bmid + "'";
MonitoringClassCriteria criteria = new MonitoringClassCriteria(mcCriteria);
MonitoringClass monClass = mg.GetMonitoringClasses(criteria)[0];
MonitoringObjectCriteria objCriteria = new MonitoringObjectCriteria(query, monClass);
List monObjects = new List(mg.GetMonitoringObjects(objCriteria));
if (monObjects.Count == 0)
litOutput.Text = "Could not find an object with that display name. System Center has been notified.";
foreach (MonitoringObject monObject in monObjects)
if (extendHours == 0)
DateTime scheduledEndTime = DateTime.UtcNow;
monObject.StopMaintenanceMode(scheduledEndTime, Microsoft.EnterpriseManagement.Common.TraversalDepth.Recursive);
litOutput.Text = pcDisplayName + " has successfully been taken out of maintenance mode.";
catch(Exception Ex)
litOutput.Text = "Encountered an error stopping maintenance mode. System Center has been notified.

" + Ex.Message;
if (!monObject.InMaintenanceMode)
DateTime startTime = DateTime.UtcNow;
DateTime scheduledEndTime = DateTime.UtcNow.AddHours(extendHours);
string comments = extendHours + " hour maintenance mode window requested";
monObject.ScheduleMaintenanceMode(startTime, scheduledEndTime, 0, comments, Microsoft.EnterpriseManagement.Common.TraversalDepth.Recursive);
litOutput.Text = pcDisplayName + " has already been taken out of maintenance mode. Starting a new maintenance mode window for " + extendHours + " hours." +
Scheduled end time is: " + scheduledEndTime.AddHours(gmtOffset).ToString();
catch(Exception Ex)
litOutput.Text = "Encountered an error placing machine into maintenance mode. System Center has been notified.

" + Ex.Message;
MaintenanceWindow myWindow = monObject.GetMaintenanceWindow();
DateTime scheduledEndTime = myWindow.ScheduledEndTime.ToUniversalTime().AddHours((extendHours + gmtOffset));
string updatedComments = myWindow.Comments + " || " + extendHours + " hour extension requested";
monObject.UpdateMaintenanceMode(scheduledEndTime, 0, updatedComments, Microsoft.EnterpriseManagement.Common.TraversalDepth.Recursive);
litOutput.Text = pcDisplayName + " has had its maintenance mode extended by the requested " + extendHours + " hours" +
New scheduled end time is: " + scheduledEndTime.AddHours(gmtOffset).ToString();
catch(Exception Ex)
litOutput.Text = "Encountered an error extending maintenance mode. System Center has been notified.

" + Ex.Message;


And that's it... this works nicely once it's setup. It may take you a while to get all the components right - if you need help give me a shout.

Changing which Gateway server an agent reports to - eh?!

Picture the scenario: you've got 500 machines, 2 gateway servers. You've got half of them reporting to Gateway 1, the other half reporting to Gateway 2. You've done a bit of research and even enabled failover with a bit of nifty Powershell.

Then one day Gateway 1 dies. All agents failover and start reporting to Gateway 2. Yep, that's all good but you now need to get a new Gateway server ASAP as you now have no redundancy.

We quickly find a new server -- Gateway 3 and are now left with the task of reassigning all the agents currently reporting to Gateway 1 onto Gateway 3, whilst keeping Gateway 2 as the failover.

Never fear, there is a solution.

There are two parts to this: server and client side. I will discuss each individually.


OpsMgr stores its connection details in two places: NetworkName and AuthenticationName in the registry. These need to be changed to the new value. This however won't work unless the server component has been carried out. An optional step here is deleting the Health Service State. In reality we only need to delete the Connector Configuration Cache, but you will be absolutely amazed at how many problems can be solved with a simple deletion of Health Service State data. This is a step you can choose to take: as far as I'm aware, the Connector Configuration will be updated if the registry keys have changed. If you're wanting to delete the State data too, uncomment the lines in the PS script.


Here we need to tell the RMS (and DB) that the agent has changed its PrimaryManagementServer (Gateway). This cannot be done with the GUI and hence, Powershell will be used.


Assumptions: This solution assumes that the CLIENT portion will be executed on a machine with Powershell installed, and with WMI (RPC/DCOM) access to each machine needing to be changed. X86/64 isn't a problem as it detects the install directory. What this Powershell script does do is make a WMI call to the remote machines to change registry keys, optionally delete State data, and restart the service. If you don't have this access it WON'T work! Also, this assumes you're running as an admin and you do have admin rights on each machine.


Client Portion:

The first thing we need is a list of all machines that report to Gateway1. We need to export this to a CSV because we'll be using it to address each machine. We're going to export this to a one-column CSV, with its NetworkName as the column name [ remember to remove the top few "computer generated" lines in the CSV.

[This will obviously be run through an OpsMgr shell on a server -- not a client]:

$oldGatewayName = "Gateway1"
$agents = Get-Agent | ?{ $_.PrimaryManagementServerName -eq $oldGatewayName } | Select NetworkName | export-csv c:\agents.csv

[Now, the client script - ensure you've copied the CSV from the script above to the C:\]:

$MachineList = Import-Csv c:\agents.csv
$NewGatewayName = "Gateway3"

Foreach($MachineName in $MachineList) {

Write-Host "Attempting " $MachineName.NetworkName
#Change Registry Keys
$reg = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey('LocalMachine', $MachineName.NetworkName)
$regKey= $reg.OpenSubKey("SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Agent Management Groups\DERSCOM\Parent Health


#Get Install Path for Health Service State folder
$path = $reg.OpenSubKey("SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup")
$HealthServiceState = ($path.GetValue("InstallDirectory")).Replace("\","\\") + "Health Service State"

#Stop HealthService
$service = Get-WmiObject -query "SELECT * FROM Win32_Service WHERE Name = 'HealthService'" -computer $MachineName.NetworkName


#Delete Health Service State Folder
#$directory = Get-WmiObject -query "SELECT * FROM Win32_Directory WHERE Name='$HealthServiceState'" -computer

#foreach($item in $directory) {

#Start Service


Server Portion:

#Unfortunately there is no way to pass Criteria Syntax to get-agent. You may extend the SDK directly if you so desire.

$oldGatewayName = "Gateway1"
$gateway2 = "Gateway2"
$gateway3 = "Gateway3"

$PrimaryMS = Get-ManagementServer | Where { $_.DisplayName -match $gateway3 }
$FailoverMS = Get-ManagementServer | Where { $_.DisplayName -match $gateway2 }

$agents = Get-Agent | ?{ $_.PrimaryManagementServerName -eq $oldGatewayName }

foreach($agent in $agents)
Write-Host "Setting MS/Failover for:" $agent.DisplayName
Set-ManagementServer -AgentManagedComputer: $Agent -PrimaryManagementServer: $PrimaryMS -FailoverServer: $FailoverMS

And that's that! Once you've run those parts your agents will take a few minutes and start talking to the new Gateway server. Simple as pie.

Powershell Script: Bulk Maintenance Mode

If you've ever needed to put a whole lot of machines into maintenance mode you'll know how tedious and time consuming this can be if you go through the GUI. There's no "cntrl + click" functionality, meaning you'll need to go to each machine and put them in.

There are obviously instances where certain machines are related by business process rather than an actual identifiable link.

My solution to this was to do the following: Allow for an import of a CSV file with a list of hostnames (just hostname, not FQDN) - loop through these and put each one into maintenance mode.

So simple. Here's the PS script (check comments for usage):

#Right, this is pretty simple. Use as follows:
#To put a BULK list of computers into maintenance mode, you will need a CSV Formatted with ONE column - that being the hostname of the machine.
#!!!!! VERY NB !!!!!!!!!!! VERY NB !!!!!!
#1) Make sure the first column is titled HostName otherwise this won't work!
#!!!!! VERY NB !!!!!!!!!!! VERY NB !!!!!!
#2) Change the $rootMS to the correct RMS.
#3) Usage is as follows: .
# Start Maintenance Mode: ./mm.ps1 START PathToCSVFile "Maintenance Mode Reason (be sure to encase in quotation marks like here)" DurationInHours
# Stop Maintenance Mode: ./mm.ps1 STOP PathToCSVFile
# Note: If you don't have START or STOP as your first parameter the script will not continue.
#4) This script implements strict error handling. A summary of each operation will be displayed after the operation has completed.
#5) If any errors are encountered, these are trapped and will be saved in CSV format in C:\errorMaintenanceMode.csv

param($action, $pathToCSV, $maintenanceModeReason, $durationHours)

$rootMS = "RMS"

if(([string]$action.CompareTo("START") -ne 0) -and ([string]$action.CompareTo("STOP") -ne 0))
Write-Host "Can't continue without a START or STOP action. Please read the comments in this .ps1 file for instructions."
$ErrorActionPreference = "Continue"

Set-Location "OperationsManagerMonitoring::" -ErrorVariable errSnapin;
New-ManagementGroupConnection -ConnectionString:$rootMS -ErrorVariable errSnapin;
Set-Location $rootMS -ErrorVariable errSnapin;

$computers = Import-Csv $pathToCSV
$resultsetCollection = @();

$currObjMaintDesc = "Bulk Maintenance Mode Update. Reason Given: " + $maintenanceModeReason
$startTime = [System.DateTime]::Now
$endTime = $startTime.AddHours($durationHours)

foreach($currentObj in $computers)
$currentObjName = $currentObj.HostName
$computerClass = Get-MonitoringClass -name:Microsoft.Windows.Computer
$computerCriteria = "DisplayName matches '(?i:" + $currentObjName + ")\.'"
$computer = Get-Monitoringobject -monitoringclass:$computerClass -criteria:$computerCriteria

if($action.ToUpper() -eq 'START')
"Starting Maintenance Mode on: " + $currentObjName
New-MaintenanceWindow -startTime:$startTime -endTime:$endTime -monitoringObject:$computer -comment:$currObjMaintDesc
elseif($action.ToUpper() -eq 'STOP')
"Stopping Maintenance Mode on: " + $currentObjName
Set-MaintenanceWindow -monitoringObject:$computer -endTime:$startTime
if(-not $?)
$errorFound = $true
$resultObj = "" | Select HostName, Status, ErrorMessage
$resultObj.HostName = $currentObjName
$resultObj.Status = "Failed"
$resultObj.ErrorMessage = $error[0]
$resultsetCollection += $resultObj

$resultObj = "" | Select HostName, Status, ErrorMessage
$resultObj.HostName = $currentObjName
$resultObj.Status = "Successful"
$resultObj.ErrorMessage = "No errors reported during operation"
$resultsetCollection += $resultObj

Write-Host "Errors were encountered whilst trying to update computers. Please see the file c:\errorMaintenanceMode.ps1 for Error Info."
$resultsetCollection | Export-Csv "c:\errorMaintenanceMode.csv"
Write-Host "Completed entire Bulk Maintenance Mode Operation without any errors."

Retrieve a list of monitors in a critical state

One thing the OpsMgr team 'forgot' to do was allow us to view all Monitors in a Critical state. This is obviously something a lot of people would need as it's nice to have an overview of what isn't in a healthy state without trawling through Health Explorer Windows.

Unfortunately there is no elegant way to do this with the SDK and/or Powershell, as this needs to be addressed on a "per monitoringobject" basis, meaning we have to loop through classes and objects and that will take a crazy amount of time.

Once again, SQL to the rescue:

Select [TimeGenerated],a.BaseManagedEntityId, a.DisplayName, a.TopLevelHostEntityId, e.DisplayName as ParentDisplayname, e.FullName as ParentFullName,
a.DisplayName[Problem],d.DisplayName[Problem Description],d.Description[Detailed Description]
from BaseManagedEntity a
INNER JOIN ManagedEntityAvailabilityView b on a.BaseManagedEntityId =b.BaseManagedEntityId
INNER JOIN StateView c on b.BaseManagedEntityId =c.BaseManagedEntityId
INNER JOIN MonitorView d on c.MonitorID= d.ID
INNER JOIN BaseManagedEntity e on e.BaseManagedEntityId = a.TopLevelHostEntityId
INNER JOIN ManagedType f on a.BaseManagedTypeId=f.ManagedTypeId

(Select BaseManagedEntityID,NewHealthState,max(TimeGenerated)[TimeGenerated]
From StateChangeEventView
Group By BaseManagedEntityID,NewHealthState)SC
On b.BaseManagedEntityID = SC.BaseManagedEntityId

Where HealthState = 3
and AlertPriority = 1
and IsUnitMonitor = 1
and SC.NewHealthState = 3
--and e.DisplayName = 'COMPUTERNAME'
Order By a.[Path]

Basic OpsMgr concepts - Classes and Objects

One abstract OpsMgr concept people generally battle with is distinguishing the difference between an object and a class.

In this post I'll use the following two as examples:

Class: Microsoft.Windows.Computer

To see an example of this, navigate to the "Discovered Inventory" pane in OpsMgr. If you change the scope to "Windows Computer", you will (in my hypothetical example) see Computer01 listed. This means that the OBJECT Computer01 is of CLASS type Windows Computer.

To demonstrate this, let's look at two powershell CMDlets: get-monitoringclass and get-monitoring-object:

$class = get-monitoringclass -name "Microsoft.Windows.Computer"

This will assign the variable $class with the object for the class "Microsoft Windows Computer."

If we now wish to get a list of all monitoring objects contained within that class, we need to use get-monitoringobject:

$class  get-monitoringobject

As you can see, this supports pipeline usage - we take our $class variable and pipe it to get-monitoringobject. This then iterates through the class returning all its objects. We can then further format the data by using Select-Object to select the fields we're wanting to view. Additionally, if we're just wanting a total, we can pipe it to "measure-object" to get an object count.

What we've done here is mimicked the "Discovered Inventory" view in the GUI. This is very helpful if we're needing certain objects for something.

If you require assistance or would like a particular topic covered, be sure to leave a comment on a post.

Next up: Using "criteria" to search instead of Where-Object

Stuff to help you with these products, yo

Right, so here's the deal:

There's a few of us working for a rather large, rather international company - we make sure a whole lot of servers are working most of the time.

I'm sure if any of you have ever worked with OpsMgr and/or SCCM, you will know that these products are quite large and the documentation/help isn't all that great.

This blog is two-fold: it's going to help us categorise all the things we've done and also serve to contribute to the System Center community.

Our team is as follows:

DT: SCCM - he's been using this product for years. I know stuff all about it so any post regarding this topic will definitely be from him.

AB - SQL - despite what you may think, there are some things you cannot do with the SDK. If the data is there, Andre will find a way to use it.

RN - MP Authoring - need help understanding relationships? Hosted entities? Abstrat classes? Eh...

CM - Infrastructure - installing Gateways, Management Servers, Certificate Authentication, etc.

KC - SDK, Powershell, Custom OpsMgr applications, MP Authoring, general blog writer.


So here's hoping this won't turn into yet another "one post" blog.