Wednesday, February 17, 2010

Stale Discovery Data - Argh!

Right - so you have a discovery with stale data. This could happen in a few ways, such as disabling a discovery for a certain group, wanting to 'clean' discovery data for an entire class, etc. The annoying thing is that once you have disabled the discovery, the previously discovered objects remain discovered.

Annoying, huh? Well, there is a solution. This method is by design because there are two methods of submitting discovery data (Snapshot and Incremental). We are able to remove all stale data by performing the following steps:

1) Create a Enabled=False Override for the class/group of your choice.
2) Open the Command Line and use the following:
Remove-DisabledMonitoringObject


This will remove all "disabled" discovery data.

...and you're done!

What Management Server do my Gateway servers report to?

If you have a rather large environment, troubleshooting where a problem lies can be quite tricky. It can originate at a Gateway Server, Management Server, etc.

SCOM doesn't provide an easy way to see where a Gateway Server reports to. In our environment, we have around 20 Gateway servers and trying to keep track of what reports where is a nightmare.

I knew there would be a way in Powershell, so I started playing around.

I eventually got this right. Here's the Powershell - it'll export the results to c:\ms.csv

------

$collection = @();
foreach($gatewayServer in Get-GatewayManagementServer)
{
$info = "" | Select GatewayServerName, ReportsToServer
$info.GatewayServerName = $gatewayServer.Name
$info.ReportsToServer = $gatewayServer.GetPrimaryManagementServer().DisplayName
$collection += $info;
}
$collection | Export-Csv c:\ms.csv

Monday, February 8, 2010

Creating Dashboards Using SQL Server Reporting Services 2008.

I have found that SSRS 2008 is great for creating dashboards so we can view what is happening in SCOM at all times. We have mounted 4 23” monitors with dashboards that can cycle pages to help us know what is going on in our SCOM environment at all times. The dashboards run against the warehouse and are pretty much real time.

To get this right I used SSRS 2008 embedded in an inline frame and then run IE in Kiosk mode. If you want to check this out below is an example of SSRS Dashboard that you can download.
The file contains everything mentioned in this blog.

http://sites.google.com/site/scomstuff/files/AlertsDashboard.rar

Just change the Shared Data source to point to your OperationsManagerDW and publish it. You can do this by double clicking on DataSource1.rds under Shared Data Sources in the Solution Explorer.

If you can’t see the Report Data window on the left click on “View” and select “Report Data”. Then if you right click on DataSet1 or Num Alerts and select query you can view the SQL used.

The example dashboard I have provided displays a graph that shows the alert trends over the last 48 hours. It also shows the top 5 alerting Rules in the last hour. It also offers the ability to drill into things to get more detail. You can click on the graph to get detail about that point in time and click on the gauges to get more detail. I find it useful for letting me know what is currently happening in SCOM.

Once you have your reports working you are going to want to strip some SSRS stuff out like the toolbar. You also want to ensure that the reports refresh properly and do not used a cached page. You are also going to want to strip out the scroll bars which I do using an inline frame. The IFrame is also my way of being able to add java script and custom HTML to SSRS pages.

To strip out the SSRS toolbar use the following in your URL: rc:Toolbar=false

To stop caching use: rs:ClearSession=true

I have provided an example an HTML file, called AlertsDashboard.html, that will call the URL’s in an inline frame, strip out the unwanted tools bars, refresh them properly and cycle through them. I also call a page that acts as a screen saver. You could publish the AlertsDashboard.html page to an IIS server or just run it from a share like I do.

You will need to modify the AlertsDashboard.html file so it points to your Report Server. Find the following code in the file and replace the highlighted code with you SSRS Server Name.

dashboards: [
{url: "http://YourReportServer/ReportServer/Pages/ReportViewer.aspx?%2fSCOM+Dashboard%2fAlerts+Dashboard&rs:Command=Render&rc:Toolbar=false&rs:ClearSession=true", time: 600},
{url: "file:///C:/SCOMSites/ScreenSaver.htm", time: 60}
{url: "http://YourReportServer/ReportServer/Pages/ReportViewer.aspx?%2fSCOM+Dashboard%2fAlerts+Dashboard&rs:Command=Render&rc:Toolbar=false&rs:ClearSession=true", time: 600}
The time specified after the URLS is the time to wait in seconds before calling the next page.


Finally to load the dashboard, create a shortcut that points to the html file something like this:

"C:\Program Files\Internet Explorer\iexplore.exe" -k "C:\SCOMSites\AlertsDashboard.html"

The –K opens up IE in Kiosk Mode. If you have multiple monitors and want to move IE windows around so you have one on each, use “Shift”, “Windows Key” and left arrow to move the IE windows to the various monitors.

It’s as easy as that.

Automating "Proxying Enabled" for Cluster Nodes

If you perform a "mass installation" of OpsMgr, the task of turning agent proxying on is quite a mission. The way I see it is as follows: most people use clusters for SQL redundancy, meaning that it is pretty safe to assume that every object contained in the Microsoft.Windows.Cluster.Service class will need to have agent proxying enabled for this to actually work.

If we use this Powershell code, we can loop through all objects contained within the Cluster class and enable it in one step.

Code:


$rootMS = "RMS1"
$targetClass = "Microsoft.Windows.Cluster.Service";
Set-Location OperationsManagerMonitoring::
New-ManagementGroupConnection $rootMS
Set-Location $rootMS
$matches = Get-MonitoringClass -name $targetClass | Get-MonitoringObject | Select-Object
Foreach($object in $matches) {
[string]$currentAgent = $object.Path
$agent = Get-Agent | Where-Object {$_.Name -eq $currentAgent}
if($agent.proxyingEnabled -eq 'False')
{
Write-Host "$currentAgent doesn't have proxying enabled yet, enabling now"
$agent.proxyingEnabled = $true;
$agent.applyChanges();
}
else
{
Write-Host "Proxying already enabled for $currentAgent, skipping..."
}
}


Easy peasy.

Maintenance Mode - Clusters, Nodes, SQL Instances

One frustrating thing about OpsMgr's Maintenance Mode system is that if you put both nodes of a cluster into maintenance mode the "Virtual" system will continue to alert. This is a common oversight and the cause of many unnecessary cluster and/or SQL alerts.

The reason for this is that the virtual computer is the top parent which hosts the Cluster Service and SQL DB Engine. Although the nodes form part of this, they are not all-encompassing and will not suffice. Although standard logic dictates that if you're putting a node into maintenance mode you're probably working on the cluster, there is an extra step needed when dealing with this.

This powershell snippet demonstrates how to put a related "Virtual Computer" into maintenance mode. This will also result in the SQL Instance being in maintenance mode as the Virtual Computer is the host of it.

Code:

$class = get-monitoringclass -name:"Microsoft.Windows.Server.Computer"
$node = get-monitoringobject -monitoringclass:$class -criteria:"DisplayName = 'Node1'"
$agent = get-agent | ?{$_.DisplayName -eq $node.DisplayName}
$clusterMachines = $agent.GetRemotelyManagedComputers()
foreach($cluster in $clusterMachines)
{
get-monitoringobject -monitoringclass:$class -criteria:"DisplayName = $cluster.DisplayName" | Select DisplayName
#Here we can carry on and put these associated objects into maintenance mode too
}


As you can see, we make use of $agent.GetRemotelyManagedComputers() to get a list of computers which the specific agent is managing. As you are well aware, we need to Set Proxying = On if we're using clusters, so this is a sure-fire way of determining if anything "virutal" is running of it.

Maintenance Mode Reminder Emails

Anyone working in a fairly large organisation will be familiar with the frustrations experienced when people don't manage maintenance mode correctly. This results in machines staying in maintenance mode way longer than needed, or the machines coming out before the work is completed. This results in severe alert storms and really messes up your alert statistics.

I developed a clever little SQL script which will get all machines in maintenance mode and send a reminder when the machine has breached a "reminder" threshold.

There are a few steps needed here and I'm not going to go through all of them. This script was customised for our environment but hey, it may help you in some way. You will however need the following:

1) A SQL Linked Server connection to Active Directory - this allows us to lookup the email address of the AD user who put the machine into maintenance mode. In our company we make use of a centralised server details page (with MM integration). If this page is used to put the machine into maintenance mode, another "service" account is used, but the username is stored in the comments section as follows: DOMAIN\USER: Comments

2) SQL Mail will need to be setup.

3) There is another portion to this - a Web Service which receives the request to extend the maintenance mode. You may choose to omit this portion but it's nice functionality to have. This is in C#.

SQL Code:
 
DECLARE @now datetime;
DECLARE @LocalGMTOffset int;
DECLARE @MMWindowAlertThresholdPerc int;
DECLARE @startPos int;
DECLARE @endPos int;
DECLARE @userName varchar(200);
DECLARE @sql nvarchar(4000);
DECLARE @AD varchar(200);
DECLARE @WebServer varchar(200);
DECLARE @actionAccount varchar(200);

SET @LocalGMTOffset = +2
SET @MMWindowAlertThresholdPerc = 75;
SET @NOW = dateadd(hour,(@LocalGMTOffset * -1),getdate());
SET @AD = 'LDAP://AD';
SET @WebServer = 'WebServer';
SET @actionAccount = 'DOMAIN\ServerDetailUser';

-- Remove this when live - need it for testing because temp mails are only cleared on disconnection
--drop table #tempMailsToSend


--- Get all records needing to be mailed
INSERT [monitor].[dbo].[tb_MMWindows] SELECT bme.BaseManagedEntityId, mm.StartTime, mm.ScheduledEndTime, 0
FROM maintenancemode mm
INNER JOIN BaseManagedEntity bme ON (mm.BaseManagedEntityId = bme.BaseManagedEntityId)
LEFT JOIN [monitor].[dbo].[tb_MMWindows] MMW on MMW.BaseManagedEntityId = mm.BaseManagedEntityId
WHERE mm.EndTime IS NULL AND mm.IsInMaintenanceMode = 1 AND mmw.ack IS NULL AND bme.isDeleted = 0 AND
((round(((convert(float, DATEDIFF(minute,mm.StartTime,@NOW)) / convert(float,DATEDIFF(minute,mm.StartTime,mm.ScheduledEndTime)))*100),0)) > @MMWindowAlertThresholdPerc)
ORDER BY mm.ScheduledEndTime DESC


-- Filter only on the Top Level IDs
SELECT identity(int,1,1) as id, a.* INTO #tempMailsToSend FROM (SELECT count(*) as childitems, mm.StartTime, mm.ScheduledEndTime, bme2.DisplayName, bme2.BaseManagedEntityId,
CASE WHEN([MM].[User] LIKE @actionAccount + '%') THEN
SUBSTRING(SUBSTRING(LTRIM(Comments),0,CHARINDEX(':',LTRIM(Comments))),CHARINDEX('\',Comments)+1,LEN(Comments))
ELSE SUBSTRING(LTRIM([MM].[User]),CHARINDEX('\',[MM].[User])+1,LEN([MM].[User])) END as UserName, mm.Comments
FROM [monitor].[dbo].[tb_MMWindows] MMW
INNER JOIN MaintenanceMode MM ON (MM.BaseManagedEntityId = MMW.BaseManagedEntityId)
INNER JOIN BaseManagedEntity bme ON (mm.BaseManagedEntityId = bme.BaseManagedEntityId)
INNER JOIN BaseManagedEntity bme2 ON (bme.ToplevelHostEntityId = bme2.BaseManagedEntityId)
WHERE MMW.ack=0 AND bme2.DisplayName != 'Microsoft.SystemCenter.AgentWatchersGroup' AND MMW.ack = 0
GROUP BY bme2.TopLevelHostEntityId, MM.Comments, [MM].[User], bme2.DisplayName, mm.StartTime, mm.ScheduledEndTime, bme2.BaseManagedEntityId) a


--SELECT * FROM #tempMailsToSend

--NEED THIS UPDATE HERE
UPDATE [monitor].[dbo].[tb_MMWindows] SET ack=1


SELECT @startPos = 1, @endPos = count(*) FROM #tempMailsToSend

WHILE @startPos <= @endPos
BEGIN

DECLARE @bodyVal nvarchar(4000);
DECLARE @subjectVal varchar(255);
DECLARE @mailBody nvarchar(500);
DECLARE @DisplayName varchar(255);
DECLARE @StartTime datetime;
DECLARE @ScheduledEndTime datetime;
DECLARE @Comments nvarchar(1000);
DECLARE @ChildItems int;
DECLARE @BmeId varchar(100);
DECLARE @mail varchar(200);


SET @sql = 'SELECT @mail = mail FROM OPENQUERY( ADSI, ''SELECT mail FROM '''''+ @AD + '''''
WHERE objectCategory = ''''Person'''' AND objectClass = ''''user'''' AND SAMAccountName = '''''+(SELECT UserName FROM #tempMailsToSend WHERE id=@startPos)+''''''')'

EXEC sp_executesql
@query = @sql,
@params = N'@mail varchar(200) OUTPUT',
@mail = @mail OUTPUT

SELECT @BmeId = BaseManagedEntityId, @DisplayName = DisplayName,@StartTime = StartTime, @ScheduledEndTime = ScheduledEndTime, @Comments = Comments FROM #tempMailsToSend WHERE id=@startPos

IF @Comments IS NULL
BEGIN
SET @Comments = ''
END


SET @subjectVal = 'MM Reminder: ' + @DisplayName
SET @bodyVal = '
<HTML>
<HEAD>
<TITLE>Maintenance Mode Reminder</TITLE>
<BODY>
<style>
body {
background-color:#ECECEC;
font-family: Arial, Helvetica, sans-serif;
font-size: 14px;
}

td {
background-color:#ECECEC;
font-family: Arial, Helvetica, sans-serif;
font-size: 12px;
}
h1 {
margin:0px 0px 5px 0px;
font-size: 16px;
}

table td {
text-align: center;
background-color: #fff;
font-size: 12px;
}

table th{
text-align: center;
background-color: #000;
color: #fff;
font-size: 12px;
}
.button {
display: inline-block;
width: 80px !important;
background-color: #ECECEC;
padding: 2px;
text-align: center;
color: #000;
text-decoration: none;
border: 1px solid #000;
}
.button:hover {
background-color: #D6D5C3;
}


</style>
</HEAD>
<BODY>
<h1>SCOM Automated Maintenance Mode Reminder</h1>
Hi,'+ '<br />
<br />
Please note that the machine listed below is approaching the end of its maintenance window. It is currently passed the set threshold of <strong>' + CAST(@MMWindowAlertThresholdPerc as varchar(2)) + '% total time elapsed.'+ '</strong><br /><br />
<table><tr>
<th>DisplayName</th><th>Start Time</th><th>Scheduled End Time</th><th>Comments</th></tr>
<tr><td>'+@DisplayName + '</td><td>' +CAST(dateadd(hour,@LocalGMTOffset,@StartTime) as nvarchar(50)) + '</td><td>' +CAST(dateadd(hour,@LocalGMTOffset,@ScheduledEndTime) as nvarchar(50))+ '</td><td>' +@Comments + '</td></tr>
</table>
<br /><br /><br />
Extend by:<br /><br />
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=1&DisplayName='+@DisplayName+'" class="button">1 hour</a>
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=2&DisplayName='+@DisplayName+'" class="button">2 hours</a>
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=4&DisplayName='+@DisplayName+'" class="button">4 hours</a>
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=8&DisplayName='+@DisplayName+'" class="button">8 hours</a><br />
<br /><br />
Other options:<br /><br />
<a href="http://' + @WebServer + '/mm/Default.aspx?bmid='+ @BmeId + '&extend=0&DisplayName='+@DisplayName+'" class="button">Take out of maintenance mode</a><br />

</BODY>
</HTML>'

EXEC msdb.dbo.sp_send_dbmail @recipients=@mail,
@subject = @subjectVal,
@body = @bodyVal,
@body_format = 'HTML';

SET @startPos = @startPos + 1
END


--MUST REMOVE THIS WHEN LIVE - this is left here for testing purposes so it'll always send
--delete from [monitor].[dbo].[tb_MMWindows]


---------------------------



C# Web Service Code:
 

using System;
using System.Configuration;
using System.Data;
using System.Linq;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.HtmlControls;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Xml.Linq;
using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Configuration;
using Microsoft.EnterpriseManagement.Monitoring;
using System.Collections.ObjectModel;
using System.Collections.Generic;

public partial class _Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
int gmtOffset = 2;
string bmid = Request.QueryString["bmid"];
int extendHours = System.Convert.ToInt32(Request.QueryString["extend"]);
string pcDisplayName = Request.QueryString["displayname"];
if (bmid != null && pcDisplayName != null)
{
ManagementGroup mg = new ManagementGroup("RMS1");

string mcCriteria = "Name = 'Microsoft.Windows.Computer'";
string query = "Id = '" + bmid + "'";
MonitoringClassCriteria criteria = new MonitoringClassCriteria(mcCriteria);
MonitoringClass monClass = mg.GetMonitoringClasses(criteria)[0];
MonitoringObjectCriteria objCriteria = new MonitoringObjectCriteria(query, monClass);
List monObjects = new List(mg.GetMonitoringObjects(objCriteria));
if (monObjects.Count == 0)
{
litOutput.Text = "Could not find an object with that display name. System Center has been notified.";
}
foreach (MonitoringObject monObject in monObjects)
{
if (extendHours == 0)
{
DateTime scheduledEndTime = DateTime.UtcNow;
try
{
monObject.StopMaintenanceMode(scheduledEndTime, Microsoft.EnterpriseManagement.Common.TraversalDepth.Recursive);
litOutput.Text = pcDisplayName + " has successfully been taken out of maintenance mode.";
}
catch(Exception Ex)
{
litOutput.Text = "Encountered an error stopping maintenance mode. System Center has been notified.

" + Ex.Message;
}
}
else
{
if (!monObject.InMaintenanceMode)
{
try
{
DateTime startTime = DateTime.UtcNow;
DateTime scheduledEndTime = DateTime.UtcNow.AddHours(extendHours);
string comments = extendHours + " hour maintenance mode window requested";
monObject.ScheduleMaintenanceMode(startTime, scheduledEndTime, 0, comments, Microsoft.EnterpriseManagement.Common.TraversalDepth.Recursive);
litOutput.Text = pcDisplayName + " has already been taken out of maintenance mode. Starting a new maintenance mode window for " + extendHours + " hours." +
"
Scheduled end time is: " + scheduledEndTime.AddHours(gmtOffset).ToString();
}
catch(Exception Ex)
{
litOutput.Text = "Encountered an error placing machine into maintenance mode. System Center has been notified.

" + Ex.Message;
}
}
else
{
try
{
MaintenanceWindow myWindow = monObject.GetMaintenanceWindow();
DateTime scheduledEndTime = myWindow.ScheduledEndTime.ToUniversalTime().AddHours((extendHours + gmtOffset));
string updatedComments = myWindow.Comments + " || " + extendHours + " hour extension requested";
monObject.UpdateMaintenanceMode(scheduledEndTime, 0, updatedComments, Microsoft.EnterpriseManagement.Common.TraversalDepth.Recursive);
litOutput.Text = pcDisplayName + " has had its maintenance mode extended by the requested " + extendHours + " hours" +
"
New scheduled end time is: " + scheduledEndTime.AddHours(gmtOffset).ToString();
}
catch(Exception Ex)
{
litOutput.Text = "Encountered an error extending maintenance mode. System Center has been notified.

" + Ex.Message;
}
}
}
}

}
}
}


And that's it... this works nicely once it's setup. It may take you a while to get all the components right - if you need help give me a shout.

Changing which Gateway server an agent reports to - eh?!

Picture the scenario: you've got 500 machines, 2 gateway servers. You've got half of them reporting to Gateway 1, the other half reporting to Gateway 2. You've done a bit of research and even enabled failover with a bit of nifty Powershell.

Then one day Gateway 1 dies. All agents failover and start reporting to Gateway 2. Yep, that's all good but you now need to get a new Gateway server ASAP as you now have no redundancy.

We quickly find a new server -- Gateway 3 and are now left with the task of reassigning all the agents currently reporting to Gateway 1 onto Gateway 3, whilst keeping Gateway 2 as the failover.

Never fear, there is a solution.

There are two parts to this: server and client side. I will discuss each individually.

Client:

OpsMgr stores its connection details in two places: NetworkName and AuthenticationName in the registry. These need to be changed to the new value. This however won't work unless the server component has been carried out. An optional step here is deleting the Health Service State. In reality we only need to delete the Connector Configuration Cache, but you will be absolutely amazed at how many problems can be solved with a simple deletion of Health Service State data. This is a step you can choose to take: as far as I'm aware, the Connector Configuration will be updated if the registry keys have changed. If you're wanting to delete the State data too, uncomment the lines in the PS script.


Server:

Here we need to tell the RMS (and DB) that the agent has changed its PrimaryManagementServer (Gateway). This cannot be done with the GUI and hence, Powershell will be used.

------

Assumptions: This solution assumes that the CLIENT portion will be executed on a machine with Powershell installed, and with WMI (RPC/DCOM) access to each machine needing to be changed. X86/64 isn't a problem as it detects the install directory. What this Powershell script does do is make a WMI call to the remote machines to change registry keys, optionally delete State data, and restart the service. If you don't have this access it WON'T work! Also, this assumes you're running as an admin and you do have admin rights on each machine.

-----

Client Portion:

The first thing we need is a list of all machines that report to Gateway1. We need to export this to a CSV because we'll be using it to address each machine. We're going to export this to a one-column CSV, with its NetworkName as the column name [ remember to remove the top few "computer generated" lines in the CSV.

[This will obviously be run through an OpsMgr shell on a server -- not a client]:

$oldGatewayName = "Gateway1"
$agents = Get-Agent | ?{ $_.PrimaryManagementServerName -eq $oldGatewayName } | Select NetworkName | export-csv c:\agents.csv


[Now, the client script - ensure you've copied the CSV from the script above to the C:\]:

$MachineList = Import-Csv c:\agents.csv
$NewGatewayName = "Gateway3"

Foreach($MachineName in $MachineList) {

Write-Host "Attempting " $MachineName.NetworkName
#Change Registry Keys
$reg = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey('LocalMachine', $MachineName.NetworkName)
$regKey= $reg.OpenSubKey("SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Agent Management Groups\DERSCOM\Parent Health

Services\0",$true)
$regKey.SetValue("NetworkName",$NewGatewayName,"String")
$regKey.SetValue("AuthenticationName",$NewGatewayName,"String")

#Get Install Path for Health Service State folder
$path = $reg.OpenSubKey("SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup")
$HealthServiceState = ($path.GetValue("InstallDirectory")).Replace("\","\\") + "Health Service State"


#Stop HealthService
$service = Get-WmiObject -query "SELECT * FROM Win32_Service WHERE Name = 'HealthService'" -computer $MachineName.NetworkName
$service.StopService()

Start-Sleep(10)

#Delete Health Service State Folder
#$directory = Get-WmiObject -query "SELECT * FROM Win32_Directory WHERE Name='$HealthServiceState'" -computer

$MachineName.NetworkName
#foreach($item in $directory) {
#$item.Delete()
#}

#Start Service
$service.StartService()

}

Server Portion:

#Unfortunately there is no way to pass Criteria Syntax to get-agent. You may extend the SDK directly if you so desire.


$oldGatewayName = "Gateway1"
$gateway2 = "Gateway2"
$gateway3 = "Gateway3"

$PrimaryMS = Get-ManagementServer | Where { $_.DisplayName -match $gateway3 }
$FailoverMS = Get-ManagementServer | Where { $_.DisplayName -match $gateway2 }

$agents = Get-Agent | ?{ $_.PrimaryManagementServerName -eq $oldGatewayName }

foreach($agent in $agents)
{
Write-Host "Setting MS/Failover for:" $agent.DisplayName
Set-ManagementServer -AgentManagedComputer: $Agent -PrimaryManagementServer: $PrimaryMS -FailoverServer: $FailoverMS
}


And that's that! Once you've run those parts your agents will take a few minutes and start talking to the new Gateway server. Simple as pie.

Powershell Script: Bulk Maintenance Mode

If you've ever needed to put a whole lot of machines into maintenance mode you'll know how tedious and time consuming this can be if you go through the GUI. There's no "cntrl + click" functionality, meaning you'll need to go to each machine and put them in.

There are obviously instances where certain machines are related by business process rather than an actual identifiable link.

My solution to this was to do the following: Allow for an import of a CSV file with a list of hostnames (just hostname, not FQDN) - loop through these and put each one into maintenance mode.

So simple. Here's the PS script (check comments for usage):

#Right, this is pretty simple. Use as follows:
#To put a BULK list of computers into maintenance mode, you will need a CSV Formatted with ONE column - that being the hostname of the machine.
#!!!!! VERY NB !!!!!!!!!!! VERY NB !!!!!!
#1) Make sure the first column is titled HostName otherwise this won't work!
#!!!!! VERY NB !!!!!!!!!!! VERY NB !!!!!!
#2) Change the $rootMS to the correct RMS.
#3) Usage is as follows: .
# Start Maintenance Mode: ./mm.ps1 START PathToCSVFile "Maintenance Mode Reason (be sure to encase in quotation marks like here)" DurationInHours
# Stop Maintenance Mode: ./mm.ps1 STOP PathToCSVFile
# Note: If you don't have START or STOP as your first parameter the script will not continue.
#4) This script implements strict error handling. A summary of each operation will be displayed after the operation has completed.
#5) If any errors are encountered, these are trapped and will be saved in CSV format in C:\errorMaintenanceMode.csv

param($action, $pathToCSV, $maintenanceModeReason, $durationHours)

$rootMS = "RMS"

if(([string]$action.CompareTo("START") -ne 0) -and ([string]$action.CompareTo("STOP") -ne 0))
{
Write-Host "Can't continue without a START or STOP action. Please read the comments in this .ps1 file for instructions."
Exit
}
$ErrorActionPreference = "Continue"
$error.Clear()

Set-Location "OperationsManagerMonitoring::" -ErrorVariable errSnapin;
New-ManagementGroupConnection -ConnectionString:$rootMS -ErrorVariable errSnapin;
Set-Location $rootMS -ErrorVariable errSnapin;

$computers = Import-Csv $pathToCSV
$resultsetCollection = @();

$currObjMaintDesc = "Bulk Maintenance Mode Update. Reason Given: " + $maintenanceModeReason
$startTime = [System.DateTime]::Now
$endTime = $startTime.AddHours($durationHours)

foreach($currentObj in $computers)
{
$currentObjName = $currentObj.HostName
$computerClass = Get-MonitoringClass -name:Microsoft.Windows.Computer
$computerCriteria = "DisplayName matches '(?i:" + $currentObjName + ")\.'"
$computer = Get-Monitoringobject -monitoringclass:$computerClass -criteria:$computerCriteria

if($action.ToUpper() -eq 'START')
{
"Starting Maintenance Mode on: " + $currentObjName
New-MaintenanceWindow -startTime:$startTime -endTime:$endTime -monitoringObject:$computer -comment:$currObjMaintDesc
}
elseif($action.ToUpper() -eq 'STOP')
{
"Stopping Maintenance Mode on: " + $currentObjName
Set-MaintenanceWindow -monitoringObject:$computer -endTime:$startTime
}
if(-not $?)
{
$errorFound = $true
$resultObj = "" | Select HostName, Status, ErrorMessage
$resultObj.HostName = $currentObjName
$resultObj.Status = "Failed"
$resultObj.ErrorMessage = $error[0]
$resultsetCollection += $resultObj
$error.Clear()

}
else
{
$resultObj = "" | Select HostName, Status, ErrorMessage
$resultObj.HostName = $currentObjName
$resultObj.Status = "Successful"
$resultObj.ErrorMessage = "No errors reported during operation"
$resultsetCollection += $resultObj
}

}
$resultsetCollection
Write-Host
if($errorFound)
{
Write-Host "Errors were encountered whilst trying to update computers. Please see the file c:\errorMaintenanceMode.ps1 for Error Info."
$resultsetCollection | Export-Csv "c:\errorMaintenanceMode.csv"
}
else
{
Write-Host "Completed entire Bulk Maintenance Mode Operation without any errors."
}

Retrieve a list of monitors in a critical state

One thing the OpsMgr team 'forgot' to do was allow us to view all Monitors in a Critical state. This is obviously something a lot of people would need as it's nice to have an overview of what isn't in a healthy state without trawling through Health Explorer Windows.

Unfortunately there is no elegant way to do this with the SDK and/or Powershell, as this needs to be addressed on a "per monitoringobject" basis, meaning we have to loop through classes and objects and that will take a crazy amount of time.

Once again, SQL to the rescue:

Select [TimeGenerated],a.BaseManagedEntityId, a.DisplayName, a.TopLevelHostEntityId, e.DisplayName as ParentDisplayname, e.FullName as ParentFullName,
f.TypeName,
DateDiff(mi,TimeGenerated,GETUTCDate())[TimeInState],ISNULL(a.path,a.DisplayName)[Path],OperationalStateName,
a.DisplayName[Problem],d.DisplayName[Problem Description],d.Description[Detailed Description]
from BaseManagedEntity a
INNER JOIN ManagedEntityAvailabilityView b on a.BaseManagedEntityId =b.BaseManagedEntityId
INNER JOIN StateView c on b.BaseManagedEntityId =c.BaseManagedEntityId
INNER JOIN MonitorView d on c.MonitorID= d.ID
INNER JOIN BaseManagedEntity e on e.BaseManagedEntityId = a.TopLevelHostEntityId
INNER JOIN ManagedType f on a.BaseManagedTypeId=f.ManagedTypeId
INNER JOIN

(Select BaseManagedEntityID,NewHealthState,max(TimeGenerated)[TimeGenerated]
From StateChangeEventView
Group By BaseManagedEntityID,NewHealthState)SC
On b.BaseManagedEntityID = SC.BaseManagedEntityId

Where HealthState = 3
and AlertPriority = 1
and IsUnitMonitor = 1
and SC.NewHealthState = 3
--and e.DisplayName = 'COMPUTERNAME'
Order By a.[Path]

Basic OpsMgr concepts - Classes and Objects

One abstract OpsMgr concept people generally battle with is distinguishing the difference between an object and a class.

In this post I'll use the following two as examples:

Object: Computer01.Pretend.com
Class: Microsoft.Windows.Computer

To see an example of this, navigate to the "Discovered Inventory" pane in OpsMgr. If you change the scope to "Windows Computer", you will (in my hypothetical example) see Computer01 listed. This means that the OBJECT Computer01 is of CLASS type Windows Computer.

To demonstrate this, let's look at two powershell CMDlets: get-monitoringclass and get-monitoring-object:

$class = get-monitoringclass -name "Microsoft.Windows.Computer"


This will assign the variable $class with the object for the class "Microsoft Windows Computer."

If we now wish to get a list of all monitoring objects contained within that class, we need to use get-monitoringobject:

$class  get-monitoringobject


As you can see, this supports pipeline usage - we take our $class variable and pipe it to get-monitoringobject. This then iterates through the class returning all its objects. We can then further format the data by using Select-Object to select the fields we're wanting to view. Additionally, if we're just wanting a total, we can pipe it to "measure-object" to get an object count.

What we've done here is mimicked the "Discovered Inventory" view in the GUI. This is very helpful if we're needing certain objects for something.

If you require assistance or would like a particular topic covered, be sure to leave a comment on a post.

Next up: Using "criteria" to search instead of Where-Object

Stuff to help you with these products, yo

Right, so here's the deal:

There's a few of us working for a rather large, rather international company - we make sure a whole lot of servers are working most of the time.

I'm sure if any of you have ever worked with OpsMgr and/or SCCM, you will know that these products are quite large and the documentation/help isn't all that great.

This blog is two-fold: it's going to help us categorise all the things we've done and also serve to contribute to the System Center community.

Our team is as follows:

DT: SCCM - he's been using this product for years. I know stuff all about it so any post regarding this topic will definitely be from him.

AB - SQL - despite what you may think, there are some things you cannot do with the SDK. If the data is there, Andre will find a way to use it.

RN - MP Authoring - need help understanding relationships? Hosted entities? Abstrat classes? Eh...

CM - Infrastructure - installing Gateways, Management Servers, Certificate Authentication, etc.

KC - SDK, Powershell, Custom OpsMgr applications, MP Authoring, general blog writer.

---

So here's hoping this won't turn into yet another "one post" blog.