Search

The problem

If you run availability reports or performance reports with a aggregation type of daily or hourly the reports are empty. This problem is described a lot on the web. And I have also written a couple of blog post how to fix this issue. But as you know we are using scom to monitor stuff , so why not monitor this aggregation processing and alert if a processing delay is occurring. ? That’s our mission today….

Analyze

Using SQL enterprise manager and a SQL query on the data warehouse DB we can read out the aggregation processing. This query looks like this:

Select AggregationTypeId, Datasetid, (Select SchemaName From StandardDataSet Where Datasetid = StandardDataSetAggregationHistory.Datasetid) , COUNT(*) as ‘Count’, MIN(AggregationDateTime) as ‘First’, MAX(AggregationDateTime) as ‘Last’ From StandardDataSetAggregationHistory Where LastAggregationDurationSeconds IS NULL group by AggregationTypeId , Datasetid

The output will show us how many aggregations there have still to be processed /aggreationtype (20=hourly , 30 = daily).

So in this case we have no problem. But I have seen scom environments where the state aggregations where so far behind that it was almost not possible to fix it. This bring up a point: especially the state aggregations are the tricky ones. If you have many ‘flipping’ monitors there will be a lot of state changes and so a lot of aggregations data to process. This process takes a lot of SQL CPU power and also disk space. In most of this cases it was the tempdb data space free or transaction log that was the root cause of the failure.

Solution

In scom we have for every aggregation an target. This target is named ‘Standard data set’. You can find it here:

If you compare the screenshot with the results on your scom console you will notice that you don’t have the green healthy state… And that’s why you are reading this post. So lets add this state.

I wanted to give every dataset that has to be processed a health state on how many aggregation it has still to process. So we make a monitor that executes for every data set the query above and if a threshold is hit the health state is changed. Also we will add a rule so that this aggregation behind count is put in a trend graph.

I have used VSAE for this , and I will not share the code but only the idea. Why not ? I believe you have to know what you are doing and by copy & pasting you don’t learn from it if you don’t have done it once from start till end.

The real work

Open a new VSAE project and add a empty MP fragment and a PowerShell fragment.

Then you make a datasource that reads the aggregation count. This is done using PowerShell and the SQL snapin.

The PowerShell script has as input the GUID of the dataset (property of the target) and as output a property bag with the aggregations count (daily and hourly). I made the script somewhat intelligent by reading out the registry where the data warehouse is located.

Now we use this datasource in a monitor module type to create a 3 state monitor. And since we have created a datasource module we can create also a rule that collects the aggregation behind for the trend graph. Yes know know this is easier to type as to do…

Below a snap of the datasource module

And below a snap of the monitor module type

and the monitor. Create one for hourly(not shown) an one for daily.

At last for trending we have to create a collection rule.

Notice that the monitor and collection rule are having as target the “Microsoft.SystemCenter.DataWarehouse.DataSet” alias “standard dataset” and notice the runas profile.

The result

When you have constructed the MP and build/deployed it you will see 2 extra monitors on the standard dataset targets as show above. Open the health explorer to see if all is ok.

Above dataset has had a problem. To see some details, view the performance counters and you will see the aggregations trend.

In this case the state hourly aggregations where way behind. So I followed one of my own blog posts to solve this one. Where I manually executed in a loop the state aggregation process to speed up the processing.

The End.

Yes I know this post is a bit ‘çloudy’ and not something you can download and import. But I hope by sharing the idea I triggered you to try it your self.

Here we are again. I am for sure you will remember a situation where you were sure everything was configured correctly and you even had restarted the server but still it doesn’t work…… NO ??

Okay so this time it was my turn…

The problem:

I was trying to discover a new network device. Network discovery won’t find my newly added network device. Why ? I checked everything and it isn’t connectivity or security. Normally when those issue apply you will see a pending failure discovery attempt. What could be wrong..

Analyze:

1) First I checked the discovery rule. The network devices are explicit specified.

I made sure the device was listed in the explicit discovery.

2) I started the discovery and checked the progress events.

Hey what’s that ! One device excluded ?? Strange this option can only be applied to recursive discovery’s….

So it must be something in the discovery rule….

The solution:

So this excluded event triggered my attention.

1) open the discovery rule. And change the Discovery method to Recursive discovery

2) Now select the Exclude Filters tab

And what do we see here . The network device I was trying to discover.

3) Now remove this exclude filter entry and save the discovery template withoutRUNNING IT

4) Reopen the discovery rule and change the discovery mode back to ‘explicit’ and save it again.

5) now run the discovery.

6) and what do we see here … the device is discovered successfully !

Conclusion

It looks like this rule was initially created in recursive discovery mode. After a while the operator decided to remove a device from the discovered device inventory. If you do that a exclude filter entry is automatically created. At the end the operator decided to change the discovery mode back to explicit (don’t ask why) and that’s causing the problem I faced…