Last week i have been working on monitoring a business critical application. This application does not write events to the Windows event log but only in it’s own log files. With System Center Operations Manager you have several options on monitoring these log files. In this post some tips and tricks on how to do log file monitoring with System Center Operations Manager. There are some important considerations that must be taken into account when monitoring log files with System Center Operations Manager. The next lines are taken from the Microsoft support site: (http://support2.microsoft.com/kb/2691973):
When monitoring a log file, Operations Manager remembers the last line read within the file (a ‘high water mark’). It will not re-read data before this point unless the file is deleted and recreated, or renamed and recreated, which will reset the high water mark. If a logfile is deleted and recreated with the same name within the same minute, the high water mark will not be reset, and log entries will be ignored until the high water mark is exceeded. An implication of this is that log files that are cleared periodically without being renamed and recreated, or deleted and recreated, will not have entries in them processed until the high water mark from before the log is cleared is exceeded. Operations Manager cannot monitor ‘circular log files’ (i.e. log files that get to a certain size or line count, then start writing the newest entries at the beginning of the log) for the same reason. The log file must be deleted or renamed and then recreated, or the application configured to write to a new log once the current log is filled. Example:
-
-
-
- 100 lines are written to logfile.txt
- logfile.txt is cleared of all entries
- log entries are written to logfile.txt (position 0 of the file)
- None of the new entries will be processed until line 101 is written
-
-
Each line of a log file must end with a new line (0x0A0x0A hex sequence) before it will be read and processed by Operations Manager. If a rule or monitor is configured to match a pattern for log file names (e.g. using the ? or * wildcard characters), it is important that only ONE log that matches the pattern is written. If multiple logs that match the pattern are being written to, the high water mark is reset to the beginning of the file with each write to a different file. The result is that all previous log entries will be reprocessed. Example:
-
-
-
- The log file name pattern is generic_csv??.txt
- The current log is generic_csv01.txt and writes happen to this log.
- A new log, generic_csv02.txt, is created. Writes occur to this log.
- When the next line is written to generic_csv01.txt, the Operations Manager will read from the beginning of generic_csv.txt, not from the last point that was read from generic_csv01.txt. Lines previously processed will be processed again, possibly resulting in alerts or other actions (depending on the rule configuration)
-
-
Another consideration is that when the log file you configured does not exist you won’t get an alert. When monitoring log files you have again the choice to use a rule or monitor. If you want it to affect the health status of your object you use a monitor. In all other cases a rule. When using a monitor, you have preferably have a healthy log entry so you can automaticly have the object turn healthy again. Where do i start? You want to know which entries you can expect in a log file. Some entries will affect the functionality on the application where other will not. I found that in many cases it’s difficult for application administrators to specify which errors and warnings can be written to an application log. If you are not sure about this you should consult the supplier and get al the information you can. Another option you can use is reading true old log files in search for Errors and/or warnings. I got the tip to use Notepad++ for this. You have the find in files option which gives you alle the entires in the logs files in a specific folder. You can also just simply get ALL the entires with the word Error or Warning. In this case i used regular expressions to filter the entries that should be picked up. The application i have been working on this week logged an entry: Error code = <0>. In this application it means everything is OK. All higher numbers are BAD. You could configure a regular expression like this:
You will find that using regular expressions gives you a certain flexability. (not only in log file monitoring…..) I nice tool i found this week is http://regexpal.com/. Here you can you test your regular expressions so you don’t have to wonder if you created the right syntax:
You should create a class in Operations Manager for your application you can target the monitors and rule to this. A computer or group can also be targeted but in my opinion this is not the way to go. So how do i create these monitor and rules? Setting up monitor and rules for log files can be done from the Operations Console. Go to Authoring, select rules or monitors and choose Create new. From here the different monitoring templates are available:
Maintenance mode: when the targetted object is in maintenance mode, the high water mark keeps running but no alerts will be raised.
One last tip You can display the Params you create in the error description. This way you can quickly view the entry that raised the alert:
I hope this post gives you a bit more insights in the possibilities. If any questions come up please let me know. Let’s make it manageable!