(RP15) Automatic Classification of System Logs
TimeTuesday, June 26th8:30am - 10am
DescriptionSystem logs (syslogs) are a valuable source of information for analyzing computing systems behavior.
The message part of each syslog entry includes detailed information about its respective event.
Syslog is the de-facto logging protocol of high-performance computing systems.
The standard draft of the Syslog protocol (RFC5424) provides general guidelines for generating syslogs.
However, the message part of system logs is unstructured and every software generates its syslog messages independently.
Automatic methods are required to efficiently analyze the large number of syslog entries generated by large scale computing systems.
The unstructured nature of the syslog messages is a major challenge towards automatic syslog analysis.
Automatic text classification is a well-known approach to address this challenge (Suryawanshi et. al, 2015; Leydesdorff et. al, 2017).
However, to use this approach, the target classes must be predefined.
The common method for detecting the target classes is to apply machine learning techniques (deep learning) on text samples to generate a specific classifier for the respective sample text formats.
The present study proposes a general and automatic classification method for system logs.
The proposed method takes advantage of the high repetition of frequent system log entries and automatically generates accurate classifiers (in the form of regular expressions).
The preliminary results of analyzing one month of system logs of a production high-performance computer indicate a very high classification accuracy.
The classification accuracy is directly related with the amount of available system logs.
The classifiers are dynamically updated according to the new log entries.