Optimal Security Alerts: Specific, Relevant, Actionable, Scalable
Author: Luke Wolcott
Luke Wolcott is a Senior Data Scientist at ActZero.ai. He loves trying to understand complicated systems, and finds computer networks and their users to be sufficiently complicated. Prior to ActZero, Luke did data science in healthcare tech, and taught math and wrote papers in academia. He has a PhD in algebraic topology from the University of Washington, and a BA from Swarthmore College.
Part of my job as a Data Scientist at ActZero is to create smart high-fidelity detections, with fancy machine learning or with straightforward analytic rules. When these detections trigger, they usually send off an alert to one of our threat hunters, or directly to a customer’s IT point person. So I spend a lot of time thinking about alerts — not just when to send them, but what to include to make them most useful. ActZero’s detection alerts are designed and tested not only to accurately identify indicators of attack, but to ease alert fatigue and speed up investigation and response.
In this blog post, I will explain why not all alerts are created equal, and what makes for an informative and actionable alert — all in the vein of helping to reduce alert fatigue, and enable other positive security outcomes.
Alerts without context yield anxiety (especially at volume)
I’ll work through an example to show you what I mean. Imagine you are an IT admin in charge of keeping your network safe, and someone in your organization just used admin privileges to turn off Office 365’s default malware filter policy. You would want to know about it! This could be the early stages of an attack, with an attacker preparing to deliver malware, soon.
The alert email from Microsoft looks something like this:
Now you’re super stressed out and anxious, and wondering what to do next. And you have tons of questions. Who turned off the policy? Is it another regular admin, or someone who recently had their privileges elevated? When did they log on, and from where? What else have they been up to? The links in the Microsoft alert email just send you to more details about the alert itself (such as whether it has been assigned to anyone for processing, or what MITRE ATT&CK category it covers), but don’t help you in your investigation or response efforts.
Proactive Alerts; where the investigation (and response) has already begun
The alert messages that we send include useful details about the triggering event(s), as well as important contextual information and analytics to jumpstart the investigation and response. When a detection triggers, it initiates a quick analytics script that calculates useful data points to include in the alert. These are the answers to the questions that you would (or, that threat hunters or security analysts would) ask (or should ask) upon receiving the alert.
In an effort to address such questions, ActZero alerts tell you useful stats about the user in question and their recent login activity: how many successful logins, how many failed logins, from how many different IPs, etc. We tell you about their admin status: has it been changed or been elevated recently? We tell you about their recent post-login activity: what Office 365 apps have they been using, with what frequency?
How do we know this information is useful? Threat hunters told us so! They are experts “on the ground,” handling alerts and threats 24/7, so they know what investigative steps work and what changes would improve response times. The R&D team works closely with our threat hunters to design, test and iterate on the content of alert messages to make them practically applicable.
Furthermore, in addition to automating some of the investigative work, our alerts include a link to recommended remediation steps. So the recipient (be they threat hunter, or IT stakeholder on the customer side) knows immediately what to do next, and has instructions on how. These instructions have been crafted carefully by our security engineers and threat hunters to be maximally useful and efficient. They explain what the detection means, how to investigate if the activity is malicious, recommended step-by-step remediation, and recommended additional precautions to consider.
Many critical detections require immediate involvement of threat hunters when they are triggered; so, we have also identified a set of detections for which the remediation steps are so straightforward that the response can be completely automated. For a provider like us, the reason is efficiency; but this outcome is beneficial to anybody responding to threats, for both speed of response, and reducing fatigue. In such cases, your alert email — if you get one at all — will inform you of the incident and tell you what steps have already been taken.
Compare the alert from above with this alert email, sent out after we detect that a malware filter policy has been removed:
If you’re curious, the link to recommended remediation steps is here.
High-quality detections yield relevant alerts
The above apples-to-apples comparison used a very simple detection: malware policy removal detected in logs. We also use machine learning (ML) models trained on our large volume of data to capture more complex malicious behavior. The features that help the model decide if behavior is sufficiently suspicious also help the human responder to understand the situation and decide what to do next. So when our ML models catch a high-fidelity (the indicativenes of attack), high-severity (the potential impact of the malicious action) signal, the alert that is sent out will include feature values as contextual information.
For example, our Cloud Suspicious Login model uses login behavior to detect a suspicious login. It compares the details of a suspicious login to that user’s typical behavior. For example, what type of device do they usually use to log in, and what type did they use this time? Some of those statistics are compiled and included in the alert message, to help paint a picture of what is going on, what looks suspicious, and what needs to be done in response. For a detailed look at user behavior in the context of logins, check out our post on o365 account takeover here.
I hope this post has helped you see why good detections deserve good alerts, and steps you can take to achieve them. I also hope it has fostered potential applications, whether within your own IT department, or your security provider, in terms of making alerts more conducive to A) providing the right information to enable remediative action for IT teams B) maximizing the efficiency and efficacy of threat hunters to support investigations and encourage automation and C) ultimately reducing alert fatigue that plagues so many IT and security stakeholders.
To see our high-quality alerts in action, you can request a demo of our service here. Or, for more on our machine-learning detections, check out this post on detecting account takeover in o365.