Cyber Security has a Machine Learning and Artificial Intelligence Problem

ml.jpeg

I recently finished a demo of a cloud security product that wouldn’t stop talking about the machine learning and AI in the product. Me being skeptical as the usual cyber security engineer would be, when I see ML and AI terms being used liberally I have to check further if they’ve actually pushed the use of a new learning algorithm or if they’re relabeling old technology with new words.

The answer, nope. Cyber security companies are full of bullshit when they use the terms ML and Ai.

Use Case #1 A lot of cloud instances are spinning up. Compared to a previous 30 day baseline they determine this is an anomaly.

Bullshit Meter: HIGH

My problem with touting the power of your ML in this use case. This is the same use case that was solved 20 years ago with security baselines. It’s not a new use case or development of an algorithm. It’s merely mapping the raw data to the same detection method that was used a quarter century ago in security space.

Use Case #2: A user logs in daily from Santa Clara, CA. Today we see him logging in from Brazil.

Bullshit Meter: HIGH

Give me a break. Take your cheap maxmind geoip data that maps user1:location1 and sets that as a static detection. If != to location key being the same as the previous location fire an alert. This is not machine learning. This is not an advanced algorithm. This is not artificial intelligence. Can we please call this a geographic detection, mismatch, or some normal word and not continue to pump and dump Ai and ML peanut butter across everything?

Bullshit Meter: HIGH

Use Case #3 Automation using a Lambda function that’s kicked off when an alert is fired or a user clicks a button.

My problem: A triggered script is not “Artificial Intelligence”. It’s a fucking script! It’s running a triggered script, my lord people. You wrote a lambda function that terminates an ec2 instance. This is not artificial intelligence. There is no neural network, there is no learning happening. You hit a correlation condition and then ran your script.

Conclusion: ML and AI in cyber security is still HIGH on the bullshit meter. If a companies ML and AI are the leading terms of there demo, show me your AI and ML. It shouldn’t be so hard to demonstrate real AI and ML that you are so proud of. The use cases I post above are all they can demonstrate. The complete strength of there AI and ML is encompassed in those lousy use cases.

Find me a product that’s using a clustering algorithm and grouping of truly related assets. Now describe your detection. And don’t bullshit me with a correlation rule that you are applying a scoring meter to.

No security vendors are doing anything new or novel with ML or AI. Just wrapping fancy words around correlation, threshold, and triggered scripts. Find me a company that’s using a learning algorithm and will demonstrate its learning capability in front of your eyes. During the demo, show me the product learning without the support of a human classifier doing all the work. No stupid botnet domain name learning use cases either, that’s 10 years ago new.