A Lazy Introduction to AI for InfoSec

Artificial Intelligence is no match for Natural Stupidity

What is AI in the first place?

The simulation of human intelligence by computers is called AI. The simulation includes learning, understanding, logical reasoning, and improvisation. Any AI that is created to perform only a specific set of tasks is called Artificial Narrow Intelligence. AI that is capable of self-correcting and making decisions exactly like a human is called Artificial General Intelligence. Real-time examples of artificial narrow intelligence include SIRI, Google Home, Alexa, IBM Watson, and Microsoft Cognitive Services.

Key Insights:

61% of enterprises say they cannot detect breach attempts today without the use of AI technologies

64% say that AI lowers the cost to detect and respond to breaches and reduces the overall time taken by 12%

73% of enterprises are testing use cases for AI for cybersecurity across their organizations

Positive Consequences:

Access Securing: Integrate and unify security solutions into a cohesive physical security management system

Fraud detection: Identify possible predictors of fraud associated with known fraudsters and their actions in the past

Malware detection: Process of determining the functionality, origin and potential impact of a given malware sample

Intrusion detection: Detecting vulnerability exploits against a target application or computer

Scoring risk in a network: Identifying and quantifying cyber-risk are essential for effective risk prioritization

User entity behavioral analysis: Algorithms and Statistical analysis to detect meaningful anomalies from the patterns of human behavior

Authentication with keystroke dynamics: Detailed timing information which describes exactly when each key was pressed and when it was released as a person is typing on a computer keyboard

Negative Consequences:

Modeling Password Guessability Using Neural Networks: Neural networks can often guess passwords more effectively than state-of-the-art approaches

Weaponized drones: Machines that attack on their own

Poisoning of Deep Learning Algorithms: A coordinate attack in which a fraction of the training data is controlled by the attacker and manipulated to subvert the learning process

One pixel attack for fooling deep neural networks: DNN is not continuous and very sensitive to tiny perturbation on the input vectors. Images can be crafted to adversarial images with modification just on one pixel

Poisoning Behavioral Malware Clustering: Clustering algorithms can be significantly compromised if an attacker can exercise some control over the input data

AI-Powered Cyberattacks Are Detected: Attacks using rudimentary machine learning to observe and learn patterns of normal user behavior inside a network

Why do we need private ML algorithms?

Machine learning algorithms work by studying tons of data and updating their parameters to identify the patterns in that data. Ideally, we want the parameters of the machine learning models to encode general patterns (“patients who smoke are more likely to possess heart disease’’) instead of facts about specific training examples (“Alice Parker has heart disease”). Unfortunately, the algorithms don’t learn to ignore these specifics by default. If we would like to use machine learning to unravel such a crucial task, like making a cancer diagnosis model, then once we publish that machine learning model (for example, by making an open-source cancer diagnosis model for doctors around the globe to use) we may inadvertently reveal information about the training set. A malicious attacker could inspect the published model and learn private information about Alice Parker. This is where differential privacy comes in.

Differential privacy makes it possible for tech companies to collect and share aggregate information about user habits while maintaining the privacy of individual users. It is a framework for measuring the privacy guarantees provided by an algorithm. The key is a family of algorithms called Private Aggregation of Teacher Ensembles PATE. OpenMined is an open-source community whose goal is to make the world more privacy-preserving by lowering the barrier-to-entry to private AI technologies. With OpenMined, an AI model can be governed by multiple owners and trained securely on an unseen, distributed dataset.

Now the BIG questions ahead are:

How to make the most of AI for cybersecurity?

What is ethical AI?

Is there any scope for privacy in the future?

Harnessing the power of AI can open up endless possibilities in Cyber. But respecting the data privacy and setting standards for using the data should be given priority.

Although it's a lazy intro but if it does make any sense, let's start caring about our data. Stay tuned for The Rajappan Project.