Skip to main content

Understanding AI security: Machine learning system vulnerabilities

Machine learning (ML) systems – the technology behind most modern AI – can be vulnerable to various security threats. Unlike traditional software where security issues are typically in the code, machine learning vulnerabilities often stem from the data, the learning process, or the mathematical properties of the models themselves.

Understanding these vulnerabilities is essential for anyone building, deploying, or relying on AI systems. This guide walks through the key security concerns at each stage of the machine learning lifecycle.

How machine learning systems work

To understand machine learning security, it helps to distinguish between two stages of a machine learning system's lifecycle:

Training: Building the model

Training is typically a one-time process that combines three elements—training data, model architecture, and a training algorithm—to produce the trained model. Any issues introduced here become permanently baked into the system.

Inference: Using the model

Once deployed, the trained model processes new data and makes predictions. This is when users interact with the system—and when attackers can probe its weaknesses.

Hover for quick info · Click for details

The examples the model learns from
Training data
Mathematical structure of the model
Model architecture
Optimization process that fits the model
Training algorithm
The trained model — output of training, used in inference
Trained model

Training security concerns

Vulnerabilities introduced during training become permanent properties of the model.

  • Data poisoning
    Manipulated training data introduces errors or hidden vulnerabilities
  • Backdoor attacks
    Hidden triggers embedded during training cause targeted failures.
  • Inherited bias
    Biased data or architecture choices perpetuate harmful patterns.
New data fed to the model
Input
The trained model — output of training, used in inference
Trained model
Model's output or decision
Prediction

Inference security concerns

Deployed models face attacks through their inputs and outputs.

  • Adversarial inputs
    Carefully crafted inputs exploit model weaknesses, e.g., prompt injection.
  • Training data privacy leakage
    Sensitive training data can be extracted from the model.
  • Inference data privacy leakage
    User inputs or outputs during inference are revealed when they shouldn't be.
  • Model extraction
    Input/output pairs enable model replication.

Security issues