👨‍💻
f3dai blog
  • 🧑about
  • Articles
    • ✨Artificial Intelligence
      • Using Gemini to query MITRE ATT&CK
      • Mapping AI safety regulation
      • Poisoning Models
      • Threat modelling generative AI
      • o1 coding capabilities
      • Multi-agent adversarial AI systems
      • Deep reinforcement learning for red teaming
    • ⚙️ICS / OT
      • Consequence-driven Cyber-informed Engineering (CCE)
      • Energy plant cyber simulation
      • OT threat landscape
    • ☁️Cyber engineering
      • Building a cyber lab
        • 1️⃣Design
        • 2️⃣Deploy
        • 3️⃣Test
        • 4️⃣Automating
      • Threat modelling
      • Automating incident response
    • 🚩Capture The Flag
      • Hackthebox - Golfer - Reversing
      • Hackthebox - Behind the Scenes - Reversing
      • Hackthebox - Bypass - Reversing
      • Harder - TryHackMe Walkthrough
    • 🎓Career
      • Domains and roles
Powered by GitBook
On this page
  • How do you train a model?
  • Poison and backdoor a model
  • Mitigations
  • Useful resources

Was this helpful?

  1. Articles
  2. Artificial Intelligence

Poisoning Models

Data poisoning is a type of cyberattack in which an adversary intentionally compromises a training dataset used by an AI or machine learning (ML) model to influence or manipulate the operation of that

PreviousMapping AI safety regulationNextThreat modelling generative AI

Last updated 9 months ago

Was this helpful?

State of threats to AI systems

Microsoft's Mark Russinovich hosted a talk about AI security and talked about the different threats to AI systems, specifically GenAI apps to date. This is a good representation:

A lot of threats in the cyber space focus on attacks once a application or infrastructure has been deployed / developed. However in the AI space, there's high importance on confidentiality and integrity during the model training phase. A compromise in the training phase could lead to techniques such as backdoors and poisoning.

MITRE have also done a great job at creating their attack matrix:

How do you train a model?

According to Gemini, AI model training can be described in simple terms as:

Teaching a child to recognise a cat. You'd show them lots of pictures of cats, pointing out their features. Over time, the child learns to identify cats based on what they've seen.

Instead of pictures, we feed computers vast amounts of data. This data could be text, images, or numbers, depending on what we want the AI to learn. Models will analyse this data, looking for patterns.

AI model training involves data collection and preparation, model selection and architecture (such as choosing which algorithm - linear regression, decision trees, neural networks, etc), the training process (optimisation, loss function, epochs and batches), and evaluation. Tools may include Python libraries such as TensorFlow, PyTorch, Keras. Hadoop, Spark for handling large datasets. Matplotlib, Seaborn for understanding data and model performance.

Read this article before you start using random models!

Poison and backdoor a model

Adversaries may attempt to poison datasets used by a ML model by modifying the underlying data or its labels. This allows the adversary to embed vulnerabilities in ML models trained on the data that may not be easily detectable.

The embedded vulnerability is activated at a later time by data samples. The trigger may be imperceptible or non-obvious to humans. This allows the adversary to produce their desired effect in the target model.

A model is being developed on Wikipedia data. An adversary is aware of the data source being used and plants poisoned data.

A Wikipedia page is modified maliciously and is planted into the model whilst training. The adversary reverts the changes to cover their tracks.

An adversary poisons just 1% of instruction tuning data, which leads to a Performance Drop Rate (PDR) of around 80%.

This research highlights the need for stronger defences against data poisoning attack, offering insights into safeguarding LLMs against these more sophisticated attacks.

Mitigations

  • Data validation:

    • Obtain data from trusted sources.

    • Validate data quality

  • Data Sanitisation and pre-processing

    • Pre-process data by removing irrelevant, redundant, or potentially harmful information that can hinder the LLM's learning effectiveness / output.

    • Quality filtering (classifier-based filtering to help distinguish between high and low-quality content)

    • De-duplication

    • Privacy redaction (PII)

  • AI Red Teaming

    • Regular reviews, audits, and proactive testing strategies constitutes an effective red teaming framework.

  • AI "SecOps"

    • Take on a DevSecOps approach by integrating security into the model training pipeline / process.

Useful resources

If you're interested in using the models without having to train them, there are lots of pre trained, open-source models available on Hugging Face:

This gif illustrates the kind of data poisoning attack on AI Model. It basically shows how the alphas or weights are influenced by the new training samples which the model uses to update itself.
https://huggingface.co/
✨
https://github.com/dahmansphi/attackai
GitHub - RookieZxy/GBTL-attackGitHub
GitHub - JonasGeiping/data-poisoning: Implementations of data poisoning attacks against neural networks and related defenses.GitHub
Logo
Logo
GenAI Threats -
Threats mapped to AI attack vectors -
ATLAS Matrix -
https://www.youtube.com/watch?v=f0MDjS9-dNw
https://www.youtube.com/watch?v=f0MDjS9-dNw
https://atlas.mitre.org/matrices/ATLAS
https://github.com/RookieZxy/GBTL-attack/blob/main/README.md
Page cover image