AI-Powered Honeypots
Honeypots are decoy computing systems that mimic real environments to trick attackers into revealing their tools. Can AI enhance honeypot deception to increase cybersecurity?
Motivation
Any system—whether a home, a safe, or a network of computing devices—can be breached with enough time, effort, and resources. Securing a system involves creating asymmetry: making the effort required to successfully attack it so great that it is not worth the attempt. Cybersecurity is the practice of ensuring the confidentiality, integrity, and availability of networked computing resources.
For defending against known attack methods, existing cybersecurity tools are quite effective. For instance, signatures of known attack patterns are used in automated detection and prevention technologies. This trend is further strengthened by the established and automated dissemination of cyber intelligence. However, detecting and preventing zero-day attacks—novel, never-before-seen exploits—is often not possible.
As a result, serious hacker groups must develop custom, novel attack procedures, a costly process. These valuable zero-day tools become obsolete once discovered and patched. This makes attackers hesitant to deploy expensive techniques for fear of exposure, saving them for carefully selected targets. This is where cyber deception technologies come in: by simulating highly attractive, high-stakes targets, they incentivize attackers to “spend” their zero-day exploits, thereby increasing the cost of hacking.
Honeypots are decoy computing systems designed to mimic real environments and lure attackers into revealing their presence or tools. This project focuses on empowering honeypots with AI in order to gain threat intelligence. Since any activity on a honeypot is likely from an attacker, it allows direct observation of their techniques, tactics, and procedures, helping to adapt defenses proactively.
Background
The success of a honeypot depends on its ability to attract and convincingly deceive attackers. The honeypot’s configuration—including its network position, services, implemented security tools, and contents—can give away its nature as a decoy if not carefully managed.
This project aims to use AI to enhance honeypot positioning and realism, maximizing the collection of threat intelligence. With continuous data from the honeypot, AI can be used to dynamically update its configuration, optimizing effectiveness.
The advent of multipurpose large language models (LLMs) presents an exciting opportunity for improving honeypot interactions. For instance, could an LLM dynamically adjust honeypot behavior to entice attackers into exposing their tools? While some research has explored LLM-powered honeypots, current efforts are still in their infancy and provide limited advancements toward holistic honeypot deployment. So far, research has only used LLMs at the shell level, overlooking other critical aspects like network posture and available services, which are key to creating a convincing honeypot.
Challenges
The core research challenge of this project is determining which AI techniques to use—and how to apply them—to advance honeypot systems toward greater realism and threat intelligence.
Large language models (LLMs) may be ineffective for certain aspects of honeypots, or even all aspects, and they come with their own set of challenges. Research is needed to understand how and when to use LLMs to enhance honeypot functionality.
Designing and engineering a system that is both configurable and self-adapting is another key challenge. Building an architecture that supports scalable deployment is essential for broad data collection. Additionally, the deployed honeypot system must incorporate a learning feedback loop, enabling it to adapt over time.
A critical component of the system’s learning process and ongoing evaluation is the ability to measure the success of the honeypot quantitatively. Developing metrics of success is part of the project.
Using federated learning in this application presents unique challenges, particularly in ensuring the privacy of network data and preventing over-generalization of instance-specific details.
Project purpose
Our research aims to develop an AI-enhanced honeypot system with the following goals:
- Learning: The honeypot system should improve its ability to gather threat intelligence over time. It must continuously learn from the data it collects and update itself accordingly.
- Federated Learning: A key objective is to enable federated systems of honeypots to learn from each other without leaking sensitive, network-specific data. Private learning techniques may be required to safeguard this sensitive information. The goal is to empower diverse, often non-collaborative security teams to collectively enhance their threat intelligence without compromising their individual security.
- Shapeshifting: Honeypots simulating different environments—such as web servers hosting websites or automated tank gauge servers monitoring physical measurements—must adapt to the specific needs of each setting. Ideally, limited domain expertise will be needed to deploy the honeypot system in niche environments, as it should learn over time what works and what doesn’t in each context.
Our expected outcomes
- An open-source AI-enhanced honeypot system at TRL 4 (a tested prototype). The system should have exhibited success in ideally diverse system deployments (e.g., a web server, an edge device used in the automotive sector, healthcare device). Based on our metrics of success, the system should be tested head-to-head with the state-of-the-art and show substantial advancement.
- Academic publications and presentations disseminating the results
Facts
Funding: Vinnova
Total project budget: 4 028 600 SEK
Project period: June 2024 - June 2026
Participants: AI Sweden, Volvo Group, Aixia AB, Västra Götalandsregionen (VGR), Scaleout Systems AB and Dakota State University