Artificial intelligence has been part of cybersecurity for years, but almost exclusively on the defensive side. Blue team tools such as intrusion detection systems, SIEM platforms, and anomaly detection engines have long relied on machine learning to search through massive volumes of logs, spot unusual patterns, and flag potential threats faster than any human analyst could. These systems excel at recognizing deviations from normal behaviour, clustering suspicious events, and correlating signals across distributed environments.
For a long time, this created an asymmetry: defenders had AI–enhanced visibility, while attackers and red team operators still relied primarily on manual techniques, custom scripts, and human intuition. This change started to appear thanks to developments in Artificial Intelligence, specifically into Large Language Models (LLMS) and Agentic AI systems to push red team operators new ways to reason about environments, automate complex tasks, and explore attack paths with unprecedented speed. What began as a defensive advantage is now becoming a shared capability that can be integrated into the operation of attackers to accelerate processes and surpass human-controlled operations and detections. An example of this shift appeared in a 2024 experiment published in a Springer study, where researchers used ChatGPT 3.5 to assist in compromising a vulnerable machine from VulnHub. The AI wasn’t acting as a simple chatbot, it behaved more like an agentic assistant capable of interpreting scan results, proposing next steps, and generating exploitation ideas. It suggested reconnaissance commands, helped interpret Nmap outputs, and even drafted a full penetration testing report. The researchers noted that the model occasionally hallucinated or misread data, but with human oversight, the workflow became significantly faster and more structured. This experiment demonstrated how AI can support offensive reasoning in a way that mirrors how blue team systems support defensive detection.
At the same time, the limitations of AI in offensive security are becoming clearer. The same study that showcased ChatGPT’s strengths also highlighted its weaknesses: hallucinated vulnerabilities, misinterpreted scan results, and commands that didn’t exist. These issues reflect concerns in AI-for-security research, where reliability and explainability remain major challenges. Agentic systems can propose creative exploitation ideas, but they don’t truly understand the environment they’re operating in. Human judgment remains essential to validate outputs, correct errors, and ensure ethical boundaries are respected.
AI’s strengths in penetration testing can be described in the following topics:
- Speed and Scalability
AI can process logs, scan outputs, and documentation far faster than humans. Tasks like reconnaissance and enumeration become dramatically more efficient.
- Pattern Recognition
Machine learning models excel at spotting anomalies or correlations that might be missed by manual analysis.
- Creativity in Attack Paths
LLMs can propose unconventional exploitation ideas, payload variations, or privilege escalation techniques that spark new lines of thinking.
- Reporting and Documentation
AI generated reports are surprisingly coherent, consistent, and detailed, which saves a huge time for pentesters who must include the paperwork phase into their tests and attacks.
AI’s weaknesses identified are the following:
- Hallucinations and Inaccuracies
LLMs sometimes invent vulnerabilities, misinterpret scan results, or propose impossible commands.
- Lack of Contextual Awareness
AI may not fully understand the nuances of a specific environment, leading to irrelevant or unsafe suggestions.
- Ethical and Security Risks
Uncontrolled AI tools could be misused, leak sensitive data, or generate harmful content if not properly governed.
- Limited Real–World Autonomy
Despite progress, AI cannot yet autonomously conduct a full, reliable penetration test without human oversight.
The rise of agentic AI is accelerating this trend. These models can chain tasks together, maintain context across long interactions, and adapt their suggestions based on evolving information. In penetration testing, this means an AI system can help map a network, analyze misconfigurations, propose exploitation paths, and refine its approach as new data emerges. While these systems are not autonomous attackers, they provide a level of strategic assistance that was previously unavailable to red teams.
These developments make it clear that offensive AI is advancing quickly, but they also highlight exactly where the next breakthroughs need to occur. The weaknesses identified from hallucinations to limited contextual awareness are indicators of where innovation can have the greatest impact. Additionally, many of the emerging research directions already align with these gaps. Agentic systems capable of maintaining long–term context directly address the problem of environment–blind suggestions. Domain–specific fine–tuning on validated security datasets can reduce hallucinations and improve the accuracy of exploitation guidance. Reinforcement Learning (RL) models that continuously adapt to new defensive patterns can help mitigate the rigidity and unpredictability seen in current LLM–based tools. At the same time, the industry is beginning to explore hybrid approaches that combine the strengths of multiple AI paradigms. Pairing LLMs with RL–driven decision engines, or integrating AI reasoning layers into existing pentesting frameworks, opening doors to tools that are not only faster but also more reliable and explainable.
The CyberAId project is capable to explore some of the approaches and test in a more complete software architecture than individual pentest research. CyberAId can provide progress into the AI pentest tools thanks to the development of techniques that address the general AI weaknesses mentioned earlier, reducing hallucinations, using additional contextual information, and creating explanation points support the decision logic framework. Additionally, some of the strategies described earlier can contribute to address for techniques to be developed in the CyberAId scenario, exploiting a digital twin environment by using AI reasoning to search and analyse the system, deciding when and how to act by creating a dynamic sequence of actions and evaluating the information obtained from those attacks.
Hilario, E., Azam, S., Sundaram, J. et al. Generative AI for pentesting: the good, the bad, the ugly. Int. J. Inf. Secur. 23, 2075–2097 (2024). https://doi.org/10.1007/s10207-024-00835-x (https://link.springer.com/article/10.1007/s10207-024-00835-x)



