Home / Infrastructure & Network Security / AI-Driven Multi-Agent System Enhances Phishing Email Detection

AI-Driven Multi-Agent System Enhances Phishing Email Detection

Jun 4, 2025

Image credit: khampha Phimmachak / Vecteezy

Russell FairweatherCybersecurity Consultant

The realm of cybersecurity faces a persistent and evolving threat in the form of phishing emails, which are designed to deceive users into divulging sensitive information by masquerading as legitimate communications. As attackers become increasingly sophisticated, traditional methods of phishing detection have struggled to keep pace with their dynamic and multifaceted tactics. Phishing schemes now range from simple scams to intricate operations utilizing social engineering and precision-targeted spear-phishing techniques. Compounding these challenges is the use of AI to craft highly convincing fraudulent messages, adding a further layer of complexity for defense mechanisms. This growing sophistication necessitates advanced solutions, capable of not only identifying threats but also doing so with a level of transparency and reliability that builds trust in automated systems. In an innovative response, researchers from the University of Auckland have developed MultiPhishGuard, a pioneering system utilizing a multi-agent architecture enhanced by Large Language Models (LLMs) to tackle this issue comprehensively. This groundbreaking approach holds the potential to significantly fortify defenses against the escalating threat of phishing attacks, promising improvements in both detection accuracy and user understanding.

Evolving Threats in Phishing

Phishing has endured as a formidable issue in cybersecurity, with attackers continually refining their strategies to exploit human and technological vulnerabilities. The Anti-Phishing Working Group’s report on an overwhelming rise in phishing incidents, with numbers exceeding 930,000 in a single quarter, underscores the menace’s prevalence. Traditional defense mechanisms, those relying heavily on rule-based filters or static blocklists, fall short against such evolving threats. These methods are often criticized for their rigidity, leaving them ill-equipped to combat innovative tactics such as domain spoofing or sophisticated URL manipulation. While machine learning offers some advancement, its reliance on historical data and known features can limit its effectiveness, particularly against novel or cleverly disguised threats. More advanced tools like deep learning models and LLMs, despite their potential, often operate in isolation, analyzing emails in a manner that can omit important context or metadata signals. This singular focus may lead to a “black box” scenario where the logic behind decisions remains obscure, thus complicating validation processes and reducing user trust. Therefore, a comprehensive system that leverages the strengths of multiple AI agents with distinct specializations could be the key to overcoming these challenges.

Multi-Agent Approach of MultiPhishGuard

In addressing the limitations of conventional phishing detection, the innovative MultiPhishGuard system employs a multi-agent approach, akin to assembling a team of specialists, each focusing on different aspects of an email. Utilizing LLMs known for their superior natural language processing capabilities, MultiPhishGuard assigns specific tasks to three primary “Basic Agents,” effectively distributing the workload and capitalizing on each agent’s strengths. One of these key agents is the Text Analysis Agent, which meticulously examines the email’s body for any linguistically suspicious patterns or phishing-related keywords that might indicate malicious intent. Alongside, the URL Analysis Agent scrutinizes embedded links within the email, assessing potential obfuscation techniques and verifying domain reputations to determine if any links lead to harmful destinations. Complementing these efforts is the Metadata Analysis Agent, which delves into the technical components of the email, such as headers and sender authentication records, to identify discrepancies or abnormalities that might suggest phishing. This synergistic approach allows MultiPhishGuard to render an overall assessment of each email’s legitimacy, backed by independent evaluations from each agent.

Each agent not only provides its analysis but also assigns a confidence score to its findings, further supporting the reliability of the system’s final decision. These individual scores are synthesized to present a comprehensive and robust classification, determining whether an email is phishing or legitimate. This collaborative method enhances the system’s versatility and accuracy, enabling it to adapt to a wide variety of phishing tactics. Moreover, by offering transparency in its reasoning, the system aspires to foster greater trust among users, facilitating informed action.

Dynamic Decision-Making and Continuous Learning

MultiPhishGuard stands out in its ability to dynamically adapt and refine its methods through sophisticated learning algorithms, thereby maintaining high accuracy in detecting phishing emails. A pivotal component of its functionality is the dynamic weight adjustment mechanism, which leverages reinforcement learning techniques such as Proximal Policy Optimization to determine the relative importance of each agent’s findings in real-time. This adaptability allows MultiPhishGuard to assign varying levels of significance to the agents’ analyses based on email-specific features. For instance, when an email contains numerous suspicious URLs, the URL Analysis Agent’s input is weighted more heavily. Such flexibility ensures that the system minimizes false positives while enhancing its capacity to detect potential threats accurately. Furthermore, the system’s effectiveness is reinforced through adversarial training, which incorporates an adversarial agent capable of generating novel phishing and legitimate email examples to challenge MultiPhishGuard. This rigorous training process enables the system to recognize and adapt to emerging phishing tactics. By continually honing its defenses against sophisticated scams, the system cultivates a self-improving ecosystem. The adversarial approach, confined to a controlled testing environment, ensures that the knowledge gained is securely implemented without risking misuse.

Enhancing Interpretability and User Trust

Ensuring transparency in AI-driven systems is crucial in building user trust and acceptance. MultiPhishGuard addresses this by employing an Explanation Simplifier Agent. This agent consolidates the evaluations made by the basic agents into succinct, coherent summaries, making the system’s reasoning accessible to users of all technical backgrounds. Instead of merely judging an email as “phishing,” the system might explain: “This email is likely a phishing attempt due to a forged sender address, urgency in the message, and a suspicious URL mimicking a well-known banking site.” Such explanations notably demystify the technical processes involved, helping users understand why specific emails are flagged. In addition, an “Expert Mode” is anticipated, offering more detailed explanations for cybersecurity professionals requiring greater depth. This focus on transparency is not only pivotal for fostering trust but also aids users in making informed decisions regarding the safety of emails.

Empirical Validation and Performance Metrics

MultiPhishGuard’s effectiveness has been extensively tested and validated through experiments utilizing multiple publicly available datasets, encompassing nearly 4,000 emails classified as either phishing or genuine. The system’s performance has been benchmarked against established datasets, such as the Nazario phishing corpus, Enron-Spam dataset, and the SpamAssassin public corpus, yielding impressive results. MultiPhishGuard achieved a remarkable accuracy rate of 97.89%, coupled with a low false positive rate of 2.73% and an even lower false negative rate of 0.20%. These metrics highlight the system’s capability to accurately discern phishing emails from legitimate correspondence, affirming its practical applicability in email security solutions. When compared to other methodologies, including single-agent LLM models like RoBERTa-base and Chain-of-Thought prompting, MultiPhishGuard consistently emerged superior across a wide range of selectivity criteria. While traditional models demonstrated high recall, they faltered in precision due to higher false positive rates, impacting their overall reliability. Furthermore, ablation studies demonstrated the critical importance of each of MultiPhishGuard’s components by temporarily excluding them and observing performance impacts. The omission of any component, such as the URL or metadata agents, notably degraded results, highlighting their integral roles. Static weighting approaches or the absence of the adversarial training module reduced the system’s adaptability. Even removing the explanation simplifier agent affected the clarity of explanations, if not detection accuracy.

Redefining Cybersecurity with MultiPhishGuard

To overcome the shortcomings of traditional phishing detection systems, the pioneering MultiPhishGuard utilizes a multi-agent technique comparable to assembling a team of experts, with each member focusing on distinct features of an email. This system leverages LLMs celebrated for their exceptional natural language processing skills, and tasks are divided among three primary Basic Agents to optimize efficiency and leverage each agent’s strengths.

The Text Analysis Agent rigorously inspects the content of the email for suspicious linguistic patterns and phishing-related keywords that might suggest malicious intent. In parallel, the URL Analysis Agent scrutinizes embedded links within the email, evaluating potential obfuscation strategies and checking domain reputations to see if any links lead to harmful sites. Complementary to these roles, the Metadata Analysis Agent analyzes the technical elements of the email, such as headers and sender authentication records, to spot discrepancies or unusual activities indicative of phishing attempts.

This collaborative approach gives MultiPhishGuard the capability to deliver a comprehensive evaluation of an email’s legitimacy. Each agent provides its analysis and assigns a confidence score to support the system’s final judgment. These scores are integrated to offer a thorough classification, determining if an email is phishing or legitimate. This coordinated technique improves the system’s adaptability and precision, targeting an array of phishing strategies while ensuring transparency, thereby building user trust and promoting decisive action.