AI Safety Marketplace

■ Problem Statement

We need empirical data quantifying whether and how frontier AI models enable individuals with limited cybersecurity expertise to successfully conduct malicious cyber activities that would otherwise be beyond their capabilities. Current risk assessments rely heavily on theoretical analysis rather than behavioral evidence. This research must design rigorous experiments comparing attack success rates, sophistication, and speed between AI-assisted and non-AI-assisted conditions across a representative sample of cyber attack types and threat actor skill levels.

■ Background

AI systems have demonstrated increasing proficiency at coding, vulnerability analysis, and security-relevant tasks. Theoretical concerns exist that these capabilities could lower barriers to entry for cybercrime and cyber attacks. However, limited empirical research exists measuring real capability uplift for non-expert actors. Prior work includes: RAND's 2023 study on AI-assisted cybersecurity tasks showing mixed results; academic red-teaming exercises demonstrating AI models can assist with exploit development; anecdotal reports of AI-generated phishing and malware. The UK AISI and US AISI both identify cyber misuse as a priority risk area requiring better measurement frameworks. Key open questions include whether AI primarily accelerates existing capabilities versus enabling entirely new attack classes, and what skill threshold is required for effective AI assistance. This research connects to broader questions about AI democratization of advanced capabilities and the offense-defense balance in cybersecurity.

■ Scope

IN SCOPE: Behavioral experiments with human participants representing non-expert threat actors; task batteries covering reconnaissance, exploitation, social engineering, malware development, and defense evasion; comparison between frontier LLM access versus internet-only access; quantitative and qualitative analysis of capability uplift; assessment of potential model-level mitigations. OUT OF SCOPE: Advanced persistent threat (APT) or nation-state actor capabilities; zero-day vulnerability discovery; attacks on specific production systems; actual deployment of malicious capabilities; highly specialized attack types requiring deep domain expertise even with AI; automated AI agent attacks without human direction. CONSTRAINTS: Must comply with ethical research standards and IRB approval; requires sandboxed testing environments; limited to publicly available or research-access AI models; participants must remain anonymous; no creation of real-world harmful artifacts.

■ Prerequisites

Required: Research methodology and experimental design experience, cybersecurity fundamentals (network security, common attack vectors, vulnerability assessment), familiarity with AI capabilities and limitations, human subjects research ethics/IRB processes. Helpful: Prior experience with behavioral studies, statistical analysis (R/Python), sandboxed security testing environments, prompt engineering with LLMs.

■ Acceptance Criteria

Published validated task battery with at least 15 distinct cyber tasks across 5+ attack categories, with documented difficulty levels and success criteria

Completed behavioral study with n≥50 participants (25+ per condition) showing statistically significant results on capability uplift metrics

Quantitative analysis demonstrating effect sizes for AI-enabled uplift across different task types and baseline skill levels

Research paper or technical report with methodology, results, and actionable recommendations for model developers and policymakers

Replication package including anonymized data, analysis code, and experimental protocols enabling independent verification

START HERE: (1) Review foundational literature: Read 'Cybersecurity and the Age of AI' by RAND (2023), 'Large Language Models and Cybersecurity' survey papers, AISI evaluation frameworks for cyber capabilities. (2) Examine existing cyber evaluation frameworks: MITRE ATT&CK framework for attack taxonomy, NIST Cybersecurity Framework, academic CTF (Capture The Flag) challenge designs. (3) Study behavioral experiment design: Research on human subjects studies in security, IRB requirements for dual-use research, statistical power analysis for comparative studies. (4) Technical preparation: Set up isolated virtual lab environments using tools like EVE-NG or GNS3, familiarize yourself with prompt engineering for technical tasks, explore existing AI red-teaming methodologies. (5) Initial pilot: Design one simple task (e.g., phishing email creation) and test with 5-10 participants in each condition to validate methodology before scaling up. (6) Connect with domain experts in both AI safety and cybersecurity to refine experimental design.

Quantify AI-enabled cyber capability uplift for non-expert actors through comparative behavioral studies

■ Problem Statement

■ Background

■ Scope

■ Impact Assessment

■ Prerequisites

■ Acceptance Criteria

■ Expected Artifacts

Working On This

Related Resources

✍️Blog Posts1

📄Papers1

Sources

Discussion (0)

■ Getting Started

Working On This

Discussion (0)