2025 Q4: From Discovery to Deployment: Shaping Safer AI Systems

From Discovery to Deployment: Shaping Safer AI Systems

In this issue:

European Commission tender award for CBRN risk assessment
Research insights on AI persuasion, honesty, and sandbagging
San Diego Alignment Workshop highlights and recordings
Upcoming events in London, Seoul, and the Bay Area (express interest)
We're hiring: across research, communications, and operations — plus a fellowship program coming soon!

My conversations in 2025 with policymakers and AI developers felt markedly different to previous years. The debate was no longer over abstract, long-term risks. Instead, people were grappling with AI systems that pose credible threats of large-scale harm in the near term. In 2025, large language models were used in terrorist planning, nation-state–sponsored cyberattacks, and large-scale psychological operations. Developers' own assessments indicate that frontier models are becoming increasingly capable of assisting malicious actors across chemical, biological, radiological, and nuclear (CBRN) domains.

Although safeguards are improving, our red-teaming—including work on GPT-5—shows that determined misuse remains difficult to prevent. Against this backdrop, I'm excited to announce that FAR.AI will work with the European Commission and our partners at SaferAI and SecureBio to conduct CBRN risk assessment of general-purpose AI systems, complementing our work with the NIST AI Safety Consortium and UK AISI.

At the same time, the widespread deployment of LLM agents in 2025 has coincided with a rise in misaligned behavior, ranging from sycophantic responses to increasingly sophisticated deception. A major focus of our research has been developing scalable oversight methods to prevent such behavior. Our initial findings were presented at NeurIPS last month, and we are now working to scale these methods and to release honesty-tuned variants of frontier open-weight models.

Beyond research, our San Diego Alignment Workshop brought together leaders from academia, industry, and government, while our Journalism Workshop on AGI Impacts & Governance (co-hosted with the Tarbell Center for AI Journalism) equipped reporters with stronger frameworks for covering AI safety. If you would like to attend one of our events or know someone who would, please fill out or share our expression of interest!

I'm encouraged by the momentum building at the intersection of technical research and policy engagement. Thank you for being part of this important work with us.

Founder & CEO, FAR.AI

FAR.Research: Highlights

European Commission Tender Award

FAR.AI has been awarded the contract from the European Commission for CBRN risk assessment of GPAI systems. Our core partners are SaferAI and SecureBio. Over the next year, we will help the AI Office identify and assess how advanced AI models could increase chemical, biological, radiological, or nuclear risks, and test whether proposed safeguards are effective in reducing those risks, helping shape technical governance mechanisms under the AI Act.

When LLMs Persuade, Does the Truth Prevail?

Our new research with nearly 3,000 participants shows that AI persuasion is a double-edged sword. GPT-4o was just as effective at increasing belief in conspiracy theories as it was at reducing those beliefs. Strikingly, participants' trust in AI rose more after interacting with a system that promoted conspiratorial claims than after interacting with one that challenged them. However, we found an easy way to shift the balance towards truth: telling the AI to only use accurate information.

Will AI convince you?

Try our Demo

Can we prevent AI from lying to us?

AI systems can learn to deceive evaluators and oversight mechanisms—misrepresenting their capabilities, intentions, or reasoning to circumvent safety measures and human control. If advanced AI systems hide their true capabilities or goals, our ability to ensure safe deployment becomes fundamentally compromised.

FAR.AI's Chris Cundy outlines where we stand on preventing AI deception and one line of work we're pursuing to mitigate this core AI safety challenge:

Watch the Video

Detecting Hidden Capabilities in AI Models

AI systems can intentionally hide their capabilities during evaluation—a behavior called sandbagging. We tested detection methods using an auditing game where a red team fine-tuned models to underperform while a blue team attempted to spot them. The most promising technique for identifying sandbagging was retraining the model to answer correctly, but more robust detection methods are still needed.

Your turn: Can you figure out if an AI is sandbagging?

Try our Demo

FAR.AI's Programs & Events

San Diego Alignment Workshop

The San Diego Alignment Workshop took place on 1–2 December, immediately before NeurIPS 2025, and brought together more than 300 leading researchers from academia, industry, and government to examine critical challenges in AI alignment.

Building on earlier Alignment Workshops (Singapore, Bay Area, Vienna, and New Orleans), the program combined talks, breakouts, and 1:1 meetings with technical discussions that deepened shared understanding of AGI risks, with a focus on agentic AI risk and security, evaluation reliability, interpretability and scalable oversight.

Full Playlist

Journalism Workshop on AGI Impacts & Governance

We co-hosted the inaugural Journalism Workshop on AGI Impacts & Governance with the Tarbell Center for AI Journalism, a half-day, journalist-only event designed to strengthen reporting on advanced AI. Speakers included former U.S. presidential AI advisors Ben Buchanan and Dean Ball, the Center for Security and Emerging Technology's Helen Toner, New York State Representative Alex Bores, Anthropic's Evan Hubinger, and experts from Deloitte Consulting and leading universities. Journalists engaged directly with researchers and policymakers to understand potential AGI timelines, economic impacts, geopolitical dynamics, and biosecurity risks, gaining clearer frameworks and sources for future coverage.

Labs Seminars

Check out our newly released recordings from the weekly FAR.Labs seminar: Marius Hobbhahn (Apollo Research), Jacob Hilton (Alignment Research Center), Jesse Hoogland (Timaeus), and Aaron Scher (MIRI).

Upcoming Events

We have several exciting events planned for 2026:

London Alignment Workshop (March 2–3)
Technical Innovations for AI Policy Conference (March 30-31)
Seoul Alignment Workshop (July 6)
Bay Area Alignment Workshop (Winter 2026)

Interested in attending? Submit an Expression of Interest early!

Express Interest

Welcome new team members!

We continue to grow and welcomed many new colleagues over the last three months. As part of this evolution, our former COO, Karl Berzins, has stepped into the role of President, and we are excited to welcome Vits Voronkov as our new COO.

Leadership: Vits Voronkov as our new COO
Research: Lukas Struppek (Member of Technical Team)
Communications: Heather McIntyre (Communications Principal)
Futures: Al-Hussein Saqr (Senior Programs & Strategy Manager); Annie Lehman-Ludwig (Senior Project Manager, Events)
Operations: Liz Ibarra (Co-Working Space Manager), and Kenya Scott (People Operations Generalist)

We're thrilled to have them on board and look forward to all the great work ahead.

Your Support Matters!

Work with us

Research:

Head of Engineering (Research): Lead our technical infrastructure and guide the research team as we scale our capabilities in AI safety and alignment research.
Senior Research Engineer / Research Engineer: Collaborate with our researchers on implementing cutting-edge AI safety experiments and evaluations.
Research Scientist: Explore and scale promising research directions in AI safety with a high potential for impact.

Communications:

Digital Media Manager: Develop and execute our digital strategy, manage our online presence, and create engaging content that communicates our research findings to policymakers, researchers, and the public.

Operations:

Finance Analyst: Partner with our Head of Finance to build and run an exceptional finance function.

Know someone who might be a good fit for any of these roles? Contact talent@far.ai.

FAR.AI Fellows Program

We're launching a new Fellows Program: a 3–6 month, paid, full-time role (remote-friendly, open globally), focused on a specific research project with a pathway toward permanent positions. The program is expected to launch in late January.

Express interest to learn more when applications open.

Help Us Drive Change

We're grateful to our funders and partners in the AI safety community and are actively seeking to diversify our support by increasing engagement outside the traditional AI safety space.

If you know funders, foundations, or institutions that you think we should connect with, we would value an introduction. Diverse strategic partnerships strengthen our ability to tackle these problems from multiple angles. If you can help connect us with potential funders, please reach out at hello@far.ai.

Let's Keep in Touch

Follow us on Twitter/X, LinkedIn, and YouTube for updates. Know someone who'd be a good fit for our work? Share our newsletter!

FAR.AI 2025 Q4