Three AI safety regrants we liked

Reviewing AI safety regranting: our wins, and what didn’t quite fit

By Jesse Richardson

AI moves at a crazy pace. To keep up, AI safety needs to move similarly fast, and that means funding has to move with it. Take the AI safety org Timaeus: our regrantors weren’t their bigger funders, but they were first — and in a world of exponential curves, accelerating potential impact by months really matters.

This is a big part of why Manifund runs our AI safety regranting program: allocating budgets of $50K-$400K to experts ("regrantors”) who choose promising projects to fund. As we’ve just announced our 2025 regrantors, now is a good time to review past regrants -- some we think were great, and others that weren't such good fits. What makes a great regrant? We look for early-stage projects that need quick funding, opportunities OpenPhil might miss, and chances to leverage our regrantors' unique expertise.

We think our regranting program is one of the best opportunities for donors who: care about AI safety, want to seed ambitious new projects, and like transparency and moving fast. If you may want to fund our 2025 program, please contact [email protected]!

About the author: Jesse Richardson recently joined Manifund after working at Mila - Quebec AI Institute, and also has a background in trading on prediction markets. This post is Jesse’s low-confidence hot takes about regrants.

Three awesome AI Safety regrants…

Scoping Developmental Interpretability to Jesse Hoogland - the first funding for Timaeus, accelerating its research by months
Support for deep coverage of China and AI to ChinaTalk - reporting on DeepSeek, ahead of the curve
Shallow review of AI safety 2024 to Gavin Leech - quick regrants inducing further funding from OpenPhil & others

https://manifund.org/projects/scoping-developmental-interpretability-xg55b33wsfc - the first funding for Timaeus, accelerating research by months

This regrant was made in late 2023 to Jesse Hoogland and the rest of what is now the Timaeus team, for the purposes of exploring Developmental Interpretability (DevInterp) as a new AI alignment research agenda. Four of our regrantors — Evan Hubinger, Rachel Weinberg, Marcus Abramovitch and Ryan Kidd — made regrants to this project totalling $143,200. Evan had previously mentored Jesse Hoogland as part of MATS, and therefore had additional context about the value of funding Hoogland’s future research. This is the sweet spot for regranting: donors may have the public information that Evan Hubinger is an expert who does good work, and could potentially donate to him/his work on that basis, but regranting allows them to leverage his private information about other valuable projects, such as DevInterp.

Regarding the grant itself; success for this project looked like determining whether DevInterp was a viable agenda to move forward with, rather than producing seminal research outputs. I recommend reading more about DevInterp if you’re interested, but my shallow understanding is that it aims to use insights from Singular Learning Theory (SLT) to make progress on AI alignment through interpretability, focusing on how phase transitions in the training process lead to internal structure in neural networks.

I’m not well placed to form an inside view on how likely DevInterp was/is to succeed, but this proposed research agenda had numerous things going for it:

it was novel; the application of SLT to alignment was largely unexplored prior to this work,
it seemed to be well thought out; the LessWrong write-up included plenty of detail about why we might expect phase transitions to be a big deal and how this would relate to alignment, as well as a solid six-month plan,
it had an element of “big if true” i.e., it may be unlikely that the strong version of the DevInterp thesis is true, but this research has potential to make meaningful progress on AI alignment if it is

These are all markers of projects I am excited to see funded through Manifund regranting. Besides our four regrantors, this agenda was also endorsed by respected AI safety researchers Vanessa Kosoy and John Wentworth, and it makes sense to update on their judgment.

In addition to the agenda itself, I think this was a good team to bet on for this kind of work; they seem capable and have relevant experience e.g. ML research, and running the 2023 SLT & Alignment Summit.

This regrant is a strong example of where Manifund’s regranting program can have the biggest impact: being early to support new projects & organizations, and thereby providing strong signals to other funders as well as some runway for these organizations to move quickly. In this case, Manifund’s early funding helped Hoogland’s team get off the ground, and they subsequently started a new organization (Timaeus) and received significantly more funding from other sources, such as $500,000 from the Survival & Flourishing Fund. It’s probable that they would’ve gotten this other funding regardless, but not guaranteed, and I’m happy that Manifund helped bring Timaeus into existence several months sooner and with increased financial security. Jesse notes:

Getting early support from Manifund made a real difference for us. This was the first funding we received for research and meant that we could start months earlier than we otherwise would have. The fact that it was public meant other funders could easily see who was backing our work and why. That transparency helped us build momentum and credibility for developmental interpretability research when it was still a new idea. I'm pretty sure it played a significant role in us securing later funding through SFF and other grantmakers.

In terms of concrete outcomes, there’s a lot to be happy with here. Timaeus and its collaborators have published numerous papers on DevInterp since this regrant was made, and it seems that DevInterp’s key insight around the existence and significance of phase transitions has been validated. My sense is that the question of whether DevInterp is a worthwhile alignment research agenda to pursue has been successfully answered in the affirmative. It’s also nice to see strong outreach and engagement with the research community on the part of Timaeus: November 2023 saw the first DevInterp conference, and they’ve given talks at OpenAI, Anthropic, and DeepMind.

https://manifund.org/projects/support-for-deep-coverage-of-china-and-ai - reporting on DeepSeek, ahead of the curve

In 2023 & 2024, Manifund regrantors Joel Becker and Evan Hubinger granted a total of $37,000 to ChinaTalk, a newsletter and podcast covering China, technology, and US-China relations. ChinaTalk has over 50,000 subscribers and is also notable for the quality of its coverage and the praise and attention it receives from elites and policymakers.

Before this regrant, ChinaTalk had been run by Jordan Schneider and Caithrin Rintoul, both part-time, on a budget of just $35,000/year. What they were able to accomplish in that time with those limited resources was impressive, and I believe merited additional funding, even just to allow Jordan to work on this full-time. More funding would also have meant ChinaTalk bringing on a full-time fellow who, per Jordan, “would be, to my knowledge, the only researcher in the English-speaking world devoted solely to covering China and AI safety.” ChinaTalk has since received further funding and is in the process of growing to five full-time employees, but we would’ve loved for this to happen sooner through an expanded regranting program.

Even putting aside the specific track record of ChinaTalk, it seems clear to me that the intersection of China and AI safety is an incredibly important area to cover, and at a high level it is valuable to fund organizations that are doing this kind of work. It can be hard to imagine plausible scenarios of how the next decade goes well with respect to AI that don’t run through US-China relations, and I am persuaded by Jordan’s case that the amount of energy currently being expended on this is grossly inadequate.

Since the first regrant, ChinaTalk’s Substack audience has grown from 26,000 subscribers to 51,000 and they’ve put out regular high-quality content, including an English translation of an interview with DeepSeek CEO Liang Wenfeng, coverage of chip policy, and what important 2024 elections in the US and Taiwan mean for China. The ChinaTalk team has expanded to six people, allowing for a greater diversity and quantity of coverage, including YouTube videos. Jordan has also announced plans for launching a think tank—ChinaTalk Institute—this year, in a similar vein to IFP.

Among their varied coverage, I was particularly impressed to see how ChinaTalk was ahead of the curve in covering the rise of DeepSeek, while most of the West seemed to be taken by total surprise in January 2025. As a trader and forecaster, this advance insight might have been worth a lot of money to me through anticipating the market freakout, suggesting I should pay more attention to ChinaTalk in the future.

ChinaTalk has continued on the strong trajectory it was on in late 2023, and it was great that Manifund was able to support ChinaTalk in this success. For more information about why this grant was likely good ex ante, I encourage you to look at regrantor Joel Becker’s comment on the subject. Joel’s detail about why ChinaTalk was at the time insufficiently funded

Philanthropists are scared to touch China, in part because of lack of expertise and in part for political reasons. Advertisers can be nervous for similar reasons… Jordan was hoping to support this work through subscriptions only.

makes me more optimistic that this regrant was the kind of thing the program should be doing: plugging holes in the funding landscape.

https://manifund.org/projects/shallow-review-of-ai-safety-2024 - two quick regrants, followed by OpenPhil support

Gavin Leech co-wrote https://www.lesswrong.com/posts/zaaGsFBeDTpCsYHef/shallow-review-of-live-agendas-in-alignment-and-safety in 2023, which was well-received and considered a useful resource for people looking to get a top-level picture of AI safety research. Given that it was intended to be a shallow review, this post has a lot of helpful detail and links for various research agendas, e.g., the amount of resources currently devoted to each, and notable criticisms.

Last year, he sought funding to create an updated 2024 version of this post. He received $9,000 from Manifund regrantors Neel Nanda and Ryan Kidd, as well as $12,000 from other donors through the Manifund site.

Big picture, I believe there should be an accessible and up-to-date resource of this kind; for people who are starting out in AI safety and don’t know anything, for funders trying to get a sense of the landscape, or for anyone else who might need it. In 2022 I was at a stage where I wanted to contribute to AI safety but didn’t know anything about it and was unsure where to start, and I would’ve likely found Gavin’s review useful, along with the other resources that existed. Based on this, Gavin’s record in a variety of fields, and the quality of the 2023 version, I think this regrant looked very promising.

In terms of output, the new post (https://www.lesswrong.com/posts/fAW6RXLKTLHC3WXkS/shallow-review-of-technical-ai-safety-2024) came out in December 2024 and appears to be similarly comprehensive to the 2023 version, although it has gotten less attention (~half the upvotes on LessWrong and not curated). That’s probably a bit worse of an outcome than I would’ve hoped for, but I still would have endorsed this grant had I known the result in advance. Presumably the updated version is less eye-catching than the original, while still being necessary.

The funding of this project also shows the advantages of the Manifund regranting program. Gavin asked for between $8,000 (MVP version) and $17,000 (high-end version) and was quickly funded for the MVP by Neel and Ryan. He then got an additional $5,000 from OpenPhil, after Matt Putz learned about this proposal via our EA Forum post; and a further $12,000 from other donors. I am happy with how the regranting program is both able to provide the small amount of funding to get a project off the ground, and increase visibility of that project so that other donors can step in and fund it to a greater extent. A couple of small negatives: (1) regrantor Neel Nanda is less optimistic than I am that this was a particularly good grant and (2) the high-end version was supposed to include a “glossy formal report optimised for policy people” which didn’t get made (OpenPhil opted against funding it), however the excess money is instead going towards the 2025 edition. I look forward to it!