Austin’s Judging Scores

Austin's Scores

Misc reflections on judging

I thought retro eval was going to be “hey how impactful were these two things which are hard to quantify” and I could spend a lot of time thinking about the value of a blog post with X readers vs a software tool with Y readers
- But in reality: A lot of the retro eval is just “did they actually deliver the thing they said they wanted to do”…
It was much easier to compare a bunch of projects to each other in a relatively narrow domain where I felt like I knew some things (forecasting), but still not that easy. I have a lot of sympathy for Scott in trying to assess grant applications from widely disparate fields.
- I’m not that confident about my $-based evaluations, and could imagine being persuaded to adjust any of them by a factor of 2 in either direction. Somewhat more confident in my rank ordering of them.
- In doing $-based evals, I did also feel somewhat anchored by inputs (how much money they raised, how much time they spent, how much it would have cost to hire someone to produce that work) than by actual impact output. I still don’t know a good solution to this.
The final results seemed very hits-based. OPTIC, Manifolio, and BRT account for almost all of the value. My fuzzy recall of what I thought of their initial applications:
- OPTIC looked very promising to me (and I bid to invest, but other investors won out?)
- Manifolio seemed cool but too geeky (my unfair judgy gut reaction was “nobody promoting Kelly Criteria would actually ship anything”)
- BRT looked like a scam/too grandiose of a vision (calling yourself “Base Rate Times?” really?)
- Very happy to have been proven wrong on the last two cases!
It’s unclear to what degree the investors were actually good at picking the eventual winners here; my sense is “not very”, my own investing included
- I think Scott did a better job at picking the ACX Grantees in the first round, than we did in this round, at least in terms of median grantee quality.
  - Confounding factors though include that the first ACX Grants had a much higher dollar amount and thus plausibly attracted more serious/competent individuals, or bought more time to work on the projects
Forecasting needs less thinking & more doing
- Fewer essays & papers, more “thing that gets used by lots of people”
  - Or essays are good but need to link back to lots of readership, discussion, etc
- More emphasis on marketing/promoting/making your work accessible to others
Some regrets/process improvements for next time:
- More regular check-ins with the people working on these ideas (monthly calls?)
- Provide some guidance on “hey, this is the kind of thing Manifold/Austin would pay lots of money for”
- Nudge promising people to apply?
- Provide better benchmarks for “this is how much equity to keep” and “this is how much your project might be worth if it succeeds (blog post vs tool vs meetup)”.