On an RCT for ESPR.

Introduction

There is a certain valuable way of thinking, which is not yet taught in schools, in this present day. This certain way of thinking is not taught systematically at all. It is just absorbed by people who grow up reading books like Surely You’re Joking, Mr. Feynman or who have an unusually great teacher in high school.

Most famously, this certain way of thinking has to do with science, and with the experimental method. The part of science where you go out and look at the universe instead of just making things up. The part where you say “Oops” and give up on a bad theory when the experiments don’t support it.

But this certain way of thinking extends beyond that. It is deeper and more universal than a pair of goggles you put on when you enter a laboratory and take off when you leave. It applies to daily life, though this part is subtler and more difficult. But if you can’t say “Oops” and give up when it looks like something isn’t working, you have no choice but to keep shooting yourself in the foot. You have to keep reloading the shotgun and you have to keep pulling the trigger. You know people like this. And somewhere, someplace in your life you’d rather not think about, you are people like this. It would be nice if there was a certain way of thinking that could help us stop doing that.

- Eliezer Yudkowsky, https://www.lesswrong.com/rationality/preface

The evidence on CFAR’s workshops.

The evidence for/against CFAR in general is of interest here, because I take it as likely that it is very much correlated with the evidence on ESPR. For example, if reading programs in India show that dividing students by initial level improves their learning outcome, then you’d expect similar processes to be at play in Kenya. Thus, if the evidence on CFAR were robust, we might be able to afford being less rigorous when it comes to ESPR.

I’ve mainly studied CFAR’s 2015 Longitudinal Study together with the more recent Case Studies and the 2017 CFAR Impact report. Here, I will make some comments about them, but will not review their findings.

The first study notes that a control group would be a difficult thing to implement, noting it would require finding people who would like to come to the program and forbidding them to do so. The study tries to compensate for the lack of a control by being statistically clever. This study seems to be as rigorous as you can get without a RCT.

But I feel like that is only partially sufficient. The magnitude of the effect found could be wildly overestimated; MIT’s Abdul Latif Jameel Poverty Action Lab provides the following slides 1:

I find them scary; depending on the method used to test your effect, you can get an effect size that is 4-5 times as great as the effect you find with an RCT, or about as great, in the other direction. The effects the CFAR study finds, f.ex. the one most prominently displayed in CFAR’s webpage, an increased life satisfaction of 0.17 standard deviations (i.e., going from 50 to 56.75%) are small enough for me to worry about such inconveniences.

Thus, I feel that an RCT could be delayed on the strength of the evidence that CFAR currently has, including its logical model (see below), but not indefinitely. In particular, if CFAR had plans for more ambitious expansion, it would be a good idea to run an RCT before. If MIT’s JPAL, didn’t specialize on poverty interventions, I would suggest teaming up with them, and it seems like a good idea to try anyways. JPAL would provide strategies like the following: we can randomly admit people for either this year or the next, and take as the control the group which has been left waiting. It is not clear to me why this hasn’t been done yet.

With regards to the second and third documents, I feel that they provide powerful intuitions for why CFAR’s logical model is not totally bullshit. This would be something like: CFAR students are taught rationality techniques + have an environment in which they can question their current decisions and consider potentially better choices = they go on to do more good in the world, f.ex. by switching careers. From the Case Studies mentioned above:

Eric (Bruylant) described the mindset of people at CFAR as “the exact opposite of learned helplessness”, and found that experiencing more of this mindset, in combination with an increased ability to see what was going on with his mind, was particularly helpful for making this shift.

Yet William MacAskill’s book, Doing Good Better, is full with examples of NGOs with great sounding premises, e.g., Roundabout Water Solutions, which were woefully uneffective. Note that Arbital, one of CFAR’s success stories, has now failed. Additionally, when reading CFAR’s own Rationality Checklist, I notice that to acquire the mental movements mentioned seems more like a long term project, and less like a skill acquirable in 4 days. This is something which CFAR itself also underscores.

Furthermore, asking alumni to estimate the impact does not seem at all like a good idea to estimate impact, particularly when these people are sympathetic to CFAR, i.e., . To get a better idea of why, take the outside view and substitute CFAR for Center for Non Violent Communication: CNVC.

Outside view: The evidence on Non Violent Communication (NVC).

The Center for NonViolent Communication provides a list of all the research about NVC known to them, of which Juncadella (2016) provides an overview up to 2013, after which not much else has been undertaken. From this review: “Eleven of the 13 studies used quantitative desings. Seven used a control group and 4 a pre-post testing comparison. Of the 7 studies that used a control group, none used a random assignation of participants. In five, the treatment and control were assigned by researcher action and criteria, and in two, the assignment protocol is not reported”.

The main problems the research presents is that it is a little bit chaotic: although Steckal, (1994) provides a measuring instrument whose consistency seems to have been validated, every researcher seems to use their own instruments, and investigate an slightly different question, i.e., for different demographics, in different settings, with different workshop lengths. All in all, there seems to be a positive effect, but its value is very uncertain.

NVC is also supported by testimonial evidence that is both extremely copious and extremely effusive, to be found in Marshall Rosenberg’s book Non Violent Communication: A Language of Life, and in their webpage. Additionally, the logical model also appears consistent and robust: by providing a way to connect with our emotions and needs, and those of others, NVC workshops provide participants with the skills necessary to relate with others, reduce tension, etc. At any point

Given the above, what probability do I assign to NVC being full of bullshit? i.e., that the ~$3,000 courses it offers are only more expensive, not significantly more effective than the $15 book? Actually quite high. NVC seems to have a certain disdain of practical solutions: f.e.x, in Q4 from the measure developed by Steckal “When I listen to another in a caring way, I like to analyze or interpret their problems”, an affirmative answer is scored negatively.

Sense of community. Prediction Spain. Scrap whole section?

ESPR as distinct from CFAR.

It must be noted that ESPR gets little love from the main organization, being mainly run by volunteers, with some instructors coming in to give classes. Eventually, it might make sense to institute espr as a different organization with a focus on Europe instead of as an American side project.

ESPR’s Logical model.

I think that the logical model underpinning ESPR is fundamentally solid, i.e., as solid as CFAR’s, given that it’s pretty solid. In the words of a student which came back this year as a Junior Counselor:

[Teaches] ESPR smart people not to make stupid mistakes. Examples: betting, prediction markets decrease overconfidence. Units of exchange class decreases likelihood of spending time, money, other currency in counterproductive ways. The whole asking for examples thing prevents people from hiding behind abstract terms and to pretend to understand something when they don’t. Some of this is learned in classes. A lot of good techniques from just interacting with people at espr.

I’ve had conversations with otherwise really smart people and thought “you wouldn’t be stuck with those beliefs if you’d gone though two weeks of espr”

ESPR also increases self-awareness. A lot of espr classes / techniques / culture involves noticing things that happen in your head. This is good for avoiding stupid mistakes and also for getting better at accomplishing things.

It is nice to be surrounded by very smart. ambitious people. This might be less relevant for people who do competitions like IMO or go to very selective universities. Personally, it is a fucking awesome and rare experience every time I meet someone really smart with a bearable personality in the real world. Being around lots of those people at espr was awesome. Espr might have made a lot of participants consider options they wouldn’t seriously have before talking to the instructors like founding a startup, working on ai alignment, everything that galit talked about etc

espr also increased positive impact participants will have on the world in the future by introducing them to effective altruism ideas. I think last year’s batch would have been affected more by this because I remember there being more on x-risk and prioritizing causes and stuff 1.

I spent 15 mins =)

Additionally, ESPR gives some of it’s alumni the opportunity to come back as Junior Counselors, which take on a possition of some responsibility, and keep improving their own rationality skills.

1. This year, being in Edimburgh, we didn’t bring in an FHI person to give a talk. We did have an AI risk panel, and ea/x-risk were important (~10%) focus of conversations. However, I will make a note to bring someone from the FHI next year. We also continued grappling with the boundaries between presenting an important problem and indoctrinating and mindfucking impressionable young persons.

Perverse incentives

As with CFAR’s, I think that alumni profiles in the following section provide useful intuitions. However, while perhaps narratively compelling, there is no control group, which is supremely shitty. These profiles may not allow us to falsify any hypothesis, i.e., to meaningfully change our priors, because these students come from a pool of incredibly bright applicants. The evidence is weak in that with the current evidence, I would feel uncomfortable saying that ESPR should be scaled up.

To the extent that OpenPhilantropy prefers these and other weak forms of evidence now, rather than stronger evidence two-three years later, OpenPhilantropy might be giving ESPR perverse incentives. Note that with 20-30 students per year, even after we start an RCT, there must pass a number of years before we can amass some meaningful statistical power (see the power calculations). On the other hand, taking a process of iterated improvement as an admission of failure would also be pretty shitty.

The questions designing a RCT poses are hard, but the bigger problem is that there’s an incentive to not ask them at all. But that would be agaist CFAR’s ethos, as outlined in the introduction.

Alternatives to espr: The cheapest option.

One question which interests me is: what is the cheapest version of the program which is still cost effective? What happens if you just record the classes, send them to bright people, and answer their questions? What if you set up a course on edx? Interventions based on universities and highschools are likely to be much cheaper, given that neither board nor flight, nor classrooms would have to be paid for. Is there a low-cost, scalable approach?

I’m told that some of the cfar instructors have strong intuitions that in-person teaching is much more effective, based on their own experience and perhaps also on a 2012 small rct, which is either unpublished or unfindable.

Still, I want to test this assumption, because, almost by definition, to do so would be pretty cheap. As a plus, we can take the population who takes the cheaper course to be a second control group.

Nuño Sempere

Measure is unceasing