Research was conducted to determine which features, formats, and designs of training programmes seem most likely to be cost-effective. Summaries of educational research and health behaviour research were reviewed. Additional searches of Google Scholar were conducted. A number of characteristics were identified that seem likely to enhance the effectiveness of training programmes, including the use of spaced repetition, practice, feedback, content-focused education for novices, distance learning, group education, and small group sizes
Animal Advocacy Careers (AAC) is a new organisation that plans to address the talent and careers bottlenecks in the effective animal advocacy (EAA) community, the overlap of the animal advocacy and effective altruism communities, focused on how we can most effectively help animals. AAC will conduct several small-scale trials of several possible interventions to address these bottlenecks and then focus on implementing those that seem most cost-effective. One of the interventions that will be trialled is the provision of training programmes on areas of expertise that are undersupplied relative to the needs of the community.
The goal of this short literature review was to determine which features, formats, and designs of training programmes seem most likely to be cost-effective. This would then enable AAC to focus initial trials and evaluations on training programmes that would more closely resemble the final products that AAC might later offer to the EAA community on a larger scale.
AAC’s initial hypotheses are listed on the first tab of the “Hypotheses and updates” spreadsheet.
This was a time-capped report. A flexible limit was set of 20 hours on initial research and note-taking (15 hours on Google Scholar searches, 5 on summaries of educational research) and 5 additional hours clarifying the write-up of the findings. This research was only intended to secure the lowest-hanging fruit of learnings from relevant research and to identify areas that required further, more rigorous research.
The findings of this research are separated into two topic areas:
Initially, summaries of educational research were reviewed. Relevant research was identified non-systematically:
Subsequently, searches of Google Scholar were conducted, with results limited to 1990 onwards. A full list of the search terms used and brief comments of the content is provided here. The first 5 pages of results were skimmed or reviewed for each search term. Research items were identified non-systematically. That is, there were not strict inclusion and exclusion criteria, and the likely relevance of research was assessed predominantly by the phrasing of the title, rather than reviews of the abstracts or the content of all returned results. Nevertheless, the following criteria were used to decide which items to include:
The conclusions of the research items were not grounds for exclusion. That is, research items were not (intentionally) omitted if their conclusions were surprising or contrasted with the findings of other included research.
I also added in some findings from a much more thorough review of the health behaviour literature that I had conducted previously.
The scoring system
For the results relating to features and formats of training programmes, each included item of research was assigned a “value of inclusion” score. The scores were given on a possible range -5 to +5. Each item of research was also assigned a “strength of evidence” score. The results for “design and delivery” used similar scoring systems, except that the “value of inclusion” score was replaced with a “proportion of the programme” score.
Given the short timeframe for research and small number of included studies, I opted not to use any statistical or quantitative procedures to aggregate the results from this scoring system.
The full results are available in the “Summary of findings” spreadsheet. The identified research items provide evidence for the following claims, among others:
Of course, given the short amount of time spent on this research, I do not have high confidence in any of these claims. Given that the research was focused on AAC’s needs, the findings may not be as relevant for the needs of other training programmes. AAC’s view updates can be seen on the second tab of the “Hypotheses and updates” spreadsheet.
Suggestions for further research
 In the end, 19 hours were spent on research (14 on Google Scholar searches, 4.5 on summaries of educational research, and 0.5 on adding in the results from my previous review of the health behavior literature). About 9 hours were spent on the write-up, editing, updates, and discussion of the results, plus an additional 12 hours were spent on initial planning and discussion. AAC had initially hoped that this research might provide insight into the methods and criteria that could be used to evaluate the impact of the programmes, but the limit of 20 hours’ worth of research proved to be insufficient to evaluate this very thoroughly. This will be assessed through subsequent research.
 I did not exclude all items that failed to meet some of the inclusion criteria, if they seemed to perform especially well on others. These inclusion criteria were pre-planned.
 Jamie Harris, “Lessons for Consumer Behavior Interventions from the Health Behavior Interventions Literature” (forthcoming).
 -5 means that if this was the only relevant evidence on this issue, I would expect this feature or format to have strong negative impacts on the participants, 0 means that I would expect it to have no impact on the participants (i.e. useless but not harmful), 1 means very low positive impacts, 2 means quite low positive impacts, 3 means moderate positive impacts, 4 means quite high positive impacts, 5 means very high positive impacts. If the feature or format is presented as an “A vs. B” format, then positive numbers count in favour of A, negative numbers count in favour of B.
 The scores were given on a possible range from 0 to 5, where 0 = no relevant evidence, 1 = very weak/uncertain, 2 = quite weak/uncertain, 3 = moderate strength/certainty, 4 = quite strong/certain, and 5 = very strong/certain evidence. With hindsight, it would have been preferable to specify more clearly what these ratings meant. For example, I could have specified that a rating of “3” meant a single randomised controlled trial of reasonable methodological quality, or similar.
 The scores were given on a possible range from 0% to 100%, where 100% means that if this was the only relevant evidence on this issue, I would make this activity type account for 100% of the programme.
 For example, several referenced:
The full list of references is available in the last tab of the “Summary of findings” spreadsheet.