Abstract Research was conducted to determine which features, formats, and designs of training programmes seem most likely to be cost-effective. Summaries of educational research and health behaviour research were reviewed. Additional searches of Google Scholar were conducted. A number of characteristics were identified that seem likely to enhance the effectiveness of training programmes, including the use of spaced repetition, practice, feedback, content-focused education for novices, distance learning, group education, and small group sizes
Introduction Animal Advocacy Careers (AAC) is a new organisation that plans to address the talent and careers bottlenecks in the effective animal advocacy (EAA) community, the overlap of the animal advocacy and effective altruism communities, focused on how we can most effectively help animals. AAC will conduct several small-scale trials of several possible interventions to address these bottlenecks and then focus on implementing those that seem most cost-effective. One of the interventions that will be trialled is the provision of training programmes on areas of expertise that are undersupplied relative to the needs of the community. The goal of this short literature review was to determine which features, formats, and designs of training programmes seem most likely to be cost-effective. This would then enable AAC to focus initial trials and evaluations on training programmes that would more closely resemble the final products that AAC might later offer to the EAA community on a larger scale. AAC’s initial hypotheses are listed on the first tab of the “Hypotheses and updates” spreadsheet. Methodology This was a time-capped report. A flexible limit was set of 20 hours on initial research and note-taking (15 hours on Google Scholar searches, 5 on summaries of educational research) and 5 additional hours clarifying the write-up of the findings. This research was only intended to secure the lowest-hanging fruit of learnings from relevant research and to identify areas that required further, more rigorous research. Research questions The findings of this research are separated into two topic areas:
The features and formats of possible training options, sub-divided into the various tradeoffs and considerations identified prior to the commencement of research.
Other heuristics for the design and delivery of training programmes and ideas about specific activity types that are especially promising or inadvisable.
Search strategy Initially, summaries of educational research were reviewed. Relevant research was identified non-systematically:
Several recommendations for further reading provided by those sites were explored.
The first few pages of results for a Google search for “evidence-based teaching” were explored.
Subsequently, searches of Google Scholar were conducted, with results limited to 1990 onwards. A full list of the search terms used and brief comments of the content is provided here. The first 5 pages of results were skimmed or reviewed for each search term. Research items were identified non-systematically. That is, there were not strict inclusion and exclusion criteria, and the likely relevance of research was assessed predominantly by the phrasing of the title, rather than reviews of the abstracts or the content of all returned results. Nevertheless, the following criteria were used to decide which items to include:
Are the findings directly applicable or at least fairly comparable to the training contexts that AAC are likely to use, e.g. training for adult professionals?
Does the research item contain (or summarise) substantial empirical findings?
Is the research item unlikely to have been made predominantly redundant by subsequent research? Relevant factors affecting this criterion include the date of publication and any impressions that I have of the thoroughness of the literature on the subtopic that it covers.
Is the item published in a peer-reviewed academic journal?
The conclusions of the research items were not grounds for exclusion. That is, research items were not (intentionally) omitted if their conclusions were surprising or contrasted with the findings of other included research. I also added in some findings from a much more thorough review of the health behaviour literature that I had conducted previously. The scoring system For the results relating to features and formats of training programmes, each included item of research was assigned a “value of inclusion” score. The scores were given on a possible range -5 to +5. Each item of research was also assigned a “strength of evidence” score. The results for “design and delivery” used similar scoring systems, except that the “value of inclusion” score was replaced with a “proportion of the programme” score. Given the short timeframe for research and small number of included studies, I opted not to use any statistical or quantitative procedures to aggregate the results from this scoring system. Results The full results are available in the “Summary of findings” spreadsheet. The identified research items provide evidence for the following claims, among others:
Distance learning is, on average (though not in all individual cases), similarly or more effective than face-to-face learning.
There is some evidence that group education may, on average, be similarly or more effective than individual education.
Smaller groups are preferable, but changing the size of the group only has small effects, perhaps especially for distance learning.
An effective training programme may need to be spread out over a relatively large period of time, rather than condensed into a single course without prep and follow-up. Notably, “spaced repetition,” “practice,” and “feedback” have been highlighted by the education research literature as important components of effective education.
With less experienced participants, a training programme should have a heavier emphasis on the teaching and transmission of core content.
Participant evaluations of internships and shadowing are highly positive. Though the evidence for improvements in the career success of participants is weak, these programme types may have a variety of other benefits, such as contributing to participant wellbeing and satisfaction.
Of course, given the short amount of time spent on this research, I do not have high confidence in any of these claims. Given that the research was focused on AAC’s needs, the findings may not be as relevant for the needs of other training programmes. AAC’s view updates can be seen on the second tab of the “Hypotheses and updates” spreadsheet. Suggestions for further research
The exploration of each of the topics considered here was very brief. A number of strategies could be used to more comprehensively check the evidence base for each claim:
Additional search terms could be added that vary from those used here.
For searches that returned a large number of studies or reviews (noted here), searches could be limited to more recent date ranges to increase the chances that the results are up-to-date.
For searches that returned a large number of studies or reviews, additional search terms could be added to limit the results to higher quality research methods (such as randomised controlled trials only, rather than observational analyses) or to meta-analyses and reviews, rather than individual studies.
Citations of the most rigorous and recent reviews or studies could be checked.
Additional research could look to more indirect evidence. For example, what effect does greater work experience have on job performance? How does this compare to greater diversity in work experiences? This could help to understand the value of shadowing and internships.
The results of a Google search for “evidence-based teaching” returned multiple pages summarising some of the key takeaways from educational research. Often, these pages provided similar summaries, frequently referencing the same sources. More detailed research could evaluate these sources and assess the strength of the evidence for their various recommendations.
This research is not intended to better understand the methods and criteria that could be used to evaluate the impact of training programmes. This will be assessed through subsequent research by AAC.
This research is not intended to better understand how to most effectively train particular skillsets, i.e. what content the training sessions or course would include. This needs to be evaluated separately and will vary by skillsets. AAC intends to conduct research into the characteristics of successful management and leadership.
Footnotes  In the end, 19 hours were spent on research (14 on Google Scholar searches, 4.5 on summaries of educational research, and 0.5 on adding in the results from my previous review of the health behavior literature). About 9 hours were spent on the write-up, editing, updates, and discussion of the results, plus an additional 12 hours were spent on initial planning and discussion. AAC had initially hoped that this research might provide insight into the methods and criteria that could be used to evaluate the impact of the programmes, but the limit of 20 hours’ worth of research proved to be insufficient to evaluate this very thoroughly. This will be assessed through subsequent research.  I did not exclude all items that failed to meet some of the inclusion criteria, if they seemed to perform especially well on others. These inclusion criteria were pre-planned.  Jamie Harris, “Lessons for Consumer Behavior Interventions from the Health Behavior Interventions Literature” (forthcoming).  -5 means that if this was the only relevant evidence on this issue, I would expect this feature or format to have strong negative impacts on the participants, 0 means that I would expect it to have no impact on the participants (i.e. useless but not harmful), 1 means very low positive impacts, 2 means quite low positive impacts, 3 means moderate positive impacts, 4 means quite high positive impacts, 5 means very high positive impacts. If the feature or format is presented as an “A vs. B” format, then positive numbers count in favour of A, negative numbers count in favour of B.  The scores were given on a possible range from 0 to 5, where 0 = no relevant evidence, 1 = very weak/uncertain, 2 = quite weak/uncertain, 3 = moderate strength/certainty, 4 = quite strong/certain, and 5 = very strong/certain evidence. With hindsight, it would have been preferable to specify more clearly what these ratings meant. For example, I could have specified that a rating of “3” meant a single randomised controlled trial of reasonable methodological quality, or similar.  The scores were given on a possible range from 0% to 100%, where 100% means that if this was the only relevant evidence on this issue, I would make this activity type account for 100% of the programme.  For example, several referenced:
John Hattie, Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement (Abingdon, UK: Routledge, 2008), which has 14,638 citations on Google Scholar.
Robert Marzano, Debra Pickering, and Jane E. Pollock, Classroom Instruction that Works: Research-based Strategies for Increasing Student Achievements (London, UK: Pearson, 2004), which has 4811 citations on Google Scholar.
Geoff Petty, Evidence Based Teaching: A Practical Approach (Oxford, UK: Oxford University Press, 2009), which has 387 on Google Scholar.
References The full list of references is available in the last tab of the “Summary of findings” spreadsheet.