The 2 sigma myth: The bar just got lowered

In 1984, the educational psychologist Benjamin Bloom released what became a highly influential study that measured the effectiveness of three kinds of instruction: classroom, mastery learning (where students continue learning until some degree of competency is reached), and one-to-one tutoring. Unsurprisingly, he found that mastery and one-one were vastly superior to traditional classroom learning. 

Bloom found that mastery learning outperformed classroom by 1 sigma (1 standard deviation– roughly the equivalent going from a grade C to a B) and that one-to-one tutoring by a human tutor raised the performance an astounding 2 sigmas (akin to going from a C to an A) [1].

The nascent computer-aided instruction industry (CAI) used Bloom’s 2 sigma level as the “gold standard” to aim for in their design of instructional tools. The goal was to become as good as a human tutor, but be able to realize the economies of scale that using a machine to deliver that would otherwise be prohibitively costly to widely deploy.

AI researchers began to regard the tutorial pedagogical model as the desired technique to emulate and began to develop systems that could behave (and hopefully perform) like human tutors, and were called Intelligent Tutoring Systems (ITS).

ITS designs differ significantly from their historical CAI predecessors. Rather than a one size fits all strategy of delivering content to a passive learner in CAI designs, ITS designs are able to customize the learning experience the student receives based on factors such as pre-existing knowledge and efficiently progress through the instructional material.

There is a considerable variety of ITS systems, but most share the same overarching organizational structure, namely, (1) a repository that contains the content model (also called knowledge model, content map, expert model, or domain model), (2) a student model that is unique to each learner that works in parallel with the content model to record what the student does and does not understand, and (3) a method of delivering the instruction to the learner, the pedagogical model.

Most ITS systems begin the instructional process by determining what knowledge the student already knows, typically through an assessment and then updating the student model’s status as instruction occurs. The system compares what is needed to know with what is known (i.e. comparing the student model with the content model) and delivers the pedagogically appropriate unit of instruction to the student. The instruction is often embedded with assessment and/or highly interactive problem-solving capabilities so that the student model is dynamically updated to always reflect the student’s current knowledge level. The ITS takes advantage of the fact that the granularity of the content is so fine and well-matched to the student model, that just the right amount of remediation is offered, theoretically yielding shorter learning times [2].

Most CAI systems can deliver up to .3 sigma improvement over traditional classroom instruction, which can be explained by their typical reliance on their self-paced mastery learning techniques. The more ambitious Intelligent Tutoring Systems claim to hit the 1 sigma level performance increase [3].  While impressive gains, both fall short of the Bloom’s bar the world has set to judge a successful product.

But what if Bloom was overreaching when he conducted his study three decades ago? In 2011, Kurt VanLehn carefully deconstructed the 1984 study and found that there were some issues with Bloom’s methodology that artificially inflated the results of the one-to-one tutors. Bloom’s mastery condition group held students to 80% competency threshold, where the one-one tutors were held to a more rigorous 90% threshold. This factor alone ensured significantly higher outcomes for the one-to-one group [4].

The idea that one might rejoice in the lowering of standards seems heretical, but this is good news for educational technology. It would be wonderful if technology could outperform human tutors, but unfairly raising the bar deprives potential students the use of effective tools, by judging them on a higher standard. Educators have been seeking technology that can help students achieve the kind of results that costly one-one-tutoring has traditionally delivered for over 50 years and the current crop of intelligent tutoring systems is just about there.

About Bill Ferster

Bill Ferster is a research professor at the University of Virginia and a technology consultant for organizations using web-applications for ed-tech, data visualization, and digital media. He is the author of Sage on the Screen (2016, Johns Hopkins), Teaching Machines (2014, Johns Hopkins), and Interactive Visualization (2012, MIT Press), and has founded a number of high-technology startups in past lives. For more information, see

  1. Bloom, B. (1984). The  2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational  Researcher, 13,  4–16.
  2. Shute, V., & Psotka, J. 1996. Intelligent tutoring systems: Past, Present and Future. In D. Jonassen (Ed.), Handbook of Research on Educational Communications and Technology: Scholastic Publications.
  3. Kulik, J., & Fletcher, J. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of Educational Research, 86(1), 42-78.
  4. VanLehn, K.  (2011) The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems,  Educational  Psychologist,  (46)4,  197-221.