## Teaching how to teach Bayes's Theorem (& covariance recognition) -- in less than 2 blog posts!

*The 14.7 billion regular readers of this blog know that one of my surefire tricks for securing genuine edification for them is for*Conditional probability is hard -- but teaching it *shouldn't* be!,

**me**to hold myself forward as actually knowing something of importance in order to lure/provoke**an actual expert**into intervening to set the record straight. It worked again! After reading my post*Adam Molnar, a statistician and former college stats instructor who is currently completing his doctoral studies in mathematics education at the University of Georgia, was moved to compose this great guide on teaching conditional probability & covariance detection. Score!*

**Conditional Probability: The Teaching Challenge **

A few days ago, Dan wrote a post presenting the results on how members of a 2000-person general population sample did on two problems, named BAYES and COVARY.

Dan posed the following questions:

- "Which"--COVARY or BAYES--"is more difficult?"
- "Which is easier to teach someone to do correctly?" and
- "How can it be that only 3% of a sample as well educated and intelligent as the one [he] tested"--over half had a college or post graduate dagree--"can do a conditional probability problem as simple as" he understood BAYES to be. "Doesn't that mean," he asked "that too many math teachers are failing to
*use*the empirical knowledge that has been developed by great education researchers & teachers?"

As it turns out, these are questions that figure in my own research on effective math instruction. As part of my dissertation, I conducted interviews of 25 US high school math teachers. In the interviews, I included versions of both COVARY and BAYES. My version of COVARY described a different hypothetical experient but used the same numbers as Dan's, while BAYES had slightly different numbers (I used the version from Bar-Hillel 1980).

So with this background, I'll offer my responses to Dan's questions.

**Which is more difficult?**

According to actual results, Bayes by far.

Dan reports that 55% of the people in his sample got COVARY correct, compared to 3% for BAYES.

Other studies have shown a similar gap.

In one Dan and some collaborators conducted, 41% of a nationally diverse sample gave the correct response to a similarly constructed covariance problem. Eighty percent of the members of my math teacher sample computed the correct response.

In contrast, on conditional-probability problems similar to BAYES, samples rarely reach double digits. I got 1 correct response out of 25--*4%--*in my math-teacher sample. Bar-Hillel (1980) asked Israeli students on the college entrance exam and had *6%* correct. Only *8%* of doctors got a similar problem right (Gigerenzer, 2002).

**Teaching Covary**

Solving COVARY, like many problems, involves three critical steps.

*Step 1* is reading comprehension.

As worded, COVARY is not a long problem, but it includes a few moderately hard words like "experiment" and "effectiveness." These phrases may not challenge the "14.6 billion" readers of this blog, but they can challenge English language learners or students with limited reading skills. Even for people who know all the words, one might misread the problem.

*Step 2* is recognition. In this problem, a solver needs to compare probabilities or ratios by knowing "more likely to survive" leads to likelihood, and that likelihood involves computation, not just comparing counts. Comparing counts across a row (223 against 75) or a column (223 against 107) will lead to the wrong answer.

Taking this step involves recognizing a term, "more likely to survive". Learning the term requires work, but the US education system includes this type of problem. In the Common Core adopted by most states, standard 8.SP.A.4 states "Construct and interpret a two-way table summarizing data on two categorical variables collected from the same subjects. Use relative frequencies calculated for rows or columns to describe possible association between the two variables." High school standard HSS.CP.A.4 repeats the tables and adds independence. Although students may not study under the Common Core, and adults had older curricula, almost everyone has seen 2 by 2 tables. Therefore, teaching the term "more likely to survive" is not a big step.

*Step 3* is computation.

Dan suggested likelihood ratios, but almost all teachers will work with probabilities (relative frequencies) as mentioned in the standard. Problem solvers need to create two numbers and compare them. The basic "classical" way to create a probability is successes over total. The classical definition works as long as solvers remember to use row totals (298 and 128), not the grand total of 426. People will make errors, but as mentioned previously, US people have some familiarity with 2 by 2 tables. Instruction is required, but the steps do not include any brand new techniques.

Of the five errors in my sample, one came from misreading (*Step 1*), one came from recognition (*Step 2*) comparing 223 against 107, and three came from computation (*Step 3*) using the grand total of 426 as the denominator instead of 298 and 126.

**Teaching Bayes**

For BAYES, a conditional-probability problem, reading comprehension (*Step 1*) is more difficult than for COVARY. COVARY provides a table, while BAYES has only text. Errors will occur when transferring numbers from the sentences in the problem. Even very smart people make occasional transfer errors.

The best-performing teacher in my interviews made only one mistake--a transfer, choosing the wrong number from earlier in a problem despite verbally telling me the correct process.

As an educator, I would like to try a version of COVARY where the numbers appeared in text without the table, and see how often people correctly built tables or other problem solving structures.

*Step 2*, recognition, is easier. The problem explicitly asks for "chance (or likelihood)" which means probability to most people. Additionally, all numbers in the problem are expressed as percentages. These suggestions lead most people to offer some percentage or decimal number between 0 and 1. All the teachers in my study gave a number in that range.

*Step 3*, computation, is much, much harder.

As demonstrated in the recent sample and other research work including Bar-Hillel (1980), many people will just select a number from the problem, either the rate of correct identification or the base rate. Both values are between 0 and 1, inside the range of valid probability values, thus not triggering definitional discomfort. Neither value is correct, of course, but I am not surprised by these results. A correct solution path generally requires training.

Interestingly, the set of possible solution paths is much larger in Bayes. Covary had probabilities and ratios; Bayes has at least eight approaches. Some options might be familiar to US adults, but none are computationally well known. In the list below, I describe each technique, comment on level of familiarity, and mention computational difficulty.

*Venn Diagrams*: A majority of adults could recognize a Venn diagram, because they are useful in logic and set theory. Mathematicians like them. Although Venn diagrams are not specified in the Common Core, they have appeared in many past math classes and I suspect they will remain in schools. I do not believe a majority of adults could correctly compute probabilities with a Venn diagram, however. Doing so requires knowing conditional probability and multiplicative independence rules, plus properly accounting for the overlapping And event. Knowing how to solve the Bayes problem with a Venn diagram almost always means one knows enough to use at least one other technique on this list, such as probability tables or Bayes Theorem. Those techniques are more direct and often simpler.

*Bayes's Theorem*: (which has several different names, including formula, law, and rule; Bayes might end with 's or ' or no apostrophe at all). If you took college probability or a mathy statistics course, you likely saw this approach. When I asked statisticians in the UGA statistics education research group to work this problem, they generally used Bayes' rule. This is not a good teaching technique, however, because the computation is challenging. It requires solid knowledge of conditional probability and remembering a moderately difficult formula. Other approaches are less demanding.

*Bayesian updating*: A more descriptive name for the approach Dan wrote about, where Posterior odds = prior odds x likelihood ratio. This is even more rare than the formula version of Bayes rule; I first saw this in my masters program. Updating is easier computationally than the formula, but I would not expect untrained people to discover it independently.

*Probability-based tables*: Many teachers attempted this method, with some reaching a usable representation (but none correctly selecting numbers from the table.) This method requires setting up table columns and rows, and then using independence to multiply probabilities and fill entries. After that, the solver needs to combine values from two boxes (True Blue and False Blue) to find the total chance that Wally perceived a blue bus, and then find the true blue probability by dividing True Blue / (True Blue + False Blue). Computation requires table manipulation, understanding independence, and knowing which numbers to divide. Choosing the correct boxes stumped the teachers most often. They tended to just answer the value of True Blue, 9% in this version.

This approach was popular because it involves tables and probabilities, ideas teachers and students have seen. Independence is also included in the Common Core. Thus, it's not too far a stretch. The problem is difficulty, between building the table using multiplicative probability and then combining boxes in a specific way. Other approaches are easier.

*Probability-based trees*: The excellent British mathematics teaching site NRICH has an introduction. AP Statistics students frequently learn tree diagrams. Some teachers used them, including the one teacher who got the explanation completely correct. Several other teachers made the same mistake as with probability tables; they built the representation, but only gave the True Blue probability and neglected the False Blue possibility.

Although trees are mentioned briefly in the Common Core as one part of one Grade 7 standard, I don't expect trees to become a popular solution. Because they were uncommon in the past, few (but not zero) non-teacher adults would attempt this approach.

*Grid representations*: Dan cited a 2011 paper by Spiegelhalter, Pearson, and Short, but the idea is older. A reference at Illuminations, the NCTM's US website for math teaching resources, included a 1994 citation. The idea is to physically color boxes represented possibilities, which allows one to find the answer by counting boxes. At Georgia, we've successfully taught grid shading in our class for prospective math teachers. It works well and it's not very difficult. One study showed that 75% of pictorial users found the correct response (Cosmides & Tooby, 1996) Unfortunately, it's never been part of any standards I know. It also requires numbers expressible out of 100, which works in this problem but not in all cases.

*Frequency-based tables*: In the 1990s, psychological researchers started publishing about a major realization: Frequency counts are more understandable than probabilities. Classic papers include Gigerenzer (1991) and Cosmides & Tooby (1996). The basic idea is to convert probabilities to frequencies by starting with a large grand total, like 1000 or 100,000, and then multiply probabilities to find counts. The larger starting point makes it likely that all computations result in integers, one problem in grid representation.

After scaling, the solver can form a table. In this problem, getting from the table to the correct answer still requires work, as one must know to divide True Blue / (True Blue + False Blue) as in the probability-based table. I know one college textbook with a "hypothetical hundred thousand table", Mind on Statistics by Utts and Heckard, which has included the idea since at least 2003. There are many college statistics textbooks, though, and frequency-based tables do not appear in US school standards. They are not commonly known.

*Frequency-based trees*: Because tables don't make it obvious which boxes to select, a tree-based approach can combine the natural intuition of counts and the visual representation of trees. This increases teaching time because students are less familiar with trees. In exchange, the problem becomes easier to solve. This might be the most effective approach to teach, but it's very new. Great Britain has included frequency trees and tables in the 2015 version of GCSE probability standards for all Year 10 and 11 students, but they have not appeared in schools on this side of the pond.

**The Teaching Challenge**

Neither COVARY nor BAYES is easy, because both require expertise beyond what was previously taught in K-12 schools.

In the current US system, looking at Common Core and other standards, COVARY will be easier to teach. COVARY requires less additional information because it can extend easily from two ideas already taught, count tables and classical relative frequency probability. It fits very well inside the Common Core standards on conditional probability.

BAYES has lots of possible approaches. Some, like grid representations and frequency trees, are less challenging than COVARY. But they are relatively new in academic terms. Many were developed outside the US and none extend easily from current US standards. I'm not even sure the sort of conditional-probability problem reflected in BAYES should be considered under Common Core (unlike the new British GCSE standards), even though I believe decision making under conditional uncertainty is a vital quantitative literacy topic. Most teachers and I believe it falls under AP Statistics.

Furthermore, educational changes take a lot of time. Hypothetically (lawyers like hypotheticals, right?), let's say that today we implement a national requirement for conditional probability. States would have to add it to their standards documents. Testing companies would need to write questions. Textbook publishers would have to create new materials. Schools would have to procure the new materials. Math teachers would need training; they're smart enough to handle the problems but don't yet have the experience.

The UK published new guidelines in November 2013 for teaching in September 2015 and exams in June 2017. In the US? 2020 would be a reasonable target.

Right now, Bayes-style conditional probability is unfamiliar to almost all adults.

In Dan's sample, over half had a college degree. That's nice, but that doesn't imply much about conditional probability.

The CBMS reports on college mathematics and statistics. A majority of college grads never take statistics. In 2010, there were about 500,000 enrollments in college statistics classes, plus around 100,000 AP Statistics test takers, but there were about 15,000,000 college students. (For comparison, there were 3,900,000 mathematics course enrollments.) Of the minority that take any statistics, most people take only one semester. Conditional probability is not a substantial part of most introductory courses; perhaps there would be 30 minutes on Bayes' rule.

Putting this together, less than 10% of 2010 college students covered conditional probability. Past numbers would not be higher, since probability and statistics have recently gained in popularity.

I think it's fair to say that less than 5% of the US adult population has ever covered the topic--making that 3% correct response rate sound logical.

In an earlier blog post, Dan wrote "If you don't get Bayes, it's not your fault. It's the fault of whoever was using it to communicate an idea to you." Yes, there are better and worse ways to solve Bayes-style problems. Teachers can and should use more effective approaches. That's what I research and try to help implement. But for the US adult population, the problem is not poor communication; rather, it's never been communicated at all.

**References**

Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44, 211Ð233.

Cosmides, L., & Tooby, J. (1996). Are humans good intuitive statisticians after all?: Rethinking some conclusions of the literature on judgment under uncertainty. Cognition, 58, 1-73.

Gigerenzer, G. (1991). How to make cognitive illusions disappear: Beyond "heuristics and biases". In W. Stroebe & M. Hewstone (Eds.), European Review of Social Psychology (Vol. 2, pp. 83-115). Chichester: Wiley.

Gigerenzer, G. (2002). Calculated risks: How to know when numbers deceive you. New York: Simon & Schuster.

Spiegelhalter, D., Pearson, M., and Short, I. (2011). Visualizing Uncertainty About the Future. Science 333, 1393-1400.

Utts, J., & Heckard, R. (2012). Mind on Statistics, 4th edition. Independence, KY: Cengage Learning.

## Reader Comments (4)

Both treated and untreated participants in your disease experiment show a survival rate considerably greater than the expected more than 50% death rate for the untreated. It is a really wonderful experiment since multiple possible conclusions may be considered.

The design protocol may have been mis-specified as to length. The protocol was violated after the start of the experiment by shortening the time-frame of the experiment. The least advanced cases might have been singled out to be in the untreated group.

I can only draw one firm conclusion. Do not hire these experimenters in the future. :)

@Bob:

Or at least hire careful proof readers to spot a glitch like inconsistency between the info in the introduction & the reported results in the "untreated" condition. Embarrassing error on my part in adapting the problem for the battery of items I admninistered.

Fortunately, the version of the problem that Adam used didn't have this defect.

The "covariance problem" is a favorite of scholars who study critical reasoning & cognitive reasoning dispositions. For classic studoes, see

Wasserman, E. A., Dorner, W. W., & Kao, S. F. (1990). Contributions of Specific Cell Information to Judgments of Interevent Contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(3), 509-521 ; and

Arkes, H. R., & Harkness, A. R. (1983). Estimates of Contingency Between Two Dichotomous Variables. J. Experiminal Psychol., 112, 117-135.

Just a general question about Bayesian questions of the sort you described:

Wouldn't the question do better with some kind of primer to remind the test taker to think about conditional probability? So, for example, tack on at the end "keeping in mind that Wally may have seen a green bus and thought it was blue."

I worry when I see the run of mill phrasing in questions about Bayesian reasoning being used to support propositions like "people are bad at conditional probability." They certainly might be, and it's obviously not intuitive. But I think it would be valuable to see if people do any better on the test questions if primed to think conditionally.

Ryan,

yes, your suggestion would likely improve the quality of responses. When working with the teachers, many gave me just the "actually blue and says blue" probability. My standard hint was to say "what are the ways [Wally] could have said blue?" which got several teachers to the correct answer.

If you want someone to test the wording, feel free to donate and I'll do it!

A reminder isn't included to try to increase realism. Written questions are not real life scenarios, of course. They try to represent reality, where there aren't hints about possible alternate causes. Therefore, there isn't a reminder in most questions. Getting people to remember the reminder - like "correlation is not causation" - is part of training.