Human Language & Communication: Bayes’ Theorem & Deciphering Your Boss’s Behavior
Welcome to the second article in our Human Language & Communication series. If you haven’t already, we recommend that you start with the first article in the series: Words, Meaning, & Misunderstandings.
In our previous article, we used the concepts of manifest and latent meaning to describe the literal and inferred levels of meaning contained within verbal communication. We then explored the “meaning gap” that emerges from the discrepancy between a speaker’s intended meaning and a listener’s interpreted meaning. We concluded by noting that greater precision in communication does not require the complete elimination of the meaning gap—an all but impossible feat given the complexities of human communication. Instead, we argued that by acknowledging the existence of a meaning gap, we could preserve the humble curiosity required to bridge the gap when possible and to admit incomplete understanding when a bridge is impossible. In the following article, we will describe a process by which a listener can bridge the meaning gap and decipher a speaker’s latent meaning.
We will begin our discussion today with a most improbable topic: probability. Reverend Thomas Bayes (1701-1761) was an English theologian and logistician whose eponymous work on probability wasn’t even published until two years after his death.2 Bayes’ theorem codifies a method for updating existing beliefs when new evidence becomes available.3 The entirety of Bayes’ theorem, the frequentist interpretations, and the newer Empirical Bayes statistical approaches are far beyond the scope of today’s discussion,3–6 so we must beg the forgiveness of any statisticians in the audience who assess the following discussion as incomplete or frankly inaccurate. We will use the Harvard trained theoretical physicist Professor Sean M. Carroll’s adaptation of Bayes’ theorem from his excellent book, The Big Picture: On the Origins of Life, Meaning, and the Universe Itself,7 to develop a Bayesian approach to bridging the meaning gap that we introduced in our previous article.
Bayes’ theorem formalizes a method for updating the degree to which we believe—the probability of truth—that a given proposition is true.3,7 Take, for example, the proposition: A coin toss will land heads up. Before receiving any additional information, we would likely assume that the coin in question was “fair” (ie, equally likely to land heads or tails up) and estimate that there was a 50% probability that the coin will land heads up (ie, the proposition).
The 50% probability that a tossed coin will land heads up likely seems rather obvious; but what do we do in situations where we learn new information about a given proposition? For example, imagine that we learn that the coin in question is not a fair coin, but is instead weighted such that landing heads up is far more likely than landing tails up. Our belief in the proposition that the coin toss will land heads up prior to our learning any new information is called, appropriately enough, the prior probability.3,7 The new evidence that the unfair coin is more likely to land heads up is known as a likelihood.7 Incorporating the newly learned likelihood into our prior probability we arrive at what is known as a posterior probability.3,6,7
If we were then to learn even more additional information about the coin (ie, further likelihoods), then we would simply repeat the process—we would treat our posterior probability as a new prior probability, update the new prior with the additional likelihood, and arrive at a new updated posterior probability. In this way, Bayes’ theorem offers a method by which we can repeatedly update our beliefs with new evidence as it comes to light. Because of the complexity of Bayes’ theorem, it may be helpful to look at an example with actual numbers to solidify our understanding and to explore some Bayesian nuances.
Let’s use the example of a fictional disease called Zebriasis to study Bayes’ theorem in action. Suppose that Zebriasis is found in about 1% of the population; if this were the case, then without any additional information, your prior probability of having Zebriasis would be 1%.
Now let’s imagine that you are worried about the proposition of having Zebriasis despite the low probability. Eventually, this worry may lead you to request a ZebraTest from your doctor. Suppose that the doctor draws your blood, places the ZebraTest in an analyzer, and an hour later tells you that the ZebraTest returned a positive result. Is it time to freak out? Not yet. Your doctor has more to tell you.
She tells you that 95% of people who actually have Zebriasis test positive with the ZebraTest; but she also tells you that 10% of people without Zebriasis also get a positive test result with the ZebraTest. Now what?
You can use this new evidence—your positive test result and the likelihoods of testing positive with and without the disease—to update your prior probability and calculate a posterior probability of your actually having Zebriasis given your positive ZebraTest result.
Using the magic of Bayes’ theorem, you determine that even with your positive ZebraTest your posterior probability of actually having Zebriasis is only about 9%! How is this possible?
The exact mathematics of how one calculates posterior probabilities from prior probabilities and likelihoods are less important than the appreciation that Bayes’ theorem can offer a formalized method for updating existing beliefs with newly acquired evidence. For those interested in the calculation, please read on; if you’d rather not know how the sausage is made, then you should skip ahead to the next paragraph.
B is short for belief, and in the above example is the prior probability of having Zebriasis (ie, 1%). E is short for evidence, and represents the likelihoods associated with a positive ZebraTest. The first part of the equation, P(B|E), can be read as, “the probability of having Zebriasis given a positive ZebraTest.”—this is the posterior probability that we are trying to calculate using Bayes’ theorem. P(B) is the 1% prior probability of having Zebriasis without any addition information. P(E|B) is the likelihood of getting a positive ZebraTest if you actually have Zebriasis (ie, 95%). P(E) is the likelihood of getting a positive ZebraTest in general, whether you do or do not have Zebriasis. P(E) must be derived by first multiplying the 1% Zebriasis prior probability with the 95% likelihood of a positive ZebraTest if you actually have Zebriasis. The resulting product is then added to the product of the alternate proposition of not having Zebriasis (ie, 99%) multiplied against the 10% likelihood of a positive ZebraTest despite not actually having the disease. Adding these two products, we find that P(E)—the likelihood of getting a positive ZebraTest in general–is 10.85%. We can then plug all of these values into Bayes’ theorem to arrive at the posterior probability of your actually having Zebriasis given you positive ZebraTest: only 9%!
By using actual values, we are more clearly able to appreciate the frequent inaccuracies of our intuitive probabilities. When your doctor tells you that 95% of people with Zebriasis get a positive ZebraTest, you likely intuit that you have only a 5% chance of not having the disease. However, using Bayes’ theorem, this intuitive error is revealed to dramatically underestimate the probability of health; in fact, the true likelihood of you not having the disease is still a whopping 91% even with a positive ZebraTest!
Now that we have introduced Bayes’ theorem, let’s return to the meaning gap and examine how the theorem can help us navigate human communication. In so doing, we will shift from informative priors (ie, priors that reflect definitely known probabilities) to uninformative priors (ie, priors that are estimates based on the best available information).3 We acknowledge that, from a statistical standpoint, we should switch to a frequentists approach; however, for the purposes of this article we will continue to employ Bayesian methods because the exact accuracies of the posterior probabilities are less important than the method by which we derive them.
In light of our shift from informative to uninformative priors and likelihoods, we must generate a set of standards to improve the reliability of our estimates:
Our first standard will be to replace continuous probability measures (eg, 0.001-99.999%) with categorical probability measures (eg, “low,” “equivocal,” and “high”).
It is important to note that the exact quantitative probabilities that we assign to each category are arbitrary and could be swapped out for any comparably low, equivocal, or high percentage. That being said, for our purposes, a low probability will be equal to 15%, an equivocal probability will be equal to 50%, and a high probability will be equal to 85%.
For example, if you are provided a new coin and don’t know whether it is fair or not, then you will likely estimate an equivocal (ie, 50%) prior probability that the coin will land heads up when tossed based on your previous experience with fair coins. However, if this unknown coin keeps landing heads up toss after toss, then you may have to update your posterior probability closer to a high probability (ie, 85%) that the next flip will be heads.
Our second standard will employ the principle of parsimony.
The Franciscan friar, William of Ockham, first described what is now known as Ockham’s razor (alternatively, “Occam’s razor”) in the 1300s.8 Ockham’s razor suggests that the simplest explanation—the most parsimonious—should be favored above the more complex.9–11
Let’s flip a few more hypothetical coins to learn how Ockham’s razor can help select among competing likelihoods. Suppose that you toss a coin 10 times and it comes up heads every time. If this coin were fair, then the chances of flipping 10 heads in a row would be 0.1% (ie, 50% raised to the power of 10). You might try to convince yourself that the coin was still fair despite this result by citing the favorable wind speed, atmospheric pressure, and present lunar position as auspicious environmental features that dramatically increased the likelihood of the coin landing heads up. A simpler, more parsimonious explanation might be that the coin was simply not fair, and thus, by design, less likely to land heads up. Ockham’s razor would guide you to assign a low likelihood to the overly complex environmental explanation that relies on weather systems and lunar calendars, and to assign a high likelihood to the simpler explanation that the coin’s design made heads more likely.
A corollary to Ockham’s razor is that actions (ie, behaviors) are more parsimonious than words; as such, an individual’s actions should be weighed more heavily than his words when attempting to evaluate the underlying meaning and intent of a given communication. In our previous article, we quoted Mark Twain’s1 famous refrain: “Action speaks louder than words, but not nearly as often.” This witticism attests to a higher truth: words are constrained only by the speaker’s imagination, whereas actions must obey the laws of the physical world. A speaker can easily say that he is sorry—but if he proceeds to repeat the apologized-for offense, then we would be justified in concluding that he has not learned the lesson despite his words.
Our third standard will be to emphasize the importance of gathering additional evidence (ie, likelihoods) to increase the accuracy of our posterior probabilities.
Through a process of repeated experimentation, we can update the most biased priors to form ever more accurate posterior probabilities. Suppose for example that you believe with a high prior probability that smoking does not cause cancer. You may be able to hold on to this prior probability for a short while, but if you are open to incorporating additional evidence (ie, likelihoods) into your prior probability, then it won’t be long before you have updated your prior probability to reflect the reality that smoking does increase your risk for cancer.
Before we examine a real-life example, let’s summarize our guiding standards for Bayesian inference in human communication analysis:
- We will constrain ourselves to low (ie, 15%), equivocal (ie, 50%), and high (ie, 85%) prior probability and likelihood estimates.
- We will follow Ockham’s razor and assign high probabilities/likelihoods to simpler explanations for verbal/behavioral communication, and low probabilities/likelihoods to overly complex explanations.
- Corollary: We will weigh actions more heavily than words.
- We will attempt to gather as much external evidence (ie, likelihoods) as possible so that we might update our priors to generate ever more accurate posterior probabilities.
With our method thus explicated, let’s turn to an example. Imagine that a manager—let’s call him Mr. Manager—unprofessionally hung up on you in a fit of frustration. Now imagine that Mr. Manager’s boss—let’s call her Ms. Boss—reprimands Mr. Manager. As a result of this reprimand, Mr. Manager calls you back to apologize.
Suppose that Mr. Manager begins this second call by saying, “I’m sorry if you were offended,” but then proceeds to rehash the reasons why his frustrations are actually your fault. We’ll stop here, and see what we can make of the communication thus far.
Giving Mr. Manager the benefit of the doubt, let’s imagine that when you picked up the phone to the words, “I’m sorry…” you believed there was a high prior probability that Mr. Manager had telephoned you to apologize for his unprofessional behavior. However, as Mr. Manager continued to speak, his three-syllable apology was soon buried under a mountain of blame. Mr. Manager’s subsequent statements thus require you to update your high prior probability of a true apology with new likelihoods. You will likely estimate that the likelihood of Mr. Manager’s superficially contrite statements representing true regret is low, and that the converse (ie, these statements not representing true regret) is high. If we crunch the numbers through Bayes’ theorem, then we arrive at an equivocal (ie, 50%) posterior probability of Mr. Manager trying to express true regret.
Notice that the posterior probability does not suddenly plummet to a low (ie, 15%) probability with only one likelihood update. However, if Mr. Manager continued to put his foot in his mouth and we updated our new prior (ie, our calculated equivocal posterior probability) with another low likelihood, then our new posterior would indeed be low (ie, 15%). There is a pattern that becomes apparent upon repeated calculation. This pattern provides an excellent shorthand for navigating the complexities of Bayesian inference:
- Any prior updated with an equivocal likelihood results in an unchanged posterior probability (eg, a low prior probability updated with an equivocal likelihood results in a low posterior probability)
- A low likelihood moves a prior probability one step down the probability ladder (eg, a high prior probability updated with a low likelihood results in an equivocal posterior probability)
- A low likelihood updates a low prior probability to a very low (ie, 3%) posterior probability
- A high likelihood moves a prior probability one step up the probability ladder (eg, a low prior probability updated with a high likelihood results in an equivocal posterior probability)
- A high likelihood updates a high prior probability to a very high (ie, 97%) posterior probability
There are a few points to highlight here: Only low or high likelihoods shift the posterior probability; and each low or high likelihood only shifts a prior probability one step down or up the probability ladder. Importantly, no matter where we start with our priors, within two likelihood updates we can find ourselves at the other end of the probability spectrum. This final observation should reinforce the importance of holding our convictions loosely and always remaining open to updating our beliefs.
Let’s return to Mr. Manager for a final discussion. Instead of trying to determine whether he is apologetic or not, let’s turn to the issue of why he called back to “apologize” in the first place. Mr. Manager seems to call only to remind you that he was right and you were wrong. However, is that all that is going on here?
Perhaps this is not the first time that Mr. Manager has been rude to a subordinate. As such, you might rightly hold a low prior probability that Mr. Manager genuinely regrets hanging up on you. However, what do you do with the new information provided by his later verbal “apology.” Instead of examining the probability of a single explanation, let’s examine two competing theories.
Theory one explains the second apologetic phone call by citing true contrition while theory two tries to identify an alternate explanation. As we examined in the preceding paragraphs, Mr. Manager’s communication on the second telephone call roundly rejects theory one, and we quickly arrive at a very low probability of true contrition as the motivating factor behind his second telephone call. To develop theory two, let’s “mute” Mr. Manager, ignore his words, and instead study his behavior.
Mr. Manager was reprimanded by Ms. Boss and as a result he called you back to “apologize;” however, he quickly dispelled any notion that he truly wanted to apologize. So why did he call?
A probable explanation may be that he feared further reprimands, sanctions, or consequences from Ms. Boss if he did nothing. In other words, maybe Mr. Manager called you out of fear.
Your belief (ie, prior probability) in the fear explanation should increase as alternate explanations become less likely.
Each successive alternate explanation that is rejected thus becomes a likelihood that can be used to update your prior probability that Mr. Manager’s phone call was motivated by fear.
Suppose that you play devil’s advocate, and start from a low prior probability that Mr. Manager’s behavior can be explained by fear. To update your prior, you must consider alternate explanations—such as true contrition—for Mr. Manager’s behavior. As noted in preceding paragraphs, Bayesian updating soon reveals true contrition to be a very low probability explanation for Mr. Manager’s behavior.
The rejection of one alternate explanation for Mr. Manager’s behavior may have no effect (ie, be treated as an equivocal likelihood) on your prior probability that fear was the primary motivator. However, after rejecting a high number of alternate explanations, your likelihood should increase to a high probability and your prior should be updated from a low to an equivocal posterior probability. If you then continue to eliminate alternate explanations, you will eventually find that you have updated your prior probability of fear as an explanation for Mr. Manager’s behavior to a very high posterior. In this way, Bayes’ theorem can be used to decide between two or more competing explanations.
As we have seen, Bayes’ theorem is an extraordinarily powerful probabilistic tool. In addition to the statistical lessons it contains, it also offers some philosophical lessons that we will highlight in our concluding remarks.
- Our statistical intuition is often biased by emotional or reflexive reasoning errors known as heuristics, and thus, can be improved using a logical method such as Bayes’ theorem.
- No matter how strongly we believe in something, with enough additional information, we should be ready to revise our belief 180 degrees to reflect new evidence.
- The rejection of an alternate explanation for a set of behaviors or events makes the remaining explanations more likely.
- Actions speak louder than words, are more difficult to use in a deceptive manner, and, as a result, are often good indicators of true motivation.
- Twain M. The Wit and Wisdom of Mark Twain: A Book of Quotations. Mineola, N.Y: Dover Publications; 1999.
- Bellhouse DR. The Reverend Thomas Bayes, FRS: A Biography to Celebrate the Tercentenary of His Birth. Stat Sci. 2004;19(1):3-43. doi:10.1214/088342304000000189.
- Efron B. Bayes’ Theorem in the 21st Century. Science. 2013;340(6137):1177-1178. doi:10.1126/science.1236536.
- Gelman A, Shalizi CR. Philosophy and the practice of Bayesian statistics: Philosophy and the practice of Bayesian statistics. Br J Math Stat Psychol. 2013;66(1):8-38. doi:10.1111/j.2044-8317.2011.02037.x.
- Gigerenzer G, Hoffrage U. How to improve Bayesian reasoning without instruction: Frequency formats. Psychol Rev. 1995;102(4):684-704. doi:10.1037/0033-295X.102.4.684.
- van de Schoot R, Winter SD, Ryan O, Zondervan-Zwijnenburg M, Depaoli S. A systematic review of Bayesian articles in psychology: The last 25 years. Psychol Methods. 2017;22(2):217-239. doi:10.1037/met0000100.
- Carroll SM. The Big Picture: On the Origins of Life, Meaning, and the Universe Itself.; 2017.
- Spade PV. The Cambridge Companion to Ockham. Cambridge University Press; 1999.
- Myung IJ, Pitt MA. Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychon Bull Rev. 1997;4(1):79-95. doi:10.3758/BF03210778.
- Green KC, Armstrong JS. Simple versus complex forecasting: The evidence. J Bus Res. 2015;68(8):1678-1685. doi:10.1016/j.jbusres.2015.03.026.
- Bonawitz EB, Lombrozo T. Occam’s rattle: Children’s use of simplicity and probability to constrain inference. Dev Psychol. 2012;48(4):1156-1164. doi:10.1037/a0026471.