Info We Trust Read online

Page 15


  LEWIS CARROLL, 1893

  Without Truth

  Why do we even put up with these abstract expressions of trust? Well, precision is expensive. Absolute certainty is often impossible. You cannot measure the entire universe, so you look at a slice, a subset—a shadow of the world that has already passed—and learn to make do with what the sample tells you. Sampling theory is not only important to data storytelling, it is also wrapped up with the entire epic saga we call science.

  Science progresses on the principle of flasifiability. Karl Popper's landmark The Logic of Scientific Discovery explains, “Science never pursues the illusory aim of making its answers final.” A scientific claim is a statement that could be proven wrong. In contrast, existential claims are unscientific because they cannot be flasified. “There is” statements, such as there are blue giraffes, are unscientific because they are not falsifiable. We cannot search the entire universe to check if they exist, have never existed, and will never exist. Science cannot help us evaluate these kinds of statements.

  Science is a method of rejecting carefully qualified opinions with new evidence. As exceptions are discovered, general claims get knocked down to more specific cases, or abandoned entirely. There are no truths in science because a truth cannot be overthrown. Science trades in corroboration, truthfulness, and trust — not absolute truth.

  If there is no possible way to determine whether a statement is true then that statement has no meaning whatsoever. For the meaning of a statement is the method of its verification.

  FRIEDRICH WAISMANN, 1903

  There is also no absolute precision. Even simple physical measurements, the wellspring of so much of our data, are not absolutely precise. When we measure the length of an object with a ruler, the number we report falls somewhere along the narrow width of a single tick or between two tick marks. In either case, the single number recorded actually represents a range. We could narrow the range by measuring with more precise equipment, but we do not bother because our measurement is good-enough as is. Measurements are not absolutely true, but they are true enough for what we need.

  Science does not rest upon solid bedrock. The bold structure of its theories rises, as it were above a swamp. It is like a building erected on piles. The piles are driven down from above into the swamp, but not down to any natural or “given” base; and if we stop driving the piles deeper, it is not because we have reached firm ground. We simply stop when we are safisfied that the piles are firm enough to carry the structure, at least for the time being.

  KARL POPPER, 1935

  A scientific view is interested in claims that can be proven false. A scientific view embraces numeric measurements as ranges. A scientific view knows our goal cannot be absolute certainty. That is why we must consider expressions like error, confidence, and significance. When we acknowledge the limits of truthfulness, we are able to offer stories as worthier of trust. When you grapple with uncertainty, you put yourself directly in the middle of the struggle between the world as it really is—reality—and the world as we perceive it. We strive to bring them into a little more harmony.

  If I were to suggest that between the Earth and Mars there is a china teapot revolving about the sun in an elliptical orbit, nobody would be able to disprove my assertion provided I were careful to add that the teapot is too small to be revealed even by our most powerful telescopes.

  BERTRAND RUSSELL, 1952

  With Certainty

  I recently went to an evening lecture about Enlightenment thinking with a friend. Before the talk started, she asked me what I thought the ratio of men to women attendees was. The effort of counting everyone in the theater would be silly—we did not need an exact ratio to satisfy our curiosity. So instead, we each counted the men and women in a couple of sections and together determined that the room was about two-thirds men. The estimate from this quick sample led us to then discuss why there might be an imbalance.

  No theory that involves just the probabilities of outcomes without considering their consequences could possibly be adequate in describing the importance of uncertainty to a decision maker. It is necessary to be concerned not only with the probabilistic nature of the uncertainties that surround us, but also with the economic impact that these uncertainties will have on us.

  RONALD A. HOWARD, 1966

  Statistical confidence relates a random sample to the actual larger universe it came from. Measuring the entirety of anything is expensive, often impossible. To get a sense of a candidate's chances you do not call every voter with polling questions. A scientist's experiments cannot continue forever. Even if you could record a complete snapshot of the entire universe, where would you store it? In fact, if we learn how to have confidence in the relationship between the sample and the true world, then we do not have to measure the entire universe. In 1937, statistics pioneer Jerzy Neyman explained this sample as an “estimate, which presumably does not differ very much from the true value of the numerical character.”

  Clinical significance describes the practical importance of a treatment's effect on daily life.

  Confidence is expressed as a duo: level and interval. The confidence level is a percentage, often 95 percent, picked prior to analysis according to some cultural norm. The confidence interval is a range of values, calculated using the predetermined level and the sample data. The interval is supposed to help us understand the range within which the actual parameter is estimated to occur. Supposed to.

  A 95% confidence interval conveys that there is a 95% probability that the calculated confidence interval from some future experiment encompasses the true value of the population parameter. It is a ratio of the number of imaginary confidence levels that contain the true value to all imaginary confidence levels.

  A 95 percent confidence level is not the probability that the actual parameter lies within the confidence interval. Instead, it is a probability statement about future imaginary confidence intervals. Like I warned, statistical confidence contains imaginary futures and layers of abstraction. Just like the chances of two candidates winning an election, confidence quickly becomes a comparison game. We only really make any sense of confidence once we are able to contrast one interval against a cultural norm or other competing range.

  The credible interval is the Bayesian analog to frequentist confidence interval. It determines the probability of a parameter falling within a predefined range of values.

  We often mention the confidence level in a footnote and depict the confidence interval directly on the graphic. Suddenly, we are back home as we compare tall and short bars—but do not miss how conceptually convoluted what you are comparing is. The confidence interval may do a good impression of Tukey's box plot, but what it indicates is far more abstract.

  The confidence interval around the mean is often represented in the same way a box plot displays the interquartile range around the median.

  Significance characterizes truthfulness based on how likely a finding is merely a product of random chance. It works by a kind of reverse logic called reductio ad absurdum: If it were not true, things would be absurd. Significance is often used to evaluate the relationship, or correlation, between two data variables in pursuit of making predictions about the future. But, an observed correlation does not necessarily make for an interesting correlation.

  Significance level is the complement of the confidence level:

  Imagine that childbirths increase in the spring, the same time of year that migrating storks return. This does not indicate that the birds brought the babies. Spurious correlations occur when events appear to be causally related due to coincidence or the presence of an unseen factor, called a lurking variable. Children's intelligence is not caused by the size of their feet. But, reading ability and shoe size both increase as a child grows up. It may appear that A causes B, or B causes A, when Z causes both. Connectivity does not have to dip into the illogical for us to question causation. Chicken-and-egg interdependence makes one unsure of which variable is respon
sible for the other. Writer Darrell Huff described, “The more money you make, the more stock you buy, and the more stock you buy, the more income you get; it is not accurate to say simply that one has produced the other.”

  All this shows how important it is to keep your wits sharp and not forget common sense as you voyage through statistical truths. So, what about some of the other measures associated with numerical trust? The margin of error is just half of the confidence interval. Like its parent interval, smaller margins of error indicate more confidence. Say you were sailing the Caribbean and knew a treasure was buried on one of 10 possible islands. One spy tells you he can help narrow it to five of these islands. Another spy tells you she can help you narrow the hunt to just two of the possible 10 islands. She is more confident and having fewer options on the chart is good if you are racing to find the treasure.

  Reliability is a coeficient that indicates consistency or agreement. How reliability is calculated depends on the situation.

  Sigma (the Greek letter σ), also known as the standard deviation … or the square of sigma, which is known as the variance or σ2 … all mean pretty much the same thing. Basically, they all measure how wide the distribution of an uncertain number is… The wider the distribution is, the greater is the possible variation and the higher the value of sigma and the variance.

  SAM SAVAGE, 2009

  On the number line, shorter ranges indicate more confidence. But a visual representation that is physically small and meaningfully large goes against the bigger is more important convention. We should want our eye to be grabbed by things we are more confident in, but here the opposite occurs. We must rise to the challenge of spotlighting confidence using visual channels beyond size. Try playing with density, as if the true position is lost in a blurry haze and we can merely paint a picture of the cloud.

  The truthful art: Truth is unattainable, but trying to be truthful is a realistic and worthy goal.

  ALBERTO CAIRO, 2016

  If a man will begin with certainties, he shall end in doubts; but if he will be content to begin with doubts he shall end in certainties.

  FRANCIS BACON, 1605

  Remember how troublesome the mean (average) is. It is often packaged with certainty-inducing qualifiers or distribution summary metrics, such as the variance or standard deviation. Remember that summaries, especially nongraphical summaries, reduce. If you are going to accept any of these intimidating statistical qualifiers without a picture, you better be able to supply the right context that lets you make a real human comparison. Otherwise, do not give blind faith to what looks like an impressive fact.

  The only certainty is that nothing is certain.

  PLINY THE ELDER, 79

  Certainty, our last term, is a generic word that has no consistent meaning across fields. Yet, our craft is often called upon to communicate uncertainty. We now know what a complicated, challenging, and worthwhile task this is. In 2009 Howard Wainer taught that effective display of data accuracy must:

  It's time to leave behind any presumption of absolute control and universal truth and embrace an informed depiction of the big numbers and small imperfections that work together to describe reality.

  GIORGIA LUPI, 2017

  Remind that the data contains some uncertainty

  Characterize the size of the uncertainty with respect to the inferences made

  Help us avoid incorrect conclusions through the lack of a full appreciation of the imprecision of our knowledge

  Statistical trust reveals how tricky truthfulness can be. It is dificult to meaningfully illustrate some of the basic concepts that statistics uses to convey trust. Be careful you are not dazzled by their abstraction as you qualify how findings relate to past and future worlds. We cannot ignore statistical trust. Uncertainty is a critical part of the message that must be conveyed.

  Return of the Hero

  At a certain point in any voyage of discovery, the ship's hull can fit no more. It must sail home to tell stories of adventures and deliver precious cargo. We explored how to make meaningful pictures with data using position, size, and color. Then we got a better sense of how to conjure more interesting comparisons and patterns, with an eye for understanding what truthfulness is all about. We have now circumnavigated the probe-humanize cycle.

  So, how do you know when it is time to step out of the cycle, and turn your attention toward shore? In the best of circumstances, you discover something so exciting that you cannot bear keeping it to yourself. Full of energy, you rush to tell the world. Often though, you will have merely attained a personal comfort in your familiarity with the data. You develop a good sense that it does not have any more secrets to share right now. Sometimes you just run out of time and have to push on, unable to carry all of the treasure home.

  We shall not cease from exploration And the end of all our exploring Will be to arrive where we started And know the place for the first time.

  T.S. ELLIOT, 1942

  John Tukey used to end discussions with: I am convinced that this is as good as we can do so far. Inquiries into data, like so many creative endeavors, are never completed, just abandoned (or sometimes, taken away). By the end of it all, you will know your data better, perhaps better than anyone ever has. You will know what it can and cannot tell you about the world. Perhaps it has revealed something astonishing. Our attention will now turn from exploring data and toward informing the world.

  Now, a little surprise. So far, we have clutched our data and methodically advanced toward the moment when we inject it into the world. Before we do, let us first step ahead and marvel at the rich landscape of information already there, serving your authentic experience as a creature of the world. Your embodied cognition powers successful data storytelling, but it also delights in dancing, music, and laughter. It is time we flip our perspective.

  All of the world, in some way, is serving you signals all the time. Most of them are not data-driven insights; very little of the world arrives on a statistical platter. Yet, you crave it all. Broader human experience has lots to teach data storytellers. We will now sample a handful of the many ways we receive all kinds of information. Doing so will help us examine higher concepts that relate to our own craft, such as engagement, emotion, and explanation. It will also reorient our perspective from inward data facing to outward people facing. Data storytelling is often an insulated practice. It is just you and the data and maybe a small team. But it is ultimately a people-facing art. Each of the following five chapters is a focused sample of a specific, non-data domain that can illuminate our craft. Some connections will be made directly back to data storytelling, but I can promise you that it would be impossible to catalog them all. There is a world of rich experience for you to draw from.

  CHAPTER

  11

  ENCOUNTER

  A cool marble doorway separates me from a velvety black space. I step in. Walking forward, I look up and notice I cannot see the ceiling, there is just inky black. As my eyes adjust to the darkness, a dozen halos of light sharpen. Approaching the nearest one, I see it is an exquisitely lit glass box with a small clay vase inside. It is decorated with iconic geometry in madder red, burnt orange, and black. Not much bigger than my hand, the vase is at the perfect height for me to view.

  The atmospheres of some museum galleries invoke temples of worship and their spiritual predecessor, the cave. Museums, in fact, are temples. Ancient Alexandria's Temple of the Muses, called the Musaeum or Mouseion, was a vast institute that attracted many scholars. Today, behindthe- scenes activities liken the museum to a library or archive. On display, museum exhibitions compete with other entertainment outings, such as going to the movies. This public-facing function is the side that interests us most.

  Museum curators and exhibit-designers create spaces that manage visitor attention. Where maps imitate and reflect the real world of experience, museums actually do it. They are real-world experiences. Examining how museums inform visitors can inspire better data storytelling. The pri
nciple distinctions between our craft and museum design is just the nature of the canvas, and the dimensionality of the data. Artifacts and objects, like the small vase in the dark gallery, are the museum's raw data. New acquisitions are studied, cataloged, preserved, and, mostly, tucked away in storage.

  There they wait in climate-controlled boxes and racks for a probing researcher or creative curator who might need them.

  Objects are where we deposit information. … For tens of thousands of years we have embodied information in solid objects, from arrows and spears to espresso machines and jetliners. More recently, we have learned to embody information in photons transmitted by our cellphones and wireless routers. Yet, what is most amazing about the information that we embody is not the physicality of the encasing but the mental genesis of the information that we encase. Humans do not simply deposit information in our environment, we crystallize imagination.

  CÉSAR HIDALGO, 2015

  Each museum object is an information vessel. A physical description, such as the vase's size, color, and material, is the foundation for many types of data. Archeology is layered on to these physical features. The vase I admire was found in southern Italy, but from its material composition we are sure it was created in Greece, probably Attica, 25 centuries ago. These object-specific qualities can then be connected to the greater flow of history. We know this vase's form. It is a lekythos, used to perfume brides with oil. We know this name because others like it were labeled: This is my lekythos. What does this example tell us about other vases? What does it tell us about the people who owned it?