Stephen Baker

The Numerati
Home - posts tagged as Excerpts

Wall Street Journal excerpt from Intro
March 5, 2011

Can a Computer Win on 'Jeopardy'?

Defeating a chess champion is a piece of cake compared to parsing puns and analyzing language.


Developed over four years at an estimated cost of more than $30 million, IBM's "Jeopardy"-playing computer, Watson, will face the quiz show's grand masters, Ken Jennings and Brad Rutter, in two games to be aired Feb. 14, 15 and 16. As Stephen Baker relates in the following excerpt from his new book, "Final Jeopardy: Man vs. Machine and the Quest to Know Everything," doubts remain about how well Watson can process the endless subtleties of human language.

Watson paused. The closest thing it had to a face, a glowing orb on a flat-panel screen, turned from forest green to a dark shade of blue. Filaments of yellow and red streamed steadily across it, like the paths of jets circumnavigating the globe. This pattern represented a state of quiet anticipation as the supercomputer awaited the next clue.

Interactive: Match Wits With Watson

It was a September morning in 2010 at IBM Research, in the hills north of New York City, and the computer, known as Watson, was annihilating two humans, both champion-caliber players, in practice rounds of the knowledge game of "Jeopardy." Within months, it would be playing the game on national TV in a million-dollar man vs. machine match-up against two of the show's all-time greats.

As Todd Crain, an actor and the host of these test games, started to read the next clue, the filaments on Watson's display began to jag and tremble. Watson was thinking—or coming as close to it as a computer could. The $1,600 clue, in a category called "The eyes have it," read: "This facial wear made Israel's Moshe Dayan instantly recognizable world-wide."

The three players—two human and one electronic—could read the words as soon as they appeared on the big "Jeopardy" board. But they had to wait for Mr. Crain to read the entire clue before buzzing. That was the rule. At the moment the host pronounced the last word, a light would signal that contestants could buzz. The first to hit the button could win $1,600 with the right answer—or lose the same amount with a wrong one. (In these test matches, they were playing with funny money.)

This pause for reading gave Watson three or four seconds to hunt down the answer. The first step was to figure out what the clue meant. One of its programs promptly picked apart the grammar of the sentence, identifying the verbs, objects and key words. In another section of its cluster of computers, research focused on Moshe Dayan. Was this a person? A place in Israel? Perhaps a holy site?

During these seconds, Watson's cognitive apparatus—2,208 computer processors working in concert—mounted a massive research operation around Moshe Dayan and his signature facial wear. They piled through thousands of documents stored in the machine. After a second or so, different programs in the computer, or algorithms, began to suggest hundreds of possible answers. To humans, many of them would look like wild guesses. Some were phrases that Mr. Dayan uttered, others were references to his military campaigns and facts about Israel. Still others proposed various articles of his clothing. At this point, the computer launched its second stage of analysis, figuring out which response, if any, merited its confidence. It proceeded to check and recheck facts, making sure that Moshe Dayan was indeed a person, an Israeli, and that the answer referred to something he wore on his face.

A human looking at Watson's frantic and repetitive labors might conclude that the player was unsure of itself, laughably short on common sense, and scandalously wasteful of computing resources. This was all true. Watson barked up every tree, and from every conceivably angle. The pattern on its screen during this process, circles exploding into little stars, provided only a hint of the industrial-scale computing at work.

In a room behind the podium, visible through a horizontal window, Watson's complex of computers churned, and the fans cooling them roared. This time, its three seconds of exertion paid off. Watson had come up with a response. The computer sent a signal to a mechanical device on the podium. It was the size of a large aspirin bottle with a clear plastic covering. Inside was a buzzer. About one one-hundredth of a second later, a metal finger inside this contraption shot downward, pressing the button.

Justin Bernbach, a 38-year-old from Brooklyn, stood to Watson's left. The airline lobbyist had pocketed $155,000 while winning seven straight "Jeopardy" matches in 2009. Unlike Watson, Mr. Bernbach understood the sentence. He knew precisely who Moshe Dayan was as soon as he saw the clue, and he carried an image of the Israeli leader in his mind. He had the answer. He gripped the buzzer in his fist and frantically pressed it four or five times as the light came on.

But Watson had arrived first.

"Watson?" said Mr. Crain.

The computer's amiable male voice arranged the answer, as "Jeopardy" demands, in the form of a question. "What is eye patch?" it said.

"Very good," Mr. Crain said. "An eye patch on his left eye. Choose again, Watson."

Mr. Bernbach slumped at his podium. This match with the machine wasn't going well.


It was going magnificently for David Ferrucci. As the chief scientist of the team developing the "Jeopardy"-playing computer, Mr. Ferrucci was feeling vindicated. Only three years earlier, the suggestion that a computer might match wits and word skills with human champions in "Jeopardy" sparked opposition bordering on ridicule in the halls of IBM Research. And the final goal of the venture, a nationally televised match against two "Jeopardy" legends, Ken Jennings and Brad Rutter, seemed risky to some, a bit déclassé to others. "Jeopardy," a TV quiz show, appeared to lack the timeless cachet of chess, which IBM computers had mastered a decade earlier.

Nonetheless, Mr. Ferrucci and his team went ahead and built their machine. Months earlier, It had fared well in a set of test matches. But the games revealed flaws in the machine's logic and game strategy. It was a good player, but to beat Messrs. Jennings and Rutter, who would be jousting for a million-dollar top prize, it would have to be great. So they had worked long hours over the summer to revamp Watson. This September event was the coming-out party for Watson 2.0. It was the first of 50 test matches against a higher level of competitor: humans, like Justin Bernbach, who had won enough matches to compete in the show's Tournament of Champions.

Watson, in these early matches, was having its way with them. Mr. Ferrucci, monitoring the matches from a crowded observation booth, was all smiles. Keen to promote its "Jeopardy"-playing phenom, IBM's advertising agency, Ogilvy & Mather, had hired a film crew to follow Mr. Ferrucci's team and capture the drama of this opening round of championship matches. The observation room was packed with cameras. Microphones on long booms recorded the back and forth of engineers as they discussed algorithms and Watson's response time, known as latency.

It was almost as if Watson, like a human giddy with hubris, was primed for a fall. The computer certainly had its weaknesses. Even when functioning smoothly, it would commit its share of wacky mistakes. Right before the lunch break, one clue read, "The inspiration for this title object in a novel and a 1957 movie actually spanned the Mae Khlung." Now, it would be reasonable for a computer to miss "The Bridge Over the River Kwai," especially since the actual river has a different name. Perhaps Watson had trouble understanding the sentence, which was convoluted, even for humans. But how did the computer land upon its outlandish response, "What is Kafka?" Mr. Ferrucci didn't know. Those things happened, but Watson still won the two morning matches.

It was after lunch that things deteriorated. Mr. Bernbach, so frustrated in the morning, started to beat Watson to the buzzer. Meanwhile, the computer was making risky bets and flubbing entire categories of clues. Defeat, which seemed so remote in the morning, was now just one lost bet away. This came in the fourth match. Watson was winning by $4,000 when it stumbled on this final clue: "On Feb. 8, 2010, the headline in a major newspaper in this city read: 'Amen! After 43 years, our prayers are answered.'" Watson missed the reference to the previous day's Super Bowl, won by the New Orleans Saints. It bet $23,000 on Chicago. Mr. Bernbach also botched the clue, guessing New York. But he bet less than Watson, which made him the first human to defeat the revamped machine. He pumped his fist.

In the sixth and last match of the day, Watson trailed Mr. Bernbach, $16,200 to $21,000. The computer landed on a Daily Double, which meant it could bet everything it had on nailing the clue. It was under the category "Colleges and Universities." A $5,000 bet would have brought Watson into a tie with Mr. Bernbach. A larger bet, while risky, could have catapulted the computer toward victory. "I'll take five," Watson said.

Five. Not $5,000, not $500. Five measly dollars of funny money. The engineers in the observation booth were stunned. But they kept quieter than usual, since cameras were rolling.

Then Watson crashed. It occurred at some point between placing that lowly bet and attempting to answer a clue about the first Catholic college in Washington. Watson's "front end," its voice and avatar, were waiting for its thousands of processors, or "back end," to deliver an answer. It received nothing. Anticipating these situations, the engineers had prepared Watson with set phrases. "Sorry," Watson said, reciting one of them, "I'm stumped." Its avatar displayed a dark blue circle with a single filament orbiting mournfully in the antarctic latitudes.

What to do? Everyone had ideas. Maybe they should finish the game with an older version of Watson. Or perhaps they could hook up Watson to another up-to-date version of the program at the company's Hawthorne labs, six miles down the road. But some worried that a remote connection would slow Watson's response time, causing it to lose more often on the buzz. In the end, as often happens with computers, a reboot brought the hulking "Jeopardy" machine back to life. But Mr. Ferrucci and his team got an all-too-vivid reminder that their "Jeopardy" player, even as it prepared for a national TV debut, could go haywire at any moment. When Watson was lifted to the podium, facing banks of TV lights, it was anybody's guess how the computer would perform.


Only four years earlier, in 2006, Watson was a prohibitive long shot, not just to win at "Jeopardy," but even to be built. For more than a year, the head of IBM Research, a physicist named Paul Horn, had been pressing different teams at the company to pursue a "Jeopardy"-playing machine. The way Mr. Horn saw it, IBM had triumphed in 1997 with its chess challenge. The company's machine, Deep Blue, had defeated the reigning world champion, Garry Kasparov. This burnished IBM's reputation among the global computing elite while demonstrating to the world that computers could rival humans in certain domains associated with intelligence.

That triumph had left IBM's top executives hungry for an encore. Mr. Horn felt the pressure. But what could the researchers get a computer to do? Deep Blue had rifled through millions of scenarios per second, calculated probabilities, and made winning moves. But it had skipped the far more complex domain of words. This, Mr. Horn thought, was where the next challenge would be. The next computer should charge into the vast expanse of human language and knowledge. For the test, Mr. Horn settled on "Jeopardy." The quiz show, which debuted in 1964, attracted some nine million viewers every weeknight. It was the closest thing in the United States to a knowledge franchise. "People associated it with intelligence," Mr. Horn later said.

There was one small problem. For months, he couldn't get any takers. "Jeopardy," with its puns and strangely phrased clues, seemed too hard for a computer. IBM already had teams building machines to answer questions, and their performance, in speed and precision, came nowhere close to even a moderately informed human. How could the next machine grow so much smarter?

Mr. Horn eventually enticed David Ferrucci and his team to pursue his vision. An expert in Artificial Intelligence, Mr. Ferrucci had a wide and ranging intellect. He was comfortable conversing about everything from the details of computational linguistics to the evolution of life on Earth and the nature of human thought. This made him an ideal ambassador for a "Jeopardy"-playing machine. After all, his project would raises all sorts of issues, and fears, about the role of brainy machines in society. Would they compete for jobs? Could they establish their own agendas, like the infamous computer, HAL, in "2001: A Space Odyssey," and take control? What was the future of knowledge and intelligence, and how would brains and machines divvy up the cognitive work?

For humans, knowledge is an entire universe, a welter of sensations and memories, desires, facts, skills, songs and images, words, hopes, fears and regrets, not to mention love. But for those hoping to build intelligent machines, it has to be simpler. Broadly speaking, it falls into three categories: sensory input, ideas and symbols.

Consider the color blue. It's something that computers and people alike can perceive, each in their own fashion. Sensory perception is the raw material of knowledge. Now think of the three-letter word "sky." Those letters are a symbol for the biggest piece of blue in our world. Computers can handle such symbols. But how about this snippet from Lord Byron? "Friendship is love without his wings." That sentence represents the third realm of knowledge: ideas. How can a machine make sense of these? In these early years of the 21st century, ideas remain the dominion of humans—and the frontier for thinking machines.

Over the next four years, Mr. Ferrucci set about creating a world in which people and their machines often appeared to switch roles. He didn't know, he later said, whether humans would ever be able to "create a sentient being." But when he looked at fellow humans through the eyes of a computer scientist, he saw patterns of behaviors that often appeared to be pre-programmed: the zombie-like commutes, the near-identical routines, from tooth-brushing to feeding the animals, the retreat to the same chair, the hand reaching for the TV remote. "It's more interesting," he said, "when humans delve inside themselves and say, 'Why am I doing this? And why is it relevant and important to be human?' "

His machine, if successful, would nudge people toward that line of inquiry. Even with an avatar for a face and a robotic voice, the "Jeopardy" machine would invite comparisons to the other two contestants on the stage. This was inevitable. And whether it won or lost on a winter evening in 2011, the computer might lead millions of spectators to rethink the nature, and probe the potential, of their own humanity.

—From "Final Jeopardy: Man vs. Machine and the Quest to Know Everything" by Stephen Baker, to be published by Houghton Mifflin Harcourt on Feb. 17. Copyright © by Stephen Baker. Reprinted by arrangement with Houghton Mifflin Harcourt.

link to post share:

Book excerpt: Educating Watson
February 28, 2011

From McKinsey Quarterly. This is taken from the fourth chapter of Final Jeopardy. I've cut-and-pasted it here.

The programmer’s dilemma: Building a Jeopardy! champion

IBM computer scientist David Ferrucci and his team set out to build a machine that could beat the quiz show’s greatest players. The result revealed both the potential—and the limitations—of computer intelligence.

FEBRUARY 2011 • Stephen Baker

In 2007, IBM computer scientist David Ferrucci and his team embarked on the challenge of building a computer that could take on—and beat—the two best players of the popular US TV quiz show Jeopardy!, a trivia game in which contestants are given clues in categories ranging from academic subjects to pop culture and must ring in with responses that are in the form of questions. The show, a ratings stalwart, was created in 1964 and has aired for more than 25 years. But this would be the first time the program would pit man against machine.

In some sense, the project was a follow-up to Deep Blue, the IBM computer that defeated chess champion Garry Kasparov in 1997. Although a TV quiz show may seem to lack the gravitas of the classic game of chess, the task was in many ways much harder. It wasn’t just that the computer had to master straightforward language, it had to master humor, nuance, puns, allusions, and slang—a verbal complexity well beyond the reach of most computer processors. Meeting that challenge was about much more than just a Jeopardy! championship. The work of Ferrucci and his team illuminates both the great potential and the severe limitations of current computer intelligence—as well as the capacities of the human mind. Although the machine they created was ultimately dubbed “Watson” (in honor of IBM’s founder, Thomas J. Watson), to the team that painstakingly constructed it, the game-playing computer was known as Blue J.

The following article is adapted from Final Jeopardy: Man vs. Machine and the Quest to Know Everything (Houghton Mifflin Harcourt, February 2011), by Stephen Baker, an account of Blue J’s creation.

It was possible, Ferrucci thought, that someday a machine would replicate the complexity and nuance of the human mind. In fact, in IBM’s Almaden Research Center, on a hilltop high above Silicon Valley, a scientist named Dharmendra Modha was building a simulated brain equipped with 700 million electronic neurons. Within years, he hoped to map the brain of a cat, and then a monkey, and, eventually, a human. But mapping the human brain, with its 100 billion neurons and trillions or quadrillions of connections among them, was a long-term project. With time, it might result in a bold new architecture for computing, one that could lead to a new level of computer intelligence. Perhaps then, machines would come up with their own ideas, wrestle with concepts, appreciate irony, and think more like humans.
But such machines, if they ever came, would not be ready on Ferrucci’s schedule. As he saw it, his team had to produce a functional Jeopardy!-playing machine in just two years. If Jeopardy!’s executive producer, Harry Friedman, didn’t see a viable machine by 2009, he would never green-light the man–machine match for late 2010 or early 2011.

This deadline compelled Ferrucci and his team to build their machine with existing technology—the familiar semiconductors etched in silicon, servers whirring through billions of calculations and following instructions from many software programs that already existed. In its guts, Blue J would not be so different from the battered ThinkPad Ferrucci lugged from one meeting to the next. No, if Blue J was going to compete with the speed and versatility of the human mind, the magic would have to come from its massive scale, inspired design, and carefully-tuned algorithms. In other words, if Blue J became a great Jeopardy! player, it would be less a triumph of science than of engineering.

Blue J’s literal-mindedness posed the greatest challenge. Finding suitable data for this gullible machine was only the first job. Once Blue J was equipped with its source material—from James Joyce to the Boing Boing blog—the IBM team would have to teach the machine to make sense of those texts: to place names and facts into context, and to come to grips with how they were related to each other. Hamlet, just to pick one example, was related not only to his mother, Gertrude, but also to Shakespeare, Denmark, Elizabethan literature, a famous soliloquy, and themes ranging from mortality to self-doubt, just for starters. Preparing Blue J to navigate all of these connections for virtually every entity on earth, factual or fictional, would be the machine’s true education. The process would involve creating, testing, and fine-tuning thousands of algorithms. The final challenge would be to prepare the machine to play the game itself. Eventually, Blue J would have to come up with answers it could bet on within three to five seconds. For this, the Jeopardy! team would need to configure the hardware of a champion.

Every computing technology Ferrucci had ever touched had a clueless side to it. The machines he knew could follow orders and carry out surprisingly sophisticated jobs. But they were nowhere close to humans. The same was true of expert systems and neural networks. Smart in one area, clueless elsewhere. Such was the case with the Jeopardy! algorithms that his team was piecing together in IBM’s Hawthorne, New York, labs. These sets of finely honed computer commands each had a specialty, whether it was hunting down synonyms, parsing the syntax of a Jeopardy! clue, or counting the most common words in a document. Outside of these meticulously programmed tasks, though, each was fairly dumb.

So how would Blue J concoct broader intelligence—or at least enough of it to win at Jeopardy!? Ferrucci considered the human brain. “If I ask you what 36 plus 43 is, a part of you goes, ‘Oh, I’ll send that question over to the part of my brain that deals with math,’” he said. “And if I ask you a question about literature, you don’t stay in the math part of your brain. You work on that stuff somewhere else.” Ferrucci didn’t delve into how things work in a real brain; for his purposes, it didn’t matter. He just knew that the brain has different specialties, that people know instinctively how to skip from one to another, and that Blue J would have to do the same thing.

The machine would, however, follow a different model. Unlike a human, Blue J wouldn’t know where to start answering a question. So with its vast computing resources, it would start everywhere. Instead of reading a clue and assigning the sleuthing work to specialist algorithms, Blue J would unleash scores of them on a hunt, and then see which one came up with the best answer. The algorithms inside of Blue J, each following a different set of marching orders, would bring in competing results. This process, a lot less efficient than the human brain, would require an enormous complex of computers. More than 2,000 processors would each handle a different piece of the job. But the team would concern itself later with these electronic issues—Blue J’s body—after they got its thinking straight.

To see how these algorithms carried out their hunt, consider one of the thousands of clues the fledgling system grappled with. Under the category Diplomatic Relations, one clue read: “Of the four countries the United States does not have diplomatic relations with, the one that’s farthest north.”

In the first wave of algorithms to handle the clue was a group that specialized in grammar. They diagrammed the sentence, much the way a grade-school teacher would, identifying the nouns, verbs, direct objects, and prepositional phrases. This analysis helped to clear up doubts about specific words. In this clue, “the United States” referred to the country, not the Army, the economy, or the Olympic basketball team. Then the algorithms pieced together interpretations of the clue. Complicated clues, like this one, might lead to different readings—one more complex, the other simpler, perhaps based solely on words in the text. This duplication was wasteful, but waste was at the heart of Blue J’s strategy. Duplicating or quadrupling its effort, or multiplying it by 100, was one way the computer could compensate for its cognitive shortcomings, and also play to its advantage: speed. Unlike humans, who can instantly understand a question and pursue a single answer, the computer might hedge, launching searches for a handful of different possibilities at the same time. In this way and many others, Blue J would battle the efficient human mind with spectacular, flamboyant inefficiency. “Massive redundancy” was how Ferrucci’s described it. Transistors were cheap and plentiful. Blue J would put them to use.

While the machine’s grammar-savvy algorithms were dissecting the clue, one of them searched for its focus, or answer type. In this clue about diplomacy, “the one” evidently referred to a country. If this was the case, the universe of Blue J’s possible answers was reduced to a mere 194, the number of countries in the world. (This, of course, was assuming that “country” didn’t refer to “Marlboro Country” or “wine country” or “country music.” Blue J had to remain flexible, because these types of exceptions often surfaced.)

Once the clue was parsed into a question the machine could understand, the hunt commenced. Each expert algorithm went burrowing through Blue J’s trove of data in search of the answer. One algorithm, following instructions developed for decoding the genome, looked to match strings of words in the clue with similar strings elsewhere, maybe in some stored Wikipedia entry or in articles about diplomacy, the United States, or northern climes. One of the linguists focused on rhymes with key words in the clue. Another algorithm used a Google-like approach and focused on documents that matched the greatest number of keywords in the clue, paying special attention to the ones that popped up most often.

While they the algorithms worked, software within Blue J would be comparing the clue to thousands of others it had encountered. What kind was it—a puzzle? A limerick? A historical factoid? Blue J was learning to recognize more than 50 types of questions, and it was constructing the statistical record of each algorithm for each type of question. This would guide it in evaluating the results when they came back. If the clue turned out to be an anagram, for example, the algorithm that rearranged the letters of words or phrases would be the most trusted source. But that same algorithm would produce gibberish for most other clues.

What kind of clue was this one on diplomatic relations? It appeared to require two independent analyses. First, the computer had to come up with the four countries with which the United States had no diplomatic ties. Then it had to figure out which of those four was the farthest north. A group of Blue J’s programmers had recently developed an algorithm that focused on these so-called nested clues, in which one answer lay inside another. This may sound obscure, but humans ask these types of questions all the time. If someone wonders about “cheap pizza joints close to campus,” the person answering has to carry out two mental searches, one for cheap pizza joints and another for those nearby. Blue J’s “nested decomposition” led the computer through a similar process. It broke the clues into two questions, pursued two hunts for answers, and then pieced them together. The new algorithm was proving useful in Jeopardy!. One or two of these combination questions came up in nearly every game. They are especially common in the all-important Final Jeopardy, which usually features more complex clues.

It would take Blue J almost an hour for its algorithms to churn through the data and return with their candidate answers. Most were garbage. There were failed anagrams of country names and laughable attempts to rhyme “north” with “diplomatic.” Some suggested the names of documents or titles of articles that had strings of the same words. But the nested algorithm followed the right approach. It found the four countries on the outs with the United States (Bhutan, Cuba, Iran, and North Korea), checked their geographical coordinates, and came up with the answer: “What is North Korea?”

At this point, Blue J had the right answer. But the machine did not yet know that North Korea was correct, or that it even merited enough confidence for a bet. For this, it needed loads of additional analysis. Since the candidate answer came from an algorithm with a strong record on nested clues, it started out with higher-than-average confidence in that answer. The machine would proceed to check how many of the answers matched the question type: “country.” After ascertaining from various lists that North Korea appeared to be a country, confidence in “What is North Korea?” rose further up the list. For an additional test, it would place the words “North Korea” into a simple sentence generated from the clue: “North Korea has no diplomatic relations with the United States.” Then it would see if similar sentences showed up in its data trove. If so, confidence climbed higher.

In the end, it chose North Korea as the answer to bet on. In a real game, Blue J would have hit the buzzer. But being a machine, it simply moved on to the next clue.

About the Author
Stephen Baker is the author of The Numerati (Houghton Mifflin Harcourt, 2008) and was previously a writer at BusinessWeek.

link to post share:

Gizmodo excerpt: Could Watson Have Been Defeated by Homebrew?
February 18, 2011

Final Jeopardy

Could Watson Have Been Defeated by Homebrew?

(from Gizmodo, Feb. 18, 2011)

Well before Ken Jennings and Brad Rutter, IBM's design team grappled with a different challenge - getting beaten to the punch by someone else inventing a trivia-savvy artificial mind. Final Jeopardy discusses Watson's early development and how this Q&A juggernaut overcame the "Basement Baseline."

In the early days of 2007, before he agreed to head up a Jeopardy project, IBM's David Ferrucci harbored two conflicting fears. The first of his nightmare scenarios was perfectly natural: A Jeopardy computer would fail, embarrassing the company and his team.

But his second concern, failure's diabolical twin, was perhaps even more terrifying. What if IBM spent tens of millions of dollars and devoted centuries of researcher years to this project, played it up in the press, and then saw someone beat them to it? Ferrucci pictured a solitary hacker in a garage, cobbling together free software from the Web and maybe hitching it to Wikipedia and other online databases. What if the Jeopardy challenge turned out to be not too hard but too easy?

That would be worse, far worse, than failure. IBM would become the laughingstock of the tech world, an old-line company completely out of touch with the technology revolution - precisely what its corporate customers paid it billions of dollars to track. Ferrucci's first order of business was to make sure that this could never happen. "It was due diligence," he later said.

He had a new researcher on his team, James Fan, a young Chinese American with a fresh doctorate from the University of Texas. As a newcomer, Fan was free of institutional pre-conceptions about how Q-A systems should work. He had no history with the annual government-sponsored competitions, in which IBM's technology routinely botched two questions for every one it got right. Trim and soft-spoken, his new IBM badge hanging around his neck, Fan was an outsider. And he now faced a singular assignment: to build a Jeopardy computer all by himself. He was given 500 Jeopardy clues to train his machine and one month to make it smart. His system would be known as Basement Baseline.

So on a February day in 2007, James Fan set out to program a Q-A machine from scratch. He started by drawing up an inventory of the software tools and reference documents he thought he'd need. First would be a so-called type system. This would help the computer figure out if it was looking for a person, place, animal, or thing. After all, if it didn't know what it was looking for, finding an answer was little more than a crap-shoot; generating enough "confidence" to bet on that answer would be impossible. For humans, distinguishing President George Washington from the bridge named after him isn't much of a challenge. Context makes it clear. Bridges don't deliver inaugural addresses; presidents are rarely jammed at rush hour, with half-hour delays from Jersey. What's more, when placed in sentences, people usually behave differently than roads or bridges.

But what's simple for us involved hard work for Fan's Q-A computer. It had to comb through the structure of the question, picking out the subjects, objects, and prepositions. Then it had to consult exhaustive reference lists that had been built up in the industry over decades, laying out hundreds of thousands of places, things, and actions and the web of relationships among them. These were known as "ontologies." Think of them as cheat sheets for computers. If a
finger was a subject, for example, it fell into human anatomy and was related to the hand and the thumb and to verbs such as "to point" and "to pluck." (Conversely, when "the finger" turned up as the object of the verb "to give," a sophisticated ontology might steer the computer toward the neighborhood of insults, gestures, and obscenities.)

In any case, Fan needed both a type system and a knowledge base to understand questions and hunt for answers. He didn't have either, so he took a hacker's shortcut and used Google and Wikipedia. (While the true Jeopardy computer would have to store its knowledge in its "head," prototypes like Fan's were free to search the Web.) From time to time, Fan found, if he typed a clue into Google, it led him to a Wikipedia page - and the subject of the page turned out to be the answer. The following clue, for example, would confound even the most linguistically adept computer. In the category The Author Twitters, it reads: "Czech out my short story ‘A Hunger Artist'! Tweet done. Max Brod, pls burn my laptop."

A good human Jeopardy player would see past the crazy syntax, quickly recognizing the short story as one written by Franz Kafka, along with a reference to Kafka's Czech nationality and his longtime associate Max Brod. In the same way, a search engine would zero in on those helpful key words and pay scant attention to the sentence surrounding them. When Fan typed the clue into Google, the first Wikipedia page that popped up was "Franz Kafka," the correct answer. This was a primitive method. And Fan knew that a computer relying on it would botch the great majority of Jeopardy clues. It would be crashing and burning in the game against even ignorant humans, let alone Ken Jennings. But one or two times out of ten, it worked. For Fan, it was a start.

The month passed. Fan added more features to Basement Baseline. But at the end, the system was still missing vital components. Most important, it had no mechanism for gauging its level of confidence in its answers. "I didn't have time to build one," Fan said. This meant the computer didn't know what it knew. In a game, it wouldn't have any idea when to buzz. In the end, Fan blew off game strategy entirely and focused simply on building a machine that could answer Jeopardy clues.

It was on a March morning at IBM labs in Hawthorne, NY, that James Fan's Basement Baseline faced off against Big Blue's in-house question-answering system, known as Piquant. The results, from Ferrucci's perspective, were ideal. The Piquant system succeeded on only 30 percent of the clues, far below the level needed for Jeopardy. It had high confidence on
only 5 percent of them, and of those it got only 47 percent right. Fan's Basement Baseline fared almost as well by a number of measures but was still woefully short of what was needed. Fan proved that a hacker's concoction was far from Jeopardy standards - which was a relief. But by nearly matching the company's state-of-the-art in Q-A technology, he highlighted its inadequacies.

The Jeopardy challenge, it was clear, would require another program, another technology platform, and a far bolder approach. The job, Ferrucci said, called for "the most sophisticated intelligence architecture the world has ever seen." He proceeded to tell his bosses that he would lead a team to assemble a Jeopardy machine—provided that they gave him the resources to build a big one.

link to post share:

The top of Chapter 11
February 14, 2011

Chapter Eleven: The Match

Since many people have read the e-book minus the last chapter, I'm posting the beginning of the chapter leading up to the point where Watson faces off against Ken Jennings and Brad Rutter on the Jeopardy stage.


Chapter Eleven: The Match

David Ferrucci had driven the same stretch hundreds of times. It was the route from his suburban home to IBM’s Yorktown labs, or a bit farther to Hawthorne. For fifteen or twenty minutes along the Taconic Parkway each morning and evening, he went over his seemingly endless to-do list. How could his team boost Watson’s fact-checking in Final Jeopardy? Could any fix ensure that the machine's bizarre speech defect would never return? Was the pun-detection algorithm performing up to par? There were always more details to focus on, plenty to fuel both perfectionism and paranoia--and Ferrucci had a healthy measure of both.

But this January morning was different. As he drove past frozen fields and forests, the pine trees heavy with fresh snow, all of the to-do lists were history. After four years, his team’s work was over. Within hours, Watson alone would be facing Ken Jennings and Brad Rutter, with Ferrucci and the machine’s other human trainers reduced to spectators. Ferrucci felt his eyes well up. “My whole team would be judged by this one game,” he said later. “That’s what killed me.”

The day before, at a jam-packed press conference, IBM had unveiled Watson to the world. The event took place on a glittering new Jeopardy set mounted over the previous two weeks by an army of nearly 100 workers. It resembled the set in Culver City: the same jumbo game board to the left, the contestants' lecterns to the right, with Alex Trebek's podium in the middle. In front was a long table for Jeopardy officials, where Harry Friedman would sit, Rocky Schmidt to his side, followed by a line of writers and judges, all of them equipped with monitors, phones, and a pile of old-fashioned reference books. All of the pieces were in place. But this east coast version was plastered with IBM branding. The shimmering blue wall bore the company’s historic slogan, Think, in a number of languages. Stretched across the shiny black floor was a logo that looked at first like Batman’s emblem. But closer study revealed the planet earth, with each of the continents bulging, as if painted by Fernando Botero. This was Chubby Planet, the symbol of IBM’s Smarter Planet campaign, and the modal for Watson’s avatar. In the negotiations with Jeopardy over the past two years, IBM had lost out time and again on promotional guarantees. It had seemed that Harry Friedman and his team held all the cards. But now that the match was assured, and on Big Blue's home turf, not a single branding opportunity would be squandered.

The highlight of the press event came when Jennings and Rutter strode across the stage for a five-minute, 15-clue demonstration. In this test run, Watson had held its own. In fact, it had ended the session ahead of Jennings, $4,400 to $3,400. Rutter trailed with $1,200. Within hours, online headlines proclaimed that Watson had vanquished the humans. It was as if the game had already been won.

If only this were true. The demo match featured just a handful of clues and included no Final Jeopardy--Watson’s Achilles heel. What’s more, after the press emptied the auditorium that afternoon, Watson and the human champs went on to finish that game and play another round--"loosening their thumbs," in the language of Jeopardy. In these games Ferrucci saw a potential problem: Ken Jennings. It was clear, he said, that Jennings had prepped heavily for the match. He had a sense of Watson's vulnerabilities and an aggressive betting strategy specially honed for the machine.  Brad Rutter was another matter altogether. Starting out, Ferrucci’s team had been more concerned about Rutter than Jennings. His speed on the buzzer was the stuff of legend. Yet he appeared relaxed, almost too relaxed, as if he could barely be bothered to buzz. Was he saving his best stuff for the match?

In the first of the two practice games, Jennings landed on all three daily doubles. Each time he bet nearly everything he had. This was the same strategy Greg Lindsay had followed to great effect in three sparring games 10 months earlier. The rationale was simple. Even with its mechanical finger slowing it down by a few milliseconds, Watson was lightening fast on the buzzer. The machine was likely to win more than its share of the regular Jeopardy clues. So the best chance for humans was to pump up their winnings on the four clues that hinged on betting, not buzzing. Those were the three Daily Doubles hiding behind certain clues, and the Final Jeopardy. Thanks to his aggressive betting, Jennings ended the first full practice game with some $50,000, a length ahead of Watson, which scored $39,000. Jennings was fired up. When he clinched the match, he pointed to the computer and exclaimed, “Game over!” Rutter finished a distant third, with about $10,000. In the second game, Jennings and Watson were neck and neck to the end, when Watson edged ahead in Final Jeopardy. Again, Rutter coasted to third place. Ferrucci said that he and his team left the practice rounds thinking, “Ken’s really good--but what’s going on with Brad?”

When Ferrucci pulled in to the Yorktown labs the morning of the match, the site had been transformed for the event. The visitors’ parking lot was cordoned off for VIPs. Security guards posted at the doors checked every person entering the building, matching their names against a list. And in the vast lobby, usually manned by one lonely guard, IBM’s luminaries and privileged guests circled around tables piled with brunch-fare. Ferrucci made his way to Watson’s old practice studio, now refashioned as an exhibition room. There he gave a half-hour talk about the supercomputer to a gathering of IBM clients, including J.P. Morgan, American Express, and the pharmaceutical giant Merck and Co. Ferrucci recalled the distant days when a far stupider Watson responded to a clue about a famous French bacteriologist by saying: “What is ‘How Tasty Was My Little Frenchman’?” (That was the title of a 1971 Brazilian comedy about cannibals in the Amazon.) 

His next stop, the make-up room, revealed his true state of mind. The make-up artist was a woman originally from Italy, like much of Ferrucci's family. As she began to work on his face she showered him with warmth and concern--acting "motherly." This rekindled his powerful feelings about his team and the end of their journey, and before he knew it, tears were streaming down his face. The more the woman comforted him, the worse it got. Ferrucci finally staunched the flow and got the pancake on his face. But he knew he was a mess. He hunted down Scott Brooks, the light-hearted press officer. Maybe some jokes, he thought, “would take the lump out of my throat.” Brooks laughed and kidded his colleague.

This irritated the testy Ferrucci and, to his relief, knocked him out of his fragile mood. He joined his team for one last lunch, all of them seated at a long table in the cafeteria. As they were finishing, just a few minutes before 1 p.m., a roaring engine interrupted conversations in the cafeteria. It was IBM’s Chairman Sam Palmisano landing in his helicopter. The hour had come. Ferrucci walked down the sunlit corridor to the auditorium.


Ken Jennings woke up that Friday morning in the Crown Plaza in White Plains. He’d slept well, much better than he usually did before big Jeopardy matches. Jennings had reason to feel confident. He had destroyed Watson in one of the practice rounds. Afterwards, he said, Watson’s developers told him that the game had featured a couple of “train wrecks”--categories where Watson appeared disoriented. Children’s literature was one. For Jennings, train wrecks signaled the machine’s vulnerability. With a few of them in the big match, he could stand up tall for humans, and perhaps extend his legend from Jeopardy to the broader realm of knowledge. “Given the right board,” he said, “Watson is beatable.” A stakes were considerable. While IBM would give all of Watson’s winnings to charity, a human winner would earn a half million-dollar prize, with another half million to give to the charity of his choice. Finishing in second or third place was worth $150,000 and $100,000, with equal amounts for the players’ charities.

A little after 11, a car service stopped by the hotel, picked up Jennings and his wife, Mindy, and drove them 13 miles north to IBM’s Yorktown laboratory. Jennings carried three changes of clothes, so that he could dress differently for each session, simulating three different days. As soon as he stepped out of the car, Jeopardy officials whisked him past the crush of people in the lobby and toward the staircase. Jeopardy had cleared out a couple of offices in IBM’s Human Resources department, and Jennings was given one as a dressing room.

On short visits to the East Coast, Brad Rutter liked to sleep late, so that he stayed in sync with West Coast time. But the morning of the match, he found himself awake at 7, which meant he faced four and a half hours before the car came by. Rutter was at the Ritz Carlton in White Plains, about a half mile from Jennings. He breakfasted, showered, and then killed time until 11:30. Unlike Jennings, Rutter had grounds for serious concern. In the practice rounds, he had been uncharacteristically slow. The computer had an exquisite sense of timing, and Jennings seemed to hold his own. Rutter, who had never lost a Jeopardy game in his life, was facing a flame-out unless he could get to the buzzer fast.

Shortly after Rutter arrived at IBM, he and Jennings played one last practice round with Watson. To Rutter’s delight, his buzzer thumb started to regain the old magic. He beat both Jennings and the machine. Now, in the three practice matches, each of the three players had registered a win. But Jennings and Rutter noticed something strange about Watson. Its game strategy, Jennings said, “seemed naive.” Just like beginning Jeopardy players, Watson started with the easy low-dollar clues and moved straight down the board. Why wasn’t it hunting for Daily Doubles? In the Blue-ray disks given to them in November, Jennings and Rutter had seen that Watson skipped around the high-dollar clues, hunting for the single Daily Double on the first Jeopardy board, and the two in Double Jeopardy. Landing Daily Doubles was vital. It gave a player the means to build a big lead. Equally important, once Daily Doubles were off the board, the leader was hard to catch. But in the practice rounds, Watson didn’t appear to have this strategy in mind.

The two players were led to a tiny entry hall behind the auditorium. As the event commenced, shortly after one p.m., they waited. They listened as IBM introduced Watson to its customers. “You know how they call time outs before a guy kicks a field goal?” Jennings said. “We were joking that they were doing the same thing to us. Icing us.” Through the door they heard speeches by John Kelly, the chief of IBM Research, and Sam Palmisano. Harry Friedman, who decades earlier had earned $5 a joke as a writer for Hollywood Squares, delivered one of his own. “I’ve lived in Hollywood for a long time,” he told the crowd. “So I know something about Artificial Intelligence.” When Ferrucci was called on to the stage, the crowd rose for a standing ovation. “I already cried in make-up,” he said. “Let’s not repeat that.”

Finally, it was time for Jeopardy. Jennings and Rutter were summoned to the stage. They walked down the narrow aisle of the auditorium, Jennings leading in a business suit and yellow tie, the taller loose-gaited Rutter following him, his collar unbuttoned. They settled at their lecterns, Jennings on the far side, Rutter closer to the crowd. Between them, its circular black screen dancing with jagged colorful lines, sat Watson.

The show began with its familiar music. A fill-in for legendary announcer, Johnny Gilbert (who hadn’t made the trip from Culver City), introduced the contestants and Alex Trebek. But even then, Jennings and Rutter had to wait while an IBM video told the story of the Watson project. In a second video, Trebek talked to Ferrucci about the machinery behind the bionic player--now up to 2,880 processing cores. Then Trebek gave viewers a tutorial on Watson’s answer panel. This would reveal the statistical confidence that the computer had in each of its top responses. It was a window into Watson’s thinking.

Trebek, in fact, had been a late convert to the answer panel. Like the rest of the Jeopardy team, he was loath to stray from the show’s time-honored formulas. People knew what to expect from the game: the precise movements of the cameras, the familiar music, voices and categories. Wouldn’t the intrusion of an electronic answer panel distract them, and ultimately make the game less enjoyable to watch? He raised that concern on a visit to IBM in November. But the prospect of playing the game without Watson’s answer panel horrified Ferrucci. Millions of viewers, he believed, would simply conclude that the machine had been fed all the answers. They wouldn’t appreciate what Watson had gone through to arrive at the correct response. So while Trebek was eating lunch that day, Ferrucci carried out an experiment. He had his technicians take down the answer panel. When the afternoon sessions began, it only took one game for Trebek to ask for the answer panel back.  Later, he said, watching Watson without its analysis was “boring as hell.”

A hush settled over the auditorium. Finally, it was time to play. Ferrucci, sitting between David Gondek and Eric Brown, laced his hands tightly and made a steeple with his index fingers. He watched as Trebek, with a wave of his arm, revealed the six categories for the first round of Jeopardy....


link to post share:

©2023 Stephen Baker Media, All rights reserved.     Site by Infinet Design

Kirkus Reviews -

LibraryJournal - Library Journal

Booklist Reviews - David Pitt

Locus - Paul di Filippo

read more reviews

Prequel to The Boost: Dark Site
- December 3, 2014

The Boost: an excerpt
- April 15, 2014

My horrible Superbowl weekend, in perspective
- February 3, 2014

My coming novel: Boosting human cognition
- May 30, 2013

Why Nate Silver is never wrong
- November 8, 2012

The psychology behind bankers' hatred for Obama
- September 10, 2012

"Corporations are People": an op-ed
- August 16, 2011

Wall Street Journal excerpt: Final Jeopardy
- February 4, 2011

Why IBM's Watson is Smarter than Google
- January 9, 2011

Rethinking books
- October 3, 2010

The coming privacy boom
- August 17, 2010

The appeal of virtual
- May 18, 2010