Stephen Baker



Posterous DOS attack. Someone should write the story  posted on August 12, 2010

News

I have a blog on Posterous, a very interesting and agile Web service. Just now I received an apologetic email from the company's CEO, Sachin Agarwal. He says that over the last six days, Posterous has been victimized by powerful Denial of Service Attacks. When the first one hit, last Wednesday, the Posterous team raced to move to new data centers. Another one hit on Friday. It was a crazy six days for Posterous. A tech leader named Vince, Agerwal writes, "worked like a mad man until he passed out on his desk."

Briefly, in 2002/03, I was acting info tech editor at BusinessWeek. If this were a Thursday night back then, I'd be preparing for Friday's story meeting, in which I'd propose a 4-column narrative on the Posterous attacks. Here's a very innovative and popular start-up, I'd argue, whose very existence was threatened by these attacks. (I still don't know where they came from.) It would make a great story. It would give us insights into the dangers surrounding us and a look at a start-up battling them.

But why, another editor would surely ask, should we care about Posterous? (Of course, if the editor in chief appeared to be interested in my story, that question might remain unasked. These meetings were exercises in Kremlinology.) In any case, I'd have to make a case for Posterous. Still, I hope someone somewhere is considering writing a story about it. I'm far too busy with my book to report the story. But it's one I'd like to read.
***
You might be wondering about my blog on Posterous. I started a BusinessWeek alumni network on Ning. When Ning demanded payment for what started out as a free site, I migrated all of the content to Posterous. I look at it not as an active blog, but as a private archive of BusinessWeek history.

add comment link to post send to friend

Goldblatt exhibit: Eyes on South Africa  posted on August 10, 2010

General


We finally got to the Jewish Museum in New York to see the David Goldblatt exhibit. He's a South African who photographed people in his country, white, black and "colored" alike, making their best efforts to live normal lives under apartheid, which created the most abnormal of circumstances. It's a wonderful exhibit if you get the chance.

add comment link to post send to friend

Jaron Lanier critiques IBM's Jeopardy challenge  posted on August 9, 2010

Jeopardy book

Jaron Lanier, the technologist and author who worries that we're getting carried away with our machines, includes Watson, IBM's Jeopardy-playing computer, in his latest broadside at the tech industry. Unfortunately, he misinterprets the question-answering technology. In a long New York Times op-ed, he writes:

...I.B.M. scientists recently unveiled a “question answering” machine that is designed to play the TV quiz show “Jeopardy.” Suppose I.B.M. had dispensed with the theatrics, declared it had done Google one better and come up with a new phrase-based search engine. This framing of exactly the same technology would have gained I.B.M.’s team as much (deserved) recognition as the claim of an artificial intelligence, but would also have educated the public about how such a technology might actually be used most effectively.

The challenge for a Jeopardy-playing computer is not simply to carry out searches based on phrases. Google and other search engines already do that, at least for simply phrased queries. Far beyond pointing toward Web pages, Watson must generate specific answers, each with its own confidence ranking. This enables the computer to calculate if it can risk a bet on the clue. It's a far more difficult challenge than the one Lanier portrays.

That said, he raises interesting questions about the marketing of machine "intelligence." His point, which he elaborated upon in his book, You Are Not a Gadget, is that we're too often ceding our decision-making to machines and "swarms" of online communities. By using Amazon recommendations or software to compose a harmony line, Lanier says we're abandoning the human brain--by far the most sophisticated known work of circuitry in the universe. Instead, we're delegating this work to far simpler algorithms.

The tech industry promotes this, he says, by branding technologies as "intelligent" and comparing some of them to the brain. It often anthropomorphiizes. This is where Watson comes in. Long before the Jeopardy challenge, IBM had a team of researchers working on Question-Answer technology. By channeling this research toward Jeopardy, the company was (and is) clearly looking for a branding opportunity. And by giving Watson a human name and voice, it anthropomorphizes the machine.

Is this a good thing? Well, putting a computer into a match against humans imposes a series of constraints that push researchers very hard. They must prepare the computer for long and confusing clues, and they have to design the system to come up with an answer it can bet on within three to five seconds. This advances the technology. (Whether or not this pays off commercially is still open to question.)

But let's assume that Watson and its kin race produce ever more sophisticated answers for us in coming years. Are we going to accept their responses as "truth" and our own judgments as something less than that? I don't think so. Watson is at its most fascinating and entertaining when it makes mistakes. It is when the machine is struggling or clueless that you most appreciate the astounding complexity of our language and the intricate web of connections in our minds.





add comment link to post send to friend

Jeopardy challenges: knowing what to look for  posted on August 8, 2010

Jeopardy book

I've written the first six chapters of the IBM-Jeopardy book, and my editor at Houghton Mifflin is having her way with them now. Since we're on a forced-march schedule, I'll be revising this first half of the book over the next three months while researching and writing the second half.

Meantime, I thought I'd throw in occasional blog posts on some of the challenges that IBM's Jeopardy-playing computer, Watson, faces as it plays the game. One of the big issues is figuring out from the question what it's supposed to be looking for. For this, it hunts for what researchers call a "lexical answer type," or LAT, in the Jeopardy clue.

Take this clue from the July 29 game. Under the category, "Who killed me, Shakespeare?" it reads:
"Banquo--this guy who sent the hitmen, though he also got his own hands bloody." What the computer should be looking for here is "this guy." That's the LAT. It's not that easy, because intially it appears as though "this guy" might be Banquo, and not the guy who sent the hitmen. But Watson has lots of training on finding the LATs, and the word "this" is a very significant pointer. (The answer, by the way, is MacBeth.)

Some are much harder. In the category "Hip Hop," Jeopardy players in February grappled with this clue: "Not surprisingly, his father Heraclides & his grandfather were both physicians & his mother was a midwife." The word is never mentioned, but the LAT we're looking for is both a son and a grandson. Often, when Watson sees "father," it knows to look for a father. But in this case, the father is given, which means it must look for a son. I can't go into all the details of how Watson does this here. (Some cannot be disclosed until after Watson's showdown with human champions next winter.) My point is that even to figure out what it should be looking for requires an immense amount of work. It involves scores of different algorithms carrying out hunts and building up a host of statistical probabilities.

Compared to us, Watson is a wasteful question-answerer. We know things. It knows nothing, and often must carry out exhaustive hunts even to find a LAT. I might add that in this "Hip Hop" category, Watson will no doubt be dedicating billions of computing cycles to the analysis of words and phrases associated with Hip Hop music. As it turns out, the category has nothing to do with it. The answers to the five clues are:  Hiphuggers, Hop Scotch, a hopper, the Bunny Hop, and, as referenced above, Hippocrates.

add comment link to post send to friend

The New York Mosque  posted on August 8, 2010

News

I don't usually write about politics or religion, but the controversy around the proposed mosque near Ground Zero in New York has been obsessing me lately.

Some of the protesters against the Mosque refer to it as a "victory" mosque, meaning that certain Muslims will celebrate it as the site of the "victory" on 9/11/01. Since there are some 2 billion Muslims on earth, that's probably a safe bet. But consider this: Thousands of U.S. Muslims are serving in our armed forces in Iraq, Afghanistan and elsewhere. Do those people consider 9/11 a "victory?" And are the Mosque critics being fair to these soldiers by grouping them with the people they're risking their lives, on our behalf, to fight?

A second point: We are extremely fortunate in this country to live in ethnic and religious peace. There is prejudice, injustice, and anger, of course, and incidents of violence. But considering that we're an enormously diverse nation of more than 300 million people, I think the level of peace and (relative) harmony here is nothing short of remarkable. Among the Muslim populations in the UK and France, there is a much greater sense of grievance and alienation. This has fomented violent uprisings in the Paris suburbs and has fed violent extremism in the UK. We have been spared these problems, in part because we don't have the same colonial histories in Muslim countries (though we're gaining them now), and because our own Muslim population is multi-ethnic and integrated into American society.

The worst thing we could do is communicate to American Muslims that they are a distrusted and detested minority. It would bring us closer to religion-based conflict in this country. And yet that's the message in the protests against the mosque in New York.

add comment link to post send to friend

When it comes to tracking customers, few match the Wall Street Journal  posted on August 4, 2010

Privacy

Advertisers who track user behavior online always put in this qualifier: It's anonymous. In other words, they track a Web surfer who seems interested in new cars or romantic movies, but not the specific person. In its latest in a series on data tracking, the Wall Street Journal today reports that this anonymity is "in name only." New technologies can come close to zeroing in on the person with just a smattering of data. Peter Eckersley, staff scientist at the Electronic Frontier Foundation, a privacy advocacy group, says, 33 bits on a person is enough.

Yet the Wall Street Journal, a vigorous customer tracker itself, doesn't have to go to all that trouble. A reader, Mark Naples, pointed out in an email that the Journal, one of few media outlets with a pay-wall, collects personally identifiable info online and has the ability to marry it with the behavior data scooped up with cookies.

Another commenter on this site, Michael Sandora, details the same points on his blog, Indigestion.  The difference between most data trackers and the Wall Street Journal, he writes, is this:

To a data tracker, I am a cookie number interested in bluegrass music, jam-bands, Star Wars, Indiana Jones, and reading about digital media.

To the Wall Street Journal’s subscription service, I am Michael Sandora, email address: msandora@thisnotmyemail.com, credit card number ####-####-####-####, bluegrass listener, Star Wars fan, and digital media follower.

What's more, the Journal's privacy policy, dating from 2008, reserves the right to  share this valuable trove with "other select companies to send you promotional materials about their products and services (that is, unless you've told us not to do so...)

I subscribe to the Journal, use their site, and really don't have problems with their blending my behavioral data with the personally identifiable stuff. I don't mind targeted advertising. And as someone who has lived off of advertising in media my entire career, I want journalism to find a funcional business model for the Internet age. But if the Journal is going to write a series on data privacy, they should pay more than passing attention to the practices of their own company.


add comment link to post send to friend

Privacy loses every time  posted on August 2, 2010

Privacy

Monday morning, and before I'm finished my first cup of coffee, I see two stories about the fall of privacy. First, the United Arab Emirates is shutting down Blackberry data services in their corner of the Arabian Peninsula because they can't evesdrop on the heavily encrypted messages. Next, I see in the Wall Street Journal (behind firewall) that the advertising side of Microsoft, in 2008, fought back a plan that would have thwarted cookies (as a default setting) in the the Internet Explorer 8.0 browser. How could Microsoft sell ads, they argued, with a browser that keeps advertisers from learning about the Web-surfing patterns of their potential customers?

Both the UAE and Microsoft have reasons to do what they're doing. The UAE is an oasis of relative freedom in a region that's short of it. People of all nationalities work in Abu Dhabi and Dubai. I was there last March. You meet Filipinos, Indians, Kenyons, Europeans, Moroccans. It's a regular UN. No place would be easier for Al Qaeda to do banking, organizing, bombing. You can even drive to the UAE from Yemen (though Google maps,for one reason or another, isn't able to give me the directions). I'm sure this move by the government angers many in the country (not least the Blackberry subscribers), but there's a defensable national security argument for it. It's at least as solid as the reasoning behind the 2001 Patriot Act in the U.S.

Microsoft also had its reasons not to interfere with cookies. It had to do with the profits in its online business, which struggles mightily against Google, among others. Given the choice between contracts from paying advertisers and appreciation of privacy-loving and non-paying Web surfers, they went with the bucks.

And that's my point. Privacy almost always loses. People say they care about it, but most of us are really like the UAE and Microsoft. Given a choice between the promise of security and privacy, we usually opt for security. (We march like sheep through the scanners at the airport, letting them oggle and grope us, and we even tolerate it when they snap, NO JOKES!)

At the same time, most of us drop our privacy concerns in a snap to save $5 at the supermarket, with a customer loyalty card, or five minutes at a toll booth. What's more, if we really cared deeply about privacy on the Internet, more of us would ditch Web mail, enable privacy browsing on our computers (and go to the trouble of typing a lot more passwords). And we'd heave the biggest surveillance machines, our cell phones, into the nearest gutter. I, for one, choose not to.

What's this all mean? We have hand-me-down notions of privacy that don't really fit our modern machines, networks and lives. In coming years, we'll see that some invasions of privacy (like cookies, in my opinion) are largely abstract. But we'll find others that are all too real. (I fear them in areas of police and medical surveillance.) For now, though, privacy loses, just about every time, to economics and promises of safety.


add comment link to post send to friend

WSJ: Advertiser tracking on the rise  posted on July 31, 2010

Datamining

The Wall Street Journal publishes a report today (behind firewall) on cookies, and the growth of consumer-tracking on major Web sites. For the report, they analyzed big Web sites, including their own, and found that many dropped more than 100 cookies into visitors' computers. (The Journal dumps 60 cookies, slightly below the 64-cookie average on the 50 largest sites.) The only big site that doesn't track visitors is Wikipedia.org.

As a reader (and former editor) I found the Journal story maddenly vague. It says that cookies are on the rise, but doesn't give any historical context. It mentions data-analysis companies that are doing highly detailed work, but doesn't name them. And while it states what type of analysis they could do with this detailed data, it doesn't give examples of how it's being used. To wit:

"Some tracking files can record a person's keystrokes online and then transmit the text to a data-gathering company that analyzes it for content, tone, and clues to a person's social connections..... Data-gathering companies [can] build personal profiles that could include age, gender, race, zip code, income, marital status, and health concerns, along with recent purchases and favorite TV shows and movies."

Why not name a few of these companies, and, while they're at it, ask advertisers how such detailed profiles are being used? Also, note the use of the word "could" in the last sentence. Is there evidence that these unnamed companies are actually building these profiles? We don't know.

I dealt with these issues often while researching The Numerati. The problem here, as in much of the data economy, is the gap between the astonishingly rich trove of data and the undeveloped business model for it. Most companies simply don't know how to put the data to use. How do you deal with millions of detailed consumer profiles when you only have four or ten or 20 different types of ad campaigns? You ignore most of the details and put the people into enormous buckets. (Credit-card companies are a notable exception. They can create thousands of different offers and test them against different groups. But they've been at this since long before the age of cookies.)

Eventually advertisers will learn to make use of this information, if a privacy uprising doesn't shut cookies down. But for now much of this detail we're communicating with our clicks and keystrokes is piling up in data centers, largely ignored.

add comment link to post send to friend

Ask.com tries different question-answering  posted on July 27, 2010

Jeopardy book

One of the common (and mistaken) assumptions about IBM's Jeopardy-playing computer, Watson, is that it has a database of answers to Jeopardy clues, and that it's just a matter of finding the right one. For Jeopardy, which has a staff of writers coming up with puzzlers, such a database would be impossible. Consider this clue from earlier this month: Under the category "Jonah's Druthers," it reads:

"Abord ship in a storm, the men "cast" these items of chance; Jonah's came up, but he'd rather it didn't. (I think I would have used "hadn't" for that last verb.) The answer, which isn't that hard for lots of humans, is "lots." But can you imagine a database waiting with an answer for that clue? No, Watson has to do loads of hunting, syntactical analysis and statistical work in three to five seconds to come up with answers.

But according to the NY Times, Ask.com is returning to its question-answering AskJeeves roots with a new Q/A service. This one, unlike Watson, will index some 500 million questions and answers. Most of these, I'm assuming, will be simple fact answers to simply-phrased questions, what Watson's builders call "factoids." How far is it from Philadelphia to Pittsburgh? How much does a Buick LeSabre cost? Most search engines, including Google, are already providing answers to these types of questions in the search results. You can often see them without clicking.

The challenge will be to keep the answers fresh. The price changes on that Buick. Nicolas Sarkozy won't be the president of France forever. A Q/A database, to stay relevant, has to be very lively, always checking and refreshing itself.

***

We're driving back from a wonderful wedding in the suburbs of Detroit. The honeymooners are now in Paris, and we're in Clearfield, Pa., the home of Dave Morgan, founder of Tacoda and Simulmedia, and the first character I introduced in The Numerati. Looking around here for dinner last night, I can understand why he decamped to Manhattan. Though the scenery in this part of western Pa, especially as dusk on a summer evening, is gorgeous.

add comment link to post send to friend

Confessions of a geezer at the movies  posted on July 22, 2010

General


I went to the movies last night and came to grips with a challenge I face: If I want to enjoy popular culture, and maybe even thrive in the work place, I'm going to need to achieve some level of expertise in video games. Games will provide the architecture, and increasingly, the interface, for much of what we do.

The movie was Inception, starring Leonardo di Caprio. In involves an adventure that continues through various levels of dreams. If it had just been about dreams and alternate realities, I think I would have loved it. But this movie behaved like an action video game. Loads of shooting and explosions, lots of buildings and bodies falling, and you're wondering the whole time: What level of the dream are we in? And you learn that certain people, when they're shot, descend to a lower level of dreams.

With all the pyrotechnics and the various levels, it felt like a video game. Leonardo had to descend into a series of alternate worlds and master them in order to reach his prize. As David Denby writes in the New Yorker, it had a thing or two in common with the Greek myth about Orpheus, who has to descend into the underworld to retrieve Eurydice. That would probably make a good video game too.


add comment link to post send to friend




©2010 Stephen Baker Media, All rights reserved.     Site by Infinet Design










@MichaelPizzo My pleasure. Another book u might like is Afterthought by James Bailey. Not new, but puts data in context of sci/math history

follow me on twitter





The Book Bag - Zoe Page

The Wall Street Journal - John Derbyshire

Frankfurter Allgemeine Zeitung - Milos Vec

The Guardian (UK) - Steven Poole & Christopher Exeter

read more reviews





The appeal of virtual
- May 18, 2010


My next book: IBM's Jeopardy mission
- March 22, 2010


BusinessWeek's strategy
- November 12, 2009


BusinessWeek cannot afford to stay within McGraw-Hill
- August 6, 2009


How to remake BusinessWeek?
- July 16, 2009


Fiction: The Andean Correspondent
- May 30, 2009


It's OK not to read the book...
- January 8, 2009


List of favorite non-fiction books
- December 18, 2008


Early results of behavioral ad campaign
- November 4, 2008


Launching Numerati behavioral campaign: Will deliver 8 million targeted ads
- September 5, 2008


The Worker: Excerpted as BusinessWeek cover story, Aug 28, 2008
- August 28, 2008


Message for math and business readers
- August 27, 2008