Stephen Baker



About-face: Can we data mine the government?  posted on January 6, 2009

Datamining

At practically every Numerati event, people bring up fears that the government will mine the data of our lives. It's certainly something to keep in mind. But in this excellent article (with lots of good links) about computational journalism, John Mecklin points to the other possibility: Journalists and other citizens increasingly will be able to mine the behavioral data of people in power.

He speculates that journalists with the skills of Numerati will be in a position to expose the secrets, the chicanery, the double-dealing. Instead of burrowing through paper documents at City Hall or drinking late into the night with sources, they'll be fine-tuning algorithms to ferret out crooked patterns in massive data sets.

In a sense, it's the journalistic equivalant of what a computer scientist told me about medicine: The next Jonas Salk will be a mathematician, not a doctor. The same goes forat least a certain continent of the next gen of prize-winning reporters.  This trend also fits with the current economics of journalism: It's cheaper, in most cases, to data mine than to wear out shoe leather, as they used to say, on expense accounts around the world.

What kind of data are we talking about? From Mecklin:

Hamilton offers a theoretical example, taking off from EveryBlock, the set of Web sites masterminded by Adrian Holovaty, one of the true pioneers of database journalism and a former innovation editor at washingtonpost.com. If you live in one of the 11 American cities EveryBlock covers, you now can enter your address, and the site gives you civic information (think building permits, police reports and so on), news reports, blog items and other Web-based information, such as consumer reviews and photos, all connected to your immediate geographic neighborhood. In the not-too-distant future, Hamilton suggests, an algorithm could take information from EveryBlock and other database inputs and actually write articles personalized to your neighborhood and your interests, giving you, for example, a story about crime in your neighborhood this week and whether it has increased or decreased in relation to a month or a year ago.

add comment link to post send to friend

Feds to mine blogs for terrorists... Ya think?  posted on January 5, 2009

Datamining

Sometimes stories make their way into the news a year, or even a decade, too late. USA Today reported in December that government investigators "may soon" be mining blogs and social networks for terrorism clues. Now I can't say that people at the NSA have specifically assured me that they are mining blogs. (They were short on specifics) But the idea that they wouldn't have been doing this analysis for years is almost beyond belief.

In any case, there is useful info in the story, including this:

"There is a lot of IED (improvised explosive device) information generated by terrorists everywhere — websites, forums, people telling you where to buy fertilizer and how to plant IEDs," said Hsinchun Chen, director of the University of Arizona's Artificial Intelligence Lab. Chen's "Dark Web" research project has found 500,000,000 terrorist pages and postings, including tens of thousands that discuss IEDs.

add comment link to post send to friend

Reading list for Barack  posted on January 3, 2009

Marketing the book

Nice to see my book turn up on a "reading list for Barack" at the School Library Journal site. You might be interested in other books on the list. One of them, Gary Marcus's Kluge, was edited by my editor, Amanda Cook at Houghton Mifflin, Another Amanda product from a few years ago is Linked, Albert Laszlo Barabasi's excellent book on the new science of networks.

While I'm rooting around, here's an interview I did a while back with Bob Edwards, formerly of NPR, now with Sirius satellite radio. It's playing this weekend, paired with Bob's talk with cowboy poet Wally McRae.
I

add comment link to post send to friend

How friends' friends can affect moods  posted on January 3, 2009

Tribes

One of my Twitter buddies sent me the link to this New Scientist story on how things like happiness work their way through social networks. It's not just your friend's happiness that influences you, but also your friends' friends. So if you're unhappy, or thinking about taking up smoking, or even concerned that your child might be diagnosed with autism, look around and see what your friends' friends are up to.

Researchers are going to benefit from lots of new data about our social connections in the next decade or so. Facebook and MySpace are gold mines, of course. But there are also companies such as Sense Networks that are starting to track our physical movements, and to sort us into new behavioral tribes.


Not really a social network or a web. It's just a leaf I found last summer. But you get the idea...

Soon, researchers like those mentioned in the article will be able to study the contagion patterns of obesity, happiness, suicide, political philosophy. And of course the government will be especially keen to understand the migration across social networks of terrorist sympathies.

When I was researching The Numerati, I talked to scientists at the Naval Postgraduate School in Monterey, Calif., who are attempting to model the diffusion of certain ideas in places like Iraq. (Or, more specifically, in Iraq.) Let's say the United States does something good in a town, like building a medical clinic. How does positive news spread to other towns? When I was there, the studies were based on simulations using agent-based modeling. In the future, they'll be able to replace many of these simulations with reality: our behavior.

add comment link to post send to friend

Yahoo "squeezing every drop" from users' data  posted on January 2, 2009

Datamining


What if you see an Internet ad for a local pizza joint and you don't click on it? Does that mean it's irrelevant to you? That you would never click on it? Would another ad with other words, perhaps delivered at another time, have led you to click?

Yahoo researchers describe (briefly) how they are working to draw such conclusions from sparse and irregular data.  (ex David Kravets at a Wired blog)

"These questions may seem simple enough, but coming up with the perfect answer is a problem of staggering proportions. Why? Because the data is often high dimensional, too sparse or too noisy to make intelligent decisions.

For instance, there are billions of interactions that go on between web pages and advertisements, but the vast majority of interactions happen so infrequently. This makes it very difficult to learn from them."

If I click a pizza ad and you click one (I know, I know, you never click ads. Everyone says that, including me...) what do those two clicks have in common? Researchers at places like Yahoo and Google have to take into account the time of day, the other pages and ads we've clicked, the geography of the pizza joints, the wording and design of the ads.


These are all variables. Trying to draw predictions from them is a thorny challenge. The good news: When they get it wrong, no one objects much to an irrelevant or boring ad. We all insist we never click on them anyway.


add comment link to post send to friend

New sensors in footballs and gloves  posted on December 31, 2008

Science


See that ball? See that hand? If Carnegie Mellon researchers are successful, that catch will soon be represented on a computer graph as an intersection of two lines, one describing the movement of the player's glove, the second the spiraling trajectory of the football. (ex Flowing Data)

This is the kind of data that is going to be pouring into our world in short order. The first applications will be to automate, or at least double-check, the judgment of humans. Was that ball in his hand when his foot came down in bounds? Currently, frustrated football fans wait through long minutes of advertising (or fast-forward their TiVos) while experts scrutinize the videos.

But analytics will follow in short order? What are the most efficient trajectories? Are there ones that are caught more often? Intercepted? (This photo, incidentally, is of an interception.) How do you teach quarterbacks to optimize their passing? How else can sensors change sports? Something to think about while watching bowl games the next few days...

add comment link to post send to friend

Solstice: a tribute to sunlight and photosynthesis  posted on December 30, 2008

General


Winter solstice has passed. And since the sun in this hemisphere is making a rebound, I thought I'd link to Caetano Veloso's Luz do Sol, the only song I know about photosynthesis. It was brought to mind by this lovely op-ed by Oliver Morton.

add comment link to post send to friend

Will work for free  posted on December 29, 2008

News

It's no secret that volunteer workers create immense value, and the Internet gives them all kinds of new opportunites. Open-source software, Wikipedia and even the Obama campaign are examples everyone knows. Now the trick for all kinds of companies is to figure out ways to engage volunteers, and to devise new incentives, currencies, even HR departments for this new and plentiful source of labor.

I wrote a story about the free labor economy for BW, and it's up online today.

add comment link to post send to friend

The future of measurement  posted on December 29, 2008

Science

Social abacus, which I found from a link on Matthew Hurst, looks like a good Numerati blog to follow. Here are the four major predictions for 2009:

  • We will substantially advance our understanding of individuals and the meaningful connections they have.
  • We will identify methods to tap what people are *really* thinking, feeling, and paying attention to, meanwhile gaining insight on what a measurement is truly capturing.
  • We will determine how to measure the value of social interactions and attach financial value, whether we’re monetizing attention or a new medium.
  • We will build better tools to manage-- analyze and visualize-- massive volumes of data, primarily tapping the evolving social graph.
And Kate helpfully adds a list of the researchers involved in these efforts. Looks like a great resource:

Neal Burns, University of Texas, Austin Center for Brand Research
Walter Carl, Chat Threads,
Maury Giles, GSD&M Idea City
Sam Gosling, University of Texas, Austin: Department of Psychology
Seth Grimes, Alta Plana
Matthew Hurst, Microsoft Live Labs
Paul Janowitz, Sentient Services
Shawn Kung, Aster Data
Roddy Lindsay, Facebook
Jamie Pennebaker, University of Texas, Austin Department of Psychology
Martha Russell, Media X at Stanford University
Miles Sims, Small World Labs
Marc Smith, Telligent
Daniel Tunkelang, Endeca

add comment link to post send to friend

Is the world moving too fast to write books?  posted on December 29, 2008

General

Kevin Methany, a software developer in St. Paul, Minn., crosses a bridge nearly every day that's been under construction since 2002. In this essay, Methany compares bridge-building to big projects in the information world. His thesis: Bridges can take forever, because the river and the cement don't change much. A software project begun the same year, by contrast, would have to keep adjusting to changes in the Web ecosystem, from Facebook to the iPhone.

Writing a book is a little like building a bridge. It takes planning, prep work, meetings, and lots of fine-tuning. This process works fine for a biography of Andrew Jackson. But at least two big trends in the world are making me rethink the current publishing model for my kind of books.

First, the industry's in trouble (and my publisher has announced a 'soft freeze' on new acquisitions.) So there's a market problem. Secondly, lots of readers (especially of tech-themed books) are doing a lot more reading on the Web. This is contributing to the industry problem. But it also means that timelier "publishing" in smaller chunks might be needed to reach these people. Right now, I would say, the great majority of them nibble at the byproducts of a book like mine--the blog posts, reviews, perhaps an occasional podcast--without reading the book. Nothing wrong with that. I'm happy people are participating at any level.

But it leaves me with a question. How do I and others write (and fund) our next books, if you want to call them that, for this coming age? It's on my mind every single day. (PS: found this very good article in CNET about self publishing.)

add comment link to post send to friend




©2009 Stephen Baker Media, All rights reserved.     Site by Infinet Design






Jan. 7, 6:30pm - Book Party -- MediaBistro - Butterfield 8 -- 5 E. 38th. St. New York, NY [more info]

click here to view entire calendar



In the future, human motivations won't be sifted only by the psychologists and ministers, but increasingly by the "nu...
- Ars Technica - Nate Anderson

"The implications may sound a bit Big Brother-esque, but Baker believes we have as much to benefit as to risk."...
- Fortune Magazine - Jessi Hempel

Math appeal: How number-crunchers have you pegged ... Back in school, solving math problems could feel like playing detect...
- Las Vegas Business Press - Matthew Crowley

click here to read all reviews



List of favorite non-fiction books
- December 18, 2008


Early results of behavioral ad campaign
- November 4, 2008


Launching Numerati behavioral campaign: Will deliver 8 million targeted ads
- September 5, 2008


The Worker: Excerpted as BusinessWeek cover story, Aug 28, 2008
- August 28, 2008


Message for math and business readers
- August 27, 2008


We are going to target you with behavorial ads--and blog about it
- August 20, 2008


Steve Baker answers questions about The Numerati
- August 1, 2008


My Math Career
- July 20, 2008


A few more math covers
- July 19, 2008