Steve Baker: Mike, I think it's safe to assume that billions of people produce negligible data, and that others are prolific. That's the digital divide. Averages, in this world, are deceptive. So that just means that the kind of people most likely to read these words produce not just four floors of library books, but maybe 20, or 100. ie. Enormous amounts. Yes, to answer your question, all kinds of messages, including American Idol votes, would count. But those votes, while high in number, are very thin in data...  May. 23, 2010, 10:21am

Mike: I am not sure how valid these stats are, but this website ( lists the total number of internet users at 1,802,330,457 and this website ( lists the number of cell subscriptions at around 4.1 billion. I am not sure how best to verify these numbers. Even using these as estimates would change your calculations. I would think it would be beneficial to, at least attempt, remove those individuals who wouldn't be counted. Also I assume the number you are quoting would include test messages? Would this include text messages that were votes for shows like 'Dancing With The Stars,' and 'American Idol?"  May. 20, 2010, 10:46am

Bradd Libby: " The figure is 1,200 exabytes per year. So each person generates, on average, 157 megabytes times 1,200? right?" Yes, 1200 times 157 megabytes. My mistake.  May. 19, 2010, 1:24pm

steve baker: Tony, you might consider 99.99% of the data to be useless crap. But if a computer is studying it, it learns, for example, that "crap" is a word often preceded by "useless." And by judging the context of thousands of occurances of that combination, it can begin to figure out patterns and meaning. In other words, our crap is its training set.  May. 19, 2010, 1:14pm

steve baker: Brad, thanks for the comments. The figure is 1,200 exabytes per year. So each person generates, on average, 157 megabytes times 1,200? right? That comes to 188k megabites, or 188 gigabytes. (You're certainly right about the distortion caused by averaging.)  May. 19, 2010, 1:11pm

tony: yep, loads and loads of data and 99.99% of it is useless crap.  May. 19, 2010, 11:31am

Bradd Libby: Oops, left out the end of my comment: However, this average can be misleading. Some people generate many, many times more digital information than others. Probably a majority generate nothing and the rest likely follow a power-law distribution. So, the average can be affected strongly simply by the fraction of people with access to a digital device. P.S. Stephen, The Numerati was excellent. I'm looking forward to the new book...  May. 19, 2010, 11:22am

Bradd Libby: 1 exabyte annually / 7 billion people = 157 megabytes per person per year Human DNA encodes about 6 billion bits, or about 700 megabytes. So, each year we generate about 21% of our own DNA in new information. A text-only version of "the" Bible (depends on which version) runs about 1 megabyte. So, alternatively, each day we write about 1/2 of a new bible.  May. 19, 2010, 11:16am