Saturday 14 April 2007

The visual part of the cloud

A recent conversation brought up the question of how many images can be found out there in the cloud of the Internet. The actual question was "unique significant images". I was caught off-guard by this topic, and realized that my attempts at coming up with a likely overall figure took me to wild assumptions which in hindsight seem rather ridiculous. But the question persists in my mind and I wonder how to best tackle such an assessment.

Two definitions:
- "unique": well, meaning just that, a unique image - so I take a photo of my house, which is unique, and share it via email with 10 people, and put it on my blog and on Flickr. The image then exists 12 times in the cloud, but for the sake of the figure we are looking for it only counts as 1.
- "significant": I needed explaining for this - e.g. structural images for designing a page (borders of a box etc.) are not signifcant, neither are ads or page titles etc.
- cloud: means all of the Internet, hidden and visible web, email accounts etc. - not just what search engine crawlers would be able to see and index.

So then where do images reside in the Internet:
- on newspaper websites
- on blogs
- on photo sharing sites such as Flickr
- in the image search server caches of search engines
- in email accounts
- on FTP servers
- on other websites (from personals to porn to company websites and anything else)

Let's work with these. I think there are about 2 billion websites out there. A website being defined as any number of pages that make up a common site. Many of these pages are likely to be generated dynamically, where a page can load up with different images each time it is accessed or on different days of accessing the same page (e.g. newspaper homepages). So the relevant quantity here would be the number of images residing in the database of this site.

Furthermore, there are currently about 750 million Internet users out there.

One way of assessing the overall quantity of "unique significant images" would be to work with the users. Let's assume on average each of these users over their Internet-usage lifetime so far has shared 5 unique images in one way or another. I think that is a conservative assumption, give the proliferation of camera phones, digital cameras and the popularity of sharing images via mail, messenger and photo sharing site.

This alone would result in about 3.75 billion "unique significant images", with a growth curve that is probably more exponential than linear in nature.

Alternatively, one could drill down on each of the categories above of places where images reside in the cloud. But that opens a whole barrel of problems: e.g. how does one assume the number of newspaper sites out there, and the number of images in the databases of those newspaper sites? In terms of blogs, I read a recent figure that there are 80 Million blogs out there - but how many images per blog? The list goes on and on, for each type of site. Porn alone is going to account for a massive amount of images - but the overall quantity would be impossible to guess without deeper inside into the number of porn sites out there and the avg. number of images per porn site.

My point is that I find it impossible to come up with a verifiable estimate. Is it 5 billion images ? 50 billion? 100 billion ? These figures seem too large, but then - who knows?

No comments: