Sunday, January 04, 2009

Internet Archiving - who owns my data?


(Update below)

Something has been bothering me about the internet (or more precisely, the collection of websites I use in the internet).
As an user, I have a broad range of options available now for publishing my content (such as blogs, images, video). But I do feel paranoid about certain qualities in the current internet.

Let me describe my situation - I write comments in several online discussion forums (such as rediff.com). I write blogs in DailyKos.com and blogspot.com. I upload videos to youtube.com. I also write comments in other people's blogs, whichever platform they may be. I save my bookmarks in del.icio.us.
Now, I value all this content I put in the web. For example, in several debates you come up with a new way of looking at something, an effective reply to a "talking point", a key piece of data that shuts up everybody. I am not talking about other people's content - I am specifically talking about content I myself put on the web in different websites. This is the age of User Generated Content and my content is distributed across different websites.
There are a couple of problems that I face with this distributed content:
1. How can I aggregate all my content and get updates when someone replies to me or links to me? This is a problem that RSS solves. I will not elaborate on this here.
2. How can I collect and provide a kind of catalog of my opinions in all these different websites? Let us say that in the near future I seek admission to Harvard. Is there someway that I can provide a collection of all my valuable content to Harvard so that I can be credentialized? Looking into the future, we can expect a new generation to start creating their identity online by teenage and thus leave a trail of their work and impressions (in whatever format) across the interner as they grow older. How can someone maintain this digital trail and leverage it?
The point is this problem has always existed even before the age of the internet. For example, if you wanted to collect the complete works of Einstein you went to every university he ever went to and searched libraries and archives. Some of these archives are digitized now, but nobody came up with a solution for an easy mechanism to package up your life's work. It was close to impossible in the pre-internet world to have a centralized collection of all of one's life's work.
But, this problem is solvable in the digital age - the facebook, orkut, myspace generation is going to have access to internet most of the time. Fifty years from now, it is possible to expect that a person's life's work can be determined by a biographer or an anthologist merely from the digital world.

So, what prevents me from getting a digital collection of my own work distributed across the internet, right now?
I can, of course, prepare a set of links with my rediff.com comments, my dailykos blogs, my blogspot blogs and my youtube videos. But that is all I can do - the websites reserve the right to invalidate these links at any point of time. In fact, twenty years from now, many of these may have switched off their servers and gone home.
As an internet contributor, the core problem I face is this - the rich data that is part of "my" internet, is not owned by me. It is owned by at least fifteen different websites. The same problem goes for everyone using the internet.
When I write an article in DailyKos, I want my article, along with the comments (which provide context) to be available for posterity. But I have no control over when they may "retire" the article or when Markos closes it down.
In theory, this is no different from the problems faced by preceding generations - if you are a newspaper columnist, you took paper cuttings of your column; probably photocopied it and kept it at home. That is all you could do.
I think we, in the internet age should demand more though - because more is possible now. For example, taking a printout of a webpage with my article is not good enough - because someone could be commenting on that article this minute; and I don't want to lose that context.
With their myriad ways of annotating, commenting and extending our content, the websites of the internet have made my content richer, more contextual, more centralized than in the pre-internet world. Youtube, blogspot, rediff have all made my contributions richer, but because of that it is more important that I be able to catalog and archive that content.

The tension here is between two poles - the websites have enabled me to contribute and reach a broad audience. For the survival of their business, they want me to keep coming back to their pages. So the data stays in different forms in different websites. I, on the other hand, would like to extract my data (in some format) and keep it in a set of archives so that the data is available for posterity. I am worried that all my valuable contributions will be gone some twenty years later.

I may sound paranoid, but I do care about the longevity of my thoughts - I think everyone who contributes in the internet does.

We should not allow what happened in the past centuries - there was no centralized publishing, so much work of enormous value was gone in a short time because the medium (like parchment or paper) perished. In this age almost everyone with access to the internet can publish their opinions and share their knowledge. It is all searchable. We should leverage the advantage of the digital medium and come up with a solution for extracting our data (even for a fee) and create some standards for extracting User Generated Content.
We also need a standard archiving solution that is not tied to any particular website. For a fee I should be able to store my data in different servers that are part of the meta-internet.
By the way, check out this browser add-in - http://www.iterasi.com. It provides a way for extracting any webpage for your personal storage. But the storage is still owned by iterasi.com - I think we need an internet archiving project that is more community-driven.


Update I
Refer to what happened to Soapblox in this article.
The content of several blogs running on the Soapblox platform was almost wiped out by hackers. Several years work could have been lost. This is why we need an open source extraction and archiving system.

9 comments:

Anonymous said...

Good Point you make.. All 'Our' content is owned by the content provider.. And like you pointed out, may be a small fee for archiving our content should help..

In that case, have you thought about developing a website, Atleast ur blogs will remain you property :)

Anonymous said...

Can anyone recommend the best Script Deployment utility for a small IT service company like mine? Does anyone use Kaseya.com or GFI.com? How do they compare to these guys I found recently: N-able N-central network policy enforcement
? What is your best take in cost vs performance among those three? I need a good advice please... Thanks in advance!

Anonymous said...

An Unheard Lady GaGa Track was Found this evening with no traces of where it originated from.
Some say that it was found in GaGa's Record Label's headquarters.

More info at http://ladygagaunreleased.blogspot.com

Free Download of the single at http://tinyurl.com/gagaunreleased

Anonymous said...

I wish not acquiesce in on it. I over precise post. Especially the title-deed attracted me to be familiar with the intact story.

Anonymous said...

Good dispatch and this enter helped me alot in my college assignement. Gratefulness you on your information.

Anonymous said...

[u][b]Xrumer[/b][/u]

[b]Xrumer SEO Professionals

As Xrumer experts, we possess been using [url=http://www.xrumer-seo.com]Xrumer[/url] for the benefit of a sustained fix now and remember how to harness the colossal power of Xrumer and turn it into a Bills machine.

We also provide the cheapest prices on the market. Many competitors devise expect 2x or even 3x and a lot of the time 5x what we debt you. But we believe in providing prominent service at a low affordable rate. The unbroken incidental of purchasing Xrumer blasts is because it is a cheaper variant to buying Xrumer. So we plan to stifle that mental activity in recollection and yield you with the cheapest standing possible.

Not only do we take the most successfully prices but our turnaround in the good old days b simultaneously payment your Xrumer posting is wonderful fast. We intention secure your posting done ahead of you discern it.

We also outfit you with a ample log of loaded posts on manifold forums. So that you can see seeking yourself the power of Xrumer and how we get harnessed it to benefit your site.[/b]


[b]Search Engine Optimization

Using Xrumer you can wish to apprehend thousands upon thousands of backlinks for your site. Scads of the forums that your Site you settle upon be posted on get high PageRank. Having your link on these sites can deep down expropriate strengthen up some top-grade grade help links and really as well your Alexa Rating and Google PageRank rating owing to the roof.

This is making your instal more and more popular. And with this developing in popularity as well as PageRank you can expect to appreciate your area really filthy expensive in those Search Locomotive Results.
Transport

The amount of conveyance that can be obtained nearby harnessing the power of Xrumer is enormous. You are publishing your site to tens of thousands of forums. With our higher packages you may still be publishing your locality to HUNDREDS of THOUSANDS of forums. Visualize 1 collection on a stylish forum will inveterately cotton on to a leave 1000 or so views, with say 100 of those people visiting your site. At once create tens of thousands of posts on in demand forums all getting 1000 views each. Your shipping ordain function at the end of one's tether with the roof.

These are all targeted visitors that are interested or bizarre nearly your site. Imagine how assorted sales or leads you can execute with this great number of targeted visitors. You are truly stumbling upon a goldmine ready to be picked and profited from.

Retain, Above is Money.
[/b]

TRAVERSE B RECOVER YOUR TWOPENNY ERUPTION TODAY:


http://www.xrumer-seo.com

Anonymous said...

Accept to pass the bedim with two backs casinos? enquiry this culmination [url=http://www.realcazinoz.com]casino[/url] exemplar and wing it humiliate online casino games like slots, blackjack, roulette, baccarat and more at www.realcazinoz.com .
you can also into our new [url=http://freecasinogames2010.webs.com]casino[/url] direct at http://freecasinogames2010.webs.com and triumph right folding spondulix !
another swaggerer [url=http://www.ttittancasino.com]casino spiele[/url] chain of events is www.ttittancasino.com , recompense german gamblers, command by means of means of safe from online casino bonus.

Anonymous said...

Someone deleted several links from storage.to and megaupload ...

From now, we will use www.tinyurlalternative.com as our main [url=http://www.tinyurlalternative.com]url shortener[/url], so every url will be there and visible for everyone.

You can choose from several great [url=http://kfc.ms]short url[/url] address like:

kfc.ms easysharelink.info jumpme.info megauploadlink.info megavideolink.info mygamelink.info myrapidsharelink.info mytorrentlink.info myurlshortener.com mywarezlink.info urlredirect.info urlshrinker.info weblinkshortener.com youtubelink.info and many others.

They maintain over 60 other ready domains and the [url=http://myurlshortener.com]url shortener[/url] service work well for free without any registration needed.

So we think it is good notion and suggest you to use [url=http://urlredirect.info]url redirect[/url] service too!

Thank you.

Anonymous said...

Making money on the internet is easy in the hush-hush world of [URL=http://www.www.blackhatmoneymaker.com]blackhat forum[/URL], It's not a big surprise if you haven’t heard of it before. Blackhat marketing uses not-so-popular or misunderstood methods to generate an income online.