Corrente regular twig has a post (All Your Informations Are Belong To Us!) that provides a link to a WSJ Article 'Scrapers' Dig Deep for Data on Web which does a decent job of reporting about the screen scraping industry. There's the usual hyperventilation in the comment threads in both locations, with good points all about, but most of all there seems to be some significant confusion about who is getting what and through what means.
I work for an organization that helps government agencies collect, store, combine, concatenate and utilize massive amounts of data. On the one hand, massive amounts of data is collected on a daily basis. On the other hand, there's not a hell of a lot the government can do with it. Why? Resource constraints. There aren't enough people and dollars to crunch the numbers, and even fewer to make sense of the information coming out the other end. On the third hand (yeah, I've got more hands than Kali), the data that does get analyzed is hedged around with very strict rules about who can read it, where it can go, and so forth. Audit trails usually include access to the information. It's hard to leak information without it being traceable back to the leaker. For the most part, the paranoid fantasies of what the government is doing with your information are laughably overblown. I add to that the ego-deflating fact that you just aren't that interesting. No one in power give a fuck who you are and what you say. The government gets bad-mouthed by bigger names than you.
As for the paranoiacs that see the gummint coming after them for taxes, ppphhhht. The IRS will probably find out that you tried to hide your gambling winnings and the cash-only rental unit in your garage, and sadly will do so with more zeal than it will look for the Merry Banksters' billions of dollars in public loot, but municipalities these days are so strapped for cash they can't afford to collect on outstanding debts below a certain threshold. The City of San Diego can't afford to track down parking violators to collect the fines, for example, unless they are really big.
The data collectors to worry about is where there is a profit to be made, namely in insurance and credit industries. There, weeding out (or, in credit's case, being able to target) marginal customers can make fractions add up rapidly. HR departments are probably next, to try to avoid hiring liabilities, but they're more likely to rely on your credit scores than your Facebook posts. Lots of bills to them means an inattentive or irresponsible employee, especially if accompanied by regular but short-tenure employment. Having been tapped to review resumes and conduct interviews in every one of my last six jobs, most people don't make the cut because they give shit-awful interviews. They never make it to the background check stage. But back to the data collectors. The insurance and credit industries are the folks with the deep pockets to buy, crunch and commodify your information. Their sole goal is to reduce you to an algorithm that can determine your position on a scorecard of likely customers. I consider this to be the most pernicious form of data collection.
There are also the marketeers who want to sell you shit and try to figure out your buying habits to sell you more of what you already buy and/or things similar to what you are buying. The holy grail of this crew is a universal web cookie that will track your every move and purchase. They rip through your emails on the major on-line providers, they stuff flash cookies on your system, they can conduct instant auctions based on your cookies and the page you are on to price, sell and push out targeted advertising. I admit to a certain respectful awe to the instantaneous nature of their markets.
Then there are the criminals who infest your system with worms and make zombies of your machines. They either want to key log your activities and clean out your bank account, or use your processing power for brute force attacks on database servers, or both. (News flash - using non-Microsoft products will not make you safer.) Criminal botnets will run as long as there is someone dumb enough to click a too-good-to-be-true email link.
The fundamental problem is that the US does not recognize and protect a citizen's right to privacy. The practical problem is that people are a bunch of yahoos when it comes to protecting their own privacy. Facebook participants don't seem to understand that they are the merchandise. Information about who you are and what you do can make someone else money. As I've said for years, Google's entire business model is get information about you and sell it to someone else as often as possible. It's not just online, though. Anything electronic can be used - your credit cards, your grocery rewards cards, your checks, your campaign contributions.
My biggest "privacy" breach at the moment is Huffington Post offering up my campaign contribution history to anyone who types in my name. Guess what? Arianna has just guaranteed that I will never again provide a campaign contribution. My next biggest problem (because I know how to prevent online data gathering - it pays to be in the industry) is from my credit card use. The credit companies sell my buying records and coordinate ad campaigns with large retail entities. For example, AMEX just sent me an offer to get $10 rebate if I'll buy 4 Fresh & Easy branded products and spend $50 before the end of the month, which means they searched for all card holders within X miles of the new F&E and created a dynamic offer. Yes, I took the offer as I could buy four products for less than $10. Yup, I'm one of the yahoos - and so are you.
If someone wanted to, they could piece together my blogger persona with my real life one, and I doubt that connection would cost me more than some embarrassment (and probably gain me some high-fives). The political writing is not what makes me vulnerable. The data that makes me vulnerable is the credit and health info. My credit is inextricable from having above the poverty level financial dealings. I can't move to all-cash and frankly wouldn't want to. Health specifics aren't listed on any web site. There's not much I an do about insurance records.
To the degree I have a digital privacy strategy, it is to stay away from Facebook and other social networking sites and carefully limit where I leave digital footprints when I'm online. That's more to deter the marketeers than anything.
My biggest solace is knowing, because I am involved in the IT side of things, how fragile digital information is. I know how easily it is lost and accidentally deleted, how bad the storage media are, how incompatible information schema can be, how quickly formats become irretrievable, how frantically companies try to make past records systems obsolete, and that as energy becomes more dear, data will get dumped.
It's fun to scare ourselves with fantasies of Big Brother, but the fears ring a little hollow when blared from Facebook and Blogger.