By Peter Wayner
Thinking about the bits of data you leave behind is a one-way ticket to paranoia. Your browser? Full of cookies. Your cellphone? A beacon broadcasting your location at every moment. Search engines track your every curiosity. Email services archive way too much. Those are just the obvious places we’re aware of. Who knows what’s going on inside those routers?
The truth is, worrying about the trail of digital footprints and digital dustballs filled with our digital DNA is not just for raving paranoids. Sure, some leaks like the subtle variations in power consumed by our computers are only exploitable by teams of geniuses with big budgets, but many of the simpler ones are already being abused by identity thieves, blackmail artists, spammers, or worse.
Sad news stories are changing how we work on the Web. Only a fool logs into their bank’s website from a coffee shop Wi-Fi hub without using the best possible encryption. Anyone selling a computer on eBay will scrub the hard disk to remove all personal information. There are dozens of sound, preventative practices that we’re slowly learning, and many aren’t just smart precautions for individuals, but for anyone hoping to run a shipshape business. Sensitive data, corporate trade secrets, confidential business communications — if you don’t worry about these bits escaping, you may lose your job.
Learning how best to cover tracks online is fast becoming a business imperative. It’s more than recognizing that intelligent traffic encryption means not having to worry as much about securing routers, or that meaningful client-based encryption can build a translucent database that simplifies database management and security. Good privacy techniques for individuals create more secure environments, as a single weak link can be fatal. Learning how to cover the tracks we leave online is a prudent tool for defending us all.
Each of the following techniques for protecting personal information can help reduce the risk of at least some of the bytes flowing over the Internet. They aren’t perfect. Unanticipated cracks, even when all of these techniques are used together, always arise. Still, they’re like deadbolt locks, car alarms, and other security measures: tools that provide enough protection to encourage the bad guys to go elsewhere.
Online privacy technique No. 1: Cookie management The search engines and advertising companies that track our moves online argue they have our best interests at heart. While not boring us with the wrong ads may be a noble goal, that doesn’t mean the relentless tracking of our online activities won’t be used for the wrong reasons by insiders or websites with less esteemed ideals.
The standard mechanism for online tracking is to store cookies in your browser. Every time you return to a website, your browser silently sends the cookies back to the server, which then links you with your previous visits. These little bits of personalized information stick around for a long time unless you program your browser to delete them.
Most browsers have adequate tools for paging through cookies, reading their values, and deleting specific cookies. Cleaning these out from time to time can be helpful, although the ad companies have grown quite good at putting out new cookies and linking the new results with the old. Close ‘n Forget, a Firefox extension, deletes all cookies when you close the tab associated with a site.
Standard cookies are just the beginning. Some ad companies have worked hard on burrowing deeper into the operating system. The Firefox extension BetterPrivacy, for example, will nab the “supercookies” stored by the Flash plug-in. The standard browser interface doesn’t know that these supercookies are there, and you can delete them only with an extension like this or by working directly with the Flash plug-in.
There are still other tricks for sticking information in a local computer. Ghostery, another Firefox extension, watches the data coming from a website, flags some of the most common techniques (like installing single-pixel images), and lets you reverse the effects.
Online privacy technique No. 2: Tor One of the simplest ways to track your machine is through your IP address, the number the Internet uses like a phone number so that your requests for data can find their way back to your machine. IP addresses can change on some systems, but they’re often fairly static, allowing malware to track your usage.
One well-known tool for avoiding this type of tracking is called Tor, an acronym for “The Onion Router.” The project, developed by the Office of Naval Research, creates a self-healing, encrypted supernetwork on top of the Internet. When your machine starts up a connection, the Tor network plots a path through N different intermediate nodes in the Tor subnet. Your requests for Web pages follow this path through the N nodes. The requests are encrypted N times, and each node along the path strips off a layer of encryption like an onion with each hop through the network.
The last machine in the path then submits your request as if it were its own. When the answer comes back, the last machine acting as a proxy encrypts the Web page N times and sends it back through the same path to you. Each machine in the chain only knows the node before it and the node after it. Everything else is an encrypted mystery. This mystery protects you and the machine at the other end. You don’t know the machine and the machine doesn’t know you, but everyone along the chain just trusts the Tor network.
While the machine acting as your proxy at the other end of the path may not know you, it could still track the actions of the user. It may not know who you are, but it will know what data you’re sending out onto the Web. Your requests for Web pages are completely decrypted by the time they get to the other end of the path because the final machine in the chain must be able to act as your proxy. Each of the N layers was stripped away until they’re all gone. Your requests and the answers they bring are easy to read as they come by. For this reason, you might consider adding more encryption if you’re using Tor to access personal information like email.
There are a number of ways to use Tor that range in complexity from compiling the code yourself to downloading a tool. One popular option is downloading the Torbutton Bundle, a modified version of Firefox with a plug-in that makes it possible to turn Tor on or off while using the browser; with it, using Tor is as simple as browsing the Web. If you need to access the Internet independently from Firefox, you may be able to get the proxy to work on its own.
Online privacy technique No. 3: SSL One of the easiest mechanisms for protecting your content is the encrypted SSL connection. If you’re interacting with a website with the prefix “https,” the information you’re exchanging is probably being encrypted with sophisticated algorithms. Many of the better email providers like Gmail will now encourage you to use an HTTPS connection for your privacy by switching your browser over to the more secure level if at all possible.
An SSL connection, if set up correctly, scrambles the data you post to a website and the data you get back. If you’re reading or sending email, the SSL connection will hide your bits from prying eyes hiding in any of the computers or routers between you and the website. If you’re going through a public Wi-Fi site, it makes sense to use SSL to stop the site or anyone using it from reading the bits you’re sending back and forth.
SSL only protects the information as it travels between your computer and the distant website, but it doesn’t control what the website does with it. If you’re reading your email with your Web browser, the SSL encryption will block any router between your computer and the email website, but it won’t stop anyone with access to the mail at the destination from reading it after it arrives. That’s how your free Web email service can read your email to tailor the ads you’ll see while protecting it from anyone else. The Web email service sees your email in the clear.
There are a number of complicated techniques for subverting SSL connections, such as poisoning the certificate authentication process, but most of them are beyond the average eavesdropper. If you’re using a local coffee shop’s Wi-Fi, SSL will probably stop the guy in the back room from reading what you’re doing, but it may not block the most determined attacker.
Online privacy technique No. 4: Encrypted messages While Tor will hide your IP address and SSL will protect your bits from the prying eyes of network bots, only encrypted mail can protect your message until it arrives. The encryption algorithm scrambles the message, and it’s bundled as a string of what looks like random characters. This package travels directly to the recipient, who should be the only one who has the password for decrypting it.
Encryption software is more complicated to use and far less straightforward than SSL. Both sides must be running compatible software, and both must be ready to create the right keys and share them. The technology is not too complicated, but it requires much more active work.
There’s also a wide range in quality of encryption packages. Some are simpler to use, which often makes for more weaknesses, and only the best can resist a more determined adversary. Unfortunately, cryptography is a rapidly evolving discipline that requires a deep knowledge of mathematics. Understanding the domain and making a decision about security can require a doctorate and years of experience. Despite the problems and limitations, even the worst programs are often strong enough to resist the average eavesdropper — like someone abusing the system admin’s power to read email.
Online privacy technique No. 5: Translucent databases The typical website or database is a one-stop target for information thieves because all the information is stored in the clear. The traditional solution is to use strong passwords to create a wall or fortress around this data, but once anyone gets past the wall, the data is easy to access.
Another technique is to only store encrypted data and ensure all the encryption is done at the client before it is shipped across the Internet. Sites like these can often provide most of the same services as traditional websites or databases while offering much better guarantees against information leakage.
A number of techniques for applying this solution are described in my book “Translucent Databases.” Many databases offer other encryption tools that can provide some or all of the benefits, and it’s easy to add other encryption to the Web clients.
In the best examples, the encryption is used to obscure only the sensitive data, leaving the rest in the clear. This makes it possible to use the nonpersonal information for statistical analysis and data-mining algorithms.
Online privacy technique No. 6: Steganography One of the most elusive and beguiling techniques is steganography, a term generally applied to the process of hiding a message so that it can’t be found. Traditional encryption locks the data in a safe; steganography makes the safe disappear. To be more accurate, it disguises the safe to look like something innocuous, such as a houseplant or a cat.
The most common solutions involve changing some small part of the file in a way it won’t be noticed. A single bit of a message, for instance, can be hidden in a single pixel by arranging the parity of the red and green components. If they’re both even or both odd, then the pixel carries the message of 0. If one is even and one is odd, then it’s a 1. To be more concrete, imagine a pixel with red, green and blue values of 128, 129, and 255. The red value is even, but the green value is odd, meaning the pixel is carrying the message of 1.
A short, one-bit message can be hidden by taking a file, agreeing upon a pixel, and making a small change in either the red or green value so that the pixel carries the right message. A one-bit change will be tiny and almost certainly not visible to the human, but a computer algorithm looking in the right place will be able to find it.
Paul Revere needed to send only one bit, but you may need to send more. If this technique is repeated long enough, any amount of data can be hidden. An image with 12 megapixels can store a message with 12Mb, or 1.5MB, without changing any pixel by more than one unit of red or green. Judicious use of compression can improve this dramatically. A large message like this article can be snuck into the corners of an average photo floating around the Internet.
Tweaking pixels is just one of the ways that messages can be inserted in different locations. There are dozens of methods to apply this approach — for example, replacing words with synonyms or artfully inserting slight typographical mistakes into an article. Is that a misspelling or a secret message? All rely on inserting small, unnoticeable changes.
Steganography is not perfect or guaranteed to avoid detection. While the subtle changes to values like the red and green component may not be visible to the naked eye, clever algorithms can sometimes find the message. A number of statistical approaches can flag files with hidden messages by looking for patterns left behind by sloppy changes. The glare off of glass or chrome in a picture is usually stuffed with pixels filled with the maximum amount of red, green, and blue. If a significant number of these are just one unit less than the maximum, there’s a good chance that a steganographic algorithm made changes.
These detection algorithms also have limits, and there are a number of sophisticated approaches for making the hidden messages harder to find. The scientists working on detection are playing a cat-and-mouse game with the scientists looking for better ways to hide the data.
For anyone seeking more on this, my book “Disappearing Cryptography” explores various solutions in depth, and my iPad App How to Hide Online provides interactive illustrations for trying the algorithms.