How to stay anonymous on the internet?

When you take an onion in your hand and remove one layer from it, there will be another layer underneath, and another layer underneath. Granted, each successive layer of a real onion gets smaller and smaller, but the onion we’ll talk about in this episode is different. Each successive layer looks identical and is the same size, and in the middle, under all those layers that only a special key will remove, is a message. How many layers do I have to take off to read the message? Only the author of the message knows.

Onion routing was first implemented in the 1990s at the United States Naval Research Laboratory. Its authors Paul Syverson, Michael G. Reed, and David Goldschlag aimed to develop a network protocol that would provide strong protection for U.S. intelligence communications. The project was further developed by the Defense Advanced Research Projects Agency (DARPA for short) and in 1998 the protocol was patented by the US Navy. In 2002, computer scientists Roger Dingledine and Nick Mathewson joined Paul Syverson and built on the existing technology to create the best-known implementation of Onion Routing, first called the Onion Routing project and then Tor. The Naval Research Laboratory released the Tor source code under a free license after some time, after which Dingledine and Mathewson, along with five others, founded the non-profit organization The Tor Project in 2006. Today, the Tor protocol (or The Onion Router for short) is available for free with its source code. There is also a Tor Browser that uses the Tor protocol, and provides some additional functionality to make it even more difficult to track a user on the Internet.

Tor is not the only onion routing protocol, however, it is the most popular. Talking about onion routing in this episode of “IT In Simple Words” I will limit myself to Tor only. The topic is as interesting for technical reasons as it is controversial for reasons of how Tor can be used. It is a powerful tool, which in the wrong hands can do some damage. I will return to this topic in the second part of this episode.

Let’s assume that I want to connect to google.com. Normally, this communication will occur through different devices, but both sides of the communication will know quite a bit about the other side. Google will know who my internet provider is, my internet provider will know that I use Google services. My ISP is unlikely to know how I use those services. The whole process of connecting to google.com will be fast, will take the shortest route and will be secured with encryption thanks to the certificate I will get from google.com. However, if I decide to use the Tor protocol for this process, it will be different. Neither my ISP will know that I’m using google.com, nor will Google corporation know who I am or who my ISP is if I don’t explicitly sign into their service. Additionally, even if somewhere along the communication path between my laptop and google.com there is some nosy administrator or hacker, all they will know is that someone is using Tor on that particular connection. Sound unbelievable? Let’s explain the mechanism behind it.

Tor is not only the protocol itself, but also an extensive community. Anyone can become part of this community and share their link. This is also what happens. Users from all over the world set aside a piece of their resources to expand the Tor network. In this way, their computers become the so-called “nodes” of the network. Nodes scattered around the world are a major part of Tor’s strength, because the communications passing through these nodes can be completely different from moment to moment, making it very difficult to track down users of the network.

Sometimes, when trying to connect to google.com, the connection will go through Italy, USA and South Africa, only to be directed to Google’s server room. However, I may decide to create a new connection and this time it will be mediated by nodes in, say, Germany, Canada, USA, to finally direct the connection to google.com. Such a connection using additional nodes is called a “circuit” or “chain” in Torah terminology. Also, the nodes in a circuit don’t work like regular routers that just forward packets to the next address. When I make a new connection to the Tor network, cryptographic keys are automatically generated with which I perform encryption. By default Tor requires 3 intermediate nodes, so let’s assume that’s also the case for me. Since I know that 3 intermediate nodes will be involved in the connection, I use three keys and encrypt my message to google.com three times using them, one by one. I send my message to the first node. This node knows only one of my keys, so it is able to decrypt the message. We can jargonally say that it takes off the first layer of ciphertext. It reads the message and notices that it is unable to do anything with it because it is still encrypted. However, it can route it to the next node. The next node performs the same operation again. Since it knows the key to the second layer, it removes it and again sees that it is of no use to it, because the message is still encrypted. So it directs the message again to the next node. The third node receives the message, decrypts it with the third key. This time the message is clear and reads “connect me to google.com”. The node makes the connection and when it receives a reply, it re-encrypts it with its key and forwards it to the Tor node from which it received the message earlier. The next node encrypts the message with its key and generally the whole process just happens in reverse. Eventually, when I get the reply and it is encrypted with three keys, I will be able to read it because only I have all of them. Throughout this process, each node only knows part of the circuit. That is, the first node only knows who I am and to whom it should direct the message coming from me, but it has no idea what is in the message. Even if it decrypts it with the key it has, there are still two layers of ciphertext left. The last node, the one that already routes the connection to google.com actually knows what page is being visited, but it doesn’t know by whom. It only knows from which node it itself received such a request. Because of such multiple encryption and the creation of circuits sometimes going all over the world, tracking down a particular user on the Internet is an extremely difficult task. One could, of course, try to eavesdrop on the traffic directly out of my laptop and directly between the end node and google.com. This would be a rather breakneck task since nodes are always randomly selected, but it is indeed theoretically possible to correlate the two connections and link them together. However, you have to remember that Tor nodes don’t serve only us. There’s a lot of network traffic between them and all anyone can see is encrypted messages. It’s hard to tell which packet came from us because it gets lost in the crowd of millions of packets per second. In addition, you never know at what stage a message is. It may be encrypted with only one more layer and may be encrypted with even more than a dozen, if the user wishes so. In practice, tracking down someone using Tor is possible, but breakneckly difficult. All this anonymity is not entirely free, however. Due to the multiple routing of connections to random Tor nodes, the speed of such a connection is much slower and decreases as the number of nodes in the circuit increases.

However, there is also a dark side of onion routing that is hard to pass by. The Tor protocol not only allows for anonymity for ordinary users, but also for servers. Since everyone who appears on the Tor network connects via several intermediary nodes, they remain unknown. A similar mechanism can be used by a server, and just as I connected to google.com in a way that Google doesn’t know who I am, Tor can also make me connect to a server that neither I nor Google corporation will not be able to locate. What’s more, even if I connect to this server, it won’t know who I am. We are talking here about the so-called Dark Web. However, the proper name is different. The service that is available only in Tor network should be called simply “hidden service” or “hidden service”. These hidden services are where illegal content is hosted, illegal goods are traded, and so on. However, the use of hidden services itself is not illegal. Anyone can decide to host their own HTTP server as a hidden service.

The process of connecting to a hidden service is quite complicated and its details are far beyond the scope of this podcast. So let’s limit ourselves to the two most relevant facts about such a connection. The first fact is that in order to use such a hidden service, we need some specific address that we type into the URL bar. In the same way as we always type in addresses with the suffix “com”, “pl” or other, hidden services use addresses with the suffix “onion”. Such addresses are usually different from normal addresses available through DNS, because they are strings of characters, which are derivatives of public keys of hidden services. If you enter an address with the suffix “onion” into the URL bar of a normal browser, you will get an error because the DNS does not know about this top-level domain. Only the Tor Browser will be able to make a valid connection. Such addresses are not published anywhere and are also not available in search engines. The second fact about connecting to a hidden service is that once the circuit is set up, there are at least 6 Tor nodes between us and the server. This is due to the implementation of the protocol. Both the client and the server hide behind at least three nodes, responsible for successive layers of encryption. In the case of a connection to a hidden service, we have the sum of these nodes, so that neither the server nor the client knows anything about the other side.

As you can see onion routing is quite controversial. On the one hand, this technology helps to protect the privacy of Internet users, but on the other hand, it creates a large room for abuse. Every once in a while we hear media reports about the CIA shutting down some site where drugs, weapons or other illegal activities were being traded. Almost every time, this causes a stir and the return of the debate about whether Tor should be outlawed. Everyone is entitled to their own opinion on the subject, but I will quote the words of one of Tor’s creators. In 2017, the aforementioned Roger Dingledine spoke at a conference in Berlin. The theme of the conference was “Will Freedom Survive the Digital Age”. Among other things, Roger talked about Tor’s hidden services, and I’ll paraphrase an excerpt from his talk: “We recently tried to verify what percentage of traffic through Tor is actually related to hidden services. It turned out to be 2-3%, which means about 98% of users use Tor to visit regular sites like Twitter, Google, Facebook and only a few percent visit hidden services. So the next time you see a drawing of an iceberg in a BBC article where they scare you that you only know 4% of the Internet and the other 96% is the Dark web, think about what the purpose of that was.”