How does the internet work?
As developers we all should have a more or less clear idea of how the internet works, so here is my attempt to explore this topic and try to explain it in a simple yet accurate way.
The internet originated as a 1960's experiment of the US defense department to design a communication system that might survive a nuclear attack. The original idea was to have node based system that exchanged information in the form of packages or “packets”, and these packets were sent in all possible directions, so that every node had all the information available in the whole system.
This multiplicity of connections or ways for information to get to any node, and the fact that every node can access information from anywhere in the network make up some of the most important principles of the internet, which are redundancy and fault tolerance. It’s also a design that allows easy scaling since new infrastructure can be built and connected without the need of interrupting communication at any time.
The internet is an incredible large number of independently operated interconnected networks. This system is fully distributed, in the sense that there’s no central control that decides where packets are routed or how networks are built. These are all business decisions being made by operators. Operators are motivated to ensure full connectivity of each network to all the other networks, because the end of the internet is for each device and network to be able to connect with any other device and network around the world.
Devices are connected to internet service providers (ISP), which are at the same time connected to other networks around the world, to which billions of devices around the world are connected.
The internet network exchanges binary information in the form of bits, each bit representing a boolean pair (on/off, 1/0). Bits can be though like the “atoms” of information. This bits can be sent from a source to a requesting device in different ways. One way is to use copper cables that send electricity pulses, but the bad thing about these is that signal tends to be lost when transported through large distances. This is where fiber optic cables come in. Fiber optic cables are built with a type of glass that reflects light, and the cool thing about them is that light travels a lot faster than electricity and signal isn’t lost in long distances, which make them a good option to transport data through huge distances, like across an ocean for example. The other way is to use wireless connections that send binary information translated to radio waves, these radio waves are then retranslated back to binary by the requesting device. Wireless is of course really comfortable but the bad thing is that signals can’t travel long distances, so normally bits are transferred through routers, which are connected to a wired network and are responsible of connecting the requesting device with the rest of the network. Routers are special machines on the internet that act as packet managers and are responsible for the smooth traffic of packets through the network. Routers keep track of all the different available connections for packets to travel, and select the cheapest connection for each packet to go through. Having multiple options or connections for packets to travel is what makes the internet fault tolerant and reliable. There are other methods being explored at the moment like laser connections from satellites, or radio connections from balloons or drones, but today’s internet still relies heavily on wire connections.
Two important concepts when talking about networks are bandwidth and latency. Bandwidth is the way we measure the amount of bits we can send through a connection, usually measured in seconds. Latency is the way we measure how much it takes one bit to travel through one end of a connection to the other.
This gigantic network works smoothly thanks thanks to a design and architecture philosophy expressed in a set of protocols, which are a well known set of rules and standards used for machines to communicate with one another. These protocols are: IP, TCP, HTTP and DNS
One of the most important protocols is the internet protocol, or IP. All devices connected to the internet have a unique address that identify that device within the network. This address is called IP address. Each time a computer sends information to another, it sends it to the IP address of the requesting machine and also sends its own IP address so that the requesting machine can know where to send its response.
When the amount of information to be sent is big, the information is divided into pieces or “packets”. Packets don’t necessarily travel through the same “roads” or network connections, and might even arrive at the destination at different times or out of order. The requesting machine is responsible for organizing the received packets to form the original piece of information that was sent.
So, imagine you make a request to a website, and for some reason that website responds you only what half the packets corresponding to the information you requested. How does your computer know if the information you received is complete or not? This is where another important protocol comes in, the transfer control protocol or TCP. Following this protocol, the requesting machine will inform the serving machine of the packets that have been received. If all the packets corresponding the requested piece of information were correctly received, the requesting machine acknowledges the delivery. If not, the serving machine resends the missing packets. This ensures that information is sent complete and allows each machine to know when information isn’t sent/received correctly.
Another important protocol that dictates how computers communicate with each other is HTTP, or Hypertext Transfer Protocol. This protocol is used by each machine to indicate what kind of information it’s sending and what kind of response does it expect. HTTP establishes different types of requests and responses, and on every communication, the machine has to declare to what type of HTTP request/response does it information correspond.
Another important protocol is the domain name system, or DNS. The DNS associates names, like www.example.com, with IP addresses. So when we look for a website, we make a request to a DNS server that is in charge of identifying the IP address linked to that domain name. That IP address is used by our computer to request information (the website) to the corresponding machine. To balance the load of request these servers receive, DNS servers have a distributed hierarchy. Servers are splitted into zones and have responsibility over one of the most common domain names (.com, .net, .org, etc.)
We could say that the internet is a system composed of different actors:
- Clients (any machine that requires information from the network)
- Servers (machines that have a special connection to the network and provide information through it)
- ISP or internet service providers (companies that maintain and provide network connection to clients and servers)
- Routers (special machines that act as packet managers and dictate how packet travel through the network)
Clients and servers exchange binary information through copper or fiber optic cables, or through wireless connections that are maintained by ISPs and managed by routers.
The system works smoothly thanks to the set of protocols used by every machine connected to the network, which are IP, TCP, HTTP and DNS. These protocols are the backbone of the internet design, which ensures its main principles: redundancy, fault tolerance and scalability.