The Application Layer

Overview

The first four layers we discussed are really the 'core' of the Internet. However, when most people think of the Internet, ARP and TCP congestion control aren't the first things that come to mind. Instead, they usually think of things like browsing the Web, sending emails, instant messaging, and sharing files. This is because the layered architecture of the Internet is doing its job: people only think about the top layer and don't worry about what goes on below. However, now that you understand how the infrastructure of the Internet works, you're in a much better place to understand the essence of some of its most popular applications.

The Hypertext Transfer Protocol

The World Wide Web is perhaps the most popular Internet application today. However, it is remarkably simple. A web page is just a document written the Hypertext Markup Language (HTML). People view the page using a program called a Web Browser, which interprets the HTML document to create a layout of text and images. The 'hyper' part of the name comes from the fact that HTML documents can refer to other documents using Links, or references to other files on the Web.

In order to retrieve these HTML documents and the files they refer to, a web browser uses the Hypertext Transfer Protocol (HTTP). HTTP is a simple protocol for communication between a web browser and a Web Server, a program on a remote computer designed to distribute web pages. The browser opens a TCP connection to the server, and then transmits the word "GET" followed by the name of the file it wants. If the file exists, the web server sends the bytes of the file back through the TCP pipe to the receiver. In this way, a browser can first load the main HTML document, then the images and other files it references, and finally display the page.

The Domain Name System

When a browser or another program wants to connect to another computer on the Internet, it usually doesn't know the computer's IP address from the start. This largely has to do with the fact that programs rely on people to tell them where to connect. People have trouble remembering numbers, and would much prefer to remember Domain Names like www.google.com. So how does www.google.com get translated into the IP address of Google's web server? This is accomplished through the Domain Name System (DNS). DNS is a network of computers, managed by ICANN, that keep track of which IP addresses correspond to different names. All a computer needs to remember is the IP address of a DNS server, and it can then ask that server for the IP address corresponding to any domain name.

Peer to Peer Protocols

An important class of Internet applications that have received much attention in recent years is made up of programs known as Peer to Peer (P2P) applications. When one computer wants to send a large file to another computer over the Internet, it can just open a TCP connection and send the file. However, if a computer needs to send a file to a thousand computers on the Internet, that computer quickly becomes overloaded. Traditionally, companies that needed to distribute a lot of data would purchase a large number of servers and a high-capacity connection to the Internet. However, in recent years researchers and companies realized that they could significantly lessen the load on the distributor by having the receivers send data to each other.

In the popular BitTorrent Protocol, files are broken into chunks that the computers downloading the file can share among themselves. If 100 computers are downloading a file with 100 chunks from a single source, the source can send one chunk to each computer and task each of them with distributing the chunk to the other 99 downloaders. In this way, the distribution of large files becomes possible without an massive server farm or expensive Internet connection.