An Introduction To The Geography Of The Internet
While we commonly think of the internet as a 24-7 virtual world where time and place don't matter, the internet is comprised of a vast collection of both physical and social infrastructure that is located in very specific places.
This tutorial introduces some concepts and techniques that can be used to understand the "what is where" of the internet.
What Is The Internet?
A digital data network is a set of digital devices with electronic interconnections that allow them to communicate with each other.
The internet is a global network of networks that was originally developed as a US Defense Department project in the late 1960s to connect large mainframe computers. However, it has grown to provide interconnection of, and access to, digital devices like smartphones, cars, trucks, industrial equipment, infrastructure control centers, remote sensors, and even home appliances (the internet of things).
The internet has become fundamentally important to contemporary life and commerce. Practically all communications and entertainment content now passes through the internet, including e-mail, text messages, telephone calls, and streaming audio/video. In addition, a vast array of non-public data such as financial transactions, scientific information, logistics coordination, etc. passes through the internet, invisibly supporting life in the industrialized world.
Identification on the Internet
A domain name identifies a domain of control on the internet.
The top-level domain (TLD) defines major groups of domain names. .com has traditionally been used for commercial entities, while .org is used for non-profits, .edu for educational institutions. There are now a wide collection of TLDs available (such as .tv, .biz, .xxx, etc.), making it difficult to use a TLD to reliabily identify what a domain is used for.
Just to the left of the TLD is the domain, which defines a specific domain within the top-level domain. The TLD and domain used together are a unique identifier.
Within a domain there can be one or more subdomains. These are commonly used to clearly separate services available from a domain, such as mail, the library, web apps, etc.
TLDs can also be Country Code Top-Level Domains that associate domain names with specific countries. In the diagram above, the .bw TLD on gov.bw indicates that this website is controlled by an entity in Botswana, in this case the central government. You can view a list of all Country Code Top-Level Domains at https://icannwiki.org/Country_code_top-level_domain.
Google uses TLDs to distinguish between versions of their search engine targeted at specific countries. For example, google.co.uk focuses on the United Kingdom, while google.co.bw focuses on Botswana.
As with non-geographic TLDs, country code TLDs are not used consistently, and servers are not necessarily located in the country indicated by a TLD.
Domain names are bought and sold. Although the business of registering domain names is handled by wide variety of private companies like GoDaddy and Verisign, the global registration of domain names is coordinated by the Internet Assigned Numbers Authority (IANA), which is a department of the NGO Internet Corporation for Assigned Names and Numbers (ICANN).
When an individual or business registers a domain name, they are required to provide identifying information for a WhoIs database that is accessible to the public. Companies that register domains usually provide web access to WhoIs information so you can determine if a domain you want to register is already owned.
You can get WhoIs data directly from IANA at http://www.iana.org/whois. For example, the listing for gov.bw shows the owner of the domain as an agency of the Botswanan government, located in Gabarone, the capitol of the country:
% IANA WHOIS server % for more information on IANA, visit http://www.iana.org % This query returned 1 object refer: whois.nic.net.bw domain: BW organisation: Botswana Communications Regulatory Authority (BOCRA) address: Plot 206/207 Independence Avenue address: Private Bag 00495 address: Gaborone address: Botswana contact: administrative name: Snr Engineer ccTLD organisation: Botswana Communications Regulatory Authority (BOCRA) address: Plot 206/207 Independence Avenue address: Private Bag 00495 address: Gaborone address: Botswana phone: +267 368 5557 fax-no: +267 395 7976 e-mail: email@example.com contact: technical name: Snr Engineer ccTLD organisation: Botswana Communications Regulatory Authority (BOCRA) address: Plot 206/207 Independence Avenue address: Private Bag 00495 address: Gaborone address: Botswana phone: +267 368 5557 fax-no: +267 395 7976 e-mail: firstname.lastname@example.org nserver: DNS1.NIC.NET.BW 126.96.36.199 2c0f:ff00:1:3:0:0:0:226 nserver: DNS2.NIC.NET.BW 188.8.131.52 2c0f:ff00:1:5:0:0:0:218 nserver: MASTER.BTC.NET.BW 184.108.40.206 2c0f:ff00:0:6:0:0:0:3 2c0f:ff00:0:6:0:0:0:5 nserver: NS-BW.AFRINIC.NET 220.127.116.11 2001:43f8:120:0:0:0:0:72 nserver: PCH.NIC.NET.BW 2001:500:14:6070:ad:0:0:1 18.104.22.168 ds-rdata: 6919 8 2 2fd5bd844725991e9a7708c3b1134b05a3d2ea216d9e239f71caeea35e4cb928 ds-rdata: 6919 8 1 5100f53e64928adc9ef0fbdfca6299dfb7081edf ds-rdata: 18880 8 1 a948aff07700c9f18ad356c5159b64cb65a0c487 ds-rdata: 18880 8 2 56b561d20ee04927d24d8a7591c58a22a42e0a18202b4deed03caa5b66d4dd42 whois: whois.nic.net.bw status: ACTIVE remarks: Registration information: https://registry.nic.net.bw created: 1993-03-19 changed: 2017-07-06 source: IANA
Note that some domain name owners, notably when the domain exists for personal use, pay their domain registrars extra fees to preserve their privacy and list the registrar as the owner in the WhoIs information list. For example, the website for the late guitarist Chris Mello (chrismello.com) lists the domain owner as the registrar rather than the executor of his estate:
% IANA WHOIS server % for more information on IANA, visit http://www.iana.org % This query returned 1 object refer: whois.verisign-grs.com domain: COM organisation: VeriSign Global Registry Services address: 12061 Bluemont Way address: Reston Virginia 20190 address: United States contact: administrative name: Registry Customer Service organisation: VeriSign Global Registry Services address: 12061 Bluemont Way address: Reston Virginia 20190 address: United States phone: +1 703 925-6999 fax-no: +1 703 948 3978 e-mail: email@example.com
Uniform Resource Locators
A domain name can be used to create a Uniform Resource Locators (URL) that uniquely identifies a resource (such as a web page) on the internet. For example:
- Scheme: Specifies the web protocol used to access the resource. Almost always http:// (hypertext transfer protocol) or https:// (secure HTTP)
- Domain Name: The domain where the resource is located
- Subdomain: The server or collection of servers in the domain where the resource is located
- Path: Where the resource is located on the server(s)
Internet Protocol (IP) Addresses
While humans identify resources on the internet using domain names and URLs, the internet itself identifies servers using numeric internet protocol (IP) addresses.
These addresses are commonly written as four digits (each representing one 8-bit byte) separated by periods. For example, this is the IP address of the web server for michaelminn.com:
Domain names are converted to IP addresses using the Domain Name Service (DNS), which itself involves a complex hierarchy of nameservers that keep track of what IP address is associated with which domain.
Large web sites, such as Google, have multiple servers around the world that handle requests, and which one you are actually accessing may vary based on your location in the world.
You can identify the IP address associated with a domain name using the ping command, which is often used by technicians to see if a computer is successfully connected to the internet or to see if a web server is running.
On a Windows machine, open the cmd terminal and type: ping <domain_name>. Press Control-C to stop the ping listing:
On a Macintosh computer, open the terminal app and type: <domain_name>. Press Control-C to stop the pin listing:
In both examples above, we verify that the server for the michaelminn.com domain has an IP address of 22.214.171.124.
IP Address Blocks
Blocks of IP addresses are assigned to ISPs. You can use a IP WhoIs tool like https://www.ultratools.com/tools/ipWhoisLookup to find the ISP associated with an IP address.
For example, searching the IP address of 126.96.36.199 for michaelminn.com, we get this listing indicating the IP address is in a block of addresses allocated to the web hosting provider DreamHost.com, which has offices in Brea, CA, and which controls a block of IP addresses from 188.8.131.52 to 184.108.40.206
Source: whois.arin.net IP Address: 220.127.116.11 Name: DREAMHOST-BLK10 Handle: NET-173-236-128-0-1 Registration Date: 3/30/10 Range: 18.104.22.168-22.214.171.124 Org: New Dream Network, LLC Org Handle: NDN Address: 417 Associated Rd. PMB #257 City: Brea State/Province: CA Postal Code: 92821 Country: UNITED STATES
Note that this does not necessarily indicate where the server is located. ISPs often have multiple data centers, and the registration information is a contact address which will usually be a corporate headquarters rather than the location of the server farm(s).
Connecting Clients and Servers On The Internet
The Client-Server Model
The internet is largely based on the client-server model where clients communicate through the internet to servers, which then either serve information (such as web pages or streaming video), or serve as an intermediary between clients (such as two cellphones).
- Clients are devices like desktop computers, laptop computers and mobile phones
- Servers are large computers or groups of computers that respond to clients
While a server can be just an ordinary desktop computer sitting under a desk and running a server operating system like Linux or Windows Server, servers are often centralized in vast collections called server farms. The buildings housing server farms are often football-field sized warehouses that are tightly secured, have backup power systems, and are staffed by small armies of maintenance technicians that assure reliable service.
Clients and servers connect to the the internet through Internet Service Providers (ISPs).
ISPs have their own networks, and data that moves on those networks is directed to appropriate clients and servers using routers that keep track of where specific IP addresses are located on a network.
When a connection is being made between a client and server that are not on the same network, routers also connect local ISP networks to high-speed, high-volume backbone networks that transfer data between local networks.
For example, this is the basic process of how a client (like a desktop computer or smartphone) gets a web page from a web server:
- A physical device is connected to an ISP through a modem, an access point when on a Wi-Fi network, or through a cell tower when using a smartphone and the cellular telephone network
- A person on that client types a URL or clicks on a link in a web browser
- The domain name is separated from URL
- Message sent to a domain name server which returns an IP address to the client (All the Internet knows is numeric IP addresses)
- Client sends request to IP address via a router
- Routers pass to higher level routers until a router is found that knows the network that contains the IP address
- Request is routed to server
- The server responds to the request
- Response packet(s) make their way back to the client through the routers
- The packets of information are received by client, reassembled and provided to the client's application (such as a web browser) for use
- Repeat as necessary
Examining Network Paths Using Traceroute
The connection between a client and a server through the maze of internet routers can be examined using the traceroute utility.
In the cmd terminal on a Windoze PC (see ping above) you can type tracert <domain_name>
In the terminal app on a Mac (see ping above) you can type traceroute <domain_name>
On mobile devices, there are apps like inettools, iptools and traceroute that allow you to see similar listings.
The following is an example of a traceroute listing to michaelminn.com through a home internet connection:
traceroute to michaelminn.com (126.96.36.199), 30 hops max, 60 byte packets 1 FIOS_Quantum_Gateway.fios-router.home (192.168.1.1) 0.413 ms 0.512 ms 0.622 ms 2 lo0-100.NYCMNY-VFTTP-402.verizon-gni.net (188.8.131.52) 8.487 ms 8.531 ms 8.583 ms 3 B3402.NYCMNY-LCR-21.verizon-gni.net (184.108.40.206) 11.419 ms 11.475 ms 4 B3402.NYCMNY-LCR-22.verizon-gni.net (220.127.116.11) 13.603 ms 5 0.et-10-3-0.BR2.NYC4.ALTER.NET (18.104.22.168) 22.855 ms 22.862 ms 22.915 ms 6 22.214.171.124 (126.96.36.199) 15.840 ms 9.046 ms 8.963 ms 7 ae16.cs2.lga5.us.zip.zayo.com (188.8.131.52) 24.351 ms 21.768 ms 21.744 ms 8 ae4.cs2.dca2.us.eth.zayo.com (184.108.40.206) 18.532 ms 15.967 ms 18.889 ms 9 ae27.cr2.dca2.us.zip.zayo.com (220.127.116.11) 16.719 ms 15.530 ms 16.601 ms 10 ae15.er5.iad10.us.zip.zayo.com (18.104.22.168) 18.737 ms 15.222 ms 15.195 ms 11 22.214.171.124.t00867-03.above.net (126.96.36.199) 19.714 ms 17.447 ms 17.441 ms 12 ip-208-113-156-4.dreamhost.com (188.8.131.52) 21.032 ms 21.707 ms 20.965 ms 13 ip-208-113-156-73.dreamhost.com (184.108.40.206) 19.445 ms 14 apache2-hok.halfback.dreamhost.com (220.127.116.11) 19.462 ms 20.941 ms
Walking through the steps:
- Step 1: The FIOS Gateway is the home router
- Steps 2-4: Verizon is the home ISP
- Steps 5-11: alter.net, zayo.com, and above.net are additional ISP backbone networks and the ISP for the hosting provider
- Steps 12-13: dreamhost.com is the network for the hosting provider
- Step 14: apache2-hok.halfback.dreamhost.com is the server where michaelminn.com is hosted
The path to international servers can be more complex and interesting. For example, this is the traceroute to gov.bw, the Botswanan government's website:
1 FIOS_Quantum_Gateway.fios-router.home (192.168.1.1) 0.378 ms 0.500 ms 0.617 ms 2 lo0-100.NYCMNY-VFTTP-402.verizon-gni.net (18.104.22.168) 9.465 ms 9.488 ms 10.193 ms 3 B3402.NYCMNY-LCR-21.verizon-gni.net (22.214.171.124) 14.006 ms 14.058 ms 14.116 ms 4 * * * 5 0.et-5-1-0.BR2.NYC4.ALTER.NET (126.96.36.199) 14.105 ms 0.et-10-3-0.BR2.NYC4.ALTER.NET (188.8.131.52) 14.193 ms 16.140 ms 6 184.108.40.206 (220.127.116.11) 18.252 ms 11.344 ms 12.095 ms 7 be2057.ccr42.jfk02.atlas.cogentco.com (18.104.22.168) 12.788 ms be2056.ccr41.jfk02.atlas.cogentco.com (22.214.171.124) 11.112 ms be2057.ccr42.jfk02.atlas.cogentco.com (126.96.36.199) 11.101 ms 8 be2490.ccr42.lon13.atlas.cogentco.com (188.8.131.52) 85.622 ms 76.448 ms 78.687 ms 9 be2871.ccr21.lon01.atlas.cogentco.com (184.108.40.206) 78.701 ms be2870.ccr22.lon01.atlas.cogentco.com (220.127.116.11) 83.813 ms be2868.ccr21.lon01.atlas.cogentco.com (18.104.22.168) 78.573 ms 10 te0-0-2-2.rcr11.b015592-1.lon01.atlas.cogentco.com (22.214.171.124) 82.942 ms 83.244 ms 83.228 ms 11 126.96.36.199 (188.8.131.52) 82.401 ms 81.693 ms 82.069 ms 12 184.108.40.206 (220.127.116.11) 264.450 ms 267.330 ms 262.130 ms 13 18.104.22.168 (22.214.171.124) 262.187 ms 262.082 ms 260.020 ms 14 gbe-msu1-pr2-lnk2custr5.btc.net.bw (126.96.36.199) 259.781 ms 261.915 ms 267.307 ms 15 gbe-dit.btc.net.bw (188.8.131.52) 261.807 ms 261.725 ms 264.190 ms
Steps 1-3 are the home router and local ISP (Verizon) as in the previous example.
Steps 5 - 10 are backbone ISPs alter.net and cogentco.com
With step 11, traceroute does not have information on that IP address to display. Performing an IP Whois on that IP address (184.108.40.206) we see this is an undersea cable connection to Africa:
inetnum: 220.127.116.11 - 18.104.22.168 netname: UnderSeaCables descr: Point to Point Links to London through Undersea Cables Eassy and WACS country: BW admin-c: MK44-AFRINIC tech-c: TM25-AFRINIC status: ASSIGNED PA mnt-by: BOFINET-MNT source: AFRINIC # Filtered parent: 22.214.171.124 - 126.96.36.199 person: Mpho KOOLESE address: Gaborone address: BW phone: +267 392 3856 nic-hdl: MK44-AFRINIC mnt-by: GENERATED-LWKEYV7AP6LKXDKOYBRBKA7LAPGJDCX9-MNT source: AFRINIC # Filtered
Then for the final steps we see routers in the .bw TLD, meaning these are probably an ISP in Botswana.
The Digital Divide
The level of access to the internet is not uniform around the world. The internet is largely comprised of infrastructure owned by corporations, and, as such, this favors wealthy urban areas where revenues from customers justifies the significant investment needed to build the infrastructure.
A variety of metrics can be used to assess the level of internet access in countries. The International Telecommunications Union (ITU) collects data from sources of varying reliability to estimate the percent of individuals that use the internet within countries. As might be expected, wealthy western countries tend to have higher levels of internet use.
An example comparison by numbers:
- The leader in internet use is Iceland, with 98% of the population having used the internet in the last three months
- The United States is surprisingly far back in the pack at 74%
- The developing country of Botswana has only 28%
Access to broadband (high-speed) internet is also a way to assess the internet capabilities of a country. The ITU also publishes estimates of Fixed broadband subscriptions (per 100 people), and the patterns are similar to internet use.
Comparing GDP per capita (a measure of wealth) to broadband subscriptions on an X/Y scatter chart, the relationship between wealth and internet access is clear.
An example comparison by numbers:
- The leader in broadband is Gibralter, with 53 subscriptions per 100 people
- The United States is again in the middle of the pack at 31 subscriptions per 100 people
- Botswana is at an early stage of deployment, with only 1.8 subscriptions per 100 people
Knowledge is power. The internet has thrived on free communication. Accordingly, authoritarian regimes consistently attempt to censor internet content and monitor internet use in order to detect and suppress opposition to their rule.
The non-governmental organization Freedom House works to defend human rights and promote democratic change, with a focus on political rights and civil liberties. Freedom House regularly publishes a wide variety of research on freedom in various facets of life in countries around the world.
Freedom House's 2016 Freedom on the Net report documented a continuing global decline in internet freedom, with 2/3 of all internet users living in countries where criticism of the government is subject to censorship, and 1/4 of internet users living in places where people have been arrested for sharing content on Facebook.
The Freedom on the Net report assigns scores on a scale of 0 (good) to 100 (bad) to countries, and also groups them into categories of free, partly free, or not free.
For example, comparing by numbers:
- United States: 18 = free
- Zimbabwe: 56 = partly free
- China: 88 = not free
The web page for the report linked above contains links to reports describing conditions in individual countries.
The Digital Dark Ages
In the developed world, we capture and store almost everything that can be stored: security video, electronic communications, smartphone photos of events momentous and trivial.
Almost none of that data will survive us.
The internet is a communications medium, not a permanent storage medium. Although storage becomes cheaper every year, technology changes every year. Data must be migrated from old storage media and file formats, or it is lost to physical degradation or technological obsolescence.
Data in The Cloud never has a permanent physical home. The Cloud is a performance and requires constant flows of capital and resources to stay in operation. Changes in the economics of The Cloud will necessitate loss of some of that data. Which data will be lost to time?
Contrast the impermanence of the digital with papyrus text from 2500 BC or clay tablets from as far back as 3300 BC.
While security camera video from an ATM where there has been no criminal activity may not be something that should outlive us, your grandchildren may want to see some of those thousands of baby pictures that you took of your son in the first year of his life. You should plan accordingly.