Suresh Kumar Pakalapati's Linux Administration: The domain name to IP resolution process

Tuesday, July 12, 2011

The domain name to IP resolution process

ROOT NAME SERVERS
The process involved in the resolution of a Domain Name to IP is a very complex one when comparing to the time it takes to do that. We type name of the website into the browser, gets converted to IP and there we have it ! All of it just happens in a fraction of a second. This document is intended for all those who would like to know the processes happening in this split time. The explanation of this process would be incomplete if I do not brief you about ROOT NAME SERVERS which are at the top of the DNS tree.

The Operators

Root servers are operated by twelve organisations often referred to as the “root server operators”. They are

A – VeriSign Global Registry Services

B – Information Sciences Institute

C – Cogent Communications

D – University of Maryland

E – NASA Ames Research Center

F – Internet Systems Consortium, Inc.

G – U.S. DOD Network Information Center

H – U.S. Army Research Lab

I – Autonomica/NORDUnet

J – VeriSign Global Registry Services

K – RIPE NCC

L – ICANN

M – WIDE Project

The letters A-M represent the 13 numeric IPv4 addresses at which the service is provided. These 13 IPs correspond to servers located at about 130 physical location around the globe in about 53 countries. These servers contain a file called ‘root.zone’ which makes them special. The root zone file lists the names and numeric IP addresses of the authoritative DNS servers for all top-level domains (TLDs) such as ORG, COM, NL and AU. There are currently 13 root-servers ( to be more specific, 13 IPs . The number of individual machines mapped with these IPs are more than 150 !)worldwide, described in further detail later in this chapter. The IPs of root-servers are known to every name server in the world using a special zone file, which is distributed with all DNS software.

Below is the general implementation of the DNS and position of the root servers :

The black spots denote authoritative NS (name servers) for the domains beside them.

The Purpose

Each domain name in the Internet name space will have at the least 1 authoritative NS associated with it (the accepted convention is to have at least 2). This name server will be the one which helps to resolve this domain and the and the domains under it ( sub-domains, add-on and parked ) to an IP. It does it by having a “Zone File” of the particular domain. In the Zone File will be an entry called the A Record which maps the domain name to an IP ( a much more detailed explanation on this resolution process will be dealt with in PART II of this document ) . So when we type in a domain into the web w browser, it will be resolved to an IP only if the query from our machine reaches the authoritative name server. But how does this query reach the correct name server ? This is where the Root Servers come into play. But as I mentioned, these servers are not contacted every time IP for a domain is queried. There are some mechanisms called “Caching” which helps to store the NS of a domain once it is queried for the first time.

You may find the root servers locations from this site :

http://public-root.com/root-server-locations.htm

It is a good one!

DOMAIN NAMES

Before I go on to the query process, I would like to present a small briefing about Domain Names :

The Domain Name System uses a tree (or hierarchical) name structure. At the top of the tree is the root node followed by the Top-Level Domains (TLDs), then the Second-Level Domains (SLD) and any number of lower levels, each separated with a dot. TLDs are split into two types:

1. Generic Top-Level Domains (gTLD): For example, .com, .edu, .net, .org, .mil, etc.

2. Country Code Top-Level Domains (ccTLD): For example, .us, .ca, .tv, .uk, etc.

For instance abc.com, is actually a combination of an SLD name and a TLD name and is written from left to right with the lowest level in the hierarchy on the left and the highest level on the right:

sld.tld

The term Second-Level Domain is technically precise in that it defines nodes at the second level within the domain name hierarchy. There are also Third-Level Domains, which are especially relevant with ccTLDS. Any name just which is to exist in the Internet’s name space just to the left of the gTLD or ccTLD ( i.e. SLD ) should be brought from an Accredited Registrar or simply, Registrar.

So what is www.abc.com?

From our reading previously, we can see that if we assume a domain name www.abc.com, is built up from www and abc.com. The domain name abc.com part was delegated from a gTLD registrar, which in turn was delegated from ICANN. The owner of the domain chose the www part since they are now the delegated authority for the abc.com domain name. They own everything to the left of the delegated domain name, in this case abc.com *. The leftmost part, the www in this case, is called a host name. Keep in mind that only by convention do web sites use the host name www (for World Wide Web), but a web site can be named efg.abc.com—few may think of typing this into their web browser, but that does not invalidate the name! Every computer that is connected to the Internet or an internal network and is accessed using a name server has a host name.

Consider some examples:

www.abc.com a company web server

ftp.abc.com a company file transfer protocol server

pc17.abc.com a normal PC

A host name must be unique within the delegated domain name, but can be anything the owner of abc.com wants. Say, he can make ftp.abc.com as the web server. There is no protocol or convention to name web servers starting with www. It is usually given for better understanding that the name starting with www will be the web server and that starting with ftp will be the ftp server.

One more thing to note is that q1.abc.com & q2.abc.com need not be 2 different hosts (separate machines). It can also be 2 different sub-domains of abc.com on the same host!

To summarize: the owner can delegate, in any way they want, anything to the left of the domain name they own (or were delegated). The delegated owner is also responsible for administering this delegation.

* For .name TLD, the registry handling it, Verisign Inc., places some restrictions regarding the registration of third level domain names and it’s ownership. You may find further details on it at :

http://www.verisign.com/domain-name-services/domain-information-center/name-faq/index.html

THE PROCESS

When we type in a domain name to the browser the ‘Resolver’ program (the client-side of the DNS is called a DNS Resolver) is responsible for initiating and sequencing the queries that ultimately lead to a full resolution (translation) of the resource sought, i.e. translation of a domain name into an IP address) it will consult a NS for the IP. For the resolver to consult, it’s IP(NS’s IP) will be provided in /etc/resolv.conf file in linux machines.

Consider that we want to resolve www.abc.com. Let it be the web server of the domain abc.com (a separate host machine) and a website hosted on it. Also consider that it has ftp.abc.com as its ftp server and many others. When the resolver query reaches our NS there are 3 possibilities to consider :

(i) the information about the particular domain is already cached

Consider that another machine using the same NS as ours had queried for www.abc.com and resolved successfully to an IP. Since ours is caching NS it will cache (store) the IP of both the domain & NS which contained the ‘A’ record for www.abc.com (this is called the Authoritative NS for the domain abc.com). So next time when any other machine queries this NS for www.abc.com , it will directly take the IP from the cache and display the website. So caching helps to reduce the load on other DNS servers to high extent since DNS queries do not go beyond the caching NS.

(ii) no ‘A’ record information in the cache

If the caching server does not find the answer to a query in its cache, it has to find another DNS server that does have the answer. In our example it will look for a server that has answers for all names that end in ‘abc.com’. In DNS terminology such a server is said to be “Authoritative” for the “domain” ‘abc.com’(as I have mentioned earlier).

In many cases our caching server already knows the address of the authoritative server for ‘abc.com’. If someone using the same caching server has recently surfed to ‘ftp.abc.com’, the caching server needed to find the authoritative server for ‘abc.com’ at that time and, being a caching server, naturally it cached the address of the Authoritative server. So it will directly contact this NS and get the A record (IP) for ww.abc.com

(iii) the NS cache is completely empty

This is the situation when the NS has just been set up and the cache is completely empty, Consequently it neither knows the answer to your query nor does it know where the authoritative servers for ‘abc.com’ are. However it does know that it is possible to ask questions for ‘abc.com’ to an authoritative server for ‘com’. As per the DNS protocol : “In case authoritative servers for a name are not known, strip off the leftmost part of the name including the first dot and send the original query to an authoritative server for that name”.

One main point to note: In our example an authoritative server for ‘com’ does not know the answer to a query about ‘www.abc.com’, because the ‘abc.com’ servers hold that information, but it does know which servers are authoritative for ‘abc.com’ queries. So instead of an answer to the query, the ‘com’ server will answer with the list of authoritative servers for ‘abc.com’, a referral in DNS terminology. Then the authoritative servers for ‘abc.com’ will give the IP for ‘www.abc.com’ or ‘ftp.abc.com’. In addition, being a caching server, it will cache both the answer and the list of authoritative servers for ‘abc.com’ for further use.

But hold on, we assumed the cache was empty in the first place, so how does our caching server know where the authoritative servers for ‘com’ are? In other words what happens once we have stripped off all parts of a domain name and still do not know where to go for an answer?

For this case there is a special set of authoritative servers, the DNS root servers or simply ‘Root Servers’. They know the addresses of all authoritative servers for names that do not have a dot in them, the Top Level Domains (TLDs) such as ‘org’, ‘com’, ‘ch’, ‘uk’.

Root servers are the only DNS servers that have to be found without any other information being cached. To solve this all servers in the Internet’s name space acting as a NS will have a pre-configured list of numeric addresses for all root servers. This list is embedded with the NS software (BIND etc.). When starting up, a caching server will send queries for the current list of root servers to each of these addresses in turn until it obtains an answer. Once it has obtained the current list, it knows where to send queries for names without dots.

So here is what happens:

When a caching server that just started receives a query for the address of ‘www.abc.com’. After it started, the server obtained a list of root servers and their addresses. When the query arrives it will not find the answer for ‘www.abc.com’ in the cache, nor will it find the address of an authoritative server for ‘abc.com’, neither the address of an authoritative server for ‘com’. Having no other choice it will then ask a root server for the address of ‘www.abc.com’. The root server are authoritative for TLDs i.e. they have the answers for the list of Authoritative NS of the TLDs. So when our query for ‘www.abc.com’ reaches the root servers it will strip off the part for which it is not authoritative. So ‘www.abc’ will be stripped off. The remaining part of the name is ‘.com’ and it is authoritative for that. So it will answer with a referral containing the list of all authoritative servers for ‘.com’ TLD. This list of NS for ‘.com’ domain will have the list of NS for all the SLDs under ‘.com’. Our caching server will then send its query for ‘www.abc.com’(please note: always it sends a FQDN) to one of them and they will strip off ‘www’ and we will get another referral with the list of all authoritative servers for ‘abc.com’. When sending the query to one of them it will get the answer (IP of www.abc.com). All this typically happens in less than a second.

From here on the caching server can answer the same query again and again from the cache without asking another server. It can also send any query for ‘ftp.abc.com’ or ‘something.abc.com’ directly to an ‘abc.com’ server and send any question for another name ending in ‘.com’ directly to a servers authoritative for ‘.com’. Only when the next query ends in something different from ‘.com’ does it have to ask a root server again. Quickly the cache will contain lists of authoritative servers for all popular domains, especially for all popular TLDs; usually our caching server will not have to query for this information again for several days. This design ensures that only a tiny fraction of all queries will have to be processed by the root servers or by authoritative servers for TLDs.

Below is a pictorial representation of the above :

So this is the process regarding the name to IP conversion. I hope you have gained a basic understanding.

Note: Please note that when a query goes to any NS including the root servers, the FQDN-Fully Qualified Domain Name is sent, i.e we query the root servers for the Authoritative NS for ‘com’ TLD. For that, the resolver does not particularly send ‘com’ in it’s query. It sends the complete domain name for which it needs the IP. www.abc.com is a FQDN but abc.com is not. A FQDN is the complete name containing the hostname, domain name and TLD. It is then the duty of the particular NS to strip off the part of the domain name for which it is not authoritative and then provide the answer to the query for the part for which it is authoritative.

As I have mentioned in the above paragraphs, all the NS in the Internet will be aware of the IP of root servers. This information is provided in a file which comes along with the name server software package. The file is named.root or named.ca (it varies). This file is called the Root Hint file. It holds the name of the root servers and the corresponding IP at which it should be contacted. This file is usually located in /var/named as named.ca or named.root (if the NS package is BIND). I have attached below a screen shot of a part of the file.

Here you may see the name of the root server to the left hand side and its IP to it’s right. The list I have put here has the severs up to D only. This list continues till M ( M.ROOT-SERVERS.NET.) .

The IP of these root servers do not change frequently but it does, once in while. Thus no one cares to update this file. So it is advisable to update this file for anyone running a busy NS. You can easily fetch this file with a dig utility:

dig @a.root-servers.net . ns > root.hints

dig @a.root-servers.net . ns > named.ca

You can easily set-up a crontab entry to perform file update once in month.

What makes Root Servers so special than other servers?

The key file that makes root servers so special is ‘root.zone’. This is contained in all the root severs from A – M. You can download this file too and view it ! Just note the screen shot in the above page. Two ftp servers are mentioned in the top : FTP.INTERNIC.NET & RS.INTERNIC.NET . Just do an ‘anonymous’ ftp to the above server and get the file by navigating to the specified directory. I have attached a screen shot of a part of it below.

In the two screen shots, we can observe the name of the Authoritative NS for the gTLD ‘.com’ and ccTLD ‘.in’ . These Authoritative NS for ‘.com’ and ‘.in’ will have the IP of Authoritative NS for the domains ( second level or third level ) under it. Similarly all the existing ccTLDs and gTLDs have an entry for their NS in this file.

Now you might wonder: Only the names of the authoritative servers are mentioned here and where to get the IP of these? You need not worry. The IPs are mentioned in the same file after listing NS for all the TLDs. To make things clear I have put a screen print below:

The above is an entry in the same file ( root.zone). A.GTLD-SERVERS.NET. is an Authoritative NS for .com TLD. It’s IPv4 and Ipv6 addresses are mentioned. Similarly, there will be an IP entry for all the all the authoritative name servers for all the TLDs.

Please Note: I recommend the reader to compare the above informations with PART III of this article. It will help in clear understanding.

Okay. Then who updates this root.zone file?

In 2004, ICANN took over responsibility for the maintenance of the root- servers TLD master file—the file that lists the authoritative servers for each TLD. Distribution of this file to each of the operational root-servers is carried out using secure transactions. To further increase the security, the server providing the root updates is only accessible from the operational root-servers. It is not a publicly visible server.