Wednesday, September 14, 2011

Tracing your way - ping, traceroute & mtr


“..The means can be likened to a seed, the end to a tree, and there is just the same inviolable connection between the means and the end as there is between the seed and the tree. They say: “Means are, after all, just means.” I would say: “Means are, after all, everything.” As the means, so the end………If we take care of the means, we are bound to reach the end sooner or later…”
- Mahatma Gandhi
There is a lot of information scattered across the Internet about the different utilities used for network troubleshooting. Yet, few articles detail the background processes taking place while using these utilities. To get a complete and precise idea on how tracing and troubleshooting work in a network, this article analyzes the internal working of 3 network utilities and checks what makes each of them unique.

Before starting with these utilities, let us have a look at one of the most important protocol used by them - ICMP.

ICMP - Internet Control Message Protocol

Internet Control Message Protocol (ICMP) is a part of the RFC 792 defined Internet protocol suite. ICMP is used for sending error messages like “destination could not be reached”, “time to live exceeded” etc. An ICMP packet will have an IP header and the ICMP message data. The first 32 bits in the ICMP message data contains the type of the ICMP packet, the code and the checksum. The Data field contains the payload. It has variable size depending on the type of the ICMP message.
The Type of the ICMP packet indicates its function like Destination Unreachable, Time Exceeded, Echo etc. The Code is a subtype which indicates further details related to the parent Type like Net Unreachable, Host Unreachable etc. The commonly seen Types and Codes are mentioned below:
TypeNameCode
0Echo Reply0 No codes
3Destination Unreachable0 Net Unreachable
1 Host Unreachable
2 Protocol Unreachable
3 Port Unreachable
4 Fragmentation Needed and Don’t Fragment was Set
5 Source Route Failed
6 Destination Network Unknown
7 Destination Host Unknown
8 Source Host Isolated
9 Communication with Destination Network is Administratively Prohibited
10 Communication with Destination Host is Administratively Prohibited
11 Destination Network Unreachable for Type of Service
12 Destination Host Unreachable for Type of Service
13 Communication Administratively Prohibited
4Source Quench0 No Code
5Redirect0 Redirect Datagram for the Network (or subnet)
1 Redirect Datagram for the Host 2 Redirect Datagram for the Type of Service and Network
3 Redirect Datagram for the Type of Service and Host
8Echo0 No Code
11Time Exceeded0 Time to Live exceeded in Transit
1 Fragment Reassembly Time Exceeded
The field “Checksum” in the ICMP message data is used to verify the integrity of the incoming ICMP packet by the receiving host. The Checksum is the 16-bit one’s complement of the one’s complement sum. The complete ICMP message (starting from the Type field to the end of the data field) is considered to calculate this value.
The utilities being discussed below use different ICMP messages for communication. Let us look into each of them in detail.

ping

The name “ping” was named after the sound of the sonar used to locate objects. It is the basic connectivity testing tool between 2 machines running TCP/IP.
ICMP packets with a Type 8 Code 0 echo requests are send out by the ping utility. Every packet’s sequence number will be increased by 1 but each of them will have the same identifier. If a connection is established with the other host, an ICMP Type 0 Code 0 echo reply packet having the same identifier will be received. A judgment on whether the connection is reliable or not can be made by checking if all the packets are received back in sequence. The fields in an ICMP packet are shown below:
The following is a ping session which I did to google.com from my console.
chacko@server:~$ping google.com
PING google.com (63.233.167.99): 56 data bytes
64 bytes from 63.233.167.99: icmp_seq=0 ttl=247 time=56.9 ms
64 bytes from 63.233.167.99: icmp_seq=1 ttl=247 time=57.2 ms
64 bytes from 63.233.167.99: icmp_seq=2 ttl=247 time=57.0 ms
64 bytes from 63.233.167.99: icmp_seq=3 ttl=247 time=56.8 ms
64 bytes from 63.233.167.99: icmp_seq=4 ttl=247 time=57.0 ms
64 bytes from 63.233.167.99: icmp_seq=5 ttl=247 time=56.9 ms
64 bytes from 63.233.167.99: icmp_seq=6 ttl=247 time=56.6 ms
64 bytes from 63.233.167.99: icmp_seq=7 ttl=247 time=56.7 ms
64 bytes from 63.233.167.99: icmp_seq=8 ttl=247 time=56.5 ms
64 bytes from 63.233.167.99: icmp_seq=9 ttl=247 time=57.0 ms
--- google.com ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 56.5/56.8/57.2 ms

Fields in the ping output:

The ping utility resolved the hostname google.com to the IP 63.233.167.99. The next field in the output shows the number of data bytes to be send - which is 56. Combined with the 8 bytes of ICMP header data, this translates to 64 data bytes which is shown at the beginning of each ping request. The Sequence Number of each request is denoted by the icmp_seq field in the ping output, which gets incremented. The “ttl” or “Time To Live” field in the Internet Protocol (IP) specifies how many more hops a packet can travel before being discarded or returned.
The ping program shows the ttl value of the packet sent to it from a remote location. These remote systems can change the TTL values to different ones in the reply (values can be 255, 128, 60 etc). The value seen above is the initial value minus the round-trip number of hops. The time shown in milliseconds is the round trip time or round trip delay time(RTT). RTT is the time required for a transmitted pulse to reach a target and for the echo reply to return to the receiver.
At the end of the output, a statistics is displayed, which shows the packet loss percentage, the minimum/maximum/average RTT.

traceroute

traceroute (tracert in Windows) prints the route which the packets takes in a TCP/IP network on their way to destination.
The command traceroute hostname sends three UDP packets having a TTL value of 1. On arrival of the packets at the closest router, the router decreases the TTL value by one, thus making it 0. When a packet with TTL value 0 is noticed by the router, it responds by sending an ICMP packet “time exceeded” (Type 11 Code 0) as “time to live exceeded in transit.” The IP address of the router that sends back the 3 ICMP packets is noted by the traceroute utility. It will then calculate the time to receive each of the packets and then sends out three more UDP packets, this time with a TTL value of 2.
Since these packets now have a TTL value of 2, they should be returned by the router that is away by 2 hops from the one sending the packets. Upon receiving these packets, they will be noted by the traceroute utility and it sends out three more UDP packets, with a TTL value of 3.This process is continued by the traceroute utility until it either reaches the final destination or it has gone through the default maximum value of 30 routers. Since these datagrams try to access an invalid port at the destination host, the message returned will be ICMP Port Unreachable, indicating an unreachable port. This event signals the traceroute program that it is finished.
The outgoing packets from traceroute are sent towards the destination using UDP at very high port numbers, typically in the range of 32,768 and higher. This is because no one usually runs UDP services in those ports, so when the packet finally reaches the destination, traceroute can know that.
Sometimes, we see timeout in the traceroute output. For example:
16  ge-0-0-0-p130.msr2.dcn.yahoo.com (216.115.108.13) 283.043 ms 285.184 ms
ge-0-0-0-p120.msr1.dcn.yahoo.com (216.115.108.9)  283.043 ms 283.853 ms
17  ge7-1.bas1-m.dcn.yahoo.com (216.109.120.205)  279.207 ms   279.348
ms ge10-2.bas2-m.dcn.yahoo.com (216.109.120.249)  290.857 ms
18  * * *
19  * * *
This means that there is no reply from the destination host. This can be due to a variety of reasons and doesn’t necessarily mean that the host is down. In fact the destination host might be receiving the packets sent, but not sending back a reply. The next host might be down or the network connecting to it may be down. Or, there is chance of a routing issue on the way back (which need not be the same route as the forward route). Some ISPs set policies in their firewalls and routers as security measures such that these ICMP reply packets are blocked.
Let me paste a traceroute session which I did to google.com from my console.
chacko@server:~$traceroute google.com
traceroute:Warning: google.com has multiple addresses; using 63.238.197.99
traceroute to google.com (63.238.197.99), 30 hops max, 38 byte packets
1  * * *
2  xxxxxxxxxxxxxxxxxxxx  (208.94.33.1)  0.671 ms  0.644 ms  0.767 ms
3  415.ge-5-2-1.mpr2.sfo3.us.above.net (63.123.129.43)  0.997 ms  1.044 ms
0.809 ms
4  so-3-3-0.mpr4.sjc2.us.above.net (63.123.30.213)  2.132 ms  2.218 ms 2.231
ms
5  so-6-0-0.mpr1.lax9.us.above.net (63.123.23.206)  10.766 ms  10.583 ms
10.782 ms
6  so-3-0-0.mpr2.lax9.us.above.net (63.123.31.102)  11.091 ms  10.913 ms
10.628 ms
7  so-4-1-0.mpr1.iah1.us.above.net (63.123.29.106)  40.584 ms  40.605 ms
49.805 ms
8  so-0-0-0.mpr2.iah1.us.above.net (63.123.31.62)  45.071 ms  40.432 ms
40.598 ms
9  so-5-1-0.mpr1.atl6.us.above.net (63.123.29.61)  53.759 ms  53.239 ms
53.203 ms
10  63.123.229.173.google.com (63.123.229.173)  55.633 ms  53.367 ms 53.653 ms
11  63.233.174.86 (63.233.174.86)  54.031 ms 64.233.174.84 (63.233.174.84)
53.647 ms  53.526 ms
12  70.12.236.173 (70.12.236.173)  71.348 ms  54.304 ms  54.479 ms
13  215.233.49.223 (215.233.49.223)  55.993 ms  56.158 ms  57.354 ms
14  jc-in-f99.google.com (64.233.187.99)  54.666 ms  54.179 ms  54.517ms

Fields in the traceroute output:

Since google.com has got multiple IP addresses pointed to it, some versions of traceroute shows the warning message as in the above traceroute output. The output shows the maximum number of hops traceroute attempts which is 30 in this case and a 38 byte packet has been used. The first hop in this output shows timeout. In the subsequent hops, we can see 3 fields at the end of each hop, which denotes the RTT of the 3 packets sent to each of the systems. In the 11th hop, we can see that the 2nd packet was sent to a different IP. This is because of the load balancing setup there, which takes each access to different systems.

mtr (My Traceroute)

mtr combines the functionalities of the ‘traceroute’ and ‘ping’ utilities.Whenmtr starts running, it investigates the network connection between the host in which it runs, and a user-specified destination host. After determining the address of each network hop between these machines, it sends out a sequence of ICMP ECHO requests to each machine to check the quality of the link to each of them. mtr uses ICMP Time Exceeded (type 11) packets returning from routers, or ICMP Echo Reply packets when the packets have hit their destination host. Running statistics about each machine is printed out as the process is being run.
The real advantage of mtr over ping or traceroute is, it shows where exactly the packet loss is happening in the route to the destination host – in realtime. It shows the loss percentage on each hosts, which can give us valuable information on which specific provider is having a network issue. Also, since mtr is using ICMP ECHO requests, it will go through the routers which have blocked udp packets. So mtr may work where traceroute is not working.
The following is the mtr output to yahoo.com which I did from my local console.
My traceroute  [v0.69]
machine.hostname.com (0.0.0.0)(tos=0×0 psize=64 bitpattern=0×00)  Wed Jan 17
12:24:50 2007
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
Packets  Pings
Host Loss%  Snt   Last   Avg  Best    Wrst StDev
1.192.168.1.254 0.0% 23    0.2   0.2     0.2       0.5   0.1
2.illekm-static-203.197.145.137.vsnl.net.in 0.0% 23  2.3 5.9     2.3      26.4
5.4
3. 203.200.149.148 0.0% 23  5.5   4.6     2.4      30.6   5.9
4. 59.163.16.146.static.vsnl.net.in 0.0% 23 239.4 242.4  238.8 262.3   5.6
5. 219.64.229.1.mpls-vpn-ny.static.vsnl.net.in  0.0% 23  270.4 278.1 269.9
316.6  11.7
6. ge-2-0-9.p558.pat1.dce.yahoo.com 0.0% 23  268.2 279.8 267.7 353.5  20.2
ge-0-0-9.p815.pat2.dce.yahoo.com
ge-0-0-9.p815.pat2.dce.yahoo.com
ge-2-0-8.p426.pat1.dce.yahoo.com
ge-0-0-8.p170.pat2.dce.yahoo.com
ge-0-0-9.p815.pat2.dce.yahoo.com
7.ge-1-0-0-p111.msr2.dcn.yahoo.com 0.0% 22  272.6 277.7 272.1   299.5   7.8
ge-0-0-0-p110.msr2.dcn.yahoo.com
ge-0-0-0-p110.msr2.dcn.yahoo.com
ge-0-0-0-p111.msr2.dcn.yahoo.com
ge-0-0-0-p100.msr1.dcn.yahoo.com
ge-0-0-0-p111.msr2.dcn.yahoo.com
ge-0-0-0-p100.msr1.dcn.yahoo.com
8.ge9-3.bas2-m.dcn.yahoo.com 0.0% 22  271.6 276.4 271.6  288.5   5.8
ge10-2.bas1-m.dcn.yahoo.com
ge6-1.bas1-m.dcn.yahoo.com
ge7-1.bas2-m.dcn.yahoo.com
ge5-2.bas1-m.dcn.yahoo.com
ge3-1.bas2-m.dcn.yahoo.com
ge10-2.bas2-m.dcn.yahoo.com
9. w2.rc.vip.dcn.yahoo.com  0.0% 22 278.9 278.2 271.6  323.9  11.1
Since the mtr output is dynamic, it is difficult to copy the output from konsole. For this either the –report option can be used from the konsole or, just type “p” while mtr is running and the output will pause. Note that you should have root privileges to run mtr.
These three utilities are good enough to get a basic information about the host, network and reachability. There are a lot of other tools with specific features, which can be used for advanced data collection and troubleshooting. The following are some of them.

Tools:

For remote traceroute - http://www.traceroute.org/

References

TCP/IP Illustrated - Volume 1 - The protocols : W. Richard Stevens
man pages of ping, traceroute & mtr