Suresh Kumar Pakalapati's Linux Administration: 08/12/10

Thursday, August 12, 2010

Troubleshooting Squid Reverse Proxy Server

Reverse Proxy Implementation

Step1 : Check Squid is running or not

#ps –ef | grep squid

This command should give you

Five internal dns server running process

Two squid demon(squid –D)

One unlinked process.

If all the mentions process are running fine means, it indicates that your Squid server is running fine.

ps -ef | grep squid

root 31617 1 0 15:06 ? 00:00:00 /opt/squid/sbin/squid -D

squid 31619 31617 0 15:06 ? 00:00:00 (squid) -D

squid 31623 31619 0 15:06 ? 00:00:00 (dnsserver)

squid 31624 31619 0 15:06 ? 00:00:00 (dnsserver)

squid 31625 31619 0 15:06 ? 00:00:00 (dnsserver)

squid 31626 31619 0 15:06 ? 00:00:00 (dnsserver)

squid 31627 31619 0 15:06 ? 00:00:00 (dnsserver)

squid 31628 31619 0 15:06 ? 00:00:00 (unlinkd)

Step2 : Check back end server is able to access from your reverse proxy or not?

#links web425.example.co.in

Step3 : Check system default logs for any suspected activity.

#tail –f /var/log/messages

Step4 : Check squid access logs, cache logs, store logs if you can able to find any issues

#tail –f /opt/squid/var/logs/access.log

#tail –f /opt/squid/var/logs/cache.log

#tail –f /opt/squid/var/logs/store.log

Step5 : Check whether your syntx in squid is fine or not with the following commands

/opt/squid/sbin/squid -k check
/opt/squid/sbin/squid -k parse

Which Reverse Proxy Is Good?

Recently I was assigned to work on reverse proxies So I did some brain storming of some know proxies such as Apache and Squid then Googled to find which are other good reverse proxies are available. There are so many open source reverse proxies in market. Out of these most popular are1.Apache(this is having lots of disadvantages)
2.Squid
3.Pound
Which reverse proxy is good?There are so many good reverse proxies some of them ar as below. Please comment on this with your favorite Reverse proxy servers.
Let us have a look on some proxies.1.Lighttpd (pronounced "lighty" or "Light-TPD") is a web server designed to be secure, fast, standards-compliant and flexible while being optimized for speed-critical environments. Its low memory footprint (compared to other web servers), light CPU load and speed goals make lighttpd suitable for servers that are suffering load problems, or for serving static media separately from dynamic content. Run by many companies where requests are high. Lighttpd is used by some of the biggest websites, including sites such as meebo. Wikimedia runs Lighttpd servers as does SourceForge. Three of the most famous torrent listing websites, the Pirate Bay, Mininova and ISOHunt, which have more than 1,000 hits per second, also use Lighttpd.
Some Features:1. Load-balancing FastCGI, SCGI and HTTP proxy support
2. Chroot support
3. Select()-/poll()-/epoll() based web server
4. Support for more efficient event notification schemes like kqueue and epoll
5. Conditional rewrites (mod_rewrite)
6. SSL and TLS support, via OpenSSL.
7. Authentication against an LDAP server
8. RRDtool statistics
9. Rule-based downloading with possibility of a script handling only Authentication
Server Side Includes support
10. Flexible virtual hosting
Modules support
11. Cache Meta Language (currently being replaced by mod_magnet) using the Lua programming language
12. Minimal WebDAV support
13. Servlet (AJP) support (in versions 1.5.x and up)
HTTP compression using mod_compress and the newer mod_deflate (1.5.x)
14. Light-weight (less than 1 MB)
Single-process design with only several threads. No processes or threads started per connection.

2.Nginx (pronounced as "engine X") is a lightweight, high performance
webserver/reverse proxy and e-mail (IMAP/POP3) proxy. It can serve 500 million requests per day. Currently nginx doing reverse proxy can serve over tens of millions of HTTP requests per day (that’s a few hundred per second) on a *single server*. At peak load it uses about 15MB RAM and 10% CPU. Under the same kind of load, apache falls over (after using 1000 or so processes and god knows how much RAM), pound falls over (too many threads, and using 400MB+ of RAM for all the thread stacks), and lighty *leaks* more than 20MB per hour (and uses more CPU, but not significantly more). Used by wordpress.com for high performance.
1. Handling of static files, index files and auto-indexing
2. Reverse proxy with caching
3. Load balancing
4. Fault tolerance
5. SSL support
6. FastCGI support, with caching.
7. Name- and IP-based virtual servers
8. FLV streaming
9. MP4 streaming, using the MP4 streaming module
10. Web page access authentication
11. SMTP, POP3 and IMAP proxy
12. STARTTLS support
13. SSL support
3.Pound is a lightweight open source reverse proxy program suitable to be used as a web server load balancing solution. Developed by an IT security company, it has a strong emphasis on security. Using regular expression matching on the requested URLs, Pound can pass different kinds of requests to different backend server groups.
1. Detects when a backend server fails or recovers, and bases its load balancing decisions on this information: if a backend server fails, it will not receive requests until it recovers
2. Decrypts https requests to http ones
3. Rejects incorrect requests
4. Can be used in a chroot environment
5. Has no special requirements concerning which web server software or browser to use
6. Supports virtual hosts
4.Varnish is an HTTP accelerator designed for content-heavy dynamic web sites. Good for even static content too. Varnish supports load balancing using both a round-robin and a random director, both with a per-backend weighting. Basic health-checking of back ends is also available.

5.Perlbal is a Perl-based reverse proxy load balancer and web server. The program is usually used by large web sites, to distribute the load over a number of servers. Perlbal also features a so-called "re-proxy" mechanism.
References:http://www.ruby-forum.com/topic/96361
http://en.wikipedia.org/wiki/Reverse_proxy

Configuration of SQUID Reverse Proxy

Before Installing and configuring SQUID as reverse proxy I just want to add the below point(s).

Don’t install SQUID from package installations such as rpm in Redhat and apt-get/deb in Debain.

Download the source package from squid official site, then compile it and install it according to your needs.

In order to SQUID run perfectly please change the ownership of the installation folder to squid.

By default SQUID will not create cache directory in the installation directory, so we have to create it manually with ownership as squid user and we have to execute squid –z in order to SQUID work properly which will create.

Don’t worry about all these points. I will explain these points once we start configuring SQUID.

So let’s start how to implement SQUID on RHEL5/CENTOS5

Step1 : Remove any squid package if it’s installed by default through rpm/deb packages.

#rpm –e squid

Step2 : Download latest SQUID package from SQUID official site to some temp directory

#mkdir /temp

#cd /temp

#wget http://www.squid-cache.org/Versions/v2/2.6/squid-2.6.STABLE23.tar.gz

Step3 : Uncompress the downloaded tar.gz package.

#tar xvfz squid-2.6.STABLE23.tar.gz

Step4 : Prepare the uncompressed package for installation. If you are new to installing source package have a look in to this post.

#cd squid-2.6.STABLE23

#./configure --prefix=/opt/squid --enable-ssl --disable-internal-dns

Let me explain the options used for the compilation.

a. --prefix=/opt/squid This option tells that install all the squid related files in /opt/squid, if you don’t specify this option by default squid will be installed in /usr.

b. --enable-ssl this option is used for supporting SSL in squid server.

c.--disable-internal-dns most confusing option of all, this will tell squid to use its own internal DNS serverwhich will take inputs from /etc/hosts file, it will block squid to use /etc/resovl.conf for name resolution.

Step5 : Install the SQIUD package now.

#make

#make check

#make install

Step6 : Once installed successfully we have to create cache folder/swap folder in /opt/squid/var/log/cache/

#/opt/squid/sbin/squid –z

Step7 : Configuration Squid

Step(7a) : Open the squid.conf file and specify the http_port entry, just search for http_port in squid.conf and specify as said below.

Note : It’s a good practice in admin activity to take backup of any file before modifying it, so just copy thesquid.conf to a safe location and then edit the squid.conf in /opt/squid/etc/

#vi /opt/squid/etc/squid.conf

http_port 10.77.225.20:80 accel vhost

Let me explain above line

http_port is the option where you can specify on which port your squid server will listen for incoming requests.

10.77.225.20 is the ip address of the squid machine. This should be a public ip address.

:80 is the port where the squid listen.

accel vhost is accelerator mode using Host header for virtual domain support. Implies accel.

Step(7b) : Specify backend server details as follows

cache_peer 10.88.26.12 parent 80 0 no-query originserver name=server_1 login=PASS

acl sites_server_1 dstdomain web425.example.co.in

cache_peer_access server_1 allow sites_server_1

Let me explain what actually the above three lines meant for.

First line specifies cache_peer is the option used to specify the backend server ip address(10.88.26.12)

back end webserver port(80) then just say to squid server, from where the quiery is originating.(originservername=server_1)

type of access(login=pass is used to specify how to access squid server from backend)

Second line specifies acl(access control list for the backend server here in this case it is web425.example.co.in)

Third line specifies allowing of this backend server(sites_server_1) to squid server(server_1).

Note : Make a note that above 3 lines for giving access to cache purpose, still we did not give http access for this site.

Step(7c) : Giving http access to backend site

acl http_accl_host1 dst web425.persistent.co.in

http_access allow http_accl_host1

The above two acl’s are used to specify backend server and its self explanatory.

Step8 : Check any syntax errors are there in the squid config file by using following command

#/opt/squid/sbin/squid -k check
#/opt/squid/sbin/squid -k parse

If your system didn’t throw any error then proceed to next step, otherwise please try to debug or write a comment on this will respond to you people.

Step9 : Now Create the cache and swap related entries

#mkdir /opt/squid/var/logs/cache

#/opt/squid/sbin/squid –z

Just a clipped output for the reference…

#[root@ser1 ~]# /opt/squid/sbin/squid -z

2009/12/28 19:27:57| Creating Swap Directories

[root@ser1 ~]# tail -f /opt/squid/

bin/ etc/ libexec/ sbin/ share/ var/

[root@ser1 ~]# tail -f /opt/squid/var/logs/cache.log

Memory usage for squid via mallinfo():

Total space in arena : 2516 KB

Ordinary blocks : 2454 KB 11 blks

Small blocks : 0 KB 6 blks

Holding blocks : 236 KB 1 blks

Free Small blocks : 0 KB

Free Ordinary blocks : 61 KB

Total in use : 2690 KB 98%

Total free : 61 KB 2%

2009/12/28 15:12:16| Squid Cache (Version 2.6.STABLE23): Exiting normally.

Step10 : Working on DNS related stuff.

Step(10a) : Specify the backend servers related info in /etc/hosts file10.88.26.12web425.example.com web425.

Step(10b) : Please remove the /etc/resolve.conf file entries if any, to disable dns queries to DNS server.

The below step is important step in configuring revers proxy.

Step(10c ) : Please specify the entries for the backend servers in your DNS servers. So that if any one accessing from outside of your network they should be redirected to your reverse proxy server which will serve you thebackend web content.

So in DNS web425.example.co.in entry should be redirected to your reverse proxy server IP address.

Step11 : Change the ownership permissions of /opt/squid to squid user

#chown squid:squid –R /opt/squid

Step12 : Starting Squid reverse proxy

#/opt/squid/sbin/squid –D

-D is the option to disable external DNS server entries.

Heartbeat Clustering

Its long time back I have learnt Heartbeat clustering around March-2008, but still this point I never implemented for production servers. This is my first attempt to do it and I am successful in implementing it fortwo node fail-over Cluster. Clustering is very complex and very advanced topic which I cannot deal with in one post. In this post I will give you some basics of Clustering, advantages of Clustering and configuration ofsimple fail-over Cluster.
Let’s start.
What is a Cluster any way?Ans : A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that of a single computer, while typically being much more cost-effective than single computers of comparablespeed or availability – www.wikipedia.org.
Cluster terminology.
Node : It’s one of the system/computer which participates with other systems to form a Cluster.
Heartbeat : This a pulse kind of single which is send from all the nodes at regular intervals using a UDP packetso that each system will come to know the status of availability of other node. It’s a kind of door knocking activity like pinging a system, So that each node which are participating in Cluster will come to know the status of other nodes availability in the Cluster.
Floating IP or Virtual IP : This is the IP assigned to the Cluster through which user can access the services. So when ever clients request a service they will be arrived to this IP, and client will not know what are the back-end/actual ip addresses of the nodes. This virtual IP is used to nullify the effect of nodes going down.

Master node : This is the node most of the time where services are run in a High availability Cluster.
Slave node : This is the node which is used in High availability Cluster when master node is down. It will take over the role of servicing the users, when it will not receive heartbeat pulse from master. And automatically gives back the control when the master server is up and running. This slave comes to know about the status of master through heartbeat pulse/signals.
Types of Clusters:
Cluster types can be divided in to two main types
1. High availability :
These types of Clusters are configured where there should be no downtime. If one node in the cluster goes down second node will take care of serving users without interrupted service with availability of five nines i.e. 99.999%.

2. Load balancing :These types of Clusters are configured where there are high loads from users. Advantages of load balancing are that users will not get any delays in their request because load on a single system is shared by two or more nodes in the Cluster.

Advantages of Cluster :
1.Reduced Cost : Its cheaper to by 10 normal servers and do cluster on them then buying a high end servers likeblade servers, which will do more work than a single blade server which have more processing power.
2. Processing Power
3. Scalability
4. Availability
Configuration files details :Three main configuration files :· /etc/ha.d/authkeys
· /etc/ha.d/ha.cf
· /etc/ha.d/haresources
Some other configuration files/folders to know :/etc/ha.d/resource.d. Files in this directory are very important which contains scripts to start/stop/restart a service run by this Heartbeat cluster.
Before configuration of Heartbeat Cluster these below points to be noted.
Note1 : The contents of ha.cf file are same in all the nodes in a cluster, except ucast and bcast derivatives.
Note2 : The contents of authkeys and haresources files are exact replica on all the nodes in a cluster.
Note3 : A cluster is used to provided a service with high availability/high performance, that service may be aweb server, reverse proxy or a Database.
Test scenario setup:1. The cluster configuration which I am going to show is a two node cluster with failover capability for a Squid reverse proxy..
2.For Squid reverse proxy configuration please click here..
3. Node details are as follows

Node1 :IpAddress(eth0):10.77.225.21
Subnetmask(eth0):255.0.0.0
Default Gateway(eth0):10.0.0.1
IpAddress(eth1):192.168.0.1(To send heartbeat signals to other nodes)
Sub net mask (eth1):255.255.255.0
Default Gateway (eth1):None(don’t specify any thing, leave blank for this interface default gateway).

Node2 :IpAddress(eth0):10.77.225.22
Subnetmask(eth0):255.0.0.0
Default Gateway (eth0):10.0.0.1
IpAddress(eth1):192.168.0.2(To send heartbeat signals to other nodes)
Sub net mask (eth1):255.255.255.0
Default Gateway(eth1):None(don’t specify any thing, leave blank for this interface default gateway).
4. Floating Ip address:10.77.225.20

Lets start configuration of Heartbeat cluster. And make a note that ever step in this Heartbeat cluster configuration is divided in two parts parts
1.(configurations on node1)
2.(configurations on node2)

For better understanding purpose

Step1 : Install the following packages in the same order which is shown
Step1(a) : Install the following packages on node1
#rpm -ivh heartbeat-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-ldirectord-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-pils-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-stonith-2.1.2-2.i386.rpm
Step1(b) : Install the following packages on node2
#rpm -ivh heartbeat-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-ldirectord-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-pils-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-stonith-2.1.2-2.i386.rpm

Step2 : By default the main configuration files (ha.cf, haresources and authkeys) are not present in/etc/ha.d/ folder we have to copy these three files from /usr/share/doc/heartbeat-2.1.2 to /etc/ha.d/
Step2(a) : Copy main configuration files from /usr/share/doc/heartbeat-2.1.2 to /etc/ha.d/ on node 1
#cp /usr/share/doc/heartbeat-2.1.2/ha.cf /etc/ha.d/
#cp /usr/share/doc/heartbeat-2.1.2/haresources /etc/ha.d/
#cp /usr/share/doc/heartbeat-2.1.2/authkeys /etc/ha.d/

Step2(b) : Copy main configuration files from /usr/share/doc/heartbeat-2.1.2 to /etc/ha.d/ on node 2
#cp /usr/share/doc/heartbeat-2.1.2/ha.cf /etc/ha.d/
#cp /usr/share/doc/heartbeat-2.1.2/haresources /etc/ha.d/
#cp /usr/share/doc/heartbeat-2.1.2/authkeys /etc/ha.d/

Step3 : Edit ha.cf file#vi /etc/ha.d/ha.cf
Step3(a) : Edit ha.cf file as follows on node1
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 25
warntime 10
initdead 50
udpport 694
bcast eth1
ucast eth1 192.168.0.1
auto_failback on
node rp1.linuxnix.com
node rp2.linuxnix.com
Step3(b) : Edit ha.cf file as follows on node2debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 25
warntime 10
initdead 50
udpport 694
bcast eth1
ucast eth1 192.168.0.2
auto_failback on
node rp1.linuxnix.com
node rp2.linuxnix.com

Let me explain each entry in detail:Debugfile : This is the file where debug info with good details for your heartbeat cluster will be stored, which is very much useful to do any kind of troubleshooting.
Logfile : This is the file where general logging of heartbeat cluster takes place.
Logfacility : This directive is used to specify where to log your heartbeat logs(if its local that indicates store logs locally or if it’s a syslog then store it on remote server and none to disable logging). And there are so many other options, please explore yourself.
Keepalive : This directive is used to set the time interval between heartbeat packets and the nodes to check the availability of other nodes. In this example I specified it as two seconds(keepalive 2).

Deadtime : A node is said to be dead if the other node didn’t receive any update form it.
Warntime : Time in seconds before issuing a "late heartbeat" warning in the logs.
Initdead : With some configurations, the network takes some time to start working after a reboot. This is a separate "deadtime" to handle that case. It should be at least twice the normal deadtime.

Udpport : This is the port used by heartbeat to send heartbeat packet/signals to other nodes to check availability(here in this example I used default port:694).

Bcast : Used to specify on which device/interface to broadcast the heartbeat packets.
Ucast : Used to specify on which device/interface to unicast the heartbeat packets.
auto_failback : This option determines whether a resource will automatically fail back to its "primary" node, or remain on whatever node is serving it until that node fails, or an administrator intervenes. In my example I have given as on that indicate if the failed node come back online, controle will be given to this node automatically. Let me put it in this way. I have two nodes node1 and node2. My node one machine is a high end one and node is for serving temporary purpose when node 1 goes down. Suppose node1 goes down, node2 will take the control and serve the service, and it will check periodically for node1 starts once it find that node 1 is up, the control is given to node1.
Node : This is used to specify the participated nodes in the cluster. In my cluster only two nodes are participating (rp1 and rp2) so just specify that entries. If in your implementation more nodes are participating please specify all the nodes.

Step4 : Edit haresources file#vi /etc/ha.d/haresources
Step4(a) : Just specify below entry in last line of this file on node1rp1.linuxnix.com 10.77.225.20 squid
Step4(b) : Just specify below entry in last line of this file on node1rp1.linuxnix.com 10.77.225.20 squid
Explanation of each entry :rp1.linuxnix.com is the main node in the cluster
10.77.225.20 is the floating ip address of this cluster.

Squid : This is the service offered by the cluster. And make a note that this is the script file located in /etc/ha.d/ resource.d/.Note : By default squid script file will not be there in that folder, I created it according to my squid configuration.

What actually this script file contains?Ans : This is just a start/stop/restart script for the particular service. So that heartbeat cluster will take care of the starting/stoping/restarting of the service(here its squid).

Step5 : Edit authkeys file, he authkeys configuration file contains information for Heartbeat to use when authenticating cluster members. It cannot be readable or writeable by anyone other than root. so change the permissions of the file to 600 on both the nodes..

Two lines are required in the authkeys file:A line which says which key to use in signing outgoing packets.
One or more lines defining how incoming packets might be being signed.
Step5 (a) : Edit authkeys file on node1#vi /etc/ha.d/authkeys
auth 2
#1 crc
2 sha1 HI!
#3 md5 Hello!Now save and exit the file
Step5 (b) : Edit authkeys file on node2#vi /etc/ha.d/authkeys
auth 2
#1 crc
2 sha1 HI!
#3 md5 Hello!Now save and exit the file

Step6 : Edit /etc/hosts file to give entries of hostnames for the nodes

Step6(a) : Edit /etc/hosts file on node1 as below

10.77.225.21 rp1.linuxnix.com rp1

10.77.225.22 rp2.linuxnix.com rp2

Step6(b) : Edit /etc/hosts file on node2 as below

10.77.225.21 rp1.linuxnix.com rp1

10.77.225.22 rp2.linuxnix.com rp2

Step7 : Start Heartbeat cluster
Step7(a) : Start heartbeat cluster on node1
#service heartbeat start
Step7(b) : Start heartbeat cluster on node2#service heartbeat start

Checking your Heartbeat cluster:If your heartbeat cluster is running fine a Virtual Ethernet Interface is created on node1 and 10.77.225.20Clipped output of my first node
# ifconfig

Eth0 Link encap:Ethernet HWaddr 00:02:A5:4C:AF:8E
inet addr:10.77.225.21 Bcast:10.77.231.255 Mask:255.255.248.0
inet6 addr: fe80::202:a5ff:fe4c:af8e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5714248 errors:0 dropped:0 overruns:0 frame:0
TX packets:19796 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1533278899 (1.4 GiB) TX bytes:4275200 (4.0 MiB)
Base address:0x5000 Memory:f7fe0000-f8000000
Eth0:0 Link encap:Ethernet HWaddr 00:02:A5:4C:AF:8E
inet addr:10.77.225.20 Bcast:10.77.231.255 Mask:255.255.248.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x5000 Memory:f7fe0000-f8000000
Eth1 Link encap:Ethernet HWaddr 00:02:A5:4C:AF:8F
inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::202:a5ff:fe4c:af8f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:145979 errors:0 dropped:0 overruns:0 frame:0
TX packets:103753 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:38966724 (37.1 MiB) TX bytes:27640765 (26.3 MiB)
Base address:0x5040 Memory:f7f60000-f7f80000
Try accessing your browser whether Squid is working fine or not. Please follow up comming posts how to troubleshoot heartbeat cluster.