Suresh Kumar Pakalapati's Linux Administration: TroubleShoot

Showing posts with label TroubleShoot. Show all posts

Thursday, January 26, 2012

Troubleshooting Using dmesg Command in Unix and Linux

During system bootup process, kernel gets loaded into the memory and it controls the entire system.

When the system boots up, it prints number of messages on the screen that displays information about the hardware devices that the kernel detects during boot process.

These messages are available in kernel ring buffer and whenever the new message comes the old message gets overwritten. You could see all those messages after the system bootup using the dmesg command.

1. View the Boot Messages

By executing the dmesg command, you can view the hardwares that are detected during bootup process and it’s configuration details. There are lot of useful information displayed in dmesg. Just browse through them line by line and try to understand what it means. Once you have an idea of the kind of messages it displays, you might find it helpful for troubleshooting, when you encounter an issue.

# dmesg | more
Bluetooth: L2CAP ver 2.8
eth0: no IPv6 routers present
bnx2: eth0 NIC Copper Link is Down
usb 1-5.2: USB disconnect, address 5
bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex

As we discussed earlier, you can also view hardware information using dmidecode.

2. View Available System Memory

You can also view the available memory from the dmesg messages as shown below.

# dmesg | grep Memory
Memory: 57703772k/60817408k available (2011k kernel code, 1004928k reserved, 915k data, 208k init)

3. View Ethernet Link Status (UP/DOWN)

In the example below, dmesg indicates that the eth0 link is in active state during the boot itself.

# dmesg  | grep eth
eth0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem 96000000, IRQ 169, node addr e4:1f:13:62:ff:58
eth1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem 98000000, IRQ 114, node addr e4:1f:13:62:ff:5a
eth0: Link up

4. Change the dmesg Buffer Size in /boot/config- file

Linux allows to you change the default size of the dmesg buffer. The CONFIG_LOG_BUF_SHIFT parameter in the /boot/config-2.6.18-194.el5 file (or similar file on your system) can be changed to modify the dmesg buffer.

The below value is in the power of 2. So, the buffer size in this example would be 262144 bytes. You can modify the buffer size based on your need (SUSE / REDHAT).

#  grep CONFIG_LOG_BUF_SHIFT  /boot/config-`uname -r`
CONFIG_LOG_BUF_SHIFT=18

5. Clear Messages in dmesg Buffer

Sometimes you might want to clear the dmesg messages before your next reboot. You can clear the dmesg buffer as shown below.

# dmesg -c

# dmesg

6. dmesg timestamp: Date and Time of Each Boot Message in dmesg

By default the dmesg don’t have the timestamp associated with them. However Linux provides a way to see the date and time for each boot messages in dmesg in the /var/log/kern.log file as shown below.

klogd service should be enabled and configured properly to log the messages in /var/log/kern.log file.

# dmesg | grep "L2 cache"
[    0.014681] CPU: L2 cache: 2048K

# grep "L2 cache" kern.log.1
Oct 18 23:55:40 ubuntu kernel: [    0.014681] CPU: L2 cache: 2048K

Tuesday, August 23, 2011

Troubleshooting Incidents related to Server Unavailability

Troubleshooting Chart for Server Unavailbility

Friday, July 15, 2011

Login as a root from GUI on Fedora 15

Fedora 15 released and I am posting solution for root lovers who like to login with root. Fedora 15 root login is exactly same like Fedora 14 root login.

WARNING :- Its not at all good to login as root from GUI. It’ DANGEROUS. BUT if some one wants to know that how to login as a root from GUI then follow the instructions.
In Fedora 15 You cannot login as a root from gui. By Default, only Normal users are allowed to login from gui mode.

I Managed to Login as a root from GUI on Fedora 15. Follow these steps and you will able to Login as a root from GUI on Fedora 15

If You want to login as a root from GUI in Fedora 15 then you have to edit something like some files which are located to /etc/pam.d/

Open your Te rminal from Applications -> System Tools -> Terminal

Now Login as a root from your terminal

Step 1 :- [itsolutions@ask4itsolutions.com]$ su – root
Password:-

Step 2:- Now go to your /etc/pam.d/ directory.

[root@ask4itsolutions]# cd /etc/pam.d/

Then first take a backup of gdm file

cp gdm gdm.bkp ( always take backup if anything goes wrong you can correct it by original file)

Step 3 :- Now Open gdm file in your favourite editor. I am using vi as my editor.

[root@ask4itsolutions pam.d]# vi gdm

Find and Comment or remove this line into your gdm file
auth required pam_succeed_if.so user != root quiet

Step 4 :- Save & Exit From that File. ( In Fedora10 Till step 4 is enough to Login as a root from GUI but for Fedora 15 you need one more file to edit otherwise you cannot Login as a root even though you edited gdm file).

Step 5 :- Here is the additional file that you need to edit and that file name is gdm-password. Open gdm-password file in your favourite editor. I am using vi as my editor.

Then first take a backup of gdm-password file

cp gdm-password gdm-password.bkp ( always take backup if anything goes wrong you can correct it by original file)

[root@ask4itsolutions.com pam.d]#vi gdm-password

Find and Comment or remove this line into your gdm file
auth required pam_succeed_if.so user != root quiet

Step 6 :- Save & Exit from File. Now Logout and Try to Login as a root user. Now you are able to Login as a root user from GUI in Fedora 15.

Thursday, August 12, 2010

How To Kill Defunct Or Zombie Process

A "defunct" processes is also known as a "zombie" processes. A Zombie process is referred as dead processwhich is receding on your system thought its completed executing. In one shot we can say its a dead processes. This process will be in your process table and consuming your memory. Having more defunct process will consume your memory which intern slows your system. We have to kill the defunct process in order to free RAM and make system stable.
Why defunct process are created?Ans : When ever a process ends all the memory used by that process are cleared and assigned to new processbut due to programming errors/bugs some processes are still left in process table. These are created when there is no proper communication between parent process and child process.
Some FAQ?
1. How to find a defunct process?
And : Grep defunct value in ps -ef output#ps -ef grep defunct2. How can i kill a defunct process?And : Just use kill command#kill defunct-pid
3. Still not able to kill?
Ans : Then use kill -9 to force kill that process#kill -9 defunct-pid
4. Still have an issue in killing it?
Ans : Then try to kill its parent id and then defunct.#kill parent-id-of-defunct-pid
Then
#kill -9 parent-id-of-defunct-pid
5. Still having defunct?
Ans : If you still find defunct process eating up RAM then last and final solution is to reboot your machine.

6.What is orphan process?
Ans : An orphan process is said to be a process which runs though parent process is terminated, these process do not know what to do and when to terminate.
7. What is difference between orphan and defunct processes?Ans : A defunct process is a dead process where there is no execution happening where as orphan process is a live process which is still in execution state but don't have parent process

I am having a system which daily creates defunct process, I cannot sit and kill these process on daily basis.

How to get rid of this problem?Ans : Just write a shell script to grep defunct process and kill them by putting this script in corntab.