Wednesday, September 14, 2011

Understanding EXT3 File System and Data Recovery using ext3grep


Most of the Linux servers are today loaded with the EXT3 File System. We have come across a lot of incidents where files have gone lost through a simple rm -rf command, either done accidentally or sometimes even intentionally by others. There has always been a misconception that such files cannot be retrieved in ext3. In fact, it is quite possible to retrieve the data.
On further investigation into this, I was able to locate a lot of tools that could help recovering the deleted data from an ext3 file system. Magicrescue, Ext3grep, Testdisk, Ddrescue, Photorec and ext3undel are some examples, to list a few.
I was in pursuit of a tool that is capable of recovering all file types and is exclusively designed for the ext3 file system. One among the best tools that I found and would like to take you through is ext3grep.

Why ext3grep

This tool provides a good set of command line features and a good idea regarding the ext3 file system. Before proceeding further, let’s look into the merits and demerits of this tool.
Merits:
  • Almost any file format can be recovered.
  • You can recover your files based on the date of deletion.
  • Even large files can be retrieved using this tool.
De-Merits:
  • The tool works only on an unmounted file system.
  • Frequent crashes of the machine can happen, as it recovers huge amount of data.
 

Points to be ensured before the Recovery Process

  • Make sure that it is not a primary partition because the software does not need the partition to be mounted during recovery.
  • Mount the partition in the read-only mode, so that the i-nodes will not get overwritten with the new data, as this can cause issues in recovering the files.
  • Ensure that you are installing ext3grep in another partition and also that there is enough space in this partition to recover the files.
 

Download and Install ext3grep in your File system

Download the source code from:http://ext3grep.googlecode.com/files/ext3grep-0.9.0.tar.gz or you can download them through svn access. Follow the steps below for the installation:
mkdir ext3grep
svn checkout http://ext3grep.googlecode.com/svn/trunk/ ext3grep
cd ext3grep
./configure -prefix=/opt/ext3grep      # Make sure that it does not get installed in
                                         the affected partition
make
make install

The Basics of the ext3 File system:

Let’s take a look at how the basic ext3 file system uses ext3grep. Ext3 is an ext2 file system with the journaling option. Journaling is nothing but keeping track of the transactions, so that in case of a crash, the files may be recovered from a previous state. All transaction information are passed to the journaling block device layer (JDB), which is independent of the ext3 file system.
The ext3 partition consists of a set of groups which are created during disk formatting. Each group consists of a super block, a group descriptor, a block bitmap, an i-node bitmap, an i-node table and data blocks. A simple layout can be specified as follows:
,---------+---------+---------+---------+---------+---------,
       | Super   | FS      | Block   | Inode   | Inode   | Data    |
       | block   | desc.   | bitmap  | bitmap  | table   | blocks  |
       `---------+---------+---------+---------+---------+---------'
You can get the total number of groups in the particular partition using the following command:
./ext3grep  /dev/hda2 --superblock | grep 'Number of groups'
Number of groups: 24
Each group consists of a set of fixed size blocks which could be of 4096, 2048 or 1024 bytes in size.
Some of the basic terminology associated with the ext3 file system are:
Superblock:
Superblock is a header that tells the kernel about the layout of the file system. It contains information about the block size, block-count and several such details. The first superblock is the one that is used when the file system is mounted.
To get information related to the blocks per group, use the command:
/opt/ext3grep/bin/ext3grep  /dev/hda2 --superblock | grep 'blocks per group'
Number of blocks per group: 32768
To get the block size details from the superblock, use the command:
/opt/ext3grep/bin/ext3grep /dev/hda5 --superblock|grep size
Block size: 4096
Fragment size: 4096
You can get a complete list of the superblock details using the command:
/opt/ext3grep/bin/ext3grep /dev/hda5 --superblock
Group Descriptor:
The next block is the group descriptor which stores information of each group. Within each group descriptor, is a pointer to the table of i-nodes and the allocation bitmaps for the i-nodes and data blocks.
Allocation Bitmap:
An allocation bitmap is a list of bits describing the block and the i-nodes which are used so that the allocation of files can be done efficiently.
I-nodes:
Each file is associated with one i-node. It contains various information about the files. The data of the files are not stored in the i-node as such, but it points to the location of the data on the disk (data structure to file).
I-nodes are stored in the i-node tables. The command: df -i will give you the total number of i-nodes in the partition and the command ls -i filename will give you the i-node number of the respective file.
df -i | grep /dev/hda5
Filesystem  Inodes          IUsed   IFree           IUse% Mounted on
/dev/hda5    18233952   33671   18200281    1%      /
-------------------------------------------------
ll -i ext3grep
inode no   permission owner  group     size in bytes     date         filename
6350788 -rwxr-xr-x 1 root      root       2765388       Oct  5  23:49 ext3grep
Directories:
In the ext3 file system, each directory is a file. This directory uses an i-node and this i-node contains all the information about the contents of the directory. Each file has a list of directory entries and each entry associates one file name with one i-node number. You can get the directory i-node information using the command:
ll -id bin
6350787 drwxr-xr-x 2 root root 4096 Oct  5 23:49 bin
Superblock Recovery:
Sometimes the superblock gets corrupted and all the data information of that particular group is lost. In this case we can recover the superblock using the alternate superblock backup.
 First, list the backup superblock
dumpe2fs -h /dev/hda5
Primary superblock at 0, Group descriptors at 1-5
 Backup superblock at 32768, Group descriptors at 32769-32773
 Backup superblock at 98304, Group descriptors at 98305-98309
 Backup superblock at 163840, Group descriptors at 163841-163845
 Backup superblock at 229376, Group descriptors at 229377-229381
 Backup superblock at 294912, Group descriptors at 294913-294917
 Next, find the position of backup superblock.
Usually the block size of ext3 will be 4096 bytes, unless defined manually during file system creation.
position= backup superblock *4
32768*4=131072
Now, mount the file system using an alternate superblock.
mount  -o sb=131072 /dev/hda5 /main
 
The ext3grep is a simple tool that can aid anyone who would have accidentally deleted a file on an ext3 file system, only to later realize that they required it.
 

Some important commands for the partition

Find the number of group to which a particular i-node belongs.
The number of i-nodes per group can be found using ext3grep described below:
group = (inode_number - 1) / inodes_per_group
To find the block to which the i-node belongs, use the command:
/opt/ext3grep/bin/ext3grep  /dev/hda2  --inode-to-block 272
Inode 272 resides in block 191 at offset 0x780.
To find the journal i-node of the drive:
/opt/ext3grep/bin/ext3grep  /dev/hda2  --superblock  | grep 'Inode number of
journal file'
Inode number of journal file: 8

The Recovery Process

In the recovery process the first thing to do is to list the files of the particular disk. You can use the command:
/opt/ext3grep/bin/ext3grep  /dev/hda2  --dump-names
Before working on the recovery process make sure that you have unmounted the partition.
To Recover all files:
The following command will recover all the files to a new directory RESTORED_FILES which is in the current working directory. The current working directory should be a new drive.
/opt/ext3grep/bin/ext3grep  /dev/hda2  --restore-all
After this, you will have a copy of all the files in the directory RESTORED_FILES .
To Recover a Single File:
If you want to recover a single file, then find the i-node corresponding to the directory that contains that file. For example, if I accidentally lost a file named backup.sql which was in /home2. First I need to find its i-node:
ll -id /home2/
2 drwxr-xr-x  5 root root 4096 Aug 27 09:21 /home2/
Here the first entry ‘2′ is the i-node of /home2. Now I can use ext3grep to list the contents of /home2.
/opt/ext3grep/bin/ext3grep  /dev/hda2  --ls --inode 2
The first block of the directory is 683. Inode 2 is directory “”.
Directory block 683:
          .-- File type in dir_entry (r=regular file, d=directory, l=symlink)
          |          .-- D: Deleted ; R: Reallocated
Index Next |  I-node   | Deletion time                        Mode        File name
==========+==========+----------------data-from-inode------+-----------+=========
   0    1 d       2                                         drwxr-xr-x  .
   1    2 d       2                                         drwxr-xr-x  ..
   2    3 d      11                                         drwx------  lost+found
   3    4 d  144001                                         drwxr-xr-x  testfol
   4    6 r      13                                         rrw-r--r--  aba.txt
   5    6 d  112001  D 1219344156 Thu Aug 21 14:42:36 2008  drwxr-xr-x  db
   6  end d  176001                                         drwxr-xr-x  log
   7  end r      12  D 1219843315 Wed Aug 27 09:21:55 2008  rrw-r--r--  backup.sql
Here, we see that the file backup.sql is already deleted. I can recover it using ext3grep through two methods.
 Recovery using the file name:
You can recover the file by providing the path of the file to the ext3grep tool. In my case /home2 was added as a separate partition. So I should give the path of the file as simply backup.sql, since it is in root directory of that partition.
umount /home2
/opt/ext3grep/bin/ext3grep  /dev/hda2  --restore-file backup.sql
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from
1217936328 = Tue Aug  5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Loading hda2.ext3grep.stage2... done
Restoring backup.sql
Ensure that the file has been recovered to the folder “RESTORED_FILES”
ll -d RESTORED_FILES/backup.sql
-rw-r--r--  1 root root 49152 Dec 26  2006 RESTORED_FILES/backup.sql
 Recovering using the i-node information.:
You can recover the file also by using the i-node information of the file. The i-node number can be obtained using the command:
/opt/ext3grep/bin/ext3grep  /dev/hda2  --ls --inode 2
------------------------------------
           7  end r      12  D 1219843315 Wed Aug 27 09:21:55 2008  rrw-r--r--  backup.sql
Here the i-node number is 12 and you can restore the file by issuing the following command:
/opt/ext3grep/bin/ext3grep  /dev/hda2  --restore-inode 12
Loading journal descriptors... sorting... done
The oldest i-node block that is still in the journal, appears to be from
1217936328 = Tue Aug  5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Restoring inode.12

 mv RESTORED_FILES/inode.12 backup.sql

 ll -h backup.sql
-rw-r--r--  1 root root 48K Dec 26  2006 backup.sql
To Recover files based on time:
Sometimes there can be a conflict where the ext3grep tool detects a lot of old files that were removed, but have the same name. In this case you have to use the “–after” option. In addition, you will also have to provide a Unix time stamp to recover the file. The Unix time stamp can be obtained from the following link: http://www.onlineconversion.com/unix_time.htm.
For example, if I would like to recover all the files that were deleted after Wed Aug 27 05:20:00 2008, the command used should be as follows:
/opt/ext3grep/bin/ext3grep  /dev/hda2  --restore-all --after=1219828800
Only show/process deleted entries if they are deleted on or after Wed Aug 27 05:20:00 2008.
Number of groups: 23
Minimum / maximum journal block: 689 / 17091
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from
1217936328 = Tue Aug  5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Writing output to directory RESTORED_FILES/
Loading hda2.ext3grep.stage2... done
Restoring aba.txt
Restoring backup.sql
You can also use the ‘–before’ option to get a file before that date.
/opt/ext3grep/bin/ext3grep  /dev/hda2  --restore-all --before=1219828800
You can recover files between a set of dates combining both the above options. For example, in order to recover a file between 12/12/2007 and 12/9/2008, I need to use a command as follows:
/opt/ext3grep/bin/ext3grep  /dev/hda2  --restore-all  --after=1197417600 --before=1228780800
To List the Correct hard links
A recovery of the files can cause a lot of hard link related issues. To find out the hard linked files, you can use the command:
/opt/ext3grep/bin/ext3grep  /dev/hda2 --show-hardlinks
After this, remove the unwanted hard linked files which are duplicates.
To List the Deleted files.
You can use the following command to list the deleted files.
/opt/ext3grep/bin/ext3grep  /dev/hda2 --deleted

Conclusion

The ext3grep is a simple tool that can aid anyone who would have accidentally deleted a file on an ext3 file system, only to later realize that they required it. Even though this tool does not provide you with a complete set of features, it can be used to get a good outline of your hard disk and to find out a way for the recovery of lost data.

Reference