Suresh Kumar Pakalapati's Linux Administration: 07/12/11

Tuesday, July 12, 2011

PostgreSQL for the sage – Must know basics for the system administrators

PostgreSQL or Postgres is an object-relational database management system (ORDBMS). Unlike MySQL, PostgreSQL is not controlled by any single company, it is a community developed project. It is a advanced version of the ‘Ingres’ Database project (which is how the project gets the name post-ingres or postgres ).

Postgres is one of the best open-source database alternative which is fully object oriented and transactions compliant. It has stored procedures, multiple views and a huge set of datatypes. Some of the other notable features are as follows.

Objects and Inheritance

Database consists of objects and the database administrators can design custom or user-defined objects for the tables. Inheritance is another feature. Tables can be set to inherit their characteristics from a “parent” table.

Functions

Functions can be used in Postgres. These can be written in the postgres’ own procedural language called ‘PL/pgSQL’ which resembles Oracle’s procedural language ‘PL/SQL’ or any other common scripting languages which support posgtres’ procedural language like PL/Perl, plPHP, PL/Python, PL/Ruby etc. Run the following in the psql client to determine if functions is enabled:

SELECT true FROM pg_catalog.pg_language WHERE lanname = 'plpgsql';

To create user-defined functions we use the CREATE OR REPLACE FUNCTION command. Example:

CREATE OR REPLACE FUNCTION fib (

fib_for integer

) RETURNS integer AS $$

BEGIN

IF fib_for < 2 THEN

RETURN fib_for;

END IF;

RETURN fib(fib_for - 2) + fib(fib_for - 1);

END;

$$ LANGUAGE plpgsql;

Indexes

An index is like a summary of a certain portion of the table. It is an optimization technique which increases speed of accessing records from a database. PostgreSQL supports indexes like Btree, hash etc. User-defined index methods can also be created. Indexes are created on tables with respect to a particular field (based on which there are a number of queries). As an example for a table:

CREATE TABLE name (

id integer,

fname varchar

lname varchar

);

To create an index on table name with respective to the field id (as there are many queries on this table requesting for firstname or lastname from the id provided), we use the index:

CREATE INDEX name_id_index ON name (id);

Triggers

Triggers are events or functions run upon the action of certain SQL statements which modify data in some records. Depending on the kind of modification we can have multiple triggers in a database. Postgres supports multiple triggers written in PL/PgSQL or it’s scripting counterparts like PL/Python. The trigger function must be defined before the trigger can be created. The trigger function must be declared as a function taking no arguments and returning type trigger. CREATE TRIGGER command is used to declare triggers.

Concurrency

PostgreSQL ensures concurrency with the help of MVCC (Multi-Version Concurrency Control), which gives the database user a “snapshot” of the database, allowing changes to be made without being visible to other users until a transaction is committed.

PostgreSQL’s MVCC keeps all of the versions of the data together in the same partition in the same table. By identifying which rows were added by which transactions, which rows were deleted by which transactions, and which transactions have actually committed, it becomes a straightforward check to see which rows are visible for which transactions.

Inorder to accomplish this, Rows of a table are stored in PostgreSQL as a tuple. Two fields of each tuple are xmin and xmax. Xmin is the transaction ID of the transaction that created the tuple. Xmax is the transaction ID of the transaction that deleted it (if any).

Along with the tuples in each table, a record of each transaction and its current state (in progress, committed, aborted) is kept in a universal transaction log.

When data in a table is selected, only those rows that are created and not destroyed are seen. That is, each row’s xmin is observed. If the xmin is a transaction that is in progress or aborted, then the row is invisible. If the xmin is a transaction that has committed, then the xmax is observed. If the xmax is a transaction that is in progress or aborted and not the current transaction, or if there is no xmax at all, then the row is seen. Otherwise, the row is considered as already deleted.

Insertions are straightforward. The transaction that inserts the tuple simply creates it with the xmax blank and the xmin set to its transaction ID. Deletions are also straightforward. The tuple’s xmax is set to the current transaction. Updates are no more than a concurrent insert and delete.

Views

A view is a table which does not exist in the database. It is a virtual table created from fields in various tables and is joined together based on some criteria. Views can be used in place of tables and will accomplish the task same as that of a table. The CREATE VIEW statement is used to accomplish this eg:

CREATE VIEW best_sellers AS

SELECT * FROM publishers WHERE demand LIKE 'high';

Foreign Keys

The primary key used in one table which is used to refer to the records in a second table is called the foreign key of the second table.

CREATE TABLE products (
    product_no integer PRIMARY KEY,
    name text,
    price numeric
);
CREATE TABLE orders (
    order_id integer PRIMARY KEY,
    product_no integer REFERENCES products (product_no),
    quantity integer
);

Here product_no is the foreign key in the second table created. The foreign key field may have values which are repeated unlike primary keys.

Files Users and Configuration

The main configuration file of Postgres is postgresql.conf. This can be located in the ‘data’ directory. It may be present either in /var/lib (/var/lib/pgsql/data/postgresql.conf) or /usr/local (/usr/local/pgsql/data/postgresql.conf). Temporary changes to the configurations can be made using postmaster command.

The init script that starts the postgres service is /etc/init.d/postgresql . It runs a number of child processes concurrently. The postgres server process is postmaster. These processes and files associated with PosgreSQL are owned by the user/group postgres. The default port used for database connections is 5432

The user postgres is the PostgreSQL database superuser. We can create a number of super users for the database (this accomplished by the create role command ), however, the default super user is postgres. The postgres user has the privilege to access all the databases and files in the server (Unless the user root is created in postgres as a superuser).

Client Authentication is controlled by the file pg_hba.conf in the data directory, e.g., /var/lib/pgsql/data/pg_hba.conf. (HBA stands for host-based authentication.)

Each record specifies a connection type, a client IP address range (if relevant for the connection type), a database name or names, and the authentication method to be used for connections matching these parameters.A record is typically in one of two forms:

local database authentication-method [ authentication-option ]

host database IP-address IP-mask authentication-method [ authentication-option ]

local : This record pertains to connection attempts over Unix domain sockets.

host : This record pertains to connection attempts over TCP/IP networks.

database : Specifies the database that this record applies to. The value all specifies that it applies to all databases, while the value sameuser identifies the database with the same name as the connecting user.

authentication methods

trust: The connection is allowed unconditionally.

reject: The connection is rejected unconditionally.

password: The client is required to supply a password which is required to match the database password that was set up for the user.

md5: Like the password method, but the password is sent over the wire encrypted using a simple challenge-response protocol.

ident: This method uses the “Identification Protocol” as described in RFC 1413. It may be used to authenticate TCP/IP or Unix domain socket connections, but its reccomended use is for local connections only and not remote connections.

Front-ends

The minimalistic front-end for PostgreSQL is the psql command-line. It can be used to enter SQL queries directly, or execute them from a file. phpPgAdmin is a web-portal used for PostgreSQL administration written in PHP and based on the popular phpMyAdmin. Likewise pgAdmin is a graphical front-end administration tool for PostgreSQL, which has support on multiple platforms. The latest stable version of the same is pgAdmin III.

Some administration related commands

Command to login to psql database mydb as user myuser:

psql -d mydb -U myuser

Command to login to psql database mydb as user myuser on a different host myhost:

psql -h myhost -d mydb -U myuser

If the port the server runs is different we use -p [port number] . Upon entering the psql shell the prompt will show the database name currently being used. In the above example it will show

mydb=> (if logged in as an ordinary user )

mydb=# (if logged in as a super user like postgres)

Create a PostgreSQL user

There are two ways to create a postgres database user. The only user initially allowed to create users is postgres. So one has to switch to this user before creating other users with varying privileges.

1. Creating the user in the shell prompt, with createuser command.

switch to the postgres user with:

su - postgres

createuser tom

Shall the new role be a superuser? (y/n) n

Shall the new role be allowed to create databases? (y/n) y

Shall the new role be allowed to create more new roles? (y/n) n

2. Creating the user in the PSQL prompt, with CREATE USER command.

switch to the postgres user with:

su - postgres

create user mary with password 'marypass';

Creating and deleting a PostgreSQL Database

There are two way to create databases.

1. Creating database in the PSQL prompt, with createuser command.

CREATE DATABASE db1 WITH OWNER tom;

2. Creating database in the shell prompt, with createdb command.

createdb db2 -O mary

To delete an entire database from within the psql prompt do :

DROP DATABASE db1;

Determining execution time of a query

Turn on timing with

\timing

Now execute the qery:

SELECT * from db1.employees ;

Time: 0.065 ms

Calculate postgreSQL database size in disk

SELECT pg_database_size('db1');

to get the values in human readable format

SELECT pg_size_pretty(pg_database_size('db1'));

to calculate postgreSQL table size in disk

SELECT pg_size_pretty(pg_total_relation_size(‘big_table’));

Slash commands used in psql

To list all slash commands and thier purpose. Login to psql and issue to the command \? . Some of the most commonly used slash commands are the following:

List databases	\l
System tables	\dS
Types	\dT
Functions	\df
Operators	\do
Aggregates	\da
Users	\du
Quit from psql	\q
Connect to different database db2	\c db2
Describe Table/index/view/sequence	\d

The below can be used with a specific table/index/view name for description of the specific table/index/view

Tables	\dt
Indexes	\di
Sequences	\ds
Views	\dv

Useful Bash commands

Bash command to list all the postgresql databases:

psql -l #This can be run as a unix user who is also a super user in postgresql

Indirect bash command to list all the postgresl users:

psql -c '\du' #-c is used to run an internal or sql command in psql shell

Backing up and restoring databases

To dump the database to an sql file use the bash command:

pg_dump mydb > db.out

To restore a database from an sql backup file (via bash)

psql -d newdb -f backupdb.out

or

psql -f backupdb.out newdb

(here the database newdb must be already created and the file backupdb.out must be present in the current directory)

To take the backup of all the Postgres databases in the server:

pg_dumpall > /var/lib/pgsql/backups/dumpall.sql

(Only possible with the postgres or the database superuser )

Resetting database user’s password

To change the password for a database user (say ‘thomas’):

ALTER USER thomas WITH PASSWORD 'newpassword';

This same command can be used to reset the password for the postgresql super user postgres, but in this case, you will have to enable password less login for postgres user by adding the following line to the top of the file pg_hba.conf in the data directory of postgres. Once the password is reset this line can be removed:

local all postgres trust

Next we issue the same command but for the user postgres

ALTER USER postgres WITH PASSWORD 'newpassword';

To create a super user via bash with multiple roles

createuser -sPE mysuperuser

Instead of this we can also use the below psql shell command:

CREATE ROLE mysuperuser2 WITH SUPERUSER CREATEDB CREATEROLE LOGIN ENCRYPTED PASSWORD 'mysuperpass2';

Physical database files in postgres

The files in data/base are named by the oid (Object Identifier) of the database record in

pg_database, like this:

cd /var/lib/pgsql/data/base

ls -l

total 33

drwx------ 22 postgres postgres 4096 Jul 23 20:06 ./

drwx------ 11 postgres postgres 4096 Aug  1 05:59 ../

drwx------  2 postgres postgres 4096 Jun 20 09:32 1/

drwx------  2 postgres postgres 4096 Mar  3 13:36 10792/

drwx------  2 postgres postgres 4096 Jun 20 15:09 10793/

drwx------  2 postgres postgres 4096 May 27 01:40 16497/

drwx------  2 postgres postgres 4096 May 27 01:40 16589/

drwx------  2 postgres postgres 4096 Jun 20 10:28 16702/

drwx------  2 postgres postgres 4096 May 27 01:40 16764/

drwx------  2 postgres postgres 4096 May 27 01:40 16785/

drwx------  2 postgres postgres 4096 Aug  1 04:37 16786/

drwx------  2 postgres postgres 4096 Aug  1 04:36 19992/

drwx------  2 postgres postgres 4096 May 27 01:40 19997/

To obtain the oid, execute the following command in psql prompt

postgres=# select oid,datname from pg_database order by oid;

   oid  |         datname

---------+--------------------------

1 | template1

10792 | template0

10793 | postgres

16497 | gadgetwi_Unable

16589 | vimusicc_filehost

16702 | personea_altissimo

16764 | shopping_businessfinance

16785 | ansonyi_wp2

16786 | ansonyi_wp

19992 | globook_PostgreSQL

Duplicity – secure incremental backup

Overview:
Duplicity is a tool to create GPG-encrypted (this way you can store your backups at remote servers without having to worry about who has access to your data) incremental backups to remote servers. Its a quite handy and secure method.
Installation
The steps to install duplicity is as follows
wget http://code.launchpad.net/duplicity/trunk/0.6.02/+download/duplicity-0.6.02.tar.gz
tar -xvf duplicity-0.6.02.tar.gz
cd duplicity-0.6.02.tar.gz
python setup.py install
If you come across any librsync.so errors . You can resolve it by using the following steps
wget http://sourceforge.net/projects/librsync/files/librsync/0.9.7/librsync-0.9.7.tar.gz/download
tar -xzvf librsync-0.9.7.tar.gz
cd librsync-0.9.7
./configure
make
make install
Now we got duplicity installed
Create a GPG key
In order to be able to encrypt your backups, you have to create a GPG key. Open a second shell and run the following command (this generates some “randomness” on your system, which will be useful to create a secure key). Kill the command with CTRL+C when you are done with key generation.
while /bin/true; do cat /var/log/messages > ~/temp.txt; sleep 1; done;
On your other shell, create your GPG key. Be sure to use a secure passphrase and to copy/write down the key ID which is displayed at the end of the generation process (we’ll need it for ftplicity). Also, make sure to backup the key to a secure location outside your server. As all your backups will be encrypted, they will be worthless if your server crashes and you lose the key.
gpg –gen-key
Default options should be fine. This will create your key in ~/.gnupg/. Once its done you can verify the existence of your key using the command
gpg –list-keys
The next step is to prepare an off-site location to receive the backup files.
The software supports different protocols like FTP,RSYNC,SCpP.
I am restricting myself with SCP here
Simple unEncrypted Backup over SCP
Setup ssh keys on the backup server allowing root to seamlessly login to the backup server.
duplicity /home/me scp://uname@other.host/usr/backup

If the above command is run repeatedly, the first session will be a full backup, and subsequent ones will be incremental.
The full option can be used to force a full backup. The next command also excludes the /tmp directory.
duplicity full –exclude /tmp /home/me scp://uname@other.host/usr/backup
Basic restore command—restore the /home/me directory backed up with scp above to directory restored_dir:

duplicity scp://uid@other.host//usr/backup restored_dir

To enable verbose mode use the option -v
Specify verbosity level (0 is total silent, 4 is the default, and 9 is noisiest)
The command would look like

duplicity -v5 /home/me scp://uid@other.host/some_dir

Encrypted Backup over SCP

Here we use the GPG key generated earlier
The format would be look like this

duplicity
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --include=/boot \
    --include=/etc \
    --include=/home \
    --include=/root \
    --include=/var/lib/mysql \
    --exclude=/** \
    ${SOURCE} ${DEST}

Needless to say the include and exclude options are for specifying the backup criteria.

duplicity –encrypt-key=”FFF7730B” –sign-key=”FFF7730B” -v5 /home/me scp://uid@other.host/some_dir

you will be asked for a GnuPG passphrase. You can type in any password you like; this has to be done everytime you run duplicity. The backup will be encrypted with the help of GnuPG. Permissions and ownerships will be preserved in the backup.

To avoid this issue , you can simply set the passphrase as environment variable using the command

export PASSPHRASE=gpgpassphrase

Backup Format & Explanation

Once it is executed , you can see the backup in the server and it would look like the following way

duplicity-full-signatures.2005-11-27T01:00:01-05:00.sigtar.gpg
duplicity-full.2005-11-27T01:00:01-05:00.manifest.gpg
duplicity-full.2005-11-27T01:00:01-05:00.vol1.difftar.gpg
duplicity-full.2005-11-27T01:00:01-05:00.vol2.difftar.gpg

The signatures file contains, signatures of each file that is backed up so that Duplicity can figure out which part of a file has changed. With that information it can upload only the missing part to complete a new backup set.

The manifest file contains a listing of all the files in the backup set and a SHA1 hash of each file, probably so Duplicity can tell very quickly whether a file has been changed or not since the last backup.

The volume files (vol1 and vol2) contain the actual file data. It appears that Duplicity volumes are at most 5MB. That’s helpful during restores so the entire backup set does’t not need to be downloaded to retrieve a single file. Duplicity will only download the volume containing that file.

Common Options:

Depending on the parameters and order of the parameters in the duplicity command, different functions can be performed. For example, an archive can be verified to see if a complete backup was made and what files, if any, have changed since the last backup.

duplicity verify [options] source_url target_directory
duplicity verify -v4 scp://user@bakuphost/etc /etc

Listfiles

It’s sometimes handy to check which files are in the latest backup set.

duplicity list-current-files [options] target_url
The command would look like
duplicity list-current-files –archive-dir /root/test/ scp://user@backupserver/some_dir

Restore

The main purpose of backup is to restore data which has been lost. The following is the common format for restoring the data from the latest backup

duplicity scp://uid@other.host/some_dir /home/me

Duplicity enters restore mode because the URL comes before the local directory. If we wanted to restore just the file “Mail/article” in /home/me as it was three days ago into /home/me/restored_file:

duplicity -t 3D –file-to-restore Mail/article scp://uid@other.host/some_dir /home/me/restored_file

The following command compares the files we backed up, so see what has changed since then:

duplicity verify scp://uid@other.host/some_dir /home/me

The following command can be used to retrieve a single file from backup

duplicity –encrypt-key “” –sign-key “” –file-to-restore home/sburke/file.txt scp://user@server.com/backup/ /var/tmp/file.txt

The path to the file that is to be restored is relative to the directory on which the backup set is based. So in the command above, home/sburke/file.txt plus the directory on which we based our backup (/backup) equals /backup/home/sburke/file.txt/. It would not work to put /backup/home/sburke/file.txt as the source path because the backup will not recognize /backup as a valid path. The last portion in the above command is the location where the file will be restored.

To delete old backups, we can use the following command

 duplicity --full --remove-older-than 1Y /media/data/backup scp://uid@server/personal

To automate the tasks, you can write a shell script

MySQL Storage Engines – an overview, their limitations and an attempt for comparison

Most of us are using Mysql database and majority don’t know how to choose the data base engines, what are the different types of storage engines available in mysql and how they differ from each other. In this article let me give you a brief idea about the Storage Engines and what are the limitations and where to use these various storage engines.

One of the greatest things about MySQL, other than being free, widely supported and fast, is the flexibility of choosing different storage engines for different tables. These storage engines act as handlers for different table types. Thus MySQL storage engines include both those that handle transaction-safe tables and those that handle non-transaction-safe tables along with many others. MySQL does this through their Pluggable Storage Engine Architecture.

MySQL Storage Engines Overview

To determine which storage engines your server supports, we use the SHOW ENGINES statement. The value in the Support column indicates whether an engine can be used. A value of YES, NO, or DEFAULT indicates that an engine is available, not available, or available and currently set as the default storage engine.

mysql> SHOW ENGINES\G
*************************** 1. row ***************************
Engine: MyISAM
Support: DEFAULT
Comment: Default engine as of MySQL 3.23 with great performance
*************************** 2. row ***************************
Engine: MEMORY
Support: YES
Comment: Hash based, stored in memory, useful for temporary tables
*************************** 3. row ***************************
Engine: InnoDB
Support: YES
Comment: Supports transactions, row-level locking, and foreign keys
*************************** 4. row ***************************
Engine: BerkeleyDB
Support: NO
Comment: Supports transactions and page-level locking
*************************** 5. row ***************************
Engine: BLACKHOLE
Support: YES
Comment: /dev/null storage engine (anything you write to it disappears)
*************************** 6. row ***************************
Engine: EXAMPLE
Support: YES
Comment: Example storage engine
*************************** 7. row ***************************
Engine: ARCHIVE
Support: YES
Comment: Archive storage engine
*************************** 8. row ***************************
Engine: CSV
Support: YES
Comment: CSV storage engine
*************************** 9. row ***************************
Engine: ndbcluster
Support: NO
Comment: Clustered, fault-tolerant, memory-based tables
*************************** 10. row ***************************
Engine: FEDERATED
Support: YES
Comment: Federated MySQL storage engine
*************************** 11. row ***************************
Engine: MRG_MYISAM
Support: YES
Comment: Collection of identical MyISAM tables
*************************** 12. row ***************************
Engine: ISAM
Support: NO
Comment: Obsolete storage engine

The listing shows the full list of available database engines.

There are a number of ways you can specify the storage engine to use. The simplest method, if you have prefer a particular engine type that fits most of your database needs then you can set the default engine type within the MySQL configuration file using the following commands(using the option storage_engine or when starting the database server (by supplying the –default-storage-engine or –default-table-type options on the command line).

More flexibility is offered by allowing you to specify the default storage engine to be used MySQL, the most obvious is to specify the engine type when creating the table:

CREATE TABLE mytable (id int, title char(20)) ENGINE = INNODB

You can also alter the storage engine used in an existing table:

ALTER TABLE mytable ENGINE = MyISAM

However, you should be careful when altering table types in this way, as making a modification to a type that does not support the same indexes, field types or sizes may mean that you lose data. If you specify a storage engine that doesn’t exist in the current database then a table of type MyISAM (the default) is created instead.

Before taking any decision about which engine we need to choose, first we need to think about the different core functionality provided by each engine that allow us to differentiate between them. We can divide up the core functionality into four areas; the supported field and data types, locking types, indexing and transactions. Some engines have unique functionality that can also drive your decision.

Field and Data Types

Although all of the engines support the common data types, i.e., integers, reals and character based storage, not all engines support other field types, particularly the BLOB (binary large object) or TEXT types. Other engines may support only limited character widths and data sizes.

These limitations, while directly affecting the information you store may also have a related effect to the types of searches you perform, or the indexes you create on that information. In turn, these differences can affect the performance and functionality of your application as you may have to make decisions about functionality based on the storage engine choice you make for the type of data you are storing.

Locking

Locking within database engines defines how access and updates to information are controlled. When an object in the database is locked for updating, other processes cannot modify (or in some cases read) the data until the update has completed.

Locking not only affects how many different applications can update the information in the database, it can also affect queries on that data. The reason for this is that the queries may be accessing data that may be being altered or updated. In general, such delays are minimal. The bulk of the locking mechanism is devoted to preventing multiple processes updating the same data. Since both additions (INSERT statements) and alterations (UPDATE statements) to the data require locking, you can imagine that multiple applications using the ame database can have a significant impact.

Locks are supported by different storage engines at different object levels, and these levels affect the concurrency of access to the information. Three different levels are supported, table locking, block locking and row locking. Table locking is most commonly supported and is the locking provided in MyISAM. It locks an entire table during an update. This will limit the number of applications that are updating a specific table to just one, and this can affect heavily used multi-user databases because it introduces delays into the update process.

Page level locking is used by the Berkeley DB storage engine and locks data according to the page (8Kb) of information that is being uploaded. When performing updates across a range of locations within the database, the locking is not a problem, but because adding rows involves locking the final 8Kb of the data structure, adding large numbers of rows, particularly of small data, can be a problem.

Row level locking provides the best concurrency; only individual rows within a table are locked, which means that many applications can be updating different rows of the same table without causing a lock situation. Only the InnoDB storage engine supports row level locking.

Indexing

Indexing can dramatically increase the performance when searching and recovering data from the database. Different storage engines provide different indexing techniques and some may be better suited for the type of data you are storing.
Some storage engines simply do not support indexing at all either because they use the indexing of the underlying tables (in the MERGE engine for example) or because the data storage method does not allow indexing (FEDERATED or BLACKHOLE engines).

Transactions

Transactions provide data reliability during the update or insert of information by enabling you to add data to the database, but only to commit that data when other conditions and stages in the application execution have completed successfully. For example, when transferring information from one account to another you would use transactions to ensure that both the debit from one account and the credit to the other completed successfully. If either process failed, you could cancel the transaction and the changes would be lost. If the process completed,then we would confirm it by committing the changes.

Storage Engines:

MyISAM

The MyISAM engine is the default engine in most MySQL installations and is a derivative of the original ISAM engine type supported in the early versions of the MySQL system. The engine provides the best combination of performance and functionality, although it lacks transaction capabilities (use the InnoDB or BDB engines) and uses table-level locking.

Unless you need transactions, there are few databases and applications that cannot effectively be stored using the MyISAM engine. However, very high-performance applications where there are large numbers of data inserts/updates compared to the number of reads can cause performance problem for the MyISAM engine. It was originally designed with the idea that more than 90% of the database access to a MyISAM table would be reads, rather than writes.

With table-level locking, a database with a high number of row inserts or updates becomes a performance bottleneck as the table is locked while data is added. Luckily this limitation also works well within the restrictions of a non-transaction database.

Limitations

1. Crash recovery can be a time-consuming process owing to MyISAM’s lack of a transaction log. Expect to have to perform such a recovery during your peak-usage period. (If it can happen, it almost certainly will.)
2. MyISAM does not support or enforce foreign key constraints.
3. All UPDATE queries to the same table are serialized — that is to say they carried out one at a time — and block all other queries, including SELECTs, from executing. This effect is pronounced on most busy multi-user applications.
4. MyISAM supports concurrent INSERTs only in certain cases.
5. Maximum of 64 indexes per row,

When to use MyISAM

1. Your application demands full-text search capabilities. Rather than push all your data into MyISAM tables to gain full-text searching, it may be viable to split your dataset into data that must be indexed for full-text searching — and stored using MyISAM — and data that should be stored using a transactional engine, such as InnoDB. A scheduled background job may then asynchronously update your MyISAM full-text indexes and provide links from the InnoDB data as appropriate. This is a common example of how to gain the best from all worlds.
2. Your application is effectively single-user — there are very few concurrent queries hitting the MySQL server.
3. You are performing limited testing or development where performance is not under scrutiny.

MERGE

The MERGE storage engine, also known as the MRG_MyISAM engine, is a collection of identical MyISAM tables that can be used as one. You can then execute queries that return the results from multiple tables as if they were just one table. Each table merged must have the same table definition.

The MERGE table is particularly effective if you are logging data directly or indirectly into a MySQL database and create an individual table per day, week or month and want to be able to produce aggregate queries from multiple tables. There are limitations to this however, you can only merge MyISAM tables and the identical table definition restriction is strictly enforced. Although this seems like a major issue, if you had used one of the other table types (for example InnoDB) then the merge probably wouldn’t be required.

Limitations

1. You can use only identical MyISAM tables for a MERGE table.
2. You cannot use a number of MyISAM features in MERGE tables. For example, you cannot create FULLTEXT indexes on MERGE tables. (You can, of course, create FULLTEXT indexes on the underlying MyISAM tables, but you cannot search the MERGE table with a full-text search.)
3. If the MERGE table is non-temporary, all underlying MyISAM tables must be non-temporary, too. If the MERGE table is temporary, the MyISAM tables can be any mix of temporary and non-temporary.
4. MERGE tables use more file descriptors. If 10 clients are using a MERGE table that maps to 10 tables, the server uses (10 × 10) + 10 file descriptors. (10 data file descriptors for each of the 10 clients, and 10 index file descriptors shared among the clients.)
5. Key reads are slower.Key reads are slower. When you read a key, the MERGE storage engine needs to issue a read on all underlying tables to check which one most closely matches the given key. To read the next key, the MERGE storage engine needs to search the read buffers to find the next key. Only when one key buffer is used up does the storage engine need to read the next key block.

When to Use MERGE

1. Easily manage a set of log tables. For example, you can put data from different months into separate tables, compress some of them with myisampack, and then create a MERGE table to use them as one.
2. Obtain more speed. You can split a big read-only table based on some criteria, and then put individual tables on different disks. A MERGE table on this could be much faster than using the big table.
3. Perform more efficient searches. If you know exactly what you are looking for, you can search in just one of the split tables for some queries and use a MERGE table for others. You can even have many different MERGE tables that use overlapping sets of tables.
4. Perform more efficient repairs. It is easier to repair individual tables that are mapped to a MERGE table than to repair a single large table.
5. Instantly map many tables as one. A MERGE table need not maintain an index of its own because it uses the indexes of the individual tables. As a result, MERGE table collections are very fast to create or remap. (Note that you must still specify the index definitions when you create a MERGE table, even though no indexes are created.
6. If you have a set of tables from which you create a large table on demand, you should instead create a MERGE table on them on demand. This is much faster and saves a lot of disk space.
7. Exceed the file size limit for the operating system. Each MyISAM table is bound by this limit, but a collection of MyISAM tables is not.
8. You can create an alias or synonym for a MyISAM table by defining a MERGE table that maps to that single table. There should be no really notable performance impact from doing this (only a couple of indirect calls and memcpy() calls for each read).

MEMORY

The MEMORY storage engine (previously known as the HEAP storage engine) stores all data in memory; once the MySQL server has been shut down any information stored in a MEMORY database will have been lost. However, the format of the individual tables is kept and this enables you to create temporary tables that can be used to store information for quick access without having to recreate the tables each time the database server is started.

Long term use of the MEMORY storage engine is not generally a good idea, because the data could so easily be lost. However, providing you have the RAM to support the databases you are working on, use of MEMORY based tables is an efficient way of running complex queries on large data sets and benefitting from the performance gains.

The best way to use MEMORY tables is to use a SELECT statement to select a larger data set from your original, disk-based, tables and then sub-analyse that information for the specific elements you want.

FEDERATED

The FEDERATED storage engine (added in MySQL 5.03) enables you to access data from remote MySQL database (other databases may be supported in the future) as if it were a local database. In effect, the MySQL server acts as a proxy to the remote server, using the MySQL client access library to connect to the remote host, execute queries and then reformat the data into the localized format.

In essence, it is a way for a server, rather than a client, to access a remote database and can be an effective way of combining data from multiple hosts or of copying specific data from remote databases into local tables without the use of data exports and imports.

ARCHIVE

The ARCHIVE storage engine supports only the INSERT and SELECT statements, but does support most of the MySQL field types. Information stored in an ARCHIVE storage engine table is compressed and cannot be modified and so ARCHIVE tables are perfect for storing log data (which you don’t want to be able to change) or information that is no longer in active use (for example, old invoicing or sales data).

While the information is stored very efficient, care should be taken when accessing data stored in the ARCHIVE tables. Because the information is compressed, selects have to read the entire table, and that also means decompressing the information. This can obviously increase the time taken to perform complex searches and retrievals. If you are performing a large number of queries on the information in these tables it may be easier to temporarily copy your data to another, uncompressed, data type such as MyISAM.

CSV

The CSV storage engine stores data not in a binary format, but in the form a CSV (Command Separated Values) file. Because of this, there are limitations to the data stored. It is not an efficient method for storing large volumes of data, or larger data types like BLOB, although such types are supported. There is also no indexing. However, because the data is stored in the CSV format it is exceedingly portable; these CSV files generated can easily be imported into many different software packages, including Excel, OpenOffice and database systems like Access or FileMaker.

In general, the CSV engine is impractical as a general database engine. It is, however, probably the most effective and easiest method for data exchange. What makes it so convenient is that we can use SELECT and INSERT statements to create the database, which in turn means that we can easily produce CSV files based on queries of other data.

With some careful work, the CSV storage engine can also be used as an effective way of getting information into MySQL. Here, you can create the tables first, shutdown the MySQL server, copy over CSV files that you have exported from Excel, Access or another database, and you can then import the data and copy it over to MyISAM or InnoDB tables.

BLACKHOLE

The Blackhole storage engine accepts but does not store data and retrievals always return an empty set. The functionality can be used in distributed database design where data is automatically replicated, but not stored locally. Although you can create tables and indexes, all SQL statements that would add or update information to the database are executed without actually writing any data. The database structure is retained, however, and you can create any indexes on the (non-existent) information that you want.

Although this seems like a futile exercise, it does allow you to test out database structures and play with table definitions without actually creating any data. Even more useful, however, is that SQL statements on BLACKHOLE databases are written to the binary log, and therefore are replicated to slave databases.

You can use this functionality to update one or more slaves directly without writing any local data. There are a number of potential uses for this functionality.

ISAM

The ISAM storage engine was the original engine type available with versions of MySQL up until MySQL 3.23, when the MyISAM storage engine was introduced. ISAM has a number of different limitations that make it impractical as a database engine. These include the storage format, which is native to the platform (and therefore not portable between systems), a maximum table size of just 4GB and limited text searching facilities. Indexes are also more limited. Since MyISAM is supported on the same platforms as ISAM, and provides better compatibility, portability and performance.

ISAM is included for backwards compatibility, you certainly shouldn’t use ISAM for new databases, use MyISAM instead.

Berkeley DB (BDB)

The Berkeley DB (or BDB) engine is based on the technology provided by the Berkeley DB storage system developed by SleepyCat software. BDB is a hash based storage mechanism, and the keys to the hash values are stored very efficiently. This makes the recovery of information–especially when accessed directly using a unique key incredibly quick, and by far the quickest of the available database types. Recovering full records is even quicker if you the data is short enough to be stored with the unique key (i.e., under 1024 bytes long). BDB is also one of only two types of storage engine that support transactions.

BDB is, however, limited in other ways. Although it uses page locking, locking only 8192 bytes of a table, rathter than the entire table, during an update this can cause problems if you are performing a large number of updates in the same page (for example, inserting many rows). There is unfortunately no way round this. Sequential data access–for example a large quantity of rows matching non-indexed data–can be a lot slower because the data needs to be scanned row by row.

Recovery of information with BDB tables can also be a problem. Data in BDB is stored in a combination of the key index, the data file and binary data logs. A loss of data in any of these sections, even just one of the data logs, can make the data in the database totally unrecoverable.

Where BDB shines therefore is in locations where you can access specific blocks of data by a unique key that does not frequently change.

InnoDB

The InnoDB Engine is provided by Innobase Oy and supports all of the database functionality (and more) of MyISAM engine and also adds full transaction capabilities (with full ACID (Atomicity, Consistency, Isolation, and Durability) compliance) and row level locking of data.

The key to the InnoDB system is a database, caching and indexing structure where both indexes and data are cached in memory as well as being stored on disk. This enables very fast recovery, and works even on very large data sets. By supporting row level locking, you can add data to an InnoDB table without the engine locking the table with each insert and this speeds up both the recovery and storage of information in the database.

As with MyISAM, there are few data types that cannot effectively be stored in an InnoDB database. In fact, there are no significant reasons why you shouldn’t always use an InnoDB database. The management overhead for InnoDB is slightly more onerous, and getting the optimization right for the sizes of in-memory and on disk caches and database files can be complex at first. However, it also means that you get more flexibility over these values and once set, the performance benefits can easily outweigh the initial time spent. Alternatively, you can let MySQL manage this automatically for you.

If you are willing (and able) to configure the InnoDB settings for your server, then I would recommend that you spend the time to optimize your server configuration and then use the InnoDB engine as the default.

Limitations
1. Queries that result in large scans of the tablespace are often slower when using InnoDB.
2. Consistency is only maintained if the underlying operating system and hardware can guarantee buffer flushes. This limitation is inherent in all transactional database management systems.
3. InnoDB tables consume a greater amount of space on-disk than their MyISAM equivalents. This is now largely irrelevant given the ubiquity of large (multiple hundreds of gigabytes) hard disk drives.

When to use InnoDB

1. You are developing an application that requires ACID compliance. At the very least, your application demands the storage layer support the notion of transactions.
2. You require expedient crash recovery. Almost all production sites fall into this category, however MyISAM table recovery times will obviously vary from one usage pattern to the next. To estimate an accurate figure for your environment, try running myisamchk over a many-gigabyte table from your application’s backups on hardware similar to what you have in production. While recovery times of MyISAM tables increase with growth of the table, InnoDB table recovery times remain largely constant throughout the life of the table.
3. Your web site or application is mostly multi-user. The database is having to deal with frequent UPDATEs to a single table and you would like to make better use of your multi-processing hardware.

NDB Cluster

Another well-known storage engine. It allows one to cluster tables. That means you have multiple masters, which all can do inserts, updates and deletes on the same table. NDB has row-level locking, but not full multi-version concurrency control. Due to some architectural limitations of how the MySQL server executes joins, they perform rather poorly with NDB storage engine. For that reason, it is best used for single table primary key lookups. So, you would not likely port your entire web-facing database to NDB Cluster for example.

Summary

As you may have been able to conclude from the above summary of the different storage engines available, there are few reasons not to use either the MyISAM or InnoDB engine types. MyISAM will do in most situations, but if you have a high number of updates or inserts compared to your searches and selects then you will get better performance out of the InnoDB engine. To get the best performance out of InnoDB you need to tweak the parameters for your server, otherwise there is no reason not to use it.
The MERGE engine is an exceedingly effective way of querying data from multiple, identically defined, tables. The MEMORY engine is the best way to perform a large number of complex queries on data that would be inefficient to search on a disk based engine. The CSV engine is a great way to export data that could be used in other applications. BDB is excellent for data that has a unique key that is frequently accessed. The following table provides an overview of some storage engines provided with MySQL:

Recompile PHP for Litespeed webserver

LSAPI is LiteSpeed’s open-source API between external applications and LiteSpeed Web Server. This how-to is for compiling and installing PHP + LSAPI on Linux, OS X, FreeBSD, Solaris, and so on. LiteSpeed comes with PHP 4.4.x compiled with LSAPI. To change the LightSpeed php to the latest stable one, we need to recompile the php with the LSAPI for LiteSpeed.

I am trying to provide some info for doing this task.

Instructions:

a) Download the required php from php.net

wget http://in2.php.net/get/php-5.2.11.tar.bz2/from/this/mirror

tar -xzvf php-5.2.11.tar.bz2

cd php-5.2.11/sapi

b) Download and the latest LSAPI for PHP from http://www.litespeedtech.com/ into the “sapi” folder of php source:

wget http://www.litespeedtech.com/packages/lsapi/php-litespeed-4.10.tgz

tar -xvf php-litespeed-4.10.tgz

c) Change directory to root PHP source directory and run commands:

cd ..

touch ac*

./buldconf –force

d) Configure/Compiling

If you have an apache compilation already , remove the part option “–with-apxs” and use ” –with-litespeed” . Also you may need to change the “prefix” option too . So a basic configure command would be like this

./configure ‘–prefix=/lsphp5′ ‘–with-litespeed’ ‘with-mysql’

make

make install

Note: You must compile PCRE support inorder for the default auto-index php script to work correctly

Post Install Configurations

1. Replace the existing lsphp binary with the new one.

Change the directory to the current installation of “lsws” ( it varies on different machines)

cd /usr/local/lsws/fcgi-bin

mv lsphp lsphp.old

cp /php-5.2.11/sapi/litespeed/php lsphp-5.2.11

ln -sf lsphp5 lsphp-5.2.11

Check Installation

/usr/local/lsws/fcgi-bin/lsphp5 -v

It should return something like:

PHP 5.2.11 (litespeed) (built: Sep26 2008 14:09:09)
Copyright (c) 1997-2004 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies

Notice litespeed in parenthesis. This means that the PHP binary has litespeed (LSAPI) support builtin.

2. php.ini

The php.ini file will be located at /usr/local/lsws/php/php.ini

If we want to use the old PHP.ini just copy it here.

# cd /usr/local/lsws/php
# mv php.ini php.ini.old
# cp /usr/local/ZEND/etc/php.ini .

3. Restart Litespeed Webserver

Finally restart LSWS and use our new PHP binary.

/usr/local/lsws/bin/lswsctrl restart

How to recompile Kernel?

Kernel Recompilation

Compiling custom kernel has its own advantages and disadvantages. It helps to optimize the kernel to your environment (hardware and usage patterns). I shall try to guide you through Kernel recompilation process.

Step 1:

Download the kernel source

cd /usr/local/src

wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-x.y.z.tar.bz2

Note: Replace x.y.z with actual version number.

Step 2:

Extract the source file

tar -xjvf linux-x.y.z.tar.bz2

Step 3:

Patching the Kernel

If you are requested to apply any patches , follow these steps

a) Move the downloaded kernel patch to the /usr/local/src directory.

b) Extract the patch file

c) Patch the kernel source using the extracted patch file

cd /usr/local/src/linux-x.y.z

patch -p1 < patchfile-2.2.x

Now the Kernel Source is patched against known vulnerabilities.

Step 4:

Configuration

If you are trying to upgrade the Kernel of already running server , it is always better use the existing configuration. To do this follow these steps

#uname -a
Linux Server1 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

# cd /boot

There you can see different configuration files as given below

#ls

config-2.6.18-128.el5 initrd-2.6.18-128.el5.img message
config-2.6.18-164.el5 initrd-2.6.18-164.el5.img quota.user*

identify the configuration file corresponding to the version of OS installed , In our case it is config-2.6.18-164.el5 . We are copying this file to the downloaded kernel source to use it during configuration.

#cp -p config-2.6.18-164.el5 /usr/local/src/linux-x.y.z/.config

# make clean

# make mrproper

# make menuconfig

You have to select different options as per your need. If you intended to use the existing configuration ,specify the path to the file ( .config in this case) by selecting the option

“Load an Alternative configuration file”

Step 5: Compilation

Compile the Kernel using the following commands

Compile to create a compressed kernel image

# make

Compile kernel modules:

# make modules

Install kernel modules

# make modules_install

Step 6: Install Kernel

If the above steps completed without any errors , now its the time to Install the new Kernel

# make install

It will install three files into /boot directory as well as modification to your kernel grub configuration file:

System.map-x.y.z

config-x.y.z

vmlinuz-x.y.z

Step 7: Create the Initrd image

Type the following command :

# cd /boot

# mkinitrd -o initrd.img-x.y.x x.y.z

initrd images contains device driver which needed to load rest of the operating system later on. Not all computer requires it, but it is better to create one

Step 8: Boot Loader Modification

Mofdify the boot loader to boot the new OS as default . Check the documentaions corresponding to your boot loader

Step 9: The last step

execute the following command

#reboot

Wait a few minutes and once it is up , you can see that the new Kernel is loaded