Tuesday, July 12, 2011

PostgreSQL for the sage – Must know basics for the system administrators

PostgreSQL or Postgres is an object-relational database management system (ORDBMS). Unlike MySQL, PostgreSQL is not controlled by any single company, it is a community developed project. It is a advanced version of the ‘Ingres’ Database project (which is how the project gets the name post-ingres or postgres ).
Postgres is one of the best open-source database alternative which is fully object oriented and transactions compliant. It has stored procedures, multiple views and a huge set of datatypes. Some of the other notable features are as follows.
Objects and Inheritance
Database consists of objects and the database administrators can design custom or user-defined objects for the tables. Inheritance is another feature. Tables can be set to inherit their characteristics from a “parent” table.
Functions
Functions can be used in Postgres. These can be written in the postgres’ own procedural language called ‘PL/pgSQL’ which resembles Oracle’s procedural language ‘PL/SQL’ or any other common scripting languages which support posgtres’ procedural language like PL/Perl, plPHP, PL/Python, PL/Ruby etc. Run the following in the psql client to determine if functions is enabled:
SELECT true FROM pg_catalog.pg_language WHERE lanname = 'plpgsql'; 
To create user-defined functions we use the CREATE OR REPLACE FUNCTION command. Example:
CREATE OR REPLACE FUNCTION fib (

fib_for integer

) RETURNS integer AS $$

BEGIN

IF fib_for < 2 THEN

RETURN fib_for;

END IF;

RETURN fib(fib_for - 2) + fib(fib_for - 1);

END;

$$ LANGUAGE plpgsql;
Indexes
An index is like a summary of a certain portion of the table. It is an optimization technique which increases speed of accessing records from a database. PostgreSQL supports indexes like Btree, hash etc. User-defined index methods can also be created. Indexes are created on tables with respect to a particular field (based on which there are a number of queries). As an example for a table:
CREATE TABLE name (

id integer,

fname varchar

lname varchar

);
To create an index on table name with respective to the field id (as there are many queries on this table requesting for firstname or lastname from the id provided), we use the index:
CREATE INDEX name_id_index ON name (id);
Triggers
Triggers are events or functions run upon the action of certain SQL statements which modify data in some records. Depending on the kind of modification we can have multiple triggers in a database. Postgres supports multiple triggers written in PL/PgSQL or it’s scripting counterparts like PL/Python. The trigger function must be defined before the trigger can be created. The trigger function must be declared as a function taking no arguments and returning type trigger. CREATE TRIGGER command is used to declare triggers.
Concurrency
PostgreSQL ensures concurrency with the help of MVCC (Multi-Version Concurrency Control), which gives the database user a “snapshot” of the database, allowing changes to be made without being visible to other users until a transaction is committed.
PostgreSQL’s MVCC keeps all of the versions of the data together in the same partition in the same table. By identifying which rows were added by which transactions, which rows were deleted by which transactions, and which transactions have actually committed, it becomes a straightforward check to see which rows are visible for which transactions.
Inorder to accomplish this, Rows of a table are stored in PostgreSQL as a tuple. Two fields of each tuple are xmin and xmax. Xmin is the transaction ID of the transaction that created the tuple. Xmax is the transaction ID of the transaction that deleted it (if any).
Along with the tuples in each table, a record of each transaction and its current state (in progress, committed, aborted) is kept in a universal transaction log.
When data in a table is selected, only those rows that are created and not destroyed are seen. That is, each row’s xmin is observed. If the xmin is a transaction that is in progress or aborted, then the row is invisible. If the xmin is a transaction that has committed, then the xmax is observed. If the xmax is a transaction that is in progress or aborted and not the current transaction, or if there is no xmax at all, then the row is seen. Otherwise, the row is considered as already deleted.
Insertions are straightforward. The transaction that inserts the tuple simply creates it with the xmax blank and the xmin set to its transaction ID. Deletions are also straightforward. The tuple’s xmax is set to the current transaction. Updates are no more than a concurrent insert and delete.
Views
A view is a table which does not exist in the database. It is a virtual table created from fields in various tables and is joined together based on some criteria. Views can be used in place of tables and will accomplish the task same as that of a table. The CREATE VIEW statement is used to accomplish this eg:
CREATE VIEW best_sellers AS

SELECT * FROM publishers WHERE demand LIKE 'high';
Foreign Keys
The primary key used in one table which is used to refer to the records in a second table is called the foreign key of the second table.
CREATE TABLE products (
    product_no integer PRIMARY KEY,
    name text,
    price numeric
);
CREATE TABLE orders (
    order_id integer PRIMARY KEY,
    product_no integer REFERENCES products (product_no),
    quantity integer
);
Here product_no is the foreign key in the second table created. The foreign key field may have values which are repeated unlike primary keys.
Files Users and Configuration
The main configuration file of Postgres is postgresql.conf. This can be located in the ‘data’ directory. It may be present either in /var/lib (/var/lib/pgsql/data/postgresql.conf) or /usr/local (/usr/local/pgsql/data/postgresql.conf). Temporary changes to the configurations can be made using postmaster command.
The init script that starts the postgres service is /etc/init.d/postgresql . It runs a number of child processes concurrently. The postgres server process is postmaster. These processes and files associated with PosgreSQL are owned by the user/group postgres. The default port used for database connections is 5432
The user postgres is the PostgreSQL database superuser. We can create a number of super users for the database (this accomplished by the create role command ), however, the default super user is postgres. The postgres user has the privilege to access all the databases and files in the server (Unless the user root is created in postgres as a superuser).
Client Authentication is controlled by the file pg_hba.conf in the data directory, e.g., /var/lib/pgsql/data/pg_hba.conf. (HBA stands for host-based authentication.)
Each record specifies a connection type, a client IP address range (if relevant for the connection type), a database name or names, and the authentication method to be used for connections matching these parameters.A record is typically in one of two forms:
local database authentication-method [ authentication-option ]
host database IP-address IP-mask authentication-method [ authentication-option ]
local : This record pertains to connection attempts over Unix domain sockets.
host : This record pertains to connection attempts over TCP/IP networks.
database : Specifies the database that this record applies to. The value all specifies that it applies to all databases, while the value sameuser identifies the database with the same name as the connecting user.
authentication methods
trust: The connection is allowed unconditionally.
reject: The connection is rejected unconditionally.
password: The client is required to supply a password which is required to match the database password that was set up for the user.
md5: Like the password method, but the password is sent over the wire encrypted using a simple challenge-response protocol.
ident: This method uses the “Identification Protocol” as described in RFC 1413. It may be used to authenticate TCP/IP or Unix domain socket connections, but its reccomended use is for local connections only and not remote connections.
Front-ends
The minimalistic front-end for PostgreSQL is the psql command-line. It can be used to enter SQL queries directly, or execute them from a file. phpPgAdmin is a web-portal used for PostgreSQL administration written in PHP and based on the popular phpMyAdmin. Likewise pgAdmin is a graphical front-end administration tool for PostgreSQL, which has support on multiple platforms. The latest stable version of the same is pgAdmin III.
Some administration related commands
Command to login to psql database mydb as user myuser:
psql -d mydb -U myuser
Command to login to psql database mydb as user myuser on a different host myhost:
psql -h myhost -d mydb -U myuser
If the port the server runs is different we use -p [port number] . Upon entering the psql shell the prompt will show the database name currently being used. In the above example it will show
mydb=> (if logged in as an ordinary user )
mydb=# (if logged in as a super user like postgres)
Create a PostgreSQL user
There are two ways to create a postgres database user. The only user initially allowed to create users is postgres. So one has to switch to this user before creating other users with varying privileges.
1. Creating the user in the shell prompt, with createuser command.
switch to the postgres user with:
su - postgres

createuser tom

Shall the new role be a superuser? (y/n) n

Shall the new role be allowed to create databases? (y/n) y

Shall the new role be allowed to create more new roles? (y/n) n
2. Creating the user in the PSQL prompt, with CREATE USER command.
switch to the postgres user with:
su - postgres

create user mary with password 'marypass';
Creating and deleting a PostgreSQL Database
There are two way to create databases.
1. Creating database in the PSQL prompt, with createuser command.
CREATE DATABASE db1 WITH OWNER tom;
2. Creating database in the shell prompt, with createdb command.
createdb db2 -O mary
To delete an entire database from within the psql prompt do :
DROP DATABASE db1;
Determining execution time of a query
Turn on timing with
\timing
Now execute the qery:
SELECT * from db1.employees ;

Time: 0.065 ms
Calculate postgreSQL database size in disk
SELECT pg_database_size('db1');
to get the values in human readable format
SELECT pg_size_pretty(pg_database_size('db1'));
to calculate postgreSQL table size in disk
SELECT pg_size_pretty(pg_total_relation_size(‘big_table’));
Slash commands used in psql
To list all slash commands and thier purpose. Login to psql and issue to the command \? . Some of the most commonly used slash commands are the following:
List databases\l
System tables\dS
Types\dT
Functions\df
Operators\do
Aggregates\da
Users\du
Quit from psql\q
Connect to different database db2\c db2
Describe Table/index/view/sequence\d
The below can be used with a specific table/index/view name for description of the specific table/index/view
Tables\dt
Indexes\di
Sequences\ds
Views\dv
Useful Bash commands
Bash command to list all the postgresql databases:
psql -l #This can be run as a unix user who is also a super user in postgresql
Indirect bash command to list all the postgresl users:
psql -c '\du' #-c is used to run an internal or sql command in psql shell
Backing up and restoring databases
To dump the database to an sql file use the bash command:
pg_dump mydb > db.out
To restore a database from an sql backup file (via bash)
psql -d newdb -f backupdb.out

or

psql -f backupdb.out newdb
(here the database newdb must be already created and the file backupdb.out must be present in the current directory)
To take the backup of all the Postgres databases in the server:
pg_dumpall > /var/lib/pgsql/backups/dumpall.sql
(Only possible with the postgres or the database superuser )
Resetting database user’s password
To change the password for a database user (say ‘thomas’):
ALTER USER thomas WITH PASSWORD 'newpassword';
This same command can be used to reset the password for the postgresql super user postgres, but in this case, you will have to enable password less login for postgres user by adding the following line to the top of the file pg_hba.conf in the data directory of postgres. Once the password is reset this line can be removed:
local all postgres trust
Next we issue the same command but for the user postgres
ALTER USER postgres WITH PASSWORD 'newpassword';
To create a super user via bash with multiple roles
createuser -sPE mysuperuser
Instead of this we can also use the below psql shell command:
CREATE ROLE mysuperuser2 WITH SUPERUSER CREATEDB CREATEROLE LOGIN ENCRYPTED PASSWORD 'mysuperpass2';
Physical database files in postgres
The files in data/base are named by the oid (Object Identifier) of the database record in
pg_database, like this:
cd /var/lib/pgsql/data/base

ls -l

total 33

drwx------ 22 postgres postgres 4096 Jul 23 20:06 ./

drwx------ 11 postgres postgres 4096 Aug  1 05:59 ../

drwx------  2 postgres postgres 4096 Jun 20 09:32 1/

drwx------  2 postgres postgres 4096 Mar  3 13:36 10792/

drwx------  2 postgres postgres 4096 Jun 20 15:09 10793/

drwx------  2 postgres postgres 4096 May 27 01:40 16497/

drwx------  2 postgres postgres 4096 May 27 01:40 16589/

drwx------  2 postgres postgres 4096 Jun 20 10:28 16702/

drwx------  2 postgres postgres 4096 May 27 01:40 16764/

drwx------  2 postgres postgres 4096 May 27 01:40 16785/

drwx------  2 postgres postgres 4096 Aug  1 04:37 16786/

drwx------  2 postgres postgres 4096 Aug  1 04:36 19992/

drwx------  2 postgres postgres 4096 May 27 01:40 19997/
To obtain the oid, execute the following command in psql prompt
postgres=# select oid,datname from pg_database order by oid;

   oid  |         datname

---------+--------------------------

1 | template1

10792 | template0

10793 | postgres

16497 | gadgetwi_Unable

16589 | vimusicc_filehost

16702 | personea_altissimo

16764 | shopping_businessfinance

16785 | ansonyi_wp2

16786 | ansonyi_wp

19992 | globook_PostgreSQL