Tuesday, July 12, 2011

MySQL Storage Engines – an overview, their limitations and an attempt for comparison

Most of us are using Mysql database and majority don’t know how to choose the data base engines, what are the different types of storage engines available in mysql and how they differ from each other. In this article let me give you a brief idea about the Storage Engines and what are the limitations and where to use these various storage engines.
One of the greatest things about MySQL, other than being free, widely supported and fast, is the flexibility of choosing different storage engines for different tables. These storage engines act as handlers for different table types. Thus MySQL storage engines include both those that handle transaction-safe tables and those that handle non-transaction-safe tables along with many others. MySQL does this through their Pluggable Storage Engine Architecture.
MySQL Storage Engines Overview
To determine which storage engines your server supports, we use the SHOW ENGINES statement. The value in the Support column indicates whether an engine can be used. A value of YES, NO, or DEFAULT indicates that an engine is available, not available, or available and currently set as the default storage engine.
*************************** 1. row ***************************
Engine: MyISAM
Support: DEFAULT
Comment: Default engine as of MySQL 3.23 with great performance
*************************** 2. row ***************************
Engine: MEMORY
Support: YES
Comment: Hash based, stored in memory, useful for temporary tables
*************************** 3. row ***************************
Engine: InnoDB
Support: YES
Comment: Supports transactions, row-level locking, and foreign keys
*************************** 4. row ***************************
Engine: BerkeleyDB
Support: NO
Comment: Supports transactions and page-level locking
*************************** 5. row ***************************
Support: YES
Comment: /dev/null storage engine (anything you write to it disappears)
*************************** 6. row ***************************
Support: YES
Comment: Example storage engine
*************************** 7. row ***************************
Support: YES
Comment: Archive storage engine
*************************** 8. row ***************************
Engine: CSV
Support: YES
Comment: CSV storage engine
*************************** 9. row ***************************
Engine: ndbcluster
Support: NO
Comment: Clustered, fault-tolerant, memory-based tables
*************************** 10. row ***************************
Support: YES
Comment: Federated MySQL storage engine
*************************** 11. row ***************************
Support: YES
Comment: Collection of identical MyISAM tables
*************************** 12. row ***************************
Engine: ISAM
Support: NO
Comment: Obsolete storage engine
The listing shows the full list of available database engines.
There are a number of ways you can specify the storage engine to use. The simplest method, if you have prefer a particular engine type that fits most of your database needs then you can set the default engine type within the MySQL configuration file using the following commands(using the option storage_engine or when starting the database server (by supplying the –default-storage-engine or –default-table-type options on the command line).
More flexibility is offered by allowing you to specify the default storage engine to be used MySQL, the most obvious is to specify the engine type when creating the table:
CREATE TABLE mytable (id int, title char(20)) ENGINE = INNODB
You can also alter the storage engine used in an existing table:
However, you should be careful when altering table types in this way, as making a modification to a type that does not support the same indexes, field types or sizes may mean that you lose data. If you specify a storage engine that doesn’t exist in the current database then a table of type MyISAM (the default) is created instead.
Before taking any decision about which engine we need to choose, first we need to think about the different core functionality provided by each engine that allow us to differentiate between them. We can divide up the core functionality into four areas; the supported field and data types, locking types, indexing and transactions. Some engines have unique functionality that can also drive your decision.
Field and Data Types
Although all of the engines support the common data types, i.e., integers, reals and character based storage, not all engines support other field types, particularly the BLOB (binary large object) or TEXT types. Other engines may support only limited character widths and data sizes.
These limitations, while directly affecting the information you store may also have a related effect to the types of searches you perform, or the indexes you create on that information. In turn, these differences can affect the performance and functionality of your application as you may have to make decisions about functionality based on the storage engine choice you make for the type of data you are storing.
Locking within database engines defines how access and updates to information are controlled. When an object in the database is locked for updating, other processes cannot modify (or in some cases read) the data until the update has completed.
Locking not only affects how many different applications can update the information in the database, it can also affect queries on that data. The reason for this is that the queries may be accessing data that may be being altered or updated. In general, such delays are minimal. The bulk of the locking mechanism is devoted to preventing multiple processes updating the same data. Since both additions (INSERT statements) and alterations (UPDATE statements) to the data require locking, you can imagine that multiple applications using the ame database can have a significant impact.
Locks are supported by different storage engines at different object levels, and these levels affect the concurrency of access to the information. Three different levels are supported, table locking, block locking and row locking. Table locking is most commonly supported and is the locking provided in MyISAM. It locks an entire table during an update. This will limit the number of applications that are updating a specific table to just one, and this can affect heavily used multi-user databases because it introduces delays into the update process.
Page level locking is used by the Berkeley DB storage engine and locks data according to the page (8Kb) of information that is being uploaded. When performing updates across a range of locations within the database, the locking is not a problem, but because adding rows involves locking the final 8Kb of the data structure, adding large numbers of rows, particularly of small data, can be a problem.
Row level locking provides the best concurrency; only individual rows within a table are locked, which means that many applications can be updating different rows of the same table without causing a lock situation. Only the InnoDB storage engine supports row level locking.
Indexing can dramatically increase the performance when searching and recovering data from the database. Different storage engines provide different indexing techniques and some may be better suited for the type of data you are storing.
Some storage engines simply do not support indexing at all either because they use the indexing of the underlying tables (in the MERGE engine for example) or because the data storage method does not allow indexing (FEDERATED or BLACKHOLE engines).
Transactions provide data reliability during the update or insert of information by enabling you to add data to the database, but only to commit that data when other conditions and stages in the application execution have completed successfully. For example, when transferring information from one account to another you would use transactions to ensure that both the debit from one account and the credit to the other completed successfully. If either process failed, you could cancel the transaction and the changes would be lost. If the process completed,then we would confirm it by committing the changes.
Storage Engines:
The MyISAM engine is the default engine in most MySQL installations and is a derivative of the original ISAM engine type supported in the early versions of the MySQL system. The engine provides the best combination of performance and functionality, although it lacks transaction capabilities (use the InnoDB or BDB engines) and uses table-level locking.
Unless you need transactions, there are few databases and applications that cannot effectively be stored using the MyISAM engine. However, very high-performance applications where there are large numbers of data inserts/updates compared to the number of reads can cause performance problem for the MyISAM engine. It was originally designed with the idea that more than 90% of the database access to a MyISAM table would be reads, rather than writes.
With table-level locking, a database with a high number of row inserts or updates becomes a performance bottleneck as the table is locked while data is added. Luckily this limitation also works well within the restrictions of a non-transaction database.
1. Crash recovery can be a time-consuming process owing to MyISAM’s lack of a transaction log. Expect to have to perform such a recovery during your peak-usage period. (If it can happen, it almost certainly will.)
2. MyISAM does not support or enforce foreign key constraints.
3. All UPDATE queries to the same table are serialized — that is to say they carried out one at a time — and block all other queries, including SELECTs, from executing. This effect is pronounced on most busy multi-user applications.
4. MyISAM supports concurrent INSERTs only in certain cases.
5. Maximum of 64 indexes per row,
When to use MyISAM
1. Your application demands full-text search capabilities. Rather than push all your data into MyISAM tables to gain full-text searching, it may be viable to split your dataset into data that must be indexed for full-text searching — and stored using MyISAM — and data that should be stored using a transactional engine, such as InnoDB. A scheduled background job may then asynchronously update your MyISAM full-text indexes and provide links from the InnoDB data as appropriate. This is a common example of how to gain the best from all worlds.
2. Your application is effectively single-user — there are very few concurrent queries hitting the MySQL server.
3. You are performing limited testing or development where performance is not under scrutiny.
The MERGE storage engine, also known as the MRG_MyISAM engine, is a collection of identical MyISAM tables that can be used as one. You can then execute queries that return the results from multiple tables as if they were just one table. Each table merged must have the same table definition.
The MERGE table is particularly effective if you are logging data directly or indirectly into a MySQL database and create an individual table per day, week or month and want to be able to produce aggregate queries from multiple tables. There are limitations to this however, you can only merge MyISAM tables and the identical table definition restriction is strictly enforced. Although this seems like a major issue, if you had used one of the other table types (for example InnoDB) then the merge probably wouldn’t be required.
1. You can use only identical MyISAM tables for a MERGE table.
2. You cannot use a number of MyISAM features in MERGE tables. For example, you cannot create FULLTEXT indexes on MERGE tables. (You can, of course, create FULLTEXT indexes on the underlying MyISAM tables, but you cannot search the MERGE table with a full-text search.)
3. If the MERGE table is non-temporary, all underlying MyISAM tables must be non-temporary, too. If the MERGE table is temporary, the MyISAM tables can be any mix of temporary and non-temporary.
4. MERGE tables use more file descriptors. If 10 clients are using a MERGE table that maps to 10 tables, the server uses (10 × 10) + 10 file descriptors. (10 data file descriptors for each of the 10 clients, and 10 index file descriptors shared among the clients.)
5. Key reads are slower.Key reads are slower. When you read a key, the MERGE storage engine needs to issue a read on all underlying tables to check which one most closely matches the given key. To read the next key, the MERGE storage engine needs to search the read buffers to find the next key. Only when one key buffer is used up does the storage engine need to read the next key block.

When to Use MERGE
1. Easily manage a set of log tables. For example, you can put data from different months into separate tables, compress some of them with myisampack, and then create a MERGE table to use them as one.
2. Obtain more speed. You can split a big read-only table based on some criteria, and then put individual tables on different disks. A MERGE table on this could be much faster than using the big table.
3. Perform more efficient searches. If you know exactly what you are looking for, you can search in just one of the split tables for some queries and use a MERGE table for others. You can even have many different MERGE tables that use overlapping sets of tables.
4. Perform more efficient repairs. It is easier to repair individual tables that are mapped to a MERGE table than to repair a single large table.
5. Instantly map many tables as one. A MERGE table need not maintain an index of its own because it uses the indexes of the individual tables. As a result, MERGE table collections are very fast to create or remap. (Note that you must still specify the index definitions when you create a MERGE table, even though no indexes are created.
6. If you have a set of tables from which you create a large table on demand, you should instead create a MERGE table on them on demand. This is much faster and saves a lot of disk space.
7. Exceed the file size limit for the operating system. Each MyISAM table is bound by this limit, but a collection of MyISAM tables is not.
8. You can create an alias or synonym for a MyISAM table by defining a MERGE table that maps to that single table. There should be no really notable performance impact from doing this (only a couple of indirect calls and memcpy() calls for each read).
The MEMORY storage engine (previously known as the HEAP storage engine) stores all data in memory; once the MySQL server has been shut down any information stored in a MEMORY database will have been lost. However, the format of the individual tables is kept and this enables you to create temporary tables that can be used to store information for quick access without having to recreate the tables each time the database server is started.
Long term use of the MEMORY storage engine is not generally a good idea, because the data could so easily be lost. However, providing you have the RAM to support the databases you are working on, use of MEMORY based tables is an efficient way of running complex queries on large data sets and benefitting from the performance gains.
The best way to use MEMORY tables is to use a SELECT statement to select a larger data set from your original, disk-based, tables and then sub-analyse that information for the specific elements you want.
The FEDERATED storage engine (added in MySQL 5.03) enables you to access data from remote MySQL database (other databases may be supported in the future) as if it were a local database. In effect, the MySQL server acts as a proxy to the remote server, using the MySQL client access library to connect to the remote host, execute queries and then reformat the data into the localized format.
In essence, it is a way for a server, rather than a client, to access a remote database and can be an effective way of combining data from multiple hosts or of copying specific data from remote databases into local tables without the use of data exports and imports.
The ARCHIVE storage engine supports only the INSERT and SELECT statements, but does support most of the MySQL field types. Information stored in an ARCHIVE storage engine table is compressed and cannot be modified and so ARCHIVE tables are perfect for storing log data (which you don’t want to be able to change) or information that is no longer in active use (for example, old invoicing or sales data).
While the information is stored very efficient, care should be taken when accessing data stored in the ARCHIVE tables. Because the information is compressed, selects have to read the entire table, and that also means decompressing the information. This can obviously increase the time taken to perform complex searches and retrievals. If you are performing a large number of queries on the information in these tables it may be easier to temporarily copy your data to another, uncompressed, data type such as MyISAM.
The CSV storage engine stores data not in a binary format, but in the form a CSV (Command Separated Values) file. Because of this, there are limitations to the data stored. It is not an efficient method for storing large volumes of data, or larger data types like BLOB, although such types are supported. There is also no indexing. However, because the data is stored in the CSV format it is exceedingly portable; these CSV files generated can easily be imported into many different software packages, including Excel, OpenOffice and database systems like Access or FileMaker.
In general, the CSV engine is impractical as a general database engine. It is, however, probably the most effective and easiest method for data exchange. What makes it so convenient is that we can use SELECT and INSERT statements to create the database, which in turn means that we can easily produce CSV files based on queries of other data.
With some careful work, the CSV storage engine can also be used as an effective way of getting information into MySQL. Here, you can create the tables first, shutdown the MySQL server, copy over CSV files that you have exported from Excel, Access or another database, and you can then import the data and copy it over to MyISAM or InnoDB tables.
The Blackhole storage engine accepts but does not store data and retrievals always return an empty set. The functionality can be used in distributed database design where data is automatically replicated, but not stored locally. Although you can create tables and indexes, all SQL statements that would add or update information to the database are executed without actually writing any data. The database structure is retained, however, and you can create any indexes on the (non-existent) information that you want.
Although this seems like a futile exercise, it does allow you to test out database structures and play with table definitions without actually creating any data. Even more useful, however, is that SQL statements on BLACKHOLE databases are written to the binary log, and therefore are replicated to slave databases.
You can use this functionality to update one or more slaves directly without writing any local data. There are a number of potential uses for this functionality.
The ISAM storage engine was the original engine type available with versions of MySQL up until MySQL 3.23, when the MyISAM storage engine was introduced. ISAM has a number of different limitations that make it impractical as a database engine. These include the storage format, which is native to the platform (and therefore not portable between systems), a maximum table size of just 4GB and limited text searching facilities. Indexes are also more limited. Since MyISAM is supported on the same platforms as ISAM, and provides better compatibility, portability and performance.
ISAM is included for backwards compatibility, you certainly shouldn’t use ISAM for new databases, use MyISAM instead.
Berkeley DB (BDB)
The Berkeley DB (or BDB) engine is based on the technology provided by the Berkeley DB storage system developed by SleepyCat software. BDB is a hash based storage mechanism, and the keys to the hash values are stored very efficiently. This makes the recovery of information–especially when accessed directly using a unique key incredibly quick, and by far the quickest of the available database types. Recovering full records is even quicker if you the data is short enough to be stored with the unique key (i.e., under 1024 bytes long). BDB is also one of only two types of storage engine that support transactions.
BDB is, however, limited in other ways. Although it uses page locking, locking only 8192 bytes of a table, rathter than the entire table, during an update this can cause problems if you are performing a large number of updates in the same page (for example, inserting many rows). There is unfortunately no way round this. Sequential data access–for example a large quantity of rows matching non-indexed data–can be a lot slower because the data needs to be scanned row by row.
Recovery of information with BDB tables can also be a problem. Data in BDB is stored in a combination of the key index, the data file and binary data logs. A loss of data in any of these sections, even just one of the data logs, can make the data in the database totally unrecoverable.
Where BDB shines therefore is in locations where you can access specific blocks of data by a unique key that does not frequently change.
The InnoDB Engine is provided by Innobase Oy and supports all of the database functionality (and more) of MyISAM engine and also adds full transaction capabilities (with full ACID (Atomicity, Consistency, Isolation, and Durability) compliance) and row level locking of data.
The key to the InnoDB system is a database, caching and indexing structure where both indexes and data are cached in memory as well as being stored on disk. This enables very fast recovery, and works even on very large data sets. By supporting row level locking, you can add data to an InnoDB table without the engine locking the table with each insert and this speeds up both the recovery and storage of information in the database.
As with MyISAM, there are few data types that cannot effectively be stored in an InnoDB database. In fact, there are no significant reasons why you shouldn’t always use an InnoDB database. The management overhead for InnoDB is slightly more onerous, and getting the optimization right for the sizes of in-memory and on disk caches and database files can be complex at first. However, it also means that you get more flexibility over these values and once set, the performance benefits can easily outweigh the initial time spent. Alternatively, you can let MySQL manage this automatically for you.
If you are willing (and able) to configure the InnoDB settings for your server, then I would recommend that you spend the time to optimize your server configuration and then use the InnoDB engine as the default.
1. Queries that result in large scans of the tablespace are often slower when using InnoDB.
2. Consistency is only maintained if the underlying operating system and hardware can guarantee buffer flushes. This limitation is inherent in all transactional database management systems.
3. InnoDB tables consume a greater amount of space on-disk than their MyISAM equivalents. This is now largely irrelevant given the ubiquity of large (multiple hundreds of gigabytes) hard disk drives.
When to use InnoDB
1. You are developing an application that requires ACID compliance. At the very least, your application demands the storage layer support the notion of transactions.
2. You require expedient crash recovery. Almost all production sites fall into this category, however MyISAM table recovery times will obviously vary from one usage pattern to the next. To estimate an accurate figure for your environment, try running myisamchk over a many-gigabyte table from your application’s backups on hardware similar to what you have in production. While recovery times of MyISAM tables increase with growth of the table, InnoDB table recovery times remain largely constant throughout the life of the table.
3. Your web site or application is mostly multi-user. The database is having to deal with frequent UPDATEs to a single table and you would like to make better use of your multi-processing hardware.
NDB Cluster
Another well-known storage engine. It allows one to cluster tables. That means you have multiple masters, which all can do inserts, updates and deletes on the same table. NDB has row-level locking, but not full multi-version concurrency control. Due to some architectural limitations of how the MySQL server executes joins, they perform rather poorly with NDB storage engine. For that reason, it is best used for single table primary key lookups. So, you would not likely port your entire web-facing database to NDB Cluster for example.
As you may have been able to conclude from the above summary of the different storage engines available, there are few reasons not to use either the MyISAM or InnoDB engine types. MyISAM will do in most situations, but if you have a high number of updates or inserts compared to your searches and selects then you will get better performance out of the InnoDB engine. To get the best performance out of InnoDB you need to tweak the parameters for your server, otherwise there is no reason not to use it.
The MERGE engine is an exceedingly effective way of querying data from multiple, identically defined, tables. The MEMORY engine is the best way to perform a large number of complex queries on data that would be inefficient to search on a disk based engine. The CSV engine is a great way to export data that could be used in other applications. BDB is excellent for data that has a unique key that is frequently accessed. The following table provides an overview of some storage engines provided with MySQL:
storage engine features