Download Storage/Database Engine

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Engine. A program that performs a core or coordinating function for other programs, or has a special-purpose
function.
A database engine (or storage engine) is the underlying software component that a database management system
(DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management
systems include their own application programming interface (API) that allows the user to interact with their
underlying engine without going through the user interface of the DBMS.
The term "database engine" is frequently used interchangeably with "database server" or "database management
system".
Many of the modern DBMS support multiple storage engines within the same database. For example, MySQL
supports InnoDB as well as MyISAM.
Some storage engines are transactional.
Name
License Transactional
Aria
GPL
No
BlitzDB GPL
No
Falcon
GPL
Yes
InnoDB GPL
Yes
MyISAM GPL
No
InfiniDB CPL
No
TokuDB GPL
Yes
XtraDB GPL
Yes
Additional engine types include:


Embedded database engines
In-memory database engines
An embedded database system is a database management system (DBMS) which is tightly integrated with an application software that
requires access to stored data, such that the database system is "hidden" from the application’s end-user and requires little or no ongoing
maintenance. It is actually a broad technology category that includes database systems with differing application programming interfaces (SQL
as well as proprietary, native APIs); database architectures (client-server and in-process); storage modes (on-disk, in-memory and combined);
database models (relational, object-oriented, entity–attribute–value model and network/CODASYL); and target markets. The term embedded
database can be confusing because only a small subset of embedded database products is used in real-time embedded systems such as
telecommunications switches and consumer electronics devices.
An in-memory database (IMDB; also main memory database system or MMDB or memory resident database) is a database
management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems
which employ a disk storage mechanism. Main memory databases are faster than disk-optimized databases since the internal optimization
algorithms are simpler and execute fewer CPU instructions. Accessing data in memory eliminates seek time when querying the data, which
provides faster and more predictable performance than disk.
In applications where response time is critical, such as telecommunications network equipment and mobile advertising networks, main memory
databases are often used. IMDBs have gained a lot of traction, especially in the data analytics space, starting mid-2000s mainly due to cheaper
RAM.
With the introduction of NVDIMM technology, in-memory databases will now be able to run at full speed and maintain data in the event of power
failure. A Non-Volatile DIMM (NVDIMM), is a computer memory DRAM DIMM module that retains data even when electrical power is removed
either from an unexpected power loss, system crash or from a normal system shutdown. NVDIMMs are primarily used to improve application
performance,[1] data security, system crash recovery time and to enhance SSD endurance and reliability. NVDIMM’s have evolved from the BBU
(Battery Backed Up) DIMM which used a backup battery to sustain power to the volatile DIMM for up to 72 hours; now, to a complete subsystem
that includes multiple memory technologies and uses supercapacitors as a power source. The BBU DIMM was originally designed for RAID HBAs
(Host Bus Adaptors) or systems in order to preserve the contents of the volatile DRAM memory used for cache. However, in computer based
applications, batteries have several maintenance and environmental concerns because of their limited life span and hazardous disposal.
Furthermore, batteries contain heavy metals which violate RoHS compliance. The NVDIMM were conceived to eliminate the problems associated
with batteries in BBU DRAM memory modules. As of November 2012, most NVDIMMs use NAND FLASH to preserve data,[7] and have moved
beyond RAID applications into fast storage appliances or in-memory processing for the data center and Cloud Computing
InnoDB. A transaction safe storage engine developed by Innobase Oy (an Oracle Company now). It the default
storage engine for MySQL as of MySQL 5.5. It provides the standard ACID-compliant transaction features, along
with foreign key support (Declarative Referential Integrity). It is included as standard in most binaries distributed by
MySQL AB, the exception being some OEM versions. InnoDB became a product of Oracle Corporation after its
acquisition of Innobase Oy in October 2005.[1] The software is dual licensed; it is distributed under the GNU
General Public License, but can also be licensed to parties wishing to combine InnoDB in proprietary software.
MariaDB and Percona Server use a fork of InnoDB called XtraDB by default. XtraDB is mantained by Percona.
Oracle InnoDB's changes are regularly imported into XtraDB, and some bug fixes and extra features are added.
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions
are processed reliably. In the context of databases, a single logical operation on the data is called a transaction.
For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting
one account and crediting another, is a single transaction. The chosen initials refer to the acid test.

In database systems, atomicity (or atomicness; from Greek a-tomos, undividable) is one of the ACID
transaction properties. In an atomic transaction, a series of database operations either all occur, or nothing
occurs. A guarantee of atomicity prevents updates to the database occurring only partially, which can cause
greater problems than rejecting the whole series outright. In other words, atomicity means indivisibility and
irreducibility.
 In database systems, a consistent transaction is one that does not violate any integrity constraints during its
execution. If a transaction leaves the database in an illegal state, it is aborted and an error is reported.
Consistency is one of the ACID properties that ensures that any changes to values in an instance are
consistent with changes to other values in the same instance. A consistency constraint is a predicate on data
which serves as a precondition, post-condition, and transformation condition on any transaction. The
database management system (DBMS) assumes that the consistency holds for each transaction in instances.
On the other hand, ensuring this property of the transaction is the responsibility of the user.
 In database systems, isolation is a property that defines how/when the changes made by one operation
become visible to other concurrent operations. Isolation is one of the ACID (Atomicity, Consistency, Isolation,
Durability) properties.
 In database systems, durability is the ACID property which guarantees that transactions that have committed
will survive permanently. For example, if a flight booking reports that a seat has successfully been booked,
then the seat will remain booked even if the system crashes. Durability can be achieved by flushing the
transaction's log records to non-volatile storage before acknowledging commitment.
(Source: WikiPedia)
Collate (Transact-SQL). A clause that can be applied to a database definition or a column definition to define the
collation, or to a character string expression to apply a collation cast.
What is the best collation to use for MySQL (with PHP)?
The main difference is sorting accuracy (when comparing characters in the language) and performance. The only
special one is utf8_bin which is for comparing characters in binary format. utf8_general_ci is somewhat faster than
utf8_unicode_ci, but less accurate (for sorting). The specific language utf8 encoding (such as utf8_swedish_ci)
contain additional language rules that make them the most accurate to sort for those languages. Most of the time I
use utf8_unicode_ci (I prefer accuracy to small performance improvements), unless I have a good reason to prefer a
specific language. You can read more on specific unicode character sets on the MySQL manual http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
(Source: stackoverflow.com)
Collate (Compare Information)
Database Character Set and Collation (Source: www.informit.com)
Every database has a database character set and a database collation, which may not be null. The CREATE
DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and
collation:
CREATE DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
ALTER DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
Example:
CREATE DATABASE db_name
DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;
MySQL chooses the database character set and database collation thus:
If both CHARACTER SET X and COLLATE Y were specified, then character set X and collation Y.
If CHARACTER SET X was specified without COLLATE, then character set X and its default collation.
Otherwise, the server character set and server collation.
MySQL's CREATE DATABASE ... DEFAULT CHARACTER SET ... syntax is analogous to the standard SQL CREATE
SCHEMA ... CHARACTER SET ... syntax. Because of this, it is possible to create databases with different character sets
and collations on the same MySQL server.
The database character set and collation are used as default values if the table character set and collation are not
specified in CREATE TABLE statements. They have no other purpose.
The character set and collation for the default database are available as the values of the character_set_database
and collation_database system variables. The server sets these variables whenever the default database changes. If
there is no default database, the variables have the same value as the corresponding server-level variables,
character_set_server and collation_server.
Table Character Set and Collation
Every table has a table character set and a table collation, which may not be null. The CREATE TABLE and ALTER
TABLE statements have optional clauses for specifying the table character set and collation:
CREATE TABLE tbl_name (column_list)
[DEFAULT CHARACTER SET charset_name [COLLATE collation_name]]
ALTER TABLE tbl_name
[DEFAULT CHARACTER SET charset_name] [COLLATE collation_name]
Example:
CREATE TABLE t1 ( ... )
DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
MySQL chooses the table character set and collation thus:
If both CHARACTER SET X and COLLATE Y were specified, then character set X and collation Y.
If CHARACTER SET X was specified without COLLATE, then character set X and its default collation.
Otherwise, the database character set and collation.
The table character set and collation are used as default values if the column character set and collation are not
specified in individual column definitions. The table character set and collation are MySQL extensions; there are no
such things in standard SQL.
Column Character Set and Collation
Every "character" column (that is, a column of type CHAR, VARCHAR, or TEXT) has a column character set and a
column collation, which may not be null. Column definition syntax has optional clauses for specifying the column
character set and collation:
col_name {CHAR | VARCHAR | TEXT} (col_length)
[CHARACTER SET charset_name [COLLATE collation_name]]
Example:
CREATE TABLE Table1
(
column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci
);
MySQL chooses the column character set and collation thus:
If both CHARACTER SET X and COLLATE Y were specified, then character set X and collation Y.
If CHARACTER SET X was specified without COLLATE, then character set X and its default collation.
Otherwise, the table character set and collation.
The CHARACTER SET and COLLATE clauses are standard SQL.
Examples of Character Set and Collation Assignment
The following examples show how MySQL determines default character set and collation values.
Example 1: Table + Column Definition
CREATE TABLE t1
(
c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci
) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin;
Here we have a column with a latin1 character set and a latin1_german1_ci collation. The definition is explicit, so
that's straightforward. Notice that there's no problem storing a latin1 column in a latin2 table.
Example 2: Table + Column Definition
CREATE TABLE t1
(
c1 CHAR(10) CHARACTER SET latin1
) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
This time we have a column with a latin1 character set and a default collation. Now, although it might seem natural,
the default collation is not taken from the table level. Instead, because the default collation for latin1 is always
latin1_swedish_ci, column c1 will have a collation of latin1_swedish_ci (not latin1_danish_ci).
Example 3: Table + Column Definition
CREATE TABLE t1
(
c1 CHAR(10)
) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
We have a column with a default character set and a default collation. In this circumstance, MySQL looks up to the
table level for inspiration in determining the column character set and collation. So, the character set for column c1 is
latin1 and its collation is latin1_danish_ci.
Example 4: Database + Table + Column Definition
CREATE DATABASE d1
DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci;
USE d1;
CREATE TABLE t1
(
c1 CHAR(10)
);
We create a column without specifying its character set and collation. We're also not specifying a character set and
a collation at the table level. In this circumstance, MySQL looks up to the database level for inspiration. (The
database's settings become the table's settings, and thereafter become the column's setting.) So, the character set for
column c1 is latin2 and its collation is latin2_czech_ci.
Connection Character Sets and Collations
Several character set and collation system variables relate to a client's interaction with the server. Some of these
have already been mentioned in earlier sections:
The server character set and collation are available as the values of the character_set_server and collation_server
variables.
The character set and collation of the default database are available as the values of the character_set_database
and collation_database variables.
Additional character set and collation variables are involved in handling traffic for the connection between a client
and the server. Every client has connection-related character set and collation variables.
Consider what a "connection" is: It's what you make when you connect to the server. The client sends SQL statements,
such as queries, over the connection to the server. The server sends responses, such as result sets, over the connection
back to the client. This leads to several questions about character set and collation handling for client connections,
each of which can be answered in terms of system variables:
What character set is the query in when it leaves the client?
The server takes the character_set_client variable to be the character set in which queries are sent by the client.
What character set should the server translate a query to after receiving it?
For this, character_set_connection and collation_connection are used by the server. It converts queries sent by the
client from character_set_client to character_set_ connection (except for string literals that have an introducer such as
_latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with
column values, it does not matter because columns have a higher collation precedence.
What character set should the server translate to before shipping result sets or error messages back to the client?
The character_set_results variable indicates the character set in which the server returns query results to the client.
This includes result data such as column values, and result metadata such as column names.
You can fine-tune the settings for these variables, or you can depend on the defaults (in which case, you can skip this
section).
There are two statements that affect the connection character sets:
SET NAMES 'charset_name'
SET CHARACTER SET charset_name
SET NAMES indicates what is in the SQL statements that the client sends. Thus, SET NAMES 'cp1251' tells the server
"future incoming messages from this client will be in character set cp1251." It also specifies the character set for
results that the server sends back to the client. (For example, it indicates what character set column values will have if
you use a SELECT statement.)
A SET NAMES 'x' statement is equivalent to these three statements:
mysql> SET character_set_client = x;
mysql> SET character_set_results = x;
mysql> SET character_set_connection = x;
Setting character_set_connection to x also sets collation_connection to the default collation for x.
SET CHARACTER SET is similar but sets the connection character set and collation to be those of the default
database. A SET CHARACTER SET x statement is equivalent to these three statements:
mysql> SET character_set_client = x;
mysql> SET character_set_results = x;
mysql> SET collation_connection = @@collation_database;
When a client connects, it sends to the server the name of the character set that it wants to use. The server sets the
character_set_client, character_set_results, and character_set_connection variables to that character set. (In effect,
the server performs a SET NAMES operation using the character set.)
With the mysql client, it is not necessary to execute SET NAMES every time you start up if you want to use a
character set different from the default. You can add the --default-character-set option setting to your mysql
statement line, or in your option file. For example, the following option file setting changes the three character set
variables set to koi8r each time you run mysql:
[mysql]
default-character-set=koi8r
Example: Suppose that column1 is defined as CHAR(5) CHARACTER SET latin2. If you do not say SET NAMES or SET
CHARACTER SET, then for SELECT column1 FROM t, the server will send back all the values for column1 using the
character set that the client specified when it connected. On the other hand, if you say SET NAMES 'latin1' or SET
CHARACTER SET latin1, then just before sending results back, the server will convert the latin2 values to latin1.
Conversion may be lossy if there are characters that are not in both character sets.
If you do not want the server to perform any conversion, set character_set_results to NULL:
mysql> SET character_set_results = NULL;
Character String Literal Character Set and Collation
Every character string literal has a character set and a collation, which may not be null.
A character string literal may have an optional character set introducer and COLLATE clause:
[_charset_name]'string' [COLLATE collation_name]
Examples:
SELECT 'string';
SELECT _latin1'string';
SELECT _latin1'string' COLLATE latin1_danish_ci;
For the simple statement SELECT 'string', the string has the character set and collation defined by the
character_set_connection and collation_connection system variables.
The _charset_name expression is formally called an introducer. It tells the parser, "the string that is about to follow is
in character set X." Because this has confused people in the past, we emphasize that an introducer does not cause
any conversion, it is strictly a signal that does not change the string's value. An introducer is also legal before
standard hex literal and numeric hex literal notation (x'literal' and 0xnnnn), and before ? (parameter substitution
when using prepared statements within a programming language interface).
Examples:
SELECT _latin1 x'AABBCC';
SELECT _latin1 0xAABBCC;
SELECT _latin1 ?;
MySQL determines a literal's character set and collation thus:
If both _X and COLLATE Y were specified, then character set X and collation Y
If _X is specified but COLLATE is not specified, then character set X and its default collation
Otherwise, the character set and collation given by the character_set_connection and collation_connection system
variables
Examples:
A string with latin1 character set and latin1_german1_ci collation:
SELECT _latin1'Müller' COLLATE latin1_german1_ci;
A string with latin1 character set and its default collation (that is, latin1_swedish_ci):
SELECT _latin1'Müller';
A string with the connection default character set and collation:
SELECT 'Müller';
Character set introducers and the COLLATE clause are implemented according to standard SQL specifications.
Using COLLATE in SQL Statements
With the COLLATE clause, you can override whatever the default collation is for a comparison. COLLATE may be
used in various parts of SQL statements. Here are some examples:
With ORDER BY:
SELECT k
FROM t1
ORDER BY k COLLATE latin1_german2_ci;
With AS:
SELECT k COLLATE latin1_german2_ci AS k1
FROM t1
ORDER BY k1;
With GROUP BY:
SELECT k
FROM t1
GROUP BY k COLLATE latin1_german2_ci;
With aggregate functions:
SELECT MAX(k COLLATE latin1_german2_ci)
FROM t1;
With DISTINCT:
SELECT DISTINCT k COLLATE latin1_german2_ci
FROM t1;
With WHERE:
SELECT *
FROM t1
WHERE _latin1 'Müller' COLLATE latin1_german2_ci = k;
With HAVING:
SELECT k
FROM t1
GROUP BY k
HAVING k = _latin1 'Müller' COLLATE latin1_german2_ci;
COLLATE Clause Precedence
The COLLATE clause has high precedence (higher than ||), so the following two expressions are equivalent:
x || y COLLATE z
x || (y COLLATE z)
3.3.10 BINARY Operator
The BINARY operator is a shorthand for a COLLATE clause. BINARY 'x' is equivalent to 'x' COLLATE y, where y is the
name of the binary collation for the character set of 'x'. Every character set has a binary collation. For example, the
binary collation for the latin1 character set is latin1_bin, so if the column a is of character set latin1, the following
two statements have the same effect:
SELECT * FROM t1 ORDER BY BINARY a;
SELECT * FROM t1 ORDER BY a COLLATE latin1_bin;
Some Special Cases Where the Collation Determination Is Tricky
In the great majority of queries, it is obvious what collation MySQL uses to resolve a comparison operation. For
example, in the following cases, it should be clear that the collation will be "the column collation of column x":
SELECT x FROM T ORDER BY x;
SELECT x FROM T WHERE x = x;
SELECT DISTINCT x FROM T;
However, when multiple operands are involved, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should this query use the collation of the column x, or of the string literal 'Y'?
Standard SQL resolves such questions using what used to be called "coercibility" rules. The essence is: Because x and
'Y' both have collations, whose collation takes precedence? It's complex, but the following rules take care of most
situations:
An explicit COLLATE clause has a coercibility of 0. (Not coercible at all.)
A concatenation of two strings with different collations has a coercibility of 1.
A column's collation has a coercibility of 2.
A literal's collation has a coercibility of 3.
Those rules resolve ambiguities thus:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then it is an error if the collations aren't the same.
Examples:
column1 = 'A'
Use collation of column1
column1 = 'A' COLLATE x
Use collation of 'A'
column1 COLLATE x = 'A' COLLATE y
Error
The COERCIBILITY() function can be used to determine the coercibility of a string expression:
mysql> SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);
-> 0
mysql> SELECT COERCIBILITY('A');
-> 3
See Section 5.8.3, "Information Functions."
Collations Must Be for the Right Character Set
Recall that each character set has one or more collations, and each collation is associated with one and only one
character set. Therefore, the following statement causes an error message because the latin2_bin collation is not
legal with the latin1 character set:
mysql> SELECT _latin1 'x' COLLATE latin2_bin;
ERROR 1251: COLLATION 'latin2_bin' is not valid
for CHARACTER SET 'latin1'
In some cases, expressions that worked before MySQL 4.1 fail as of MySQL 4.1 if you do not take character set
and collation into account. For example, before 4.1, this statement works as is:
mysql> SELECT SUBSTRING_INDEX(USER(),'@',1);
+-------------------------------+
| SUBSTRING_INDEX(USER(),'@',1) |
+-------------------------------+
| root
|
+-------------------------------+
After an upgrade to MySQL 4.1, the statement fails:
mysql> SELECT SUBSTRING_INDEX(USER(),'@',1);
ERROR 1267 (HY000): Illegal mix of collations
(utf8_general_ci,IMPLICIT) and (latin1_swedish_ci,COERCIBLE)
for operation 'substr_index'
The reason this occurs is that usernames are stored using UTF8 (see Section 3.6, "UTF8 for Metadata"). As a result,
the USER() function and the literal string '@' have different character sets (and thus different collations):
mysql> SELECT COLLATION(USER()), COLLATION('@');
+-------------------+-------------------+
| COLLATION(USER()) | COLLATION('@')
|
+-------------------+-------------------+
| utf8_general_ci | latin1_swedish_ci |
+-------------------+-------------------+
One way to deal with this is to tell MySQL to interpret the literal string as utf8:
mysql> SELECT SUBSTRING_INDEX(USER(),_utf8'@',1);
+------------------------------------+
| SUBSTRING_INDEX(USER(),_utf8'@',1) |
+------------------------------------+
| root
|
+------------------------------------+
Another way is to change the connection character set and collation to utf8. You can do that with SET NAMES 'utf8'
or by setting the character_set_connection and collation_connection system variables directly.
An Example of the Effect of Collation
Suppose that column X in table T has these latin1 column values:
Muffler
Müller
MX Systems
MySQL
And suppose that the column values are retrieved using the following statement:
SELECT X FROM T ORDER BY X COLLATE collation_name;
The resulting order of the values for different collations is shown in this table:
latin1_swedish_ci
latin1_german1_ci
latin1_german2_ci
Muffler
Muffler
Müller
MX Systems
Müller
Muffler
Müller
MX Systems
MX Systems
MySQL
MySQL
MySQL
The table is an example that shows what the effect would be if we used different collations in an ORDER BY clause.
The character that causes the different sort orders in this example is the U with two dots over it, which the Germans
call U-umlaut, but we'll call it U-dieresis.
The first column shows the result of the SELECT using the Swedish/Finnish collating rule, which says that U-dieresis
sorts with Y.
The second column shows the result of the SELECT using the German DIN-1 rule, which says that U-dieresis sorts with
U.
The third column shows the result of the SELECT using the German DIN-2 rule, which says that U-dieresis sorts with UE.
Three different collations, three different results. That's what MySQL is here to handle. By using the appropriate
collation, you can choose the sort order you want.