Dictionary Definition
database n : an organized body of related
information
User Contributed Dictionary
English
Noun
- A collection of (usually) organized information in a regular
structure, usually but not necessarily in a machine-readable
format accessed by a computer.
- I have a database of all my contacts in my filoFAX.
- A software program for storing, retrieving and manipulating a
database(1).
- Which database do you use, MySql or Access?
- A combination of (1) and (2).
Translations
collection of information
- Croatian: baza podataka
- Czech: databáze
- Finnish: tietokanta
- German: Datenbank
- Greek: βάση δεδομένων (vási dedoménon)
- Hebrew: בסיס נתונים
- Japanese: データベース
- Portuguese: banco de dados, base de dados
- Russian: база данных (báza dánnykh)
- Sindhi:
- Volapük: nünodem
software program
- Croatian: baza podataka
- Czech: databáze
- Finnish: tietokantaohjelma
- German: Datenbank
- Greek: βάση δεδομένων (vási dedoménon)
- Hebrew: בסיס נתונים
- Japanese: データベース
- Portuguese: base de dados
- Russian: база данных (báza dánnykh)
- ttbc Arabic: قاعدة البيانات(plural: قواعد البيانات)
- ttbc Dutch: database, databank
- ttbc Esperanto: datumbazo
- ttbc French: base de données (official term), database (deprecated term)
- ttbc Italian: database
- ttbc Polish: baza danych
- ttbc Spanish: base de datos
- ttbc Swedish: databas
Dutch
Noun
database- database
French
Noun
database- database
Synonyms
Italian
Noun
database- database
Extensive Definition
A database is a structured collection of
records or data. A computer database relies upon
software to organize
the storage of data. The software models the database structure in
what are known as database
models. The model in most common use today is the relational
model. Other models such as the hierarchical
model and the network
model use a more explicit representation of relationships (see
below for explanation of the various database models).
Database management systems (DBMS) are the
software used to organize and maintain the database. These are
categorized according to the database
model that they support. The model tends to determine the query
languages that are available to access the database. A great deal
of the internal engineering of a DBMS, however, is independent of
the data model, and is concerned with managing factors such as
performance, concurrency, integrity, and recovery from hardware
failures. In these areas there are large differences between
products.
History
The earliest known use of the term data base was in November 1963, when the System Development Corporation sponsored a symposium under the title Development and Management of a Computer-centered Data Base. Database as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers. (The abbreviation DB, however, survives.)The first database management systems were
developed in the 1960s. A pioneer in the field was Charles
Bachman. Bachman's early papers show that his aim was to make
more effective use of the new direct access storage devices
becoming available: until then, data processing had been based on
punched
cards and magnetic
tape, so that serial processing was the dominant activity. Two
key data
models arose at this time: CODASYL developed
the network
model based on Bachman's ideas, and (apparently independently)
the hierarchical
model was used in a system developed by North
American Rockwell later adopted by IBM as the cornerstone
of their
IMS product. While IMS along with the CODASYL IDMS were the big,
high visibility databases developed in the 1960s, several others
were also born in that decade, some of which have a significant
installed base today. Two worthy of mention are the PICK
and MUMPS
databases, with the former developed originally as an operating
system with an embedded database and the latter as a programming
language and database for the development of healthcare
systems.
The relational
model was proposed by E. F.
Codd in 1970. He criticized existing models for confusing the
abstract description of information structure with descriptions of
physical access mechanisms. For a long while, however, the
relational model remained of academic interest only. While CODASYL
products (IDMS) and network model products (IMS) were conceived as
practical engineering solutions taking account of the technology as
it existed at the time, the relational model took a much more
theoretical perspective, arguing (correctly) that hardware and
software technology would catch up in time. Among the first
implementations were Michael
Stonebraker's Ingres at
Berkeley, and the System R project
at IBM. Both of these were research prototypes, announced during
1976. The first commercial products, Oracle
and DB2,
did not appear until around 1980. The first successful database
product for microcomputers was dBASE for the
CP/M and
PC-DOS/MS-DOS operating
systems.
During the 1980s, research activity focused on
distributed
database systems and database
machines. Another important theoretical idea was the Functional
Data Model, but apart from some specialized applications in
genetics, molecular biology, and fraud investigation, the world
took little notice.
In the 1990s, attention shifted to object-oriented
databases. These had some success in fields where it was
necessary to handle more complex data than relational systems could
easily cope with, such as spatial
databases, engineering data (including software repositories),
and multimedia data. Some of these ideas were adopted by the
relational vendors, who integrated new features into their products
as a result. The 1990s also saw the spread of Open Source
databases, such as PostgreSQL and
MySQL.
In the 2000s, the fashionable area for innovation
is the XML
database. As with object databases, this has spawned a new
collection of start-up companies, but at the same time the key
ideas are being integrated into the established relational
products. XML databases aim to remove the traditional divide
between documents and data, allowing all of an organization's
information resources to be held in one place, whether they are
highly structured or not.
Database models
Various techniques are used to model data structure. Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model. For any one logical model various physical implementations may be possible, and most products will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance. Here are three examples:Hierarchical model
In a hierarchical model, data is organized into an inverted tree-like structure, implying a multiple downward link in each node to describe the nesting, and a sort field to keep the records in a particular order in each same-level list. This structure arranges the various data elements in a hierarchy and helps to establish logical relationships among data elements of multiple files. Each unit in the model is a record which is also known as a node. In such a model, each record on one level can be related to multiple records on the next lower level. A record that has subsidiary records is called a parent and the subsidiary records are called children. Data elements in this model are well suited for one-to-many relationships with other data elements in the database.This model is advantageous when the data elements
are inherently hierarchical. The disadvantage is that in order to
prepare the database it becomes necessary to identify the requisite
groups of files that are to be logically integrated. Hence, a
hierarchical data model may not always be flexible enough to
accommodate the dynamic needs of an organisation.
Network model
The network model tends to store records with links to other records. Each record in the database can have multiple parents, i.e., the relationships among data elements can have a many to many relationship. Associations are tracked via "pointers". These pointers can be node numbers or disk addresses. Most network databases tend to also include some form of hierarchical model. Databases can be translated from hierarchical model to network and vice versa. The main difference between the network model and hierarchical model is that in a network model, a child can have a number of parents whereas in a hierarchical model, a child can have only one parent.The network model provides greater advantage than
the hierarchical model in that it promotes greater flexibility and
data accessibility, since records at a lower level can be accessed
without accessing the records above them. This model is more
efficient than hierarchical model, easier to understand and can be
applied to many real world problems that require routine
transactions. The disadvantages are that: It is a complex process
to design and develop a network database; It has to be refined
frequently; It requires that the relationships among all the
records be defined before development starts, and changes often
demand major programming efforts; Operation and maintenance of the
network model is expensive and time consuming.
Examples of database engines that have network
model capabilities are RDM Embedded
and RDM Server.
Relational model
The basic data structure of the relational model is a table where information about a particular entity (say, an employee) is represented in columns and rows. The columns enumerate the various attributes of an entity (e.g. employee_name, address, phone_number). Rows (also called records) represent instances of an entity (e.g. specific employees).The "relation" in "relational database" comes
from the mathematical notion of relations
from the field of set theory. A
relation is a set of tuples, so rows are sometimes
called tuples. All tables in a relational database adhere to three
basic rules.
- The ordering of columns is immaterial
- Identical rows are not allowed in a table
- Each row has a single (separate) value for each of its columns (each tuple has an atomic value).
If the same value occurs in two different records
(from the same table or different tables) it can imply a
relationship between those records. Relationships between records
are often categorized by their
cardinality (1:1, (0), 1:M, M:M).
Tables can have a designated column or set of
columns that act as a "key" to select rows from that table with the
same or similar key values. A "primary key" is a key that has a
unique value for each row in the table. Keys are commonly used to
join or combine data from two or more tables. For example, an
employee table may contain a column named address which contains a
value that matches the key of a address table. Keys are also
critical in the creation of indexes, which facilitate fast
retrieval of data from large tables. It is not necessary to define
all the keys in advance; a column can be used as a key even if it
was not originally intended to be one.
Relational operations
Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web applications, such as Wikipedia, perform SQL queries when generating pages.In response to a query, the database returns a
result set, which is the list of rows constituting the answer. The
simplest query is just to return all the rows from a table, but
more often, the rows are filtered in some way to return just the
answer wanted. Often, data from multiple tables are combined into
one, by doing a join. There
are a number of relational operations in addition to join.
Normal forms
Relations are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal form.Database Management Systems
Relational database management systems
An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:The entire information content of the database is
represented in one and only one way. Namely as explicit values in
column positions (attributes) and rows in relations (tuples) Ergo, there are no
explicit pointers between related tables.
Post-relational database models
Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes.Object database models
In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.A variety of these ways have been tried for
storing objects in a database. Some products have approached the
problem from the application programming end, by making the objects
manipulated by the program
persistent. This also typically requires the addition of some
kind of query language, since conventional programming languages do
not have the ability to find objects based on their information
content. Others have attacked the problem from the database end, by
defining an object-oriented data model for the database, and
defining a database programming language that allows full
programming capabilities as well as traditional query
facilities.
DBMS internals
Storage and physical database design
Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.Other important design choices relate to the
clustering of data by category (such as grouping data by month, or
location), creating pre-computed views known as materialized views,
partitioning data by range or hash. As well memory management and
storage topology can be important design choices for database
designers. Just as normalization is used to reduce storage
requirements and improve the extensibility of the database,
conversely denormalization is often used to reduce join complexity
and reduce execution time for queries.
Indexing
All of these databases can take advantage of indexing to increase their speed, and this technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.Relational DBMS's have the advantage that indexes
can be created or dropped without changing existing applications
making use of it. The database chooses between many different
strategies based on which one it estimates will run the fastest. In
other words, indexes are transparent to the application or end-user
querying the database; while they affect performance, any SQL
command will run with or without index to compute the result of an
SQL statement.
The RDBMS will produce a plan of how to execute the query, which is
generated by analyzing the run times of the different algorithms
and selecting the quickest. Some of the key algorithms that deal
with joins are
nested
loop join, sort-merge
join and hash join.
Which of these is chosen depends on whether an index exists, what
type it is, and its
cardinality.
An index speeds up access to data, but it has
disadvantages as well. First, every index increases the amount of
storage on the hard drive necessary for the database file, and
second, the index must be updated each time the data are altered,
and this costs time. (Thus an index saves time in the reading of
data, but it costs time in entering and altering data. It thus
depends on the use to which the data are to be put whether an index
is on the whole a net plus or minus in the quest for
efficiency.)
A special case of an index is a primary index, or
primary key, which is distinguished in that the primary index must
ensure a unique reference to a record. Often, for this purpose one
simply uses a running index number (ID number). Primary indexes
play a significant role in relational databases, and they can speed
up access to data considerably.
Transactions and concurrency
In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here:- Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
- Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
- Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
- Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes
In practice, many DBMS's allow most of these
rules to be selectively relaxed for better performance.
Concurrency
control is a method used to ensure that transactions are
executed in a safe manner and follow the ACID rules. The DBMS must
be able to ensure that only serializable,
recoverable schedules are allowed, and that no actions of
committed transactions are lost while undoing aborted transactions
.
Replication
Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:- Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
- Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
- Multimaster: Two or more replicas sync each other via a transaction identifier.
Parallel synchronous replication of databases
enables transactions to be replicated on multiple servers
simultaneously, which provides a method for backup and security as
well as data availability.
Security
Database security denotes the system, processes, and procedures that protect a database from unintended activity.In the United Kingdom legislation protecting the
public from unauthorized disclosure of personal information held on
databases falls under the Office of the Information Commissioner.
United Kingdom based organizations holding personal data in
electronic format (databases for example) are required to register
with the Data Commissioner. (reference: http://www.ico.gov.uk/)
Locking
Locking is the act of putting a lock (access restriction) on an aspect of a database which at a particular given instance is being modified. Such locks can be applied on a row level, or on other levels such as an entire table. This helps maintain the integrity of the data by ensuring that only one user at a time can modify the data. Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See IBM for more detail.Architecture
Depending on the intended use, there are a number
of database architectures in use. Many databases use a combination
of strategies. On-line Transaction Processing systems (OLTP) often
use a row-oriented datastore architecture, while data-warehouse and
other retrieval-focused applications like Google's BigTable, or
bibliographic database(library catalogue) systems may use a
column-oriented datastore architecture.
Document-Oriented, XML, Knowledgebases, as well
as frame databases and rdf-stores (aka Triple-Stores), may also use
a combination of these architectures in their implementation.
Finally it should be noted that not all database
have or need a database 'schema' (so called schema-less
databases).
Applications of databases
Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.For example suppliers database contains the data
relating to suppliers such as;
- supplier name
- supplier code
- supplier address
It is often used by schools to teach students and
grade them.
Database as Cultural Form
Although originally a computer technology, the
database, according to media theorist Lev
Manovich, is becoming a new cultural form in its own right and
a genre of new media. A cultural form is one of many ways that
people represent the world—art and literature, for example. As
contemporary culture is gradually computerized, Manovich argues,
traditional cultural forms are being replaced with new ones that
derive from the computer. He calls this transcoding. The database is
the computer age’s key form of cultural expression, as narrative
was to the modern age via cinema. In this analysis, he is using
"database" metaphorically. See also database
cinema.
Katherine
Hayles has argued, in response, that narrative and database are
not in opposition but rather are natural symbionts.
Links to DBMS products
- 4D
- ADABAS
- Alpha Five
- Apache Derby (Java, also known as IBM Cloudscape and Sun Java DB)
- BerkeleyDB
- CouchDB
- CSQL
- Datawasp
- dBase
- FileMaker
- Firebird (database server)
- H2 (Java)
- Hsqldb (Java)
- IBM DB2
- IBM IMS (Information Management System)
- IBM UniVerse
- Informix
- Ingres
- Interbase
- InterSystems Caché
- MaxDB (formerly SapDB)
- Microsoft Access
- Microsoft SQL Server
- Model 204
- MySQL
- Nomad
- Objectivity/DB
- OpenLink Virtuoso
- OpenOffice.org Base
- Oracle Database
- Paradox (database)
- Polyhedra DBMS
- PostgreSQL
- Progress 4GL
- RDM Embedded
- ScimoreDB
- SQLite
- Superbase
- Sybase
- Teradata
- Vertica
- Visual FoxPro
References
- Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002.
- Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
- Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
- Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
- Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
- Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
- Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
- Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6.
- Shih, J. "Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
- Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
- Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.
See also
External links
- comp.databases.theory (Database Theory Discussion Group)
- Web page about FSQL: References and links about FSQL
- Increase Database Performance
- Wiki about Oracle, all for Oracle in spanish
- The EM-DAT International Disaster Database
- The CE-DAT Complex Emergency Database
database in Afrikaans: Databasis
database in Arabic: قاعدة بيانات
database in Azerbaijani: Verilənlər bazası
database in Belarusian: База дадзеных
database in Belarusian (Tarashkevitsa): База
дадзеных
database in Bosnian: Baza podataka
database in Breton: Stlennvon
database in Bulgarian: База данни
database in Catalan: Base de dades
database in Czech: Databáze
database in Danish: Database
database in German: Datenbank
database in Estonian: Andmebaas
database in Modern Greek (1453-): Βάση
δεδομένων
database in Spanish: Base de datos
database in Esperanto: Datumbazo
database in Basque: Datu-base
database in Persian: دادگان
database in French: Base de données
database in Irish: Bunachar sonraí
database in Galician: Base de datos
database in Korean: 데이터베이스
database in Hindi: डेटाबेस
database in Croatian: Baza podataka
database in Indonesian: Basis data
database in Interlingua (International Auxiliary
Language Association): Base de datos
database in Icelandic: Gagnagrunnur
database in Italian: Database
database in Hebrew: בסיס נתונים
database in Georgian: მონაცემთა ბაზა
database in Latvian: Datu bāze
database in Lithuanian: Duomenų bazė
database in Hungarian: Adatbázis
database in Malayalam: ഡാറ്റാബേസ്
database in Malay (macrolanguage): Pangkalan
data
database in Dutch: Database
database in Japanese: データベース
database in Norwegian: Database
database in Uzbek: Ma'lumotlar Bazasi
database in Polish: Baza danych
database in Portuguese: Banco de dados
database in Romanian: Bază de date
database in Russian: База данных
database in Albanian: Baza e të dhënave
database in Sinhala: දත්ත සමුදාය
database in Simple English: Database
database in Slovak: Databáza
database in Slovenian: Podatkovna baza
database in Serbian: База података
database in Finnish: Tietokanta
database in Swedish: Databas
database in Tagalog: Database
database in Tamil: தரவுத்தளம்
database in Thai: ฐานข้อมูล
database in Vietnamese: Cơ sở dữ liệu
database in Turkish: Veri tabanı
database in Ukrainian: База даних
database in Chinese: 数据库