Database

Databases - Design and ManagementWhat is a database?

A database stores electronic data in an organized and accessible manner. The size and scope of databases can vary wildly, from a small database used by an individual to file phone numbers, to an extremely large enterprise database that stores several terabytes of information accessed by tens of thousands of individuals. A database's tables and rows hold data in much the same way as a file cabinet or an address book, but they allow users more flexibility in what kinds of searches they can perform on the data as well as how they want to view the data

What is a DBMS?

A Database Management System (DBMS) is a software system that is used both to create databases and manage the information stored within them. The architecture of the DBMS will frequently determine or limit the possible uses of the databases it creates. Some DBMS's work best for creating single-user databases, while others can build databases that accommodate multiple users in larger corporate environments.

What is a DBA?

A Database Administrator (DBA) is crucial to any organization that has mission-critical information stored in its databases. DBAs are responsible for the design and administration of databases, and for ensuring that the highest level of data integrity is maintained. Database maintenance includes performance tuning, monitoring the logs for errors, performing backup and recovery procedures as well as tasks associated with the data stored in the database, which include data transfer, data replication and data cleansing.

Database Architectures - RDBMS, OODBMS, and ORDBMS

A database consists of one or more tables, each containing data stored as individual records. Different database architectures determine how the tables and records are organized or related to one another. The first database architectures - Hierarchical and Networked - have largely been superseded by the Relational, Object-Oriented and Object-Relational architectures. The Relational Data Model, developed by Todd Codd in 1969, allows multiple tables to be related to one another within a database.

For example, one customer's information could be recorded in separate tables such as "Personal Information", "Marketing Efforts", and "Service Requests". The information stored in these tables will then relate back to the customer's main record. A relational database management system (RDBMS) also offers flexibility in terms of how the customer's data can be viewed.

To access the information stored in relational databases, users can either build queries using the Structured Query Language (SQL), or they can utilize a user interface that translates their requests into SQL and displays the results. While the American National Standards Institute (ANSI) approved an early version of SQL as a standard, many RDBMS's also use customized, proprietary forms of the language. The Object-Oriented database model emerged in the mid-1980s due to the dissatisfaction of some database users with the structural limitations of RDBMS's. The Object-Oriented model defines each piece of data and its associated processes as an individual object. According to the basic tenets of this model, all information about an object is stored in one place instead of being stored across multiple tables, as is done in the relational model.

An Object-Oriented Database Management System (OODBMS) also integrates more easily with applications that have been written with an Object-Oriented programming language such as C++ or Java. Despite the advantages of the Object-Oriented approach, no standard model for the construction of an OODBMS yet exists. For this reason, at least in part, RDBMS's still dominate the database market. One effort to combine the best parts of the RDBMS and OODBMS is the Object-Relational Database Management System (ORDBMS). This model allows developers to incorporate the best parts of an RDBMS and an ODBMS. An ORDBMS works with objects like an ODBMS, but also allows SQL-based querying like an RDBMS. Market Leaders As of June 2001, the Oracle Corporation controlled 33.8% of the database market, IBM held 30.1% and Microsoft had 14.9%.

While Oracle once dominated the market, the allegiance has been shifting as major enterprise application vendors such as SAP, PeopleSoft and Siebel Systems have pushed their primary application development efforts to IBM's DB2 database.

IBM's recently completed acquisition of Informix has consolidated the number of players in the database market. It remains to be seen what impact the various open source DBMS's will have on the market, and the extent to which they will erode the market share of the current leaders. The Future of Databases Databases play an important role in both data management and data storage in today's Information Age. The high value placed on information-gathering by companies as well as individuals requires efficient methods of storing and accessing information. Database architectures and products will need to be even more highly scalable to accommodate and support this increasing production of data in the future.

According to Michael Lesk, the huge amount of data would take several billion gigabytes or several thousand petabytes to store. Today, the importance and impact of databases is unquestioned, as government organisations, academic institutions and business entities create and maintain extensive databases containing all kinds of information ranging from natural-language text documents, statistical tables, financial data, and multimedia objects to data of a scientific and technical nature. Many databases are composed of metadata, which means the records hold data about data such as information about the size and character of another database rather than primary source content such as a person's name and address. Database technologies, including architecture and access methods, are rapidly developing to keep pace with this demand for information management mechanisms.

Database designers and managers face many challenges that reflect the complexity of the burgeoning information environment. Database technologies must handle massive amounts of data, extract useful information from these repositories, and have the ability to reflect relationships between data maintained in different databases. In addition, the architecture and system must provide integrity, recovery, concurrency and security

To answer these challenges, the three fundamental database models, hierarchical, network and relational, have served as a foundation for developing more powerful and flexible data models, such as the extended-relational and object-relational models. Well defined architecture and data schema assure efficient, logical data storage which increases database capacity and extends the capabilities of query languages and other access methods. In addition, data mining creates useful information by identifying related data within the vast stores. Researchers now wrestle with the complexities of relational issues and interoperability. Researchers are now able to use meta data more efficiently to improved data dissemination. Researchers are also able to use federated strategies for distributed databases.

What is a Database

The collection of records kept for a common purpose is known as a database. That is, a database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning.

An entity is a collection of distinguishable real-world objects with common properties. Examples include an employee or a project that is described in the database.

An attribute is a data item that describes a property of an entity or a relationship. For example, the employee's name or salary.

Given an ordered list of m entities, E1, E2,..., Em (Where the same entity may occur more than once in the list), a relationship R defines a rule of correspondence between the instances of these entities. Specifically, R represents a set of m-tuples, a subset of the Cartesian product of entity instances E1 * E2 * ... * Em.

That is, a relationship among two or more entities an interaction among the entities; for example, a works-on relationship between an employee and a project.

What is a Database Management System

Modern database technology provides the means to store, manage and access large amounts of information. A database management system (DBMS) is a program product for keeping computerized records about an enterprise. In other words, it is a collection of programs that enables users to create and maintain a database. A DBMS may deal with more than one database at a time.

A number of characteristics distinguish the database approach from the traditional approach of programming with files. In traditional file processing, each user defines and implements the files needed for a specific application as part of programming the application. For example, one user in the grade reporting office may keep a file on students and their grades. Programs to print a student's transcript and to enter new grades into the file are implemented. A second user in the accounting office may keep track of students' fees and their payments. Although both users are interested in data about students, each user maintains separates files and programs to manipulate these files, because each requires some data not available from the other user's files. The redundancy in defining and storing data results in wasted storage space and in redundant efforts to maintain common data up-to-date. In the database approach, a single repository of data is maintained that is defined once and then is accessed by various users.

The characteristic that allows program-data independence and program-operation independence is called data abstraction. One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are not needed by most database users.

A data model - a collection of concepts that can be used to describe the structure of a database - provides the necessary means to achieve this much needed feature of data abstraction. By structure of a database we mean the data types, relationships and constraints that should hold on the data. Most data models also include a set of basic operations for specifying retrievals and updates on the database.

In any data model, it is important to distinguish between the description of the database and the database itself. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently.

A multiuser DBMS must allow multiple users to access the database at the same time. This is essential if data for multiple applications is to be integrated and maintained in a single database. The DBMS must include concurrency control software to ensure that several users trying to update the same data do so in a controlled manner so that the result of the updates is correct. These types of applications are called on-line transaction processing (OLTP).

System analysts determine the requirements of end users, especially naive end users, and develop specifications for canned transactions that meet these requirements.

Application programmers implement these specifications as programs; then they test, debug, document, and maintain these canned transactions.

Advantages of having a DBMS

Controlling redundancy
Restricting unauthorized access
Providing persistent storage for program objects and data structures.
Permitting inferencing and actions using rules
Permitting multiple user interfaces
Representing complex relationships among data.
Enforcing Integrity Constraints
Providing backup and recovery

Data warehouses and on-line analytical processing (OLAP) systems are used in many companies to extract and analyze useful information from very large databases for decision making.

The Three-Schema DBMS Architecture

The goal of the three-schema architecture is to separate the user applications and the physical database. In this architecture, schemas can be defined at the following three levels.

The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database.
The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints.
The external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group.

The DBMS Languages

Once the design of a database is completed and a DBMS is chosen to implement the database base, the first order of the day is to specify conceptual and internal schemas for the database and any mappings between the two. In many DBMSs where no string separation levels is maintained, one language, called the data definition language (DDL), is used by the DBA and by database designers to define both schemas. The DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog.

In DBMS where a clear separation is maintained between the conceptual and internal levels, the Data Definition Language (DDL) is used to specify the conceptual schema only. Another language, the storage definition language (SDL), is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages. For a true three-schema architecture, we would need a third language, the view definition language (VDL), to specify user views and their mappings to the conceptual schema, but in most DBMS the DDL is used to define both conceptual and external schemas.

Once the database schemas are compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion , deletion and modification of the data. The DBMS provides a data manipulation language (DML) for these purposes.

In current DBMSs, the preceding types of languages are usually not considered distinct languages; rather, a comprehensive integrated language is used that includes constructs for conceptual schema definition, view definition, and data manipulation. Storage definition is typically kept separate, since it is used for defining physical storage structures to fine-tune the performance of a database system. A typical example of a comprehensive database language is the SQL relational database language which represents a combination of DDL, VDL, and DML, as well as statements of constraint specification and schema evolution.

Many DBMSs have forms specification language for form-based interfaces.