The construction of Object-Oriented Database Management Systems started in the middle 80s, at a prototype building level, and at the beginning of the 90s the first commercial systems appeared. The interest for the development of such systems stems from the need to cover the modeling deficiencies of their predecessors, that is the relational database management systems. They were intended to be used by applications that have to handle big and complex data such as Computer Aided Engineering, Computer Aided Design, and Office Information Systems.
The area of the OODBMSs is characterized by three things. First, it lacks a common data model. There is no common data model although many proposals can be found in the literature. This is a more general problem of all the object-oriented systems not only the database management systems. Since the data model determines the database language of the system, which in turn determines the implementation of the system, we can understand that the differences between the various systems with different data models can be big and substantial. Second is the common theoretical framework.
Although there is no standard object-oriented model, most object-oriented database systems hat are operational or under development today share a set of fundamental object-oriented concepts. Therefore the implementation issues in OODBMSs that arise due to these concepts are universal. The third characteristic is that of experimental activity. Plenty of prototypes have been implemented and some ! of them became commercial products. There is really a need for applications to handle very complex data and that is why the interest of people in building such systems is so strong.
Although there is no consensus on what an OODBMS is and which are the features that differentiate it from other systems, there has been a lot of effort for an greement on defining the formal characteristics that can stand as the set of specification requirements for the construction of such a system. These should also be used as the set of features that one has to check in order to find out if a system is really an OODBMS. The features of the OODBMS can be divided as follows: * mandatory features: these are the features that one system should have in order to deserve the title OODBMS. optional features: these are the features that if one system has, should be considered better than another that does not have them, provided that both have all the mandatory features. open choices: these are features that a designer of a system can choose if and how to implement. They represent the degrees of freedom left to the system designers. An OODBMS should be a database management system and at the same time an object oriented system. The first characteristic is translated to the following features: persistence, concurrency, recovery, secondary storage management, and ad hoc query mechanisms.
The second characteristic is translated to the following: composite objects, object-identity, encapsulation, inheritance overriding and late binding, extensibility, and computational completeness of he database language used. Composite objects can be built recursively from simpler ones by applying constructors to them. These simpler objects can be integers, characters, strings, booleans, and in general objects of types that all the programming languages possess. There are various constructors such as list, set, bag, array, tuple, etc.
The minimal set of constructors that a system must have is: set (to represent unordered collections of real world objects), list (to represent ordered collections of real world objects), tuple (to represent properties of real world objects). A system that supports composite objects and therefore constructors for their building, should also support operators for the retrieval, insertion, and deletion of their component objects. That means that the database language should be extended in a way that these operators will be included.
The identity of an object is what makes it different from all the other objects. This allows the objects to be independent of their values. Therefore the notion of identical objects is introduced: two objects are equal if they have the same values, but are identical if they have the same object identity. The fact that each object possesses an identity facilitates the handling of composite objects since it makes the common use of objects possible and it protects the consistency of the database.
If a component object is changed, this change affects all the composite objects that reference it. Due to the object identity, there is no need for replicates, and that is how the consistency of the database is protected. The mechanism of encapsulation allows the hiding of the internal state of the objects. The internal state of an object is not liable to direct access. It can only be accessed by its methods. Objects that have this ability are called encapsulated objects. There are many types of encapsulation including: full, write, and partial.
Using full encapsulation, all the operations on objects are done via message sending and method execution. In write encapsulation, the internal state of the objects is visible only for reading operations. Partial encapsulation involves allowing direct access for reading and writing for only a part of the internal state (private and public part). The use of the same message for different methods that belong to different classes can facilitate he design of the database as well as of the applications that access it.
In general, since the internal structure of an object is not visible by the other objects, we can assign to methods with the same functionality the same message even if their implementation is different. This is called overloading of the message. Since a message can correspond to more than one method, the code of the method that has to be executed can only be found at run time. That means that while an application is executed, it can be found out if the message sent is applicable to the object.
If not the application ends up with a run-time error. The fact that the piece of code that should be executed is bound at run-time is called late binding. The hierarchies of the classes are based on the principle of inheritance which is considered one of the most basic of the object-oriented systems. Inheritance is an antisymmetric, transitive, binary relationship that can exist between two classes A and B from which the A is called a subclass of B and B is called a superclass of A.
The relationship has many common characteristics with the ancestor/descendant relationship since a class has direct and indirect subclasses as an ancestor has direct and indirect descendants. In general a superclass can have one or more direct subclasses, although the number of direct superclasses that a subclass can have is not the same for all the models. In fact, in all the models, all the classes have at least one superclass but there are some models that do not allow classes to have more than one. These are called single inheritance models and the rest multiple inheritance models.
According to the concept of inheritance, the subclasses ca! n inherit methods and attributes from their superclasses. That means that inheritance is the mechanism that allows the generation of new software modules rom existing software modules. There are four kinds of inheritances that have slightly different semantics: * Substitution inheritance: if class A is a subclass of class B, then any object of class B can be substituted by an object of the class A. That means that the set of messages that constitute the interface of class A is a superset of the set of messages of class B. Inclusion inheritance: if class A is a subclass of class B, then objects of A and B have the same internal structure although they may share the same methods and messages. This kind of inheritance corresponds to the notion of classification.
Constrain inheritance: if class A is a subclass of class B, then any object of class A has the same internal structure with any object of class B, but also satisfies a certain condition e. g. if “child” is a subclass of the class “person” and they share the attribute “age”, then any instance of the class “child” must satisfy the condition its age to be less than 10. Specialization inheritance: if class A is a subclass of class B, then the set of instances of A is a subset of the set of instances of B. One of the necessary constituents of a DBMS is the data definition and manipulation language (DDML), also called database language. The use of this language allows persistent data to be created, updated, deleted, or retrieved. The database languages that were used by the RDBMSs were based on the relational calculus or the relational algebra and hence were not computationally complete although mathematically founded.
An OODBMS should have a computationally complete database language because: it can be used for the methods of the classes, for applications that are written in the same language, there is no need of transformation of the data structures or mapping of the data (impedance mismatch), and programmers do not need to earn another language if they choose to write their applications using this language. The designers of the OODBMSs that currently exist preferred to use as database languages some of the most popular programming languages (C++, Smalltalk, Common Lisp, etc. than creating their own. In order to do this, however, they had to expand the semantics of the language they chose in certain ways so that persistent data could be handled. Besides, if the language chosen was not an object-oriented one, its semantics should be further expanded in a way that the object-oriented concepts could be included. Each database system comes with a set of predefined types (integer, real, char, string). This set should be extensible i. e. the user should be able to define his/her own types and treat them in the same way he/she treats the predefined ones.
In other words, user types and system types should have the same status although perhaps they are differently supported by the system itself. Persistence is one of the most basic features of a DBMS (at least the most evident one) and hence of an OODBMS. It is the ability of the programmer to have his/her own data survive the execution of a process so that he/she can ventually reuse it in another process. For an object-oriented system, there is an additional requirement which stems from the extensibility requirement, that any object must be able to become persistent independently of its type.
The secondary storage management is one of the most important DBMS features. It should include a set of mechanisms that improve the performance of the system like indexing, clustering, access path selection, data buffering, or caching. The designer of the databases should be able to choose if he/she will activate these mechanisms or not, although for application programmers the use f these mechanisms should be transparent and not require special effort for their maintenance. The database management system should be able to support many users.
That means that they must provide special mechanisms for the concurrency of the accesses of the data, and the arbitration in case of conflicts. Such mechanisms have already been provided by the RDBMSs and hence should also be provided by the OODBMSs. A basic requirement for a database system is that in case of a hardware or software failure, the system should be able to bring itself back to the most recent coherent state of the data. This feature has to o with the concurrency control and the transaction management, but also requires extra mechanisms that have already been explored and studied for the RDBMSs.
A DBMS should provide its users with a simple interactive way of making ad-hoc queries and receiving answers. For this purpose, the OODBMS can provide a special query language as the RDBMSs did, a specially extended programming language, or some graphical tools (browsers, forms, etc. ). Whatever they do provide should satisfy the following: it should be high-level so that the queries will be simple and easily understood by humans, it should be efficient, nd it should be application independent.
In this section I will analyze the optional features that an OODBMS should have. Some of them have to do with the object-oriented nature of the system, and some others with the handling of persistent data. Multiple inheritance allows the creation of a new class from one or more other classes. There is no general agreement if a system must support multiple inheritance or not. It is true, however, that the systems that support this feature make the application design easier since multiple inheritance is a more powerful tool than single inheritance.
On the other hand, many problems arise from the support of multiple inheritance that have to do with conflicts among the attributes and methods inherited by more than one arbitrary class. The degree of type checking performed at compile time should be as great as possible. The optimal situation is where a program that was accepted by the compiler cannot produce any run-time errors. It is desirable for a system to be distributed although that is independent from the fact that it is an object-oriented system.
Concurrency control is one of the mandatory features of a DBMS, but the current systems are intended to be sed for handling very long data like images, sound, text, etc. and consequently they should provide special transaction mechanisms in order to allow the efficient handling of such kinds of data. The RDBMSs do not support such handling and therefore the object-oriented technology had to enhance the classical transaction framework with long and nested transactions. Most of the applications evolve and they do no acquire a stable state until a long time after their initial implementation.
For this reason it should be possible to do the following: the old data should not be overridden by new ones but should e kept and coexist as older versions of the same object and not as independent objects; and in case of schema changes, the data that corre! spond to previous schemas should not be thrown away but should evolve following the schema evolution. There is a set of features, finally, for which the designers can choose among different implementations that are not equivalent, but they have certain advantages and disadvantages. There are plenty of programming models (C++, Lisp, Smalltalk, etc. , but none of them should be considered better than the others. The designers choose the programming model of their system according o the kind of applications that the system is going to serve. The choice of the programming style is open as well. The one that better suits the applications should be chosen. The representation system is the set of the types or classes provided by the system as well as the set of constructors that can be applied on these classes. As long as the system provides support for extensibility and composite objects, there is no restriction of which member the representation should contain.
There are systems that support the highest degree of uniformity, which means that everything in the system includ! ng classes, methods, messages, etc. is treated as an object. Uniformity has consequences at the level of the implementation of the system and at the level of the application programming and the user interface as well. Although uniformity is a nice feature and simplifies the implementation of the system, it can sometimes confuse the users since in reality there is no absolute uniformity.
The design of the relational database system and the mechanisms that they use have been mathematically founded. Most of them are the result of long research periods that lead to the successful solving of the most important problems that ccurred in these systems. The object-oriented database systems, since they are fairly new, do not have a very sound theoretical solution for many of the issues that arise from their implementations. Here are some of the problems introduced by the new approach: * The object-oriented model contains some concepts whose semantics are still under discussion.
There is no standard data model and consequently there is no standard methodology for designing an object-oriented scheme. For the relation systems on the contrary, the ER diagram is totally acceptable. The query language of the relational systems was base on the mathematical theory of the relational algebra and the relational calculus. There is not something similar for the OODBMS. A lot of effort has been done for the definition of an object-oriented algebra since it is clear that the relational algebra is inadequate for the support of the object-oriented model. The traditional indexing and locking techniques used should be extended in order to be used for object-oriented databases. The composite objects cause a lot of trouble and is still an open research issue. The complexity of the hierarchies of classes created can be so big that the schemas can be handled with difficulty. The object-oriented systems are very much successful in areas where their predecessors failed: * The design of the schema can be done in a very direct way since the object-oriented model is very close to the real world model.
On the contrary, the relational design which is based on canonical forms of the relational system is much more awkward. * The maintenance of the database is much easier due to the schema evolution facilities and the modular design allowed by the object-oriented model. The identity concept that gives one internal pointer to each object throughout its life protects the consistency of the database and helps modeling similar real world entities. In the relational systems, this identification number was inevitably user provided. The database is not only used for storing data but also pieces of code (methods) that run on the data. Consequently, a whole application can be stored and executed with the help of the OODBMS that also supports its maintenance. * The inheritance concept makes code easily reusable. * The expensive join operations of the relational systems have been ubstituted by the composite object notion, which combined with the clustering mechanism can improve the performance of the composite object retrieval.
There are many applications that have been using the relational systems very successfully now for many years and they do not need to change. However, there are a couple of other applications especially in the engineering fields that dont do much with relational systems, mainly from the modeling aspect. For these kinds of applications, the object-oriented approach seems quite appropriate in spite of the problems that still have to be solved.