The Historical way to Object Orientation (technological view)
This chapter gives a survey through the ideas of software engineering which lead to object orientation. Because of the mass of existing material this lecture can not be complete. But it follwos one line of ideas to illustrate some views and obstructions on the way of OO.
The result is that a lot of basic ideas (esp. the abstract data type) of object orientation had been in the minds of the theoreticers very early. But it needed a long time until these mostly intuitive ideas formed a concrete theory. A learning process which the traditional programmers now have to follow.
The history of software engineering started at the end of the 1960's with the development of the structured programming method.
At that time the developers faced more and more the problem that they had to maintain code written by other developers which was nearly impossible to understand. So the people looked for an answer to the following questions:
The last question might be important only for theoretical computer scientist. But this question about proofing correctness led to new ideas and methods in the whole history of software engineering and is therefore the most important one.
Jackson tried to solve the first problem by a bit strange method. His idea was that if you have a method by which every programmer writes the same code then every programmer can understand this code. He developed his method mainly from the work of Dijkstra.
Dijkstra gave the first stage of answers for these questions:
Dijkstra started by asking: 'Which Methods for Understanding do we have?.'
He listed the following:
The next step in his thoughts was to find out how a program can be devided into 'understandable' subparts during the design step.
The enumerative reasoning is very limited for a human. Jackson told in his courses on structured programming that a subroutine should not be larger than one output page. Even selections are sometimes difficult to understand by enumeration. Everybody worked with old code (10 years and older) will agree that it is hard work to check the result of an IF-THEN-ELSE group with more than 2 boolean expressions. The limit for understanding programs by emumeration lies approximately by 50 statements with max. 3 select dimensions.
The most practical hints in structured programming were given on the base of enumeration - the poorest tool we have.
The hints given to get an ideal functional decomposition were very vague.
The main questions answered during the decomposition process is: How do I get B from A ?
The typical examples were the realisation of mathematical algorithms on the computer. The example from Dijkstra:
DCL table P fill table P with 1000 primes --> Program A print table P --> Program B
On the other hand Dijkstra had in his mind a view which look very much like the nowadays class libraries in object oriented programming. His vision was:
The definition of the virtual machine computing prime numbers was not too far away from a class definition:
| COMPFIRST machine name | |
| begin; | |
| draw:(build, print) | algorithm draw = build + print |
| var: image | used data structure |
| instr: build(image), print(image) | used instructions |
| end; | |
Parallel to Dijkstra's ideas about the functional decomposition of programs Hoare thought about the structuring of data.
Hoare had the Idea that the programmer should not only be able to define own operations in form of subroutines but also own data types. A type in the sense of Hoare is like a class in a strongly typed language. A type has the following characteristica:
type colour = (blue,red,white,yellow);
For describing actions done with types Hoare introduced the 'with' construct.
with sv do S;The block S may use internal names of the type sv.
with today do
case m of
{ Sept: April: June: Nov:
if day > 30 then goto invalid,
Feb:
if day > ( if (y /4)*4 = y then 29 else 28)
then goto invalid,
else do nothing };
Unfortunaltely Hoare thought of this 'with' statement only as a pseudo code statement for design purposes. At that time storage was rare and path length of code was optimized to be short as possible even if the programs were no more readable. Hoares idea was to put the 'with' construct as a comment before such an highly optimized and nonunderstandable piece of code.
Hoare even allowed a strong type of subtyping.
type figure = ( position: POINT; rect:R, tr: T, cir: C)
For every subtype exist different methods without operation name overloading which must be called in a sort of selection group.
with afigure do
{ R: s_for_R,
T: s_for_T,
C: s_for_C };
These Ideas of Hoare lead to user defined types in PASCAL without subtyping and WITH statement. The view for a need for language constructs helping structuring and decomposition of code was overlaid by the everyday problem writing code fitting into the rare storage and still working fast enough on slow processors.
Here the first time was stated that data and operations should be grouped together. Only the practical tools provided were still poor. They stated:
"Data and operations seem to be so closely connected in our mind, that it takes elements of both kinds to make up any concept useful for understanding computing processes."
Hoare and Dahl gave the following rules for decomposition of a program:
A program should be decomposed s.t.:
For realisation of the second rule no practical help is given. The only tools for structuring programs at this time were blocks and procedures.
Parallel to all these theories Nygaard in Oslo had to solve a practical problem. He did simulations of real world things. He found that ALGOL60 the language he used lacked some constructs. So he decided in the late 1960's to extend ALGOL and constructed SIMULA which is called the first object oriented language -- even if one paradigm - the encapsulation - is not fully realisized in SIMULA.
SIMULA does not provide a clear distinction between inner and outer view of an object.
One thing Nygaard did not like in ALGOL was that all local data of a block vanished after its processing. So he constructed classes as ALGOL blocks whose block instances survive its call.
The terminology used in object oriented programming is the same introduced by Nygaard:
SIMULA is strongly typed. Because the generated objects do not vanish automatically SIMULA needs a garbage collector.
This is example contains a class histogramm with methods tabulate(y) and frequency(i). Every object is initialisized automatically when the method new is executed by the code between the last method and the 'end of histogram' statement. The input parameters ( array X and integer n) for this initialisation are specified just after the class name.
class histogram(X,n); array X; integer n; begin integer N, integer array T[0:n]; procedure tabulate(y); real y; begin integer i; i:=0; while ( if i<n then y<x[i+1] else false) do i:= i+1; T[i] := T[i]+1; N:=N+1; end of tabulate; procedure frequency(i); integer i; frequency := T[i]/N; integer i; for i:=0 step 1 until n do T[i]:=0; N:=0; end of histogram
Example for class usage:
weights :- new histogram(B,12) weights.tabulate(w)The variable weights contains the histgram generated from the array B after new is executed. Next tabulate is executed on weights.
SIMULA does not provide encapsulation. Internal class attributes can be accessed from outside: X.A points to A in object X or calls procedure A of class X.
A Subclass B of a class A is defined by the statement:
A class B
In a subclass only adding of attributes or procedures are allowed.
At this stage was a big gap beetween the theoretical based ideas of decomposition and data typing and the tools provided and the ideas of SIMULA. Further theoretical work lead to abstract data types.
Hoare asked an old question again: How can the correctness of a program be proofed. The base here are not understandable program parts but the data representation.
In his mind seem to be mostly solution of mathematical problems. His Idea was that the user has a abstract data representation of his program. The program itself can be written in terms of the modelled abstract space and its (mathematical) operations. The abstract space the program is formulated in consists of:
The realisation of this abstract space - the data representation - consists of:
The proof of the correctness of a program then can be done in two steps
The second part of the proof he considered to be trivial. Perhaps he thought this could always be done by common mathematical methods. At least he focused on the first step only.
Interesting is that in his examples he used SIMULA 67 notion. Without mentioning or using any features like subtyping. The example we will follow now through the rest of this article is the set of integers.
class smallinset begin integer m; integer array A[1:100]; procedure insert(i); integer i; begin integer j; for j:= 1 step 1 until m do if A[j]=i then goto end insert; m:=m+1; A[m]:=i; end insert;
procedure remove[i]; integer i; begin integer j,k; for j:= 1 step 1 until m do if A[j]=i then begin for k:=j+1 step 1 until m do A[k-1] := A[k]; m:=m-1; goto end remove; end; end remove: end remove; Boolean procedure has(i); integer i; begin integer j; has := false; for j:= 1 step 1 until m do if A[j] = i then begin has:=true; goto end contains; end; end contains: end has; m:=0; end smallintset
It has to be proofed now that this implementation represents the mathematical set of intergers. This implementation has at least one precondition: The number of elements in the set may never be greater than 100. So the first step in the proof is:
Note:
For defining axioms, pre- and post conditions Hoare uses his implementation of the datatype. He don't mention what has to done if you have a second Implementation of smallintset. For proofing correctness of the second implementation you have to
The proof that the data representation is correct is then done by:
Hoare give no hints how to check that all axioms, pre- and postconditions are complete and correct.
Now we have checked that the data representation is correct. Are there any rules we have to look at when using this data representation? Yes, there is. We are only allowed to use the predefined interface. Otherwise by uncontrolled access from outside some behaviour will be added to the data representation which is not predefined and checked.
Local variables of a class are only changed inside the class.
==> Therefore a strong encapsulation of the class data is necessary. The only thing SIMULA did not provide.
His theoretical conclusions fit exactly into what object oriented programming will provide later. Astonishing is therefore Hoare's hint to implement m the number of elements as public variable which destroys the encapsulation totaly.
The next step is to think about an data modeling independent from any implementation. This lead to the theory of abstract data types.
The problems the people hoped to solve were:
spec Set is uses Bool, Int sorts Set operations: empty : --> Set, insert: Set x Int --> Set, is_empty: Set --> Bool, is_elem: Set x Int --> Bool Axioms: s: Set, x,y: Int is_empty(empty) = true, is_empty(insert(s,x)) = false, is_elem(empty,x) = false x=y ==> is_elem(insert(s,x),y) = true ^(x=y) ==> is_elem(insert(s,x),y) = is_elem(s,y) endspec
If we look at the example above the following questions come up:
Note:
In these axioms there is none from which we conclude that true ^= false. So theoretically we could take a one element set for Bool which surely would contradict to our intuitively idea of a set. Therefore the axiom true ^= false should be added.
An other Idea solving this problem is restrict the type Bool to an abstract datatype which contains the axiom true ^= false. But this solution was not handled at this stage of theory. The aim here is to define everything what is necessary in the datatype, even axioms for used data types. Remind: Any decision about the usage of a type is a decision of implementation and should be made at a later point. So by defining axioms for used datatypes you can decide later which set you want to use for Bool.
There are no axioms necessary for Int. So our specification for Set really is a specification for every Set. Every decision about Int is a refinement towards implementation.
In their paper Goguen, Thatcher and Wagner proofed that the specification of an abstract datatype is equivalent to the specification of a signature. A signature consists of the names of used data types and the names of operations. Additionaly a set E of Equations written in terms of the specification might be added. These are Axioms which define a equivalence relation on the signature.
You can say the signature defines the language which will be used when working with the data type.
Algebras over a signature can be viewed as implementtations of a datatype (not necessarily on a computer -- they might only be examples).
By choosing for every used data type in the signature a set and for each operation an algorithm we get an algebra A for the specification of Set.
Here I will give a rough repetition of definitions:
A S - Signature Sigma over a Set S is a Set of operator names Sigma(w,s), where w is a string of elements of S. If w is an empty string sigma in Sigma(w,s) is a constant in s.
When defining a datatype S will be the set of used types {Bool, Int, Set in our example}.
A specification D = ( Sigma, E) consists of a signature Sigma and a Set of axioms E.
A D-algebra is used as a semantic for the datatype.
A Sigma algebra A consists of a family of sets A(s) for all s in S
an for every sigma in Sigma(w,s) there exist a function
sigma(A): A(t1) x A(t2) x ...---> A(s). ti out of w.
Let D = ( Sigma, E) be a specification. If for a Sigma algebra A all Axioms in E hold, A is called a D-Algebra.
In the article fron Goguen, Thatcherand Wagner the set of axioms E are equations. But in general theory and in praxis E contains also inequalties and logical expressions.
Every Implementation of a datatype is a D- or sigma algebra if D or sigma is the specification of the data type.
The set of D- or Sigma algebras (D-alg or Sigma-alg) form a category. D-alg =Sigma-alg/E
Let A and B be D- or Sigma Algebras. A morphism H: A --> B consists of a family of morphism h(s): A(s) --> B(s) s.t. sigma(B) * H = h(s) * sigma(A)
A morphism H: A --> B is called Isomorhism, if there exists a morphism G: B --> A s.t. H*G = ID(B) and G*H = ID(A). Then the D- or Sigma algebras A and B are called isomorphic.
The isomorhism class of an abstract datatype is unique for every Specification.
The initial Object of an D-algebra is used as a standard semantic for working with the specification D.
The initial Object of an Sigma algebra is constructed out of all possible Terms in this algebra. So intuitively an abstract datatype of an Specifcation consists of all expressions in the language defined by this specification.
Under special conditions there exists also a final object in the category of D-algebras ( Only special forms of Axioms are allowed).
A object B (also called classifying space) in a category is called final object when for every object A in the category there exist exactly one morphism f(A): A --> B.
This morphism is called classifying map for A in the category. If a final object B exists its isomorhism class is unique in the category.
For Sigma Algebras or specifications which axioms consists of equations only this final Object is trivial and therefore not interesting. But in more general Specifications or special notions for databases the final object becomes interesting too. In Specifications wich contain inequalities the so called final semantic is used for describing errorhandling.
What happens if there exist different implementations of the same datatype? Let us imagine an implementation B of Set in old Babylonia:
H: A->B
If there exists a map G which transforms B back to A then the Algebras A and B are called isomorhic. Isomorphic Algebras differ mainly in the implementations (algorithms) of their operations.
Note:
You can think of an Implementation C of Set, where C(Set) consists only of an infinite array without an extra variable which contains the number of elements in the set. The last element in the array is always a special character '*' for 'undefined element', which must of course not be element of C(Int).
The algebras A and C are isomorphic.
The target is now to find a standard algebra which can be used as a semantic for our data type to be independent from any implementation. In this abstract programs could be written without knowing the internal details of the datatype Fortunatlely such a standard algebra exists. It is formed out of all possible expressions of the specification.
The abstract datatypes form the theoretic base of object orientation. Important paradigmen ( encapsulation, user- and implementor view, minimal interfaces) are already formulated. SIMULA provides already partial practical help for realizing these paradigmen. In Object Orientated Programming all these ideas come together and are extended by inheritence.
Parallel to the development of the theory of abstract data types the developement of languages went on:
Beside the possibility of encapsulating classes and their methods they provide helps for Reuse and hierarchical Data structuring: