Algorithms and Data Structures in an Object-Oriented Framework (“ADSOOF”)

Algorithms and Data Structures in an Object-Oriented Framework: Conclusion

In these concluding notes, we go through the topics of the module again, giving a summary of the main points of each. This module should have given you a deeper understanding of programming, with an emphasis on the object-oriented style of programming. It should help you move away from thinking about programming just in terms of the features of the programming language you are using, towards thinking of it first in terms of what you want to do to solve some problem or represent some scenario, and then in terms of how you would do that in the programming language you are using. When writing a program you structure it by “inventing your own instructions”, that is by writing methods, and by “inventing your own types”, that is by writing classes. In most cases you will decide on the type and instructions you need first by thinking about the scenario you are representing, then you program on two levels: one (the application) makes use of your own invented types and instructions, the other (the implementation) is the code that provides these types and instructions. In more complex code, parts of the code may itself be further broken down into application and implementation parts.

Good choice of algorithm and data structure can make big differences to the efficiency of a program, and we looked at how it is possible for one problem to have several alternative algorithms that can solve it, or one abstract data type to have several alternative data structures that can represent it.

When writing code to solve some problem or represent some structure, it is best to write it in a way so that code can be re-used in different circumstances. We saw how inheritance helps us write more general code, and generic typing is another Java feature which can be used to write code useable with a variety of different actual types. We also saw how to make use of code from the Java library: the object-oriented programming style and the various generalisation mechanisms encouarge re-use of code which already exists rather than writing your own from the beginning.

Using Objects

This module started by looking at using objects. We can create new objects by using the constructor for them. A variable “refers to” an object: we use this terminology to remind us that the variable and the object are not the same thing. A variable may be changed to stop referring to one object and start referring to another. It is possible for two or more variables to refer to the same object (called aliasing). When we call a method on a variable it may change what is inside the object the variable refers to, or it may return a value based on that object, or it may do both. A class in Java defines a type of object by saying what methods objects of that type have, and what is inside them to make them work. But we do not need to know what is inside a class to use objects of that class, all we need to know is the public methods and constructors for that class. If a class has no public methods which change what is inside objects of its type, objects of its type are said to be immutable.

When a Java program is executing, it executes inside an environment, which means the variables it has access to. When a new method call is made, the Java execution switches to a new environment consisting of new variables named after the parameters of the method call and any further variables declared in that method call (plus, if it is a non-static method, the variables of the object the call is made on). When a method call finishes, Java execution goes back to the environment it was in previously. So there may be many variables of the same name, which are separate because they are in separate environments or separate objects. Aliasing means separate variables may refer to the same object, that is how an object may get changed in one method call, and the change stays on after that method call has finished.

When we make a method call, the types of the arguments to the method call must match the types of the parameters of the method header. If there is a return value, it must match the type of the variable the result of the method call is assigned to, or the type of the parameter the method call matches if it is an argument to a further method call. A non-static method call must be made on a reference to an object of the type the method comes from. In this way, types in a programming language like Java help tell us how to fit methods and variables together to make more methods and classes.

In some cases, calling a Java method causes an exception to be thrown rather than the method terminating normally. An exception may be caught if the method call is inside the try part of a Java try...catch statement with a matching catch part, otherwise the method call in which the method call occurred itself halts and throws the same exception. Exceptions are used to deal with circumstances where methods are used in an abnormal way. It is possible to write your own methods which throw exceptions, and to define your own exception types.

Using Arrays

An array is a collection of values all of the same type. You can treat an array as a single object, referring to it through a variable whose type is that of its component values plus the [] symbol. But you can also treat the components of the array as if they are themselves variables, so if variable a refers to an array, then a[i] refers to one of the components of the array, where i is an integer value. Since a may change the array it is referring to, and i may be a variable or an expression containing variables and integer variables can change their values, the component a[i] accesses may change as the code is executed. It is important to be aware that when an array object is created its size is fixed. But as with any other object, the array and a variable referring to it are not the same thing, so two variables can refer to the same array object, and a variable may be changed so that it stops referring to one array object and starts referring to another.

Java's Built-in Objects

The class String is provided by Java to represent text. A String object is an indexable list of characters. If str refers to a String object, then str.charAt(i) returns the character at position i in the string referred to by str. It is important to be aware that charAt is a method call, so this is not the same as a[i] with arrays which can be treated as a variable. You can change what is in an array by assigning a value to a[i] as in a[i]=n, but you cannot assign a value to str.charAt(i) as you cannot assign a value to a method call. In fact, a String in Java is immutable: there is nothing you can do to change a String object. Java provides plenty of methods you can call on String objects, but all of them are constructive, meaning that if they are defined as making changes on the String object they are called on, they actually create and return a new String object representing the changes, they do not change the object they are called on.

A Java arrayList, which in Java is given by the library type ArrayList, is in many ways like an array, it is an indexable collection of objects of the same type. However, unlike arrays, it can change in size, and Java has a range of methods that can be called on an arrayList object, some of which change its size, some of which change its contents. Unlike arrays, there are no special symbols for accessing individual components of an arrayList object, they can only be accessed by using the appropriate method calls. So if a refers to an arrayList, then a.get(i) returns the component at position i in the arrayList referred to by a, and a.set(i,n) changes the value of that component to n. In fact arrays are the only collection structure in Java which has special symbols for accessing parts of it.

ArrayLists in Java introduce the idea of generic types. If Thing is a Java type, then ArrayList<Thing> is the type “arrayList of Thing”. Note what occurs between < and > to make a full arrayList type in Java has to be a type, so it is not at all the same thing as what occurs between [ and ] to make a variable referring to an individual component from a variable referring to an array: that has to be an integer value.

Lisp Lists and Recursion

The type LispList was introduced in this module mainly to help introduce the topic of recursion. It is not a standard part of the Java language, though it is standard in some other programming languages, and its name comes from the historic programming language called Lisp which is based around using this sort of structure (its name comes from “list processing”). Like ArrayList, it is a generic type, so to give a full type it has to be combined with the type of its components using the < and > symbols. But like String objects, LispList objects are immutable, so there are no operations and no way of writing methods which change LispList objects directly, only operations and methods to create new ones.

If ls refers to a LispList object, then ls.head() returns the first item in the list (as with charAt with Strings, it is a method call, so you cannot assign to it). The call ls.tail() returns a new list which contains all items in the list referred to by ls except the first one. The call ls.cons(n) returns a new list which is like the one referred to by ls except that n is added to the front, but it is important to remember that it does not change the actual object which ls refers to. The call ls.isEmpty() returns true if ls refers to an empty list (one containing 0 items), and false otherwise.

The type LispList is a recursive type, because an object of that type, unless it is the empty list, contains its head item and its tail which is itself a LispList object. As a consequence, a good way of tackling writing methods to manipulate LispList objects is to have a method make a call to the same method but with the tail of its argument list as the argument to this recursive call. Then look at how the result of the recursive call relates to what is required from the whole call. Writing recursive methods requires the use of induction as a way of thinking so you are confident your code will work correctly - you assume the recursive call works, then you show that if this assumption is true, the main call must work, then you show that the base case (the case when there isn't a recursive call) works, and from this the method must work under all circumstances.

The recursive way of thinking was used to develop several sorting algorithms on Lisp lists. As Lisp lists are immutable, a sorting algorithm applied to one actually creates a new LispList object whose contents are the same as those of the LispList object passed to it, except they are in sorted order. It does not change the actual LispList object.

Implementing Objects

It was a deliberate aspect of the module not to introduce the implementation of objects until nearly half way through. So up to this point we were only using definitions of objects provided for us, we were not writing our own classes to use to produce objects. This was to emphasise that you can easily use someone else's code without being aware of what that code is, you only need to know the public methods it provides are. This is a general way to deal with the complexity of computer code: we find a way of breaking it into parts and deal with each part in isolation. Good computer code is written to enable us to do this - different parts interact with each other in ways that are easily seen and defined. We avoid any use of techniques which would cause one piece of code to interact in some unexpected way (meaning not obvious to someone who looks at the code) with another piece. Key to this is the idea that when we write a class to give the behaviour of some type of object, we clearly specify how objects of that class behave in terms of the public methods they provide. Then the coding process can be split into two, writing the code which uses objects of a particular class and writing the code which implements them. The specification is the link: so long as the class which implements the objects is written so that they always perform exactly to the specification, a class which uses the objects need only rely on the specification, it has no reliance on what is inside the implementing class. So long as everyone agrees on the specifications, programmers can write their own code without having to worry that it will be affected by some other programmer changing some other code.

This idea was illustrated by considering an implementation of arrayLists. In practice, we wouldn't actually have to do this as we can always use the code for the class ArrayList as provided in Java's library. But if this didn't exist, we could write our own code. The approach that has to be taken is to consider what goes inside an object to make it work so that when its public methods are called the object appears to behave in the way it has been defined to behave and is expected to behave by those writing the code which uses it.

A simple way of implementing an arrayList object is to have an array inside it which exactly matches in size and content the arrayList it is implementing. This means that a method whose call is meant to change the size of the arrayList it is called on has to be implemented by code which creates an entirely new array and replaces the array inside the object with this new array. A way of avoiding this continuous creation of new arrays is to use an array and count technique, in which inside an object representing an arrayList is an array and an integer representing the portion of that array which is currently counted as holding the data for the arrayList it represents. Then operations which change the size of the arrayList can actually be implemented by code which changes the value of the count integer but does not create a completely new array.

Sorting and Efficiency

We had looked previously at sorting algorithms using Lisp lists where we had to use the head and tail methods to break down the lists and the cons method to build up new ones. When sorting arrays or arrayLists, however, we can access individual items by their index, and this enables us to consider other algorithms or to write the algorithms we have already developed in a different way. Since array and arrayLists, unlike Lisp Lists, are mutable, it also enables us to consider sorting in place, meaning instead of creating a new structure to hold the sorted collection, we move the items around in position in the original sorted collection until they are in sorted order.

We considered several different sorting algorithms, but we also noted that the Java library provides built-in code to sort collections. This raises the issue of what order a collection of objects would be sorted in. This may seem obvious with numerical data, but more complex data items could have several ways of being ordered. If use Java's built-in sorting methods to sort an array or arrayList of String objects, we will find they are sorted in alphabetical order. We can call the method compareTo on a String object with another String object as its argument, and get a result telling us whether one comes before or after the other in alphabetical order, or they are equal. In fact the compareTo method is used generally in Java's library sorting code, and in other code in Java's library. Any class of objects which has a compareTo method defined in it will be sorted according to the order that method gives by Java's built-in sorting code. This is known as the natural order of the class of objects.

We also looked at the efficiency of various sorting algorithms. Experiments with them showed that some sorting algorithms always seemed to work faster than other, particularly with large collections being sorted. No matter what the circumstances, it was always the same algorithms which were the fast ones or the slow ones. Finding which algorithms are the ones that will run faster can be done by analysing the algorithms rather than just coding them, running them and finding out by trial and error. If we try to estimate the number of times one item is compared with another when sorting a collection of N items, we will find with some algorithms this will be proportional to N2 and with other proportional to N multiplied by the logarithm of N. If N is large, N*log(N) is much smaller than N2, which is why those algorithms judged to run in time proportional to N*log(N) perform much more quickly than those judged to run in time proportional to N2. We say the algorithms are “of order N*log(N)” or “of order N2”, and this is often abbreviated to O(N log N) or O(N2). Hence this estimate of efficiency is known as big-O notation.

Logarithmic factors in the order of efficiency of an algorithm are introduced when the algorithm involves repeatedly dividing the problem size in some way. This is because if you repeatedly divide some value by half, the number of times you can do that until you reach one is the logarithm to the base 2 of that number.

Inheritance

The section on inheritance was the first of two sections which explored in detail Java's type system. An important aspect of this is that it enables us to write generalised code which can be used in a variety of circumstances.

It is possible in Java to define a class of objects in terms of how it changes another class. It may add new methods and data items, and it may also replace the code for existing methods which is termed overriding. An important aspect of Java and other object-oriented languages is that a method which is written to take arguments of one type can be used to take arguments which are actually of a type which extends that type. When this happens, the type which the variable which refers to the object was declared as is called its apparent type, while the type of the constructor which was used to create the actual object is called its actual type. When a method is called on a variable where the type of the object it refers to is one which extends the type of the variable and the extension overrides the code of that method, the code used in the method call is the code from the extended type (that is, the “actual type”) rather than the variable type (that is the “apparent type”). This principle is known as dynamic binding.

Interface Types and Generics

We may want to write generalised code which works for a variety of different types of object. The only thing the different types have in common is that they define their own version of the methods which our generalised code calls on them. This is what we saw with code for the sorting algorithms sorting by natural order: all we needed was that the class for the object included the compareTo method. Java gives a way of writing such code, we use an interface type. This is a type which only gives the names and argument types of its methods, it does not give code for them. Other types implement this type by giving code to its methods. If we write some general code which makes use of the interface type, the dynamic binding principle means this code may be used for objects of any of the types which implement the interface, and when used it will use the code for their own versions of the methods when the method calls are made.

If you want to write general code for a collection of objects, you have to use the idea of a type variable. If T is a type variable for a method, then we can write code which works on a parameter given as of type ArrayList<T>, and it will work for a matching argument of an arrayList containing any type of object. However, what we can do with them is limited, since T could be set to any type, the only methods we can call on objects we only know to be of type T are those few methods which can be called on any object (they are defined in the most general type, Object). If we want to be more specific, we use the idea of a bounded type variable. Then if the declaration of the type variable T is <T extends SomeType>, the method which uses objects of type T can call on them any method which is given in type SomeType. It could be that SomeType is an interface type set up to bring together various methods which we might want to use in some generalised code.

Java's code library provides the interface Comparable which just contains the method header for compareTo. This can be used to write generalised code which works for any class of objects so long as those objects have a natural order defined by their code for this compareTo method.

Linked Lists

We moved back to considering implementation when we considered the idea of linked list data structures. It is important to distinguish between a linked list and a Lisp list, although the two are closely related. Lisp List is an abstract data type, that is, we view it entirely through the public methods which can be called on objects of that type. The concept of "linked list" is a data structure, which means it is viewed in terms of the actual variables and objects which make it up. So the contents of an object which is of an abstract data type are a data structure; the data structure implements the abstract data type. The linked list data structure is the natural way of implementing the Lisp list abstract data type. It consists of an object which contains a data item and a link to an object of the same type. This closely corresponds to the head and tail of a Lisp list. But it would be possible to implement the class LispList in some other way, for example, using arrays, and it would still be a correct implementation so long as the methods head, tail, cons, isEmpty and empty worked correctly.

More complex data structures can be built which contain objects linked together by reference to other objects. One example is the doubly linked list, in which the links between objects run such that a chain can be followed forwards and backwards in a list. This can be used as an alternative way of implementing what we have called an arrayList. In fact, the Java library provides an interface type called List which gives most of the methods of ArrayList. The class ArrayList is declared as implementing it, but the class LinkedList is another Java library class which also implements it. The class ArrayList implements the List interface using an array, the class LinkedList implements the List interface using a doubly-linked list. Java provides these two different implementations in order to give a choice. One is more efficient in some circumstances, the other is more efficient in other cicumstances. Code written in terms of the general interface type List will work with objects of either implementation type.

Java's Collections Framework

We finished the module by considering in more detail other classes and interfaces provided by Java to give built-in code to manage various forms of collections.

The Set interface gives a range of methods for managing a collection of objects considered as a set. Like the List interface, it is implemented by two classes, in this case HashSet and TreeSet. The List interface gives the general idea of items stored in positions in a collection, whereas the Set interface gives methods which correspond to a mathematical set. A set has no concept of its members being stored in particular positions, and an item is either a member of a set, or it is not, there is no concept of it occurring a multiple number of times.

We can go through the contents of an object of type List (that is, actual type ArrayList or LinkedList or some other class which implements List) by calling get(i) on it for i set to each of the integers from 0 up to but not including the size of the object. We can't do this for an object of type Set because it has no concept of its members being at particular positions. But the Java library provides a general way to go through the items of any collection, including a Set, which is to produce an object of type Iterator from the collection (a call to the method iterator() is defined to do this). Each time the method next() is called on an Iterator object, it will return the next item from the collection it came from, until all of them have been returned. In fact Java has a special control structure, called the for each loop which automatically creates and uses an iterator to go through a collection (it also works for arrays). An iterator over a TreeSet object returns the items in it in their natural order, there is no special order they are returned in for a HashSet object.

Java's library provides an interface called Comparator which describes a general type of objects whose only role is to take two objects and compare them. An object of type Comparator can be passed as an extra argument to Java's sort methods, it then results in those sort methods doing their sorting in the order provided by the Comparator object rather than their natural order. A Comparator object can also be used to produce a TreeSet object which stores its contents in some order other than their natural order.

The Map interface in Java's library (again, implemented by two different classes, HashMap and TreeMap) provides a collection of objects which is indexed by keyword rather than by position as with List. Actually the key type can be any type, but it is most often String. Unlike a List object, adding or deleting an element from a Map object does not change the index of the other elements.

There are some methods in Java's code library which return a view of a collection. Examples are an unmodifiable view (the collection cannot be changed if accessed via this reference), a view of only a portion of the collection, and a view in a different format (for example, as an array rather than a Set). A view differs from an alias because it is not the same type, but it is not a copy because it has the same underlying data so changing that data in the original changes the view and vice versa.

Final Note

This module has covered a lot of material, if all you do is read about it, it might seem hard to take in. But this is a practical module, it has been presented in terms of code you can run and experiment with yourself. That is the best way to learn this material, write your own code or modify code which has already been written, and see how what is described in theory here in these notes works in practice. As with many practical skills, the more you do it, the easier it becomes, and what seems complicated when just described using words seems much easier and comes naturally when you actually make practical use of it.


Matthew Huntbach

Last modified: 7 July 2019