Algorithms and Data Structures in an Object-Oriented Framework (“ADSOOF”)
Algorithms and Data Structures in an Object-Oriented Framework: Conclusion
In these concluding notes, we go through the topics of the module again, giving a summary of the main points of each. This module should have given you a deeper understanding of programming, with an emphasis on the object-oriented style of programming. It should help you move away from thinking about programming just in terms of the features of the programming language you are using, towards thinking of it first in terms of what you want to do to solve some problem or represent some scenario, and then in terms of how you would do that in the programming language you are using. When writing a program you structure it by “inventing your own instructions”, that is by writing methods, and by “inventing your own types”, that is by writing classes. In most cases you will decide on the type and instructions you need first by thinking about the scenario you are representing, then you program on two levels: one (the application) makes use of your own invented types and instructions, the other (the implementation) is the code that provides these types and instructions. In more complex code, parts of the code may itself be further broken down into application and implementation parts.
Good choice of algorithm and data structure can make big differences to the efficiency of a program, and we looked at how it is possible for one problem to have several alternative algorithms that can solve it, or one abstract data type to have several alternative data structures that can represent it.
When writing code to solve some problem or represent some structure, it is best to write it in a way so that code can be re-used in different circumstances. We saw how inheritance helps us write more general code, and generic typing is another Java feature which can be used to write code useable with a variety of different actual types. We also saw how to make use of code from the Java library: the object-oriented programming style and the various generalisation mechanisms encouarge re-use of code which already exists rather than writing your own from the beginning.
This module started by looking at using objects. We can create new objects by using the constructor for them. A variable “refers to” an object: we use this terminology to remind us that the variable and the object are not the same thing. A variable may be changed to stop referring to one object and start referring to another. It is possible for two or more variables to refer to the same object (called aliasing). When we call a method on a variable it may change what is inside the object the variable refers to, or it may return a value based on that object, or it may do both. A class in Java defines a type of object by saying what methods objects of that type have, and what is inside them to make them work. But we do not need to know what is inside a class to use objects of that class, all we need to know is the public methods and constructors for that class. If a class has no public methods which change what is inside objects of its type, objects of its type are said to be immutable.
When a Java program is executing, it executes inside an environment, which means the variables it has access to. When a new method call is made, the Java execution switches to a new environment consisting of new variables named after the parameters of the method call and any further variables declared in that method call (plus, if it is a non-static method, the variables of the object the call is made on). When a method call finishes, Java execution goes back to the environment it was in previously. So there may be many variables of the same name, which are separate because they are in separate environments or separate objects. Aliasing means separate variables may refer to the same object, that is how an object may get changed in one method call, and the change stays on after that method call has finished.
When we make a method call, the types of the arguments to the method call must match the types of the parameters of the method header. If there is a return value, it must match the type of the variable the result of the method call is assigned to, or the type of the parameter the method call matches if it is an argument to a further method call. A non-static method call must be made on a reference to an object of the type the method comes from. In this way, types in a programming language like Java help tell us how to fit methods and variables together to make more methods and classes.
In some cases, calling a Java method causes an exception to be thrown
rather than the method terminating normally. An exception may be caught if
the method call is inside the try
part of a Java
try
...catch
statement with a matching
catch
part, otherwise the method call in which the method
call occurred itself halts and throws the same exception. Exceptions are
used to deal with circumstances where methods are used in an abnormal way.
It is possible to write your own methods which throw exceptions, and to define
your own exception types.
An array is a collection of values all of the same type. You can treat an array as
a single object, referring to it through a variable whose type is that of its
component values plus the []
symbol. But you can also treat the
components of the array as if they are themselves variables, so if variable
a
refers to an array, then a[i]
refers to one of
the components of the array, where i
is an integer value. Since
a
may change the array it is referring to, and i
may be
a variable or an expression containing variables and integer variables can change
their values, the component a[i]
accesses may change as the code
is executed. It is important to be aware that when an array object is created
its size is fixed. But as with any other object, the array and a variable referring
to it are not the same thing, so two variables can refer to the same array object,
and a variable may be changed so that it stops referring to one array object and
starts referring to another.
The class String
is provided by Java to represent text. A String
object is an indexable list of characters. If str
refers to a String
object, then str.charAt(i)
returns the character at position
i
in the string referred to by str
. It is important to
be aware that charAt
is a method call, so this is not the same as
a[i]
with arrays which can be treated as a variable. You can
change what is in an array by assigning a value to a[i]
as
in a[i]=n
, but you cannot assign a value to
str.charAt(i)
as you cannot assign a value to a method call. In fact, a
String
in Java is immutable: there is nothing you can do to change
a String
object. Java provides plenty of methods you can call on
String
objects, but all of them are constructive, meaning
that if they are defined as making changes on the String
object they
are called on, they actually create and return a new String
object
representing the changes, they do not change the object they are called on.
A Java arrayList, which in Java is given by the library type
ArrayList
, is in many ways like an array, it is an indexable
collection of objects of the same type. However, unlike arrays, it can change in
size, and Java has a range of methods that can be called on an arrayList object,
some of which change its size, some of which change its contents. Unlike arrays,
there are no special symbols for accessing individual components of an arrayList
object, they can only be accessed by using the appropriate method calls. So if
a
refers to an arrayList, then a.get(i)
returns
the component at position i
in the arrayList referred to by
a
, and a.set(i,n)
changes the value of that
component to n
. In fact arrays are the only collection
structure in Java which has special symbols for accessing parts of it.
ArrayLists in Java introduce the idea of generic types. If Thing
is
a Java type, then ArrayList<Thing>
is the type “arrayList of
Thing
”. Note what occurs between <
and
>
to make a full arrayList type in Java has to be a type,
so it is not at all the same thing as what occurs between [
and
]
to make a variable referring to an individual component from a variable
referring to an array: that has to be an integer value.
The type LispList
was introduced in this module mainly to
help introduce the topic of recursion. It is not a standard part of the Java
language, though it is standard in some other programming languages, and its name
comes from the historic programming language called Lisp which is based around
using this sort of structure (its name comes from “list processing”).
Like ArrayList
, it is a generic type, so to give a full type it has to be
combined with the type of its components using the <
and
>
symbols. But like String
objects,
LispList
objects are immutable, so there are no operations and no way
of writing methods which change LispList
objects directly, only
operations and methods to create new ones.
If ls
refers to a LispList
object, then ls.head()
returns the first item in the list (as with charAt
with String
s,
it is a method call, so you cannot assign to it). The call ls.tail()
returns a new list which contains all items in the list referred to by
ls
except the first one. The call ls.cons(n)
returns a new list which is like the one referred to by ls
except that n
is added to the front, but it is important to remember
that it does not change the actual object which ls
refers to.
The call ls.isEmpty()
returns true
if ls
refers
to an empty list (one containing 0 items), and false
otherwise.
The type LispList
is a recursive type, because an object of that type,
unless it is the empty list, contains its head item and its tail which is itself a
LispList
object. As a consequence, a good way of tackling writing
methods to manipulate LispList
objects is to have a method make a
call to the same method but with the tail of its argument list as the argument to
this recursive call. Then look at how the result of the recursive call
relates to what is required from the whole call. Writing recursive methods requires
the use of induction as a way of thinking so you are confident your code
will work correctly - you assume the recursive call works, then you show
that if this assumption is true, the main call must work, then you show that the
base case (the case when there isn't a recursive call) works, and from this
the method must work under all circumstances.
The recursive way of thinking was used to develop several sorting algorithms
on Lisp lists. As Lisp lists are immutable, a sorting algorithm applied to one
actually creates a new LispList
object whose contents are the same as
those of the LispList
object passed to it, except they are in sorted
order. It does not change the actual LispList
object.
It was a deliberate aspect of the module not to introduce the implementation of objects until nearly half way through. So up to this point we were only using definitions of objects provided for us, we were not writing our own classes to use to produce objects. This was to emphasise that you can easily use someone else's code without being aware of what that code is, you only need to know the public methods it provides are. This is a general way to deal with the complexity of computer code: we find a way of breaking it into parts and deal with each part in isolation. Good computer code is written to enable us to do this - different parts interact with each other in ways that are easily seen and defined. We avoid any use of techniques which would cause one piece of code to interact in some unexpected way (meaning not obvious to someone who looks at the code) with another piece. Key to this is the idea that when we write a class to give the behaviour of some type of object, we clearly specify how objects of that class behave in terms of the public methods they provide. Then the coding process can be split into two, writing the code which uses objects of a particular class and writing the code which implements them. The specification is the link: so long as the class which implements the objects is written so that they always perform exactly to the specification, a class which uses the objects need only rely on the specification, it has no reliance on what is inside the implementing class. So long as everyone agrees on the specifications, programmers can write their own code without having to worry that it will be affected by some other programmer changing some other code.
This idea was illustrated by considering an implementation of arrayLists. In practice,
we wouldn't actually have to do this as we can always use the code for the class
ArrayList
as provided in Java's library. But if this didn't exist,
we could write our own code. The approach that has to be taken is to consider what
goes inside an object to make it work so that when its public methods are called the
object appears to behave in the way it has been defined to behave and is expected
to behave by those writing the code which uses it.
A simple way of implementing an arrayList object is to have an array inside it which exactly matches in size and content the arrayList it is implementing. This means that a method whose call is meant to change the size of the arrayList it is called on has to be implemented by code which creates an entirely new array and replaces the array inside the object with this new array. A way of avoiding this continuous creation of new arrays is to use an array and count technique, in which inside an object representing an arrayList is an array and an integer representing the portion of that array which is currently counted as holding the data for the arrayList it represents. Then operations which change the size of the arrayList can actually be implemented by code which changes the value of the count integer but does not create a completely new array.
We had looked previously at sorting algorithms using Lisp lists where we had to use
the head
and tail
methods to break down the lists and
the cons
method to build up new ones. When sorting arrays or arrayLists,
however, we can access individual items by their index, and this enables us to
consider other algorithms or to write the algorithms we have already developed in
a different way. Since array and arrayLists, unlike Lisp Lists, are mutable, it
also enables us to consider sorting in place, meaning instead of creating
a new structure to hold the sorted collection, we move the items around in
position in the original sorted collection until they are in sorted order.
We considered several different sorting algorithms, but we also noted that the
Java library provides built-in code to sort collections. This raises the issue of
what order a collection of objects would be sorted in. This may seem obvious with
numerical data, but more complex data items could have several ways of being ordered.
If use Java's built-in sorting methods to sort an array or arrayList of
String
objects, we will find they are sorted in alphabetical order.
We can call the method compareTo
on a String
object with
another String
object as its argument, and get a result telling us
whether one comes before or after the other in alphabetical order, or they are equal.
In fact the compareTo
method is used generally in Java's library sorting
code, and in other code in Java's library. Any class of objects which has
a compareTo
method defined in it will be sorted according to the order
that method gives by Java's built-in sorting code. This is known as the
natural order of the class of objects.
We also looked at the efficiency of various sorting algorithms. Experiments with them showed that some sorting algorithms always seemed to work faster than other, particularly with large collections being sorted. No matter what the circumstances, it was always the same algorithms which were the fast ones or the slow ones. Finding which algorithms are the ones that will run faster can be done by analysing the algorithms rather than just coding them, running them and finding out by trial and error. If we try to estimate the number of times one item is compared with another when sorting a collection of N items, we will find with some algorithms this will be proportional to N2 and with other proportional to N multiplied by the logarithm of N. If N is large, N*log(N) is much smaller than N2, which is why those algorithms judged to run in time proportional to N*log(N) perform much more quickly than those judged to run in time proportional to N2. We say the algorithms are “of order N*log(N)” or “of order N2”, and this is often abbreviated to O(N log N) or O(N2). Hence this estimate of efficiency is known as big-O notation.
Logarithmic factors in the order of efficiency of an algorithm are introduced when the algorithm involves repeatedly dividing the problem size in some way. This is because if you repeatedly divide some value by half, the number of times you can do that until you reach one is the logarithm to the base 2 of that number.
The section on inheritance was the first of two sections which explored in detail Java's type system. An important aspect of this is that it enables us to write generalised code which can be used in a variety of circumstances.
It is possible in Java to define a class of objects in terms of how it changes another class. It may add new methods and data items, and it may also replace the code for existing methods which is termed overriding. An important aspect of Java and other object-oriented languages is that a method which is written to take arguments of one type can be used to take arguments which are actually of a type which extends that type. When this happens, the type which the variable which refers to the object was declared as is called its apparent type, while the type of the constructor which was used to create the actual object is called its actual type. When a method is called on a variable where the type of the object it refers to is one which extends the type of the variable and the extension overrides the code of that method, the code used in the method call is the code from the extended type (that is, the “actual type”) rather than the variable type (that is the “apparent type”). This principle is known as dynamic binding.
We may want to write generalised code which works for a variety of different types of object.
The only thing the different types have in common is that they define their own version
of the methods which our generalised code calls on them. This is what we saw with
code for the sorting algorithms sorting by natural order: all we needed was that
the class for the object included the compareTo
method. Java gives
a way of writing such code, we use an interface type. This is a type which
only gives the names and argument types of its methods, it does not give code for them.
Other types implement this type by giving code to its methods. If we write some
general code which makes use of the interface type, the dynamic binding principle
means this code may be used for objects of any of the types which implement the
interface, and when used it will use the code for their own versions of the methods
when the method calls are made.
If you want to write general code for a collection of objects, you have to use
the idea of a type variable. If T
is a type variable for a method,
then we can write code which works on a parameter given as of type
ArrayList<T>
, and it will work for a matching argument of
an arrayList containing any type of object. However, what we can do with them is
limited, since T
could be set to any type, the only methods we can
call on objects we only know to be of type T
are those few methods
which can be called on any object (they are defined in the most general type,
Object
). If we want to be more specific, we use the idea of a
bounded type variable. Then if the declaration of the type variable
T
is <T extends SomeType>
, the
method which uses objects of type T
can call on them any method which is
given in type SomeType
. It could be that SomeType
is an
interface type set up to bring together various methods which we might want to use
in some generalised code.
Java's code library provides the interface Comparable
which just contains
the method header for compareTo
. This can be used to write generalised code
which works for any class of objects so long as those objects have a natural order
defined by their code for this compareTo
method.
We moved back to considering implementation when we considered the idea of
linked list data structures. It is important to distinguish between a linked
list and a Lisp list, although the two are closely related. Lisp List is
an abstract data type, that is, we view it entirely through the public
methods which can be called on objects of that type. The concept of "linked
list" is a data structure, which means it is viewed in terms of the actual variables
and objects which make it up. So the contents of an object which is of an abstract
data type are a data structure; the data structure implements the abstract data type.
The linked list data structure is the natural way of implementing the Lisp list
abstract data type. It consists of an object which contains a data item and a link
to an object of the same type. This closely corresponds to the head and tail of a
Lisp list. But it would be possible to implement the class LispList
in some other way, for example, using arrays, and it would still be a correct
implementation so long as the methods head
, tail
,
cons
, isEmpty
and empty
worked correctly.
More complex data structures can be built which contain objects linked together
by reference to other objects. One example is the doubly linked list, in
which the links between objects run such that a chain can be followed forwards
and backwards in a list. This can be used as an alternative way of implementing
what we have called an arrayList. In fact, the Java library provides an interface
type called List
which gives most of the methods of ArrayList
.
The class ArrayList
is declared as implementing it, but the class
LinkedList
is another Java library class which also implements it.
The class ArrayList
implements the List
interface using
an array, the class LinkedList
implements the List
interface using
a doubly-linked list. Java provides these two different implementations in order to
give a choice. One is more efficient in some circumstances, the other is more efficient
in other cicumstances. Code written in terms of the general interface type
List
will work with objects of either implementation type.
We finished the module by considering in more detail other classes and interfaces provided by Java to give built-in code to manage various forms of collections.
The Set
interface gives a range of methods for managing a collection
of objects considered as a set. Like the List
interface, it is
implemented by two classes, in this case HashSet
and
TreeSet
. The List
interface gives the general
idea of items stored in positions in a collection, whereas the Set
interface gives methods which correspond to a mathematical set. A set has no
concept of its members being stored in particular positions, and an item is
either a member of a set, or it is not, there is no concept of it occurring a
multiple number of times.
We can go through the contents of an object of type List
(that is,
actual type ArrayList
or LinkedList
or some other
class which implements List
) by calling get(i)
on
it for i
set to each of the integers from 0 up to but not including
the size of the object. We can't do this for an object of type Set
because it has no concept of its members being at particular positions. But
the Java library provides a general way to go through the items of any
collection, including a Set
, which is to produce an object of type
Iterator
from the collection
(a call to the method iterator()
is defined to do this). Each time
the method next()
is called on an Iterator
object, it
will return the next item from the collection it came from, until all of them have
been returned. In fact Java has a special control structure, called the
for each loop which automatically creates and uses an iterator to
go through a collection (it also works for arrays). An iterator over a
TreeSet
object returns the items in it in their natural order,
there is no special order they are returned in for a HashSet
object.
Java's library provides an interface called Comparator
which describes a general type of objects whose only role is to take two
objects and compare them. An object of type Comparator
can be
passed as an extra argument to Java's sort methods, it then results in those
sort methods doing their sorting in the order provided by the
Comparator
object rather than their natural order. A
Comparator
object can also be used to produce a
TreeSet
object which stores its contents in some order
other than their natural order.
The Map
interface in Java's library (again, implemented by
two different classes, HashMap
and TreeMap
)
provides a collection of objects which is indexed by keyword rather than
by position as with List
. Actually the key type can be
any type, but it is most often String
. Unlike a List
object, adding or deleting an element from a Map
object
does not change the index of the other elements.
There are some methods in Java's code library which return a
view of a collection. Examples are an unmodifiable view
(the collection cannot be changed if accessed via this reference),
a view of only a portion of the collection, and a view in a different
format (for example, as an array rather than a Set
). A
view differs from an alias because it is not the same type, but it is not
a copy because it has the same underlying data so changing that data in
the original changes the view and vice versa.
This module has covered a lot of material, if all you do is read about it, it might seem hard to take in. But this is a practical module, it has been presented in terms of code you can run and experiment with yourself. That is the best way to learn this material, write your own code or modify code which has already been written, and see how what is described in theory here in these notes works in practice. As with many practical skills, the more you do it, the easier it becomes, and what seems complicated when just described using words seems much easier and comes naturally when you actually make practical use of it.
Last modified: 7 July 2019