This page will be used for a brief summary of the lectures given each
week.
A more full set of notes is available from the link
Local Notes and Resources.
Week 1: 10/11 January
I was away for the first week of the semester, but in my absence a
lab exercise I had prepared was handed out and talked through by Jon
Rowson. The exercise was intended to get you to think through some of the
issues around algorithms by working out an algorithm for a moderately
complex problem. The problem was that you were given a sequence of
integers, some positive some negative, and you had to find a starting
point and finishing point in the sequence where if you added all the
numbers between those points you got a greater total sum than you could
get for any other starting and finishing points.
It ought to be fairly obvious that one way of solving this problem is to try every possible combination of starting and finishing points, and find the one which gives the largest sum. Translating this algorithm into runnable Java code is a good exercise of your programming abilities. However, a program obtained by implementing this algorithm will run very slowly once it is tried on large sequences of numbers. The problem is not the Java code, but the algorithm itself. Careful consideration of this problem can lead to a much more efficient algorithm which could be coded up in Java.
Week 2: 17/18 January
The first lecture introduced the course. Algorithms are ways of solving
problems. We can express them in English, but English like any natural
language is imprecise,
and what one person understands by a phrase may not be understood in
the same
way by another person. To be more precise, we could use some sort of
formal
description language. We could also express algorithms in terms of a
computer programming language like Java. This has the advantage of
being
precise, but in order to make the code runnable it may be necessary to
add
a lot of extra support material which isn't really part of the
algorithm.
So it is important to distinguish between the abstract concept of an
algorithm, and a particular piece of code used to implement it. In this
course, however, we will concentrate on algorithms which are expressed
in
Java code. There is a big advantage in being able to experiment with an
algorithm by actually running it in the form of a computer program, and
it
also helps improve our programming skills.
Data structures are ways of storing collections of data. Since most algorithms work on collections of data, algorithms and data structures are generally closely linked as topics. So far, the only data structure you have seen is the array, which is built into Java and most other programming languages. Other data structures can be provided in Java through Java classes which implement them. Although the Java code libraries have several examples of data structure classes already written for you, these will not be covered in the course. Since it is a course on understanding the concepts through simple examples, and not a course in the use of Java (although it does use Java) the emphasis will be on writing your own code to implement common data structures.
We noted that when considering an algorithm there are two different
approaches
we can take. One of these is the iterative approach in which we
repeatedly make changes to some structure until we have reached the
solution. The other is the recursive approach, in which we go
about solving a problem by seeing if we can express it in terms of a
smaller version of the same problem. We illustrated these approaches by
looking at an iterative and a recursive algorithm for solving the
problem of evaluating n
to the power of m
,
along with Java methods that implement the algorithms.
We looked briefly at ways of proving algorithms to be correct. With
iterative algorithms, we can use the loop invariant approach.
This means finding an expression, the loop invariant, which is true
when the loop starts,
and remains true at the end of the execution of the loop's body. It
must therefore remain true when the loop terminates, at which point the
loop test must be false. From the combination of the loop test being
false and the loop invariant being true, we can show the problem is
solved
when the loop terminates. With recursive algorithms, we can use proof
by
induction.
We also considered briefly the idea of constructive and destructive operations on data
structures (on arrays, since so far that is the only data structure
that has been covered). If a data structure is passed as an argument to
a method, the method is destructive if it works by actually changing
the data structure. It is constructive if it works by creating a new
data structure which reflects the changes desired but leaves the actual
data structure passed as an argument unchanged.
Week 3: 24/25 January
An important topic this week was insights that can be used to develop
more efficient algorithms. The first example looked at was evaluating
n
to the power of m
. Last week we considered
algorithms which involved repeated multiplications by n
.
However, if we compute nm/2
all we need do is
multiply that number by itself to get nm
(or,
if m
is an odd number, multiply it by itself and then
by n
, with m/2
being rounded down to the
nearest integer). We showed that using this insight, the total number
of multiplications needed to evaluate nm
would
be at most twice the logarithm to base two of m
, whereas
with
last week's algorithms m
mutiplications are necessary. If
m
is a large number, this makes a huge difference.
We looked at two algorithms for sorting an array. One was selection sort and the other was merge sort. Selection sort works by finding the smallest item and putting it in the first position, the second smallest item and putting it in the second position, and so on. Merge sort works by dividing the array into two parts, sorting each part recursively, and then taking items from the sorted parts to put into the final array, each time taking the lowest item from either of the two sorted arrays which has not yet been taken.
We introduced the big-O notation as a way of describing the efficiency of algorithms. The idea is that if we have a formula giving the number of significant operations taken in an algorithm in terms of the size of the data, we can ignore all but the largest term of the formula and we can ignore constant multiplying factors. We gave an informal argument that showed selection sort is O(N2) and merge sort is O(N log N). This means that merge sort is more efficient, if we try to sort a large array using the two algorithms, we will find merge sort always does it much quicker. The important things is that it is the algorithm that makes the difference, not details of exactly how it's programmed or even the computer it's run on.
We looked at the solutions to the problem given as an exercise in week 1. A simple solution is just to find the sum of every possible subsequence and chose the one which is the biggest. This, however, is not an efficient way of solving the problem. An efficient solution involves the following insight. Some notes removed here as the same problem may be set as an assessed exercise in modules currently being taught
Please note that at this point we have covered up to and including the
section "Comparing Sorting Algorithms" in the
Local Notes and Resources. You can
find the code for the "powers" example in the directory
http://www.dcs.qmul.ac.uk/~mmh/ADS/code/powers/
or click
here.
Please remember the important thing for this course is the code in
the method power
and not the supporting code in the
method
main
which is only given to provide a framework which lets
you run the power
method. You can find code for various
variations on merge sort and selection sort in the directory
http://www.dcs.qmul.ac.uk/~mmh/ADS/code/sort/
, or click
here.
Again, please concentrate on the code for the sort
method
and not on the subsidiary code in the main
and other
methods which is just provided to enable you to run the sort
method with some data.
main
method, or methods that print some message. This makes our code far
more flexible since we can re-use our method whenever required just by
calling it,
and we have made a clear distinction between solving the problem and
printing
a message telling us the solution.
Searching an unordered array means looking at each item in turn until we either find the one we want or we have searched all the items and can conclude the one we want isn't there. This is referred to as linear search. It is O(N) since on average it will take from N/2 to N (depending on the probability of the item being in the array) checks to say whether the item is in the array.
If the items in the array are put in order, we can use the more efficient binary search. This means looking at the middle item. If it's the item we're looking for, we can halt. Otherwise if the item we're looking for is less than the middle item we know we can restrict search to the first half of the array, otherwise we know we can restrict it to the second half. We first considered a recursive method which actually copied half of the original array into a new half-sized array and called itself recursively with that array as an argument. However, this copying is inefficient. A more efficient method keeps the bounds of the part of the array being searched as arguments, and passes the original array into the recursive call along with more restricted bounds. Since the arguments giving these bounds are not part of the original problem definition, they should not appear as arguments to the public method that solves the problem. This is achieved by making the public method call a private method with the bounds as extra arguments. Note that binary search is O(log N) since it looks at an array item and then cuts the array into half, so the maximum number of itsm it will look at is the number of time an array can be divided into half. We noted that a slight improvement could be made to the merge sort method given previously by adopting the technique of passing the whole array and bounds arguments rather than a new half sized array.
We looked at the concept of environments. When a static method is executed, it is executed in its own environent, which is a collection of variables and their values. The environment consists of the variables which are the parameters to the method plus any local variables (plus also, though we did not consider them, class variables). When a method call is made, a new environment is created with the variables which are the parameters set by assigning the corresponding arguments in the method call to them. The old environment is returned to when execution of the method finishes. Executing a method in an environment means when a variable name is used, it is taken to mean the variable in that environment. Recursion creates a whole stack of environments as each recursive call creates its own environment. Note there is not necessarily a link between variables of the same name in different environments. Assigning a value to a variable in the environment of the current method call does not change the value of a variable of the same name in previous environments which are waiting to be returned to.
We noted that array assignment gave aliasing. So if a
and b
both refer to arrays, b=a
causes b
to refer to the array that a
refers to. Then any change
to
b[i]
for any i
will also change the value of
a[i]
since they are two names for the same thing. This is
why when arrays are passed as parameters to methods, any changes made
to
them remain in place after the method finishes (enabling destructive
methods to work). Although the method is executed in its own
environment
with its own variable referring to the array, the assignment involved
in parameter
passing causes the array to be shared, not copied.
We looked at a phenomenon known as tail recursion. This is where a method returns the result of a recursive call as its own return value. The consequence is that there is no code left to execute after the recursive call, Since there is no code to execute, there is actually no point in saving the environment to be used in code after the recursive call. This means that tail recursion can be converted to iteration with a single environment replacing the multiple environments of recursion, and assigning a new value to a variable replacing the way recursion creates a new variable of the same name with a possibly different value.
We then moved onto a different topic, how we represent collections of data. Suppose we want to represent a set, which is a collection where no element occurs more than once and the order the elements are stored in doesn't matter. Suppose we want the set to change so we can add and delete items destructively. One technique we could use is the array and count technique. This means having an array of the maximum size we would ever need, and a variable holding a count, if the variable stores N then the first N cells of the array count as the data in the set. Adding a new item to the set involves putting it in the cell indexed by the count and increasing the count by one. Deleting an item from the set means decreasing the count by one then putting the item in the cell indexed by count into the cell where the deleted item was. We can do this because order of items in the array doesn't matter. For simplicity, we will agree attempting to delete an item that isn't there or adding an item which is already there has no effect.
We considered a "set maintenance program" which used this representation. We first wrote the program all as one method. This made it long and complex as it was difficult to tell which parts of the program were about manipulating the set, and which parts were about providing the ineraction with the program user.
So next we broke the program up and put all the code that was about manipulating the set into separate methods named according to their function which took the array and count representing the set as their arguments. This gave us a program which was easier to understand, with the main method restricted to providing the user interface.
Finally we considered putting the code for sets into a separate class. Objects of this class had an array and count within them. This provided better structuring. Now the array and count representing a set were clearly brought together in one object. The user interface code interacted with that object just by calling its methods. It did not even have to refer to the array and count. The effect was as if the set type was built into Java.
Code for the "all in one method" set maintenance program can be found
in the file
UseIntSets1.java
in the directory http://www.dcs.qmul.ac.uk/~mmh/ADS/code/sets/
.
Code for the "all in one class"
set maintenance program can be found in the file
UseIntSets2.java
in the same directory. The front end for the version we saw with one
file for objects
representing sets can be found in the file
UseIntSets3.java
in this directory, with the actual code for the set objects in the file
IntSet1.java.
Note that the code for the user interface is not important, once again this is only given to provide a framework for you to run the more relevant code, which is that which directly implements sets. It is certainly not meant to be an example of a good user interface, rather it is designed to be as simple as possible so as not to detract attention from the main issue. The directory given above contains a large number of files all giving variations on the basic theme of implementing a set of integers. We will be going through various implementations in the next few weeks. A general guide to all the files in the sets directory can be found here.
array
and count
fields
within
this object as they are declared as private. It can only access the
object through
its public methods, which are add
, delete
, member
and print
and its constructor. So far as the person
writing the code for
UseIntSets3
is concerned, IntSet1
might as
well be a
built-in part of Java with the operations add
, delete
,
member
and print
and its constructor provided.
The important thing is that we have broken up our set maintenance program into two parts which have only a limited way of interacting. It helps us with the design and construction of programs if we can break them into logically separate parts, perhaps with different programmers working on the different parts. If there is only a limited way of interacting, through the public methods of a class which are defined to work in a particular way, there is less chance of unexpected errors in the whole program. Programmers writing the code for an ADT need only be concerned that they give objects of that type the correct behaviour, they need not otherwise be concerned with how the objects are used. Programmers using the code for an ADT can assume it behaves correctly, they need not be concerned with what goes on inside objects of that type.
Note that it is important to distinguish between an object and the type of that object, where in Java the type is given by its class. Do not talk or write of "an abstract data type" when you really mean "an object of that abstract data type" - they are two separate things. One way of thinking about this is that classes are "recipes" or "blueprints" for objects. You would be sure to distinguish, for example, between a cake and a recipe for a cake. You could not eat a recipe for a cake. You could change a cake, and you could change a recipe for a cake. But changing a recipe for a cake has a different effect than changing an individual cake!
Note also the distinction between an abstract data type,
and the code which implements it. Different code may
implement the same abstract data type if the way an object
of that type behaves, in the view of the code which uses it,
does not change. Perhaps you could think of this as altering
the recipe in a way that doesn't change the appearance or
taste of the cake created from it. With sets, we noted that
we could change our representation from the original "array and
count" form to an "array and count" form where the items in the
array are stored in order. This has the advantage that the more
efficient binary search can be used in implementing the
member
method, but the insert
and
delete
methods are less efficient as they require
more work in order to keep the array in order. There is no
change to the way objects of the type behave in terms of the
sets they are taken as representing, but a change in time
performance of the operations could be observed (though only
with sets of large size would there be a difference significant
enough to be observed by an ordinary user).
In our examples, the class IntSet2.java
represents a change to our original set ADT, IntSet1
,
because we replace the method
print
by the method toString
, and these two
methods
have different effects. The method print
actually causes
a text
representation of the objet to be printed, while the method toString
only returns a text representation which could be printed or used
otherwise as
desired. This is more flexible, so in future we will always provide
objects with
a toString
method returning a text representation, rather
than a
print
method which causes actual output from the program
but returns
nothing to be manipulated in the program. The class
IntSet3.java
represents the same ADT as IntSet2
, but uses an ordered
array
implementation.
We saw a very different implementation of the set ADT, using an
array of booleans.
The code for this is in
IntSet4.java.
Here a set of integers is represented by an array, where the cell
indexed by i
is set to true
if i
is in the set, and to false
otherwise. This has the advantage that it is very efficient - the
operations add
,
delete
and member
can all be done in just
one step. The disadvantage
is that we can only use this representation if we can be sure that
items in the set will always
be in the range of 0 to one less than the size of the array.
In IntSet5.java a representation of sets is given where a set of integers is represented by an array of the same size as the set. Then to add an item to or delete an item from a set, a new array is constructed and the array field within the object is made to refer to this new array.
We noted that object assignment in Java gives rise to aliasing.
If set1
and set2
are two references referring to set objects,
executing the statement set2=set1
causes set2
to refer to the object that
set1
refers to, while set1
still refers to
the same object. So if, after executing this statement, we call
add
or delete
on set2
then the
set that set1
refers to will change because it is the same object. To avoid such
unexpected side effects
(which could easily be missed in a large program which created
aliases), one technique would
be to add a method copy
to the abstract data type. So,
for example,
set2=set1.copy()
causes set2
to refer to a
set which has the
same data as set1
but is a different object. A copy
method
needs to be inside the class for a set so it can access the data inside
set objects. A copy
of the data (in our case the array) is created, and this is put into a new
object. In
order to create a new object with this copy inside it, we have to
introduce a new constructor
which takes the data stucture inside an object as its argument and sets
the field of the
object created to this data structure. We make this constructor private
so
that it can only be used by the copy
method. This is done
in
IntSet6.java.
Note
the class IntSet6
shows a simple example of inheritance.
Rather than
write a completely new class, we make IntSet6
extend IntSet5
.
This
means that IntSet6
objects have the attributes of IntSet5
objects without the code for them having to be written in the class IntSet6
.
Note,
however, that in order for the field array
declared in IntSet5
to be accessible in code written in IntSet6
, it needs to
be declared as
protected
rather than private
.
We followed from this by considering a constructive
implementation of
sets. This generalises from the idea of constructive and destructive
methods we saw in week 2. Up till now, the set classes we have seen
have been destructive,
because the methods add
and delete
when
called on them
actually cause them to change (and hence destroy the data they were
holding and
replace it by the updated data). In a constructive implementation,
these methods
would return new objects, leaving the original object unchanged. So,
for example,
set2=set1.add(5)
would cause set2
to refer
to an object
representing the same set that set1
refers to but with
the addition of 5
; it would not however change the object
that set1
refers to. The technique used for copying in IntSet6
can
be used
to make constructive versions of the add
and delete
methods, this is done in
IntSet7.java.
If all the public methods of a class of objects work constructively, objects of that class are referred to as immutable, meaning they can't be changed because there is no way of changing them. This avoids the problem of unexpected side-effects caused by aliasing since there can't be any if there are no destructive operations. Objects which are not immutable are referred to as mutable.
We noted the use of this
in methods, it means
"the object this
method is called on". So, for example, when set2=set1.add(5)
is executed, this
in the code for add
is
taken as
referring to set1
. If the object returned is the same as
the
object the method is called on, for example in set2=set1.add(5)
when the set represented by set1
already contains 5
,
we can return this
rather than create a new object if the
object is immutable. Although this results in aliasing, that is not a
problem if there are no destructive methods.
Finally we looked at axioms. These complete the concept of an Abstract Data Type. Axioms are rules which the methods of an ADT must obey if the code is to be a correct implementation. They can be expressed in terms of a relationship between the results of different call of constructive methods on objects of the ADT.
As an example, one axiom for sets is that
s.add(i).member(j) == s.member(j) if i!=jThis means that for any set
s
and integers i
and j
, the result of calling member(j)
on
the result of calling add(i)
on s
is the
same as the result of calling member(j)
on s
,
providing i
and j
are not equal. If i
and j
are equal, we can't say that
s.add(i).member(j) == s.member(j)
is
always true, since i
is always a member of
s.add(i)
; regardless of whether it is a member of s
.
Otherwise, however, adding i
to s
does not
cause a
change to whether j
is a member of s
.
If we have sufficient well-chosen axioms, we can fully define an Abstract Data Type. Any code which causes the methods to behave as asked for in the axioms is a correct implementation of the Abstract Data Type defined by the axioms.
Notes covering this week's material can be found in the local notes and resources section on representing arrays using sets part 1, part 2 and part 3.We returned to sets to consider a version of the abstract data type which contained union and intersection operators, making it closer to the concept of a set you may have come across in courses on discrete mathematics. Adding axioms for the union and intersection operations to the original axioms we had for the concept of "set" was easy. Implementing the operators as methods was a little more tricky.
The union of set s1
with set s2
consists
of all the
items in s1
plus all the items in s2
which
are not
members of s1
. The intersection of set s1
with set s2
consists of all the items in s1
which are also
members of s2
. We need a way of going through all the
items in a set in order to achieve this. We have not provided a public
method which
enables us to do this, but if the code for methods union
and
intersection
is provided in the set class, when we call
the
method as s2.union(s1)
we have access in the code to the
array in the data structure for s2
, so we can go through
that.
Note that we considered the operations only with a constructive
implementation,
so s2.union(s1)
is a value not a statement, it
only makes
sense when something is done with the value (assigned to a variable or
passed
as an argument or used for another method call on it), and the value is
a new
object. Calling s2.union(s1)
causes no change to s1
or
s2
(the same applies to intersection
).
Our initial implementation (code in
IntSet8.java)
turned out to be rather inefficient. Each time we
called the method add
to add a new item to the set being
constructed, we created a whole new set which existed only until the
next
add
was called. A more efficient implementation needed to
go through the arrays of both sets s1
and s2
when
s2.union(s1)
or s2.intersection(s1)
is
called,
creating a new array and only then creating a new set object with this
array
within it. This was easily done because a feature of Java and similar
object oriented languages is that if an object of a particular class is
passed as an argument to a method in that same class, the code for that
method has access not only to the private fields of the object it is
called
on, but also to the private fields of the object passed as an argument.
Suppose, for example, the method union
is inside class Set
,
and Set
has a private
field called array
,
and
the header to the method is:
public Set union(Set s)Then when
s2.union(s1)
is called, the variable array
refers to the array
field of the object referred to by s2
,
but also the array
field of the object referred to by s1
is
referred to as s.array
. It is probably a good idea to use
this.array
rather than just array
in this
case, to make
it a little clearer which array
is being referred to, it
means the same
thing as just array
. An implementation done in this way
is found in
IntSet9.java.
To demonstrate the general principle, we also looked at a constructive implementation of sets of integers including union and intersection using the "array of booleans" technique we considered in week 5. Since the set operations are constructive, they need to work not by changing the array of booleans as we saw before, but by creating a new array of booleans representing the new set we wish to create, then incorporating this in a new object using a private constructor which takes an array of booleans as its argument. The code for this can be seen in IntSet10.java.
The code for IntSet8
, IntSet9
and IntSet10
contains
an additional method, fromString
, which converts a string
representation of
a set to an actual set object. You need not be concerned with how this
method works,
it is not an essential part of the set class, but is there because it
helps us run
examples of the use of these sets.
Apart from looking at sets and sequences, we went through the
exercises set in
week
2,
which asked you to write various array manipulation methods. In the
case of the
shuffle
operation, it was noted that the trick to a
successful
implementation was to use a loop in which the array index was increased
by 2
rather than the usual 1 each time round. In the case of the filter
method, we had the problem that the size of the final array was not
known at the
start. We noted that a solution in which the array returned was of the
same
size as the initial array but with 0 in the place of the filtered-out
numbers was
not what was asked for. One solution was to go through the
initial array
counting how many items were to be filtered out, then create an array
of the
correct size, then go through again this time checking and copying the
non-filtered items into the new array. Another solution, which had the
advantage of checking each number only once to see if it was to be
filtered out,
was to create a tempoary array of the size of the inital array to hold
the
items which are not filtered out, and then to copy them into an array
of the
appropriate size and return that array. A solution using recursion, in
which the
final array was created in the base case and items put in the array on
return
from recursion, was also discussed.
We looked at quick sort, a simple version of which could be
created using
modifed versions of the code for filter
and append
.
The idea is to divide an array into two parts, one consisting of all
the elements
from the array less than its first element, the other consisting of all
the elements
of the array greater than or equal to the first element (but not
including the first
element itself). Then sort each of these parts using recursion, and
join the
sorted two parts together with the original first element in the
middle. The base
case is when an array is 0 or 1 in length, it is then returned
unchanged. Note that the more efficient version of quick sort (code
here)
which you saw in
week
11
of your procedural programming course works by moving elements around
in the initial
array rather than creating two new arrays, so it works destructively,
the code is perhaps
less clear than
ours
in order to gain this efficiency.
Quick sort is similar to the merge sort we saw in week 3, in both
cases an array is
divided into two, the two parts sorted recursively and the sorted parts
joined together.
Using a similar argument to that used for merge sort, quick sort can be
categorised as
O(N log N). Although we did not discuss it, unlike merge sort, quick
sort has a bad worst
case since the two parts are not necessarily the same size, the worst
case being when
one part is always of length 0, which would happen if quick sort were
applied to an
already sorted array. A point we noted between the two is that quick
sort works by
having the comparisons done, which are the main work in sorting, before
the
recursion, while the joining of the two sorted parts is simple. In
merge sort
the separation of the two parts is simple, but the main work involving
the
comparisons is done after the recursion when the two parts are
merged together.
Week 7: 21/22 February
The Monday lecture period was given over to the first term-time test
for the course,
while the Tuesday morning lecture was given over to discussing
solutions to the
lab exercises for Week
3
which are available
here.
In the Tuesday afternoon lecture we started the important topic of linked lists. A linked list involves the simplest sort of recursive data structure, one built up out of "cell" objects which have two fields. One field holds data, the other is of the same type as the cell. So a cell refers to another cell, that cell refers to a further cell, and so on, creating a list. If we consider the data in a cell to be the first item in a list, and the data in the cell that cell refers to in its cell link to be the next item in the list, and so on, the structure given by a cell object and the further cell objects it links to can be considered in the abstract as just a list of data items.
The list finishes in a cell whose cell field is set to the value null
.
This is built into the Java language, and any variable of any object
type can be set to the value
null
, indicating that it currently refers to no actual
object.
The two fields in our cell objects were not declared as private
,
and the class
had no methods except for a constructor to create a new cell object
with a given piece
of data to store and a given next cell (or null
) to link
to. So the data in a
list may be changed by any code which has access to the list. Also
links may be changed and
made to refer to other cells. The aliasing effect of object assignment
means a cell in a
linked list may be referred to in several different ways, perhaps
directly through a
variable which refers to it (or points to it, as we tend to say
in this context),
and indirectly through a variable which refers to another cell whose
field links to the
cell in question.
This is shown diagrammatically by cells and pointers diagrams.
In a cells and pointers
diagram, each cell is shown by a rectangle divided into two, one part
containing the data item,
the other part containing an arrow which leads out and points to the
rectangle representing the
cell it links to. Link fields which are set to null
are
represented either by
an arrow which points to an "electrical earth" symbol, or by a line
drawn diagonally across the part of the rectangle representing the
link. Variables which refer to cells in the situation
are shown by a small box labelled with the variable's name containing
an arrow which leads out and
points to the rectangle representing the cell the variable points to.
The aliasing effect means unresticted use of linked lists can lead
to confusing code and
unexpected side effects as a cell referred to through one variable is
changed and this changes
the linked list of which it is a part referred to by another variable.
Therefore linked lists
are best used as a data structure referred to inside another class
which implements an abstract
data type. This means the linked lists can only be changed by the
methods of that class so, as
long as they are carefully designed to work properly together, no
unexpected side-effects can occur.
If linked lists can only be referred to within another class, we can
show that by declaring
the class for cells inside that class. This is referred to as a
nested class.
Like methods and fields, nested classes may or may not be declared as static
.
A static
nested class, like a static method, is self-contained, meaning it is
not attached to any
particular object of the class it is in.
In IntSet11
we
saw an implementation of the abstract data type "set" we have
considered in previous weeks,
but here using a list data structure rather than an array or array and
count as used
previously. We noted that a common operation on linked lists involves
setting a pointer
variable to refer to the first cell in the list, then in a loop update
changing it to
refer to the following cell. This may be used, for example, to check
for the presence of a particular data item in a linked list, with the
loop exited either if the pointer becomes
equal to null
or the pointer points to a cell which holds
that item as its data.
If the loop exits with the pointer equal to null
, we know
we have checked
the entire list and the item has been found not to be present. An item
may be deleted from
a linked list by stopping the pointer moving down the list at the stage
where it points to
the cell before the one holding the item. Then its linking
field is changed to set it
equal to the linking field of the cell holding that item, having the
effect of cutting that
cell out of the linked list.
Our implementation of the abstract data type "set" using linked
lists had set objects with a single
private field of the type of cell objects. This field could be accessed
directly by the methods
to implement the set operations, with these methods being called on
individual set objects and hence
accessing and manipulating the linked list within the set object the
method is called on.
Week 8: 28 February / 1 March
We continued to look at linked list implementations of the ADT "set"
and at operations on linked lists to provide the set operations. As
with arrays,
the linked list data structure used to represent sets may be ordered or
unordered. With arrays, ordering had a major benefit because we could
use
binary search to implement the set membership test. However binary
search
relies on the fact that any cell in an array can be accessed in one
step given
its index. Cells in a linked list can only be accessed by starting at
the first
cell and following the links. So testing for membership when a set is
implemented
by an ordered link list is still O(N), since on average we have to go
through
N/2 cells for the test when N is the size of the set. We can halt at
the point where the data in the
cell is
greater than the data being searched for, rather than going right
through the list
if it is not present, but this is not a major efficiency improvement.
When a new
item is added to an ordered linked list, it has to be added in the
right place. This is achieved by setting a pointer to the cell before
the place where the new item is to be
inserted, and setting the link field of the cell that pointer points to
to a new
cell whose data field holds the item being inserted and whose link
field holds the
old value of the link field. Note that to insert an item into a linked
data
structure it is necessary to do it by altering a pointer in a cell
which is already
linked into the structure. Inserting an item at the front of the linked
list needs to be treated as a special case however.
As linked lists are a recursive data structure, it can be convenient
to write code
operating on them using recursive methods. The recursive methods we saw
needed to take
a linked list as an argument. We had a public non-static method, which
being non-static had access to the linked list field of the set object
it was called on, passing that reference to a private static method
which was the actual recursive method where the list being manipulated
was given as an argument.
As we saw with sets implemented using arrays, sets implemented using
linked lists had
just the same problem with aliasing when the operations were
destructive. If
s1
and s2
are variables of a set type, then
executing
s2=s1
causes s2
to be an alias of s1
,
and
any adding or deleting of an element from the set represented by s1
will
also happen to the set represented by s2
and vice-versa,
as they
are represented
by the same object. Again, we need a separate copy operation so we can
execute
s2=s1.copy()
which causes s2
to represent a
set which initially
stores the same elements as the set represented by s1
but
is independent
of it so destructive changes to one won't affect the other.
We saw three different ways of copying linked lists. One, the pointer
at back
technique, created an initial cell which copied the data from the
initial cell of the
original linked list, and then had a pointer to that cell and a pointer
to the original
linked list, with new cells being added via the pointer to the back of
the new linked lists,
and both pointers then being moved down (the new list pointer then
being
on the new last cell).
The next technique used recursion. If a linked list is null
then its
copy is null
. Otherwise its copy is a new cell whose data
item is the data
item from the first cell of the linked list being copied, and which
links to a linked list which is a copy
of the linked list the first cell of the original linked list refers
to. This simple recursive
definition translates directly to a recursive method. The third
technique was
stack and reverse. If we start with a new linked list which is
initially null
and run a pointer down the original linked list adding cells to the
front of new linked list which
copy the data from the old linked list, when the pointer reaches the
end of the original linked list,
the new linked list will hold a copy - but with the data in reverse
order. If the order matters
(which it doesn't for sets) we can use the same process to create a
third linked list which reverses
this reversed list to give the exact copy. Variations on these three
techniques can be used in
general in any case where we want to construct a new linked list which
represents some change to
an original linked list. We also saw that if we used recursion, the new
linked list could share
cells from the original linked list when there was no change to the
remainder of the list.
Code to extend the implementation of sets using linked list in
IntSet11
to add a copy method is found in
IntSet11a,
IntSet11b
and
IntSet11c,
using the three techniques given above. Code to implement sets using
linked lists
destructively, with the operations implemented using recursion can be
found in
IntSet14a.
Code to implement sets using linked lists with constuctive operations
can be found in
IntSet17
(here recursion is used for the delete
method) and
IntSet17a
(with iteration used for the delete
method).
Notes on using linked lists to implement sets can be found in the local notes and resources under the heading Sets using linked lists , with further notes in the following sections on recursion with linked structures and constructive recursion with linked structures.
We started looking at the linked tree data structure. Like linked lists, this consists of cells which link through having references to further cells. With linked lists, each cell has a reference to one further cell, with trees each cell has a reference to two further cells. Strictly, this is a "binary tree", since one could have trees with cells which have references to more than two further cells. We refer to the initial cell of a tree as its "root", and the two cells to which it links plus the further cells to which they link as its "branches", with the two branches of a binary tree being termed "left" and "right".
A tree can be yet another data structure to represent sets if it has
the
property of being an ordered binary tree. A binary tree is
ordered if
every item in its left branch is less than the item at its root, every
item in
its right branch is greater than the item at its root, and the two
branches are
themselves ordered. The empty tree, represented just by null
,
is
also ordered. To test whether an item is a member of a set represented
by an
ordered binary tree, if the tree is empty, it is not. Otherwise, we
first test
whether the root item is the item being searched for, if so the item is
a
member. Otherwise, if the item is less than the root we can restrict
search to
the left branch since we know if it is present it must be in that
branch, if the
item is greater than the root we can restrict search to the right
branch. This is
similar to binary search, where each step in the algorithm cuts the
problem size
into half. Note, however, that there is no guarantee that a binary tree
will
have equal numbers of elements in its left and right branch. There are
techniques to ensure trees are balanced in this way, but we will not
have
time in this course to cover them.
null
. The field in the cell which contains null
is
then set to refer to a new cell which contains the item as its root and
a null
left and right branch. A similar process takes
place to delete
an item which is stored in a leaf cell (i.e. one with null
for both
branches): the pointer is moved to the left or right as appropriate
until it points
to the cell whose left or right branch is the leaf to be removed, and
that pointer to
the left or right branch is set to null
. If the item to
be deleted
is in a cell one of whose branches is null
then the
pointer to that
cell from the cell above it can be set to the other branch to remove
the cell.
The important thing in all these cases is that to change a
linked
structure the change must be done through a pointer to a cell which is
linked into
the structure by changing one of that cell's fields.
The difficult case with deleting an item from an ordered binary tree
is when the item
is in a cell neither of whose branches is null
. The cell
cannot just be
removed to leave a "hole". What can be done is that the item to be
removed is replaced
by the largest item in its left branch and the cell holding that item
is removed. This
keeps the property that everything in the left branch is less than the
root item and
everything in the right branch is greater than the root item. The
largest item in the
left branch must be the one obtained by going to the left branch and
moving down the right
branches until the next right branch is null
, the cell
then reached is
either a leaf or has a null
right branch, so can be
removed as described previously.
We looked at tree traversal which we saw when we considered how we could get a textual representation of the set represented by a tree. Tree traversal is any case where we need to go through every item in a tree. It involves a recursive algorithm with three parts: processing the root, going through the left branch recursively, and going through the right branch recursively. If the left branch is processed, then the root, then the right branch, this is referred to as in-order traversal and has the result that the items are processed in the ordering (e.g. numerical ascending with our tree of integers) used to order the binary tree. Processing the root, then the left branch then the right branch is referred to as pre-order traversal, while processing the left branch, the right branch and finally the root is known as post-order traversal.
An implementation of sets using an ordered binary tree with the set
operations implemented destructively can be found in
IntSet16.
Note this implementation also includes, for demonstration purposes, a
method
that will print the tree structure of the underlying representation.
The
printing is done with the root on the left and the branches growing
towards
the right. The set maintenance program front-end in
UseIntSets15a
causes its set to be represented by IntSet16
, and
includes an extra
command t
which prints the tree representation of the
set.
We looked at algorithms for adding and deleting items from an ordered binary tree constructively. If the required change is to the left branch, a new tree reflecting the change can be obtained by making a new tree tree reflecting the change in the left branch, and then returning a new tree cell whose left branch is this new tree, whose root is the same root as the original tree, and whose right branch is the same right branch as the original tree. Similar applies if the change is to the right branch. If the root only is to be changed, the new tree returned consists of a new cell whose root item is the new root, but whose left and right branches are the left and right branches of the original tree. The result is that the new tree shares some cells with the old tree but only where the shared cells are not in the part of the tree that has been changed. Since all operations are constructive there is no danger that the shared cells could lead to unexpected side effects. The class IntSet18 shows sets implemented by ordered binary trees using constructive version of the set operations.
Notes on using linked lists to implement sets can be found in the local notes and resources under the heading Sets using trees , with further notes in the following sections on recursion with linked structures and constructive recursion with linked structures.
We next looked at the abstract data type List
.
This is a simple example of an ordered collection, with the only
operations
on it being head
which returns the first item of the
collection,
tail
which returns a new collection consisting of every
item
except the first item (in the same order), cons
which
takes an
item as an argument and returns a new list whose first item is the
argument
to the call to cons
and whose other items are the items
of the
list the call was made on (in the same order), isEmpty
which tests
whether the list it is called on is the empty list, and empty
,
a static
method returning a new List
object representing the empty
list.
We noted the difference between this abstract data type List
and the
linked list data structure we have seen previously. Whereas a linked
list may be
manipulated directly through the fields of its cells, a List
object
can only be manipulated through its operations which are provided as
Java methods.
It is possible to change a linked list destructively by assigning
values to
the fields of its cells, but there is no way of changing a List
object destructively, the abstract data type forces constructive
programming.
The abstract data type List
is most obviously implemented
by a linked
list, so a List
object contains a field referring to a
linked list inside
it. However, to show that it is a distinct thing from a linked list, we
saw an
alternative implementation using an array and count. An abstract data
type is defined
only by the way its public methods work, so what is inside it to make
them work is
not part of the definition.
Code for linked lists can be found in the
lists
subdirectory of the code directory. The class
List
gives an implementation of the abstract data type List
using
linked lists, while the class
ArrayList
gives an implementation using an array and count. A simple front-end
which reads two
lists and appends them to form a new list can be found in this
directory in class
UseLists1.
Notes on the abstract data type List
as we have defined
it in
this course can be found in the
local
notes and resources
under the heading
Lists.
In the Tuesday lectures we covered two abstract data types stacks and queues.
A stack is a destructive version of the List
abstract
data type we saw last week.
It is a collection of data ordered by the time the items were added to
the collection. The operations
are to return the first item of the collection, take off the first
item, and add a new first item (plus
an emptiness test and a constructor to produce a new empty collection).
Returning the first item
is exactly equivalent to returning the head of a List
object. Taking off the first
item is like the tail
operation, except instead of
returning a new object and leaving
the object it is called on unchanged, it actually changes the object it
is called on to make it
represent a collection with the same items as before but missing its
first item. The operation
of adding a new item is like cons
with List
objects, except that it
changes the object it is called on to make it represent what is
represented by the new List
object returned by the cons
operation. These operations
on stacks are given the names
top
, pop
and push
. To get a
feel for the
abstract data type, we can think of the items in it piled vertically
into a stack, like a stack
of books or some other flat object (hence the name). Then we can only
look at the top item of the stack, pop off the top item, or push a new
item onto the top.
In order to generalise our concept of abstract data types, we
considered stacks whose
components were of type Object
whereas up to now
we have considered
various collections of integers. The type Object
is
Java's "most general
type". A variable of type Object
can hold a reference to
an object of
any type. Note that numerical data and characters in Java
are the only things
which are not objects, so a variable of type Object
cannot be assigned a value of type int
, for example. Java
gets round this by having some special built-in
classes called wrapper classes, which are the object equivalent
of non-object
data. For example, the class Integer
is the
wrapper class for int
,
it has a constructor for making a new Integer
object to
represent a particular
int
value, and a method that may be called to return the
int
value
a particular Integer
object represents.
A stack may be implemented by an array and count data structure, or by a linked list data structure.
Stacks are common in Computer Science. They may be used to represent a list of tasks which have to be done in order. If the current task being done breaks into a list of subtasks, these may be pushed onto the stack. This then ensures they are done in order, with the following task left waiting until they are all done. The same applies if the subtask itself breaks down into further subtasks. We saw how this could be used to implement problem solving that would otherwise require recursion. We looked at the example of traversing a tree in-order. This breaks down into the task of traversing the left branch, processing the root, and traversing the right branch. This could be achieved by having a stack which can contain both trees and integers. The whole tree is initially pushed onto the stack. Then a loop is executed until the stack becomes empty. The loop consists of taking the first item from the stack. If it is an integer, it is processed. If it is a tree, its right branch and its root integer and its left branch are pushed onto the stack (leaving the left branch at the top of the stack). A branch is not put on the stack if it is empty.
Stacks are sometimes referred to as LIFO collections, standing for "last in, first out" meaning that the delete operation removes whatever item which is still in the collection was the last to be added. Queues are sometimes referred to as FIFO collections, meaning that the delete operations removes whatever item which is still in the collection was first to be added. This leads to the behaviour we are familiar with in queues to be served where people join the back of the queue when they arrive, and the next person to be served when a server is free is the person at the front of the queue.
A queue can be implemented by a linked list, with the items being
stored in linked
list cells in the same order we think of the queue as having. Then
deleting the first
item from the queue is easy (set the linked list field of the queue
object
to the next
field of the first linked list cell), but
adding a new
item to the back requires traversing the whole list to reach
the back cell, making the operation O(N).
The efficiency can easily be improved by having the queue object have
two linked
list fields, one to refer to the first cell in the linked list, the
other to refer
to the last cell. Then adding a new item to the queue can be done by
changing the
last cell's next
link to refer to a new cell which holds
the item and
whose next
link is null
. This can be done
in one step
through the pointer to the last cell which is then changed to refer to
the new last cell.
An alternative implementation of queues uses an array and two array indexes, one to index the cell holding the first item in the queue, the other to index the cell after the one holding the last item. If the two indexes are the same, it represents the empty queue. To represent the change in the queue to cause the front item to leave, increase the front index by one, the data in the cell it used to refer to is then no longer considered part of the queue. To add a new item to the back, put it in the cell indeed by the back index, and increase the back index by one. With stacks implemented by arrays, there is a single index which goes up and down as items are added and deleted. Here the two indexes can only go up, so will eventually reach the end of the array. In order to prevent this causing a problem, if an index reaches the end of the array, next time it is supposed to be increased it is set to 0 instead. Then if the front index is greater than the back index, the data treated as being in the queue is that starting at the cell indexed by the front index, going up to the end of the array, and then continuing at the start of the array up to the cell before the one indexed by the back index. Now, as with our array implementation of stacks, we only hit a problem is the number of items in the collection exceeds the size of the array.
We can use a queue in a similar way that we used a stack above to traverse trees. The algorithm is to start with a queue which consists of the tree being traversed. Then we iteratively remove the first item from the queue of trees, process its root, and add its left and right branches to the queue (as it is a queue they can only be added to the back). The result is that we have breadth-first traversal. This means the root item is processed, then the roots of its two branches, then all the items at depth 2 in the tree, then all the items at depth 3 and so on. We could not achieve this using simple recursion, as it involves jumping between branches in the tree rather than processing one as a whole before moving on to the other.
We looked at two ways of using linked structures to implement the
abstract data
type sequence
which we considered in week 6.
One was to use doubly-linked lists. These are lists formed of
cells each of which
has two references to further cells, with the cells linked so each has
a link leading
both forward and backward in the list. The insertion point is indicated
by a pointer
to one of the cells, and adding and deleting items from the sequence
involves
linking or deleting cells from the doubly-linked structure. The other
data structure which was shown to implement sequences
was two ordinary linked lists. One represents the items that
would be encountered
in moving backwards in the sequence from the insertion point, the other
the items that
would be encountered in moving forwards. Then adding or deleting an
item involves adding or
deleting the first item from one of these lists, and moving backwards
or forwards is
reprsented by taking an item from the front of one and putting it on
the front of the other.
Object
and
Using
stacks and queues.
The code can be found in the lists
subdirectory of the code directory. Here
Stack
and
Queue
are
the implementations using linked lists, while
ArrayStack
and
ArrayQueue
are
the implementations using arrays.
UseIntSets20
is the
set maintenance program front end which uses a stack to implement an undo
and
redo
command. The class
IntSets21
is an implementation of sets using ordered binary trees which includes
an extra method
which displays the items in the set in the order in which they are
stored breadth-first
in the tree.
Notes on linked data structures used to implement sequences can be
found in the
Editor example notes.
Week 11: 21/22 March
We covered the Table abstract data type, which is also known as
a Map.
The Java
Collections Framework, a library of classes for abstract data types
which is
provided with the Java language uses the name Map
for its
class which
provides this sort of data collection. Note that in this course we look
at simplified
versions of classes that are already provided in Java as part of its
extensive library of built-in classes, and see how they
might be
implemented using the basic Java concepts. As a programmer and computer
scientist, it
is important you are familiar with this, but in practical circumstances
where you
are writing Java programs it is better to use Java classes that are
provided with
the language, so long as they do what you require, rather than write
your own.
The idea of a table or map is that it is a collection of data where individual data items can be accessed by a key. A key is a field of a data item, typically but not necessarily a string, which is known to be unique for each individual item of data. For example, we know that every motor vehicle registered in the UK has a unique registration number, so if we had a table of data representing records of cars, we could use the car registration number as the key. National Insurance numbers in the UK or Social Security Numbers in the USA are frequently used as keys for personal records as there is meant to be a separate one for each individual (see here for some discussion on this issue).
The name "table" comes from the idea that our data could be stored in a table with a separate column for each field of the data, with the items stored in key order, so given a key you could do search to find the rest of the data associated with that key. The name "map" comes from the idea that the data structure represents a mathematical mapping function from the domain of keys to the domain of data items.
In the previous week, in order to make our collections as general as
possible we made them
collections of references of Java's most general type, Object
.
This week we
considered an abstract class which defined only those aspects
of data items with keys
we need to build tables, we gave this class the name
KeyObject
.
We can define our tables as collections of objects of this class, and
then use classes which
extend it when we make actual use of the table class. All that is
required are methods
which enable us to compare objects using their keys (in order to test
for equality, or
to place objects in an ordering), and a method which enables us to
get the key of a particular object. The comparison methods are given
completely, the
method to get a key is specified only in terms of its signature - this
method has to be
filled in as appropriate by the classes which extend the abstract
class.
We defined a class with the name
Table
as a Java interface. An interface is an abstract class
in which no code is
given for any method, they are all only given as signatures and
it is
necessary for classes which implement the interface to give the actual
code. This
fits in closely with the idea of abstract data types, as developed in
this course,
defined only by the operations on them.
A table object is very similar to the set objects we have considered
extensively
previously. It contains a collection of data which has no concept of
the data
being stored in any particular position and no concept of a data item
occurring
multiple number of times. There are operations to add and delete data
items. With a table, however, the operations to delete a data item
takes only the
item's key as its argument. It changes the table to delete the item
with the
given key. With sets we defined the delete operations so that an
attempt to
delete an item which wasn't there had no effect. With tables we give
the delete
operation a boolean
return value, which is set to false
if an attempt is made to delete an item which isn't in the collection,
and to
true
otherwise. The member
operation of sets
is replaced
by a retrieve
operation, which takes a key as its
argument and returns
the data item in the collection which has that key. If no data item in
the
collection has the given key, it returns the value null
,
which can
be done since any variable of an object type can be set to null
.
In order to display the content of a table we gave a method print
which
took a PrintStream
as its argument, this can be connected
to a file
or to the screen output. PrintStream
is defined in the java.io
package, so an import
statement is needed in the
interface file to allow
the reference to it.
As tables resemble sets they can be implemented using the data
structures and
algorithms we used to implement sets. For example,
Table1
shows an implementation using the array-and-count technique.
Since the collection
contains KeyObject
s rather than the int
values we used
previously, we use the equalKey
method from the KeyObject
class where previously we used ==
. In those
implementations, such
as an ordered binary tree, which depend on an ordering of the data
items, we use the
lessThan
method from KeyObject
where we used
<
with int
values. An implementation of
the interface Table
which uses an unordered linked list
can be found in
Table3
,
an implementation which uses an ordered binary tree can be found in
Table4
.
In order to demonstrate practical use of tables, we gave a class
Student
which extends KeyObject
and a front-end in
StudentDatabase1
which provides a simple student database maintenance program.
For simplicity we used a student's
surname as the key, although this is not really a good idea as several
people nay have the
same surname. The constructor for objects of type Student
took a
BufferedReader
as its argument, enabling Student
objects to
be constructed directly from file or screen input. The date of birth
and marks for a
student were defined by nested classes, with a public method addMark
in class Student
which changes the student record to add
a new mark. The
class Mark
inside class Student
gives its
own addMark
method used to change the Mark
object which forms part of
a Student
object.
We saw implementations of class Table
using arrays,
linked lists
and trees, similar to the previous implementations of sets. However,
the most
efficient implementation of a set of integers was to use an array of
booleans,
with a[i]
set to true
if i
is
in the
set and to false
otherwise. The advantage of this
implementation was
that it was very fast in terms of time, since given the value i
we
could access a[i]
, and hence find whether i
was in the
set it represented, in one step. This raises the question of whether
there's
an equivalent implementation for tables.
One possibility would be to have an array so large that there's a separate cell for every possible key. We can convert keys which contain a mixture of numeric and alphabetic characters to integers by, for example, treating them as numbers written in base 36, with the ith letter of the alphabet representing the base 36 digit i+10. However, it can easily be seen that for many practical examples there would be far too many possible numbers to be accommodated. Instead, given an array of length n, we use an array of length n, convert the key to an integer, and have a function which takes this integer and returns an integer in the range 0 to n-1. This function is known as a hash function. Then an item with a particular key is stored in the cell of the array indexed by the hash function applied to the integer equivalent of the key. A table implemented in this way is known as a hash table.
The problem with this is that we cannot guarantee that two different keys won't give the same index when the hash function is applied to their integer equivalent. When this happens it is known as a collision. A good hash function will ensure that there is as far as possible an equal chance for any array index that a key will hash into it, so the data is distributed evenly across the array and the chances of a collision are minimised. However, it is still necessary to deal with collisions when they do occur.
One technique used is referred to as linear probing. Here if the cell an item should be stored in is occupied, it is stored in the next free cell. When searching for an item, the cell it should be in is looked at, but if it is occupied by an item with another key, the next cell is looked at and the same principle applies. Note that if an item is deleted, a special marker needs to be put in its place in case a items that should have been stored in its place were displaced to later cells, so that when a search is made it doesn't stop at the cell where the deleted item was.
Another technique is known as separate chaining. Here each cell in the hash table array refers to a linked list of data items. So long as collisions are rare, the linked lists will not be long, so there will still be few steps required to find an item in the table given its key.
An implementation of the interface Table
which uses a
hash table
with linear probing can be found in
Table6
,
an implementation which uses a hash table with linear chaining can be
found in
Table7
.
Further notes on the topics covered this week can be found in the
local
notes and resources
under the headings
Tables,
student database part 1 and
student
database part 2.
Week 12: 4/5 April
The time for the Monday lecture was given over to the second term-time
test.
The first Tuesday lecture was on the final topic for this course,
which is enumerations.
In our definition of the abstract data type Table
last
week, we suggested a method
which prints the contents of the table in textual form, but we said
that it was too specific. What we want is a way of going through the
contents of a table in a general way so that
we can do with them whatever we like. It might be writing a textual
representation to a file,
but it might be anything else. For example, in the
lab
exercise we had this week, we wanted to go through the contents of
a table used to
store a database of student records in order to find the records of all
students whose mark
fell below some pass mark. There is no way of doing this in the
definition of Table
we gave, apart from printing the contents of the table to a file and
then re-reading the records,
which is inconvenient and inefficient.
An enumeration is an object which is obtained from a collection
object and enables us to go through the contents of the collection
object one at a time. Java has an interface
which defines such an object provided as part of the java.util
package
which comes with the Java language. It is defined through just two
methods:
nextElement
and hasMoreElements
. Each time
the method
nextElement
is called on an object of type Enumeration
,
a new element from the collection it was created from is returned,
until all the
elements from that colletion have been returned (attempting to call nextElement
after that will result in an exception being thrown). The method hasMoreElements
returns a boolean
value, true
if there are
some remaining
elements that have not yet been returned by a call to nextElement
,
and
false
otherwise.
Note this is our first (and only in this course) example of the
common technique
in Java of implementing a built-in interface. The advantage is that
this provides us
with a standard way of referring to a particular type of object. We can
write
our own classes which implement the Enumeration
interface, but we
can write code which uses objects of these classes as if they were of
type
Enumeration
. Note also that in recent versions of Java,
the type
Enumeration
has been replaced by a similar type called Iterator
.
This means that Java programmers are encouraged to use Iterator
rather than Enumeration
. In general, however, once a
feature has been
added to a programming language it can't be taken away in later
versions as that
would cause programs written assuming the old version to stop working.
So the
definition of Enumeration
is still in Java, and we use it
for the
purpose of illustration as it is slightly simpler than the definition
of
Iterator
.
We gave a Java interface for tables, ETable,
which included a method getEnumeration
which returned an
enumeration of a table. An enumeration for a particular implementation
of a table is given by
an inner class declared inside the class of the table. As the
inner class is not static
, an object of this class has
access
to the data fields of the class the object is created in.
We looked at a simple example, in
ETable1,
which shows a table implemented with an array and count which provides
an enumeration.
The enumeration object created by calling getEnumeration
on a
particular ETable1
object has access to the array and
count of that
ETabl1
object, and has its own index variable which gives
the
position in the array of the next object from the table that will be
returned
by a nextElement
call. When that call is made, the object
is returned and the index increased by one. The hasMoreElements
test involves comparing the index variable against the count variable,
when
it reaches the value of the count variable it has returned every item
of
data stored in the table.
Enumeration objects for other implementations of tables may be more
complex.
In ETable3
we had an implementation of a table using an ordered binary tree. The
enumeration
needs to go through the items of the ordered binary tree one at a time,
each
call of nextElement
returning the next one. To accomplish
this,
the technique of using a stack to traverse a tree, as covered in
week 10 is used. In order to build
the stack,
the inner class for the enumeration needs itself to have an inner class
for the
cells of the linked list of the stack. This shows that it is possible
to have
layers of nesting of classes in Java - here a class inside a class
inside a class.
In ETable4 we have a table which can provide an enumeration, with the table implemented as a hash table with linear probing, while in ETable5 we have similar, but with the hash table using separate chaining. The separate chaining example shows a class with two nested subclasses, a static one to form the cells that are chained together, a non-static one to construct the enumeration. We noted that as the whole point of hash tables is that the data is scattered in the array seemingly at random, an enumeraion which goes through the data in the hash table in the order of the array will seem to go through the items in no particular order. We noted therefore that if we have a case where we want a table but often have to go through the items in it in the order of their key, then it would be better to implement the table using an ordered binary tree as this enables us easily to produce an enumeration which goes through the items in order.
In the final lecture, we went through the questions in the second test. We covered again the crucial idea of an abstract data type, since it was obvious from looking through the test papers that many students are failing because they have not grasped this fundamental concept.
The idea of an abstract data type is that it is defined only
by the
operations possible on it, which in Java is done by a class where the
only operations possible on it are to call its public methods. Objects
of the class will have a private data structure in them, and there is
a correspondence between particular values of this data structure and
the
abstract concept they are used to represent. The methods of the
class manipulate the private data structure to change it or construct
new objects representing a change to the abstract concept given by the
method. But only the methods in the class access and manipulate
its private
data structure. Question 2 of the test was a
good example of this - it was about the abstract data type List
,
and not about linked lists. An object of the type List
may well have
a linked list data structure inside it, but List
objects
can only
be manipulated by the public methods of the class, which the question
gave as
head
, tail
, cons
, isEmpty
and empty
. If you are asked to write code to do things
with
objects of type List
, you should not make reference to
the type
Cell
or to first
and next
fields, as
these are aspects of a possible implementation of the type rather than
the public operations
it provides. This was covered in a week 9
lectures and the
week
10 lab exercise.
Question 5 of the test was about the abstract data type sequence
,
which is
a list with an insertion point. This means that a class has to be
provided which gives
the sequence operations, which are add
, delete
,
backwards
and forwards
. Inside the class there is a data structure,
and there is a
correspondence between configurations of the data structure and the
abstract idea of a
particular sequence value. Many answers to this question showed little
understanding
of this concept, for example, many people suggested "tree" as a
possible implementation of sequence
, which it is not as
there is no correspondence betwen a
particular value of a tree and a particular value of a sequence.