Algorithms and Data Structures 2005: Lecture summary

Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Week 10
Week 11
Week 12

This page will be used for a brief summary of the lectures given each week. A more full set of notes is available from the link Local Notes and Resources.

Week 1: 10/11 January

I was away for the first week of the semester, but in my absence a lab exercise I had prepared was handed out and talked through by Jon Rowson. The exercise was intended to get you to think through some of the issues around algorithms by working out an algorithm for a moderately complex problem. The problem was that you were given a sequence of integers, some positive some negative, and you had to find a starting point and finishing point in the sequence where if you added all the numbers between those points you got a greater total sum than you could get for any other starting and finishing points.

It ought to be fairly obvious that one way of solving this problem is to try every possible combination of starting and finishing points, and find the one which gives the largest sum. Translating this algorithm into runnable Java code is a good exercise of your programming abilities.  However, a program obtained by implementing this algorithm will run very slowly once it is tried on large sequences of numbers.  The problem is not the Java code, but the algorithm itself. Careful consideration of this problem can lead to a much more efficient algorithm which could be coded up in Java.

Week 2: 17/18 January

The first lecture introduced the course. Algorithms are ways of solving problems. We can express them in English, but English like any natural language is imprecise, and what one person understands by a phrase may not be understood in the same way by another person. To be more precise, we could use some sort of formal description language. We could also express algorithms in terms of a computer programming language like Java. This has the advantage of being precise, but in order to make the code runnable it may be necessary to add a lot of extra support material which isn't really part of the algorithm. So it is important to distinguish between the abstract concept of an algorithm, and a particular piece of code used to implement it. In this course, however, we will concentrate on algorithms which are expressed in Java code. There is a big advantage in being able to experiment with an algorithm by actually running it in the form of a computer program, and it also helps improve our programming skills.

Data structures are ways of storing collections of data. Since most algorithms work on collections of data, algorithms and data structures are generally closely linked as topics. So far, the only data structure you have seen is the array, which is built into Java and most other programming languages. Other data structures can be provided in Java through Java classes which implement them. Although the Java code libraries have several examples of data structure classes already written for you, these will not be covered in the course. Since it is a course on understanding the concepts through simple examples, and not a course in the use of Java (although it does use Java) the emphasis will be on writing your own code to implement common data structures.

We noted that when considering an algorithm there are two different approaches we can take. One of these is the iterative approach in which we repeatedly make changes to some structure until we have reached the solution. The other is the recursive approach, in which we go about solving a problem by seeing if we can express it in terms of a smaller version of the same problem. We illustrated these approaches by looking at an iterative and a recursive algorithm for solving the problem of evaluating n to the power of m, along with Java methods that implement the algorithms.

We looked briefly at ways of proving algorithms to be correct. With iterative algorithms, we can use the loop invariant approach. This means finding an expression, the loop invariant, which is true when the loop starts, and remains true at the end of the execution of the loop's body. It must therefore remain true when the loop terminates, at which point the loop test must be false. From the combination of the loop test being false and the loop invariant being true, we can show the problem is solved when the loop terminates. With recursive algorithms, we can use proof by induction.

We also considered briefly the idea of constructive and destructive operations on data structures (on arrays, since so far that is the only data structure that has been covered). If a data structure is passed as an argument to a method, the method is destructive if it works by actually changing the data structure. It is constructive if it works by creating a new data structure which reflects the changes desired but leaves the actual data structure passed as an argument unchanged.

Week 3: 24/25 January

An important topic this week was insights that can be used to develop more efficient algorithms. The first example looked at was evaluating n to the power of m. Last week we considered algorithms which involved repeated multiplications by n. However, if we compute nm/2 all we need do is multiply that number by itself to get nm (or, if m is an odd number, multiply it by itself and then by n, with m/2 being rounded down to the nearest integer). We showed that using this insight, the total number of multiplications needed to evaluate nm would be at most twice the logarithm to base two of m, whereas with last week's algorithms m mutiplications are necessary. If m is a large number, this makes a huge difference.

We looked at two algorithms for sorting an array. One was selection sort and the other was merge sort. Selection sort works by finding the smallest item and putting it in the first position, the second smallest item and putting it in the second position, and so on. Merge sort works by dividing the array into two parts, sorting each part recursively, and then taking items from the sorted parts to put into the final array, each time taking the lowest item from either of the two sorted arrays which has not yet been taken.

We introduced the big-O notation as a way of describing the efficiency of algorithms. The idea is that if we have a formula giving the number of significant operations taken in an algorithm in terms of the size of the data, we can ignore all but the largest term of the formula and we can ignore constant multiplying factors. We gave an informal argument that showed selection sort is O(N2) and merge sort is O(N log N). This means that merge sort is more efficient, if we try to sort a large array using the two algorithms, we will find merge sort always does it much quicker. The important things is that it is the algorithm that makes the difference, not details of exactly how it's programmed or even the computer it's run on.

We looked at the solutions to the problem given as an exercise in week 1. A simple solution is just to find the sum of every possible subsequence and chose the one which is the biggest. This, however, is not an efficient way of solving the problem. An efficient solution involves the following insight. Some notes removed here as the same problem may be set as an assessed exercise in modules currently being taught

Please note that at this point we have covered up to and including the section "Comparing Sorting Algorithms" in the Local Notes and Resources. You can find the code for the "powers" example in the directory http://www.dcs.qmul.ac.uk/~mmh/ADS/code/powers/ or click here. Please remember the important thing for this course is the code in the method power and not the supporting code in the method main which is only given to provide a framework which lets you run the power method. You can find code for various variations on merge sort and selection sort in the directory http://www.dcs.qmul.ac.uk/~mmh/ADS/code/sort/, or click here. Again, please concentrate on the code for the sort method and not on the subsidiary code in the main and other methods which is just provided to enable you to run the sort method with some data.

Week 4: 31 January / 1 February

We started by looking at the problem of searching. This is simpler than the problem of sorting we considered last week. Searching just takes an item and a collection of items and tells us whether the item is in the collection. If the collection is indexable like an array (which is the only sort of collection you have seen so far), a variant on searching is to find the position of the item in the collection. If we are just looking for presence, we should have a method which takes a collection and an item and returns a boolean. If we are looking for position, we should have a method that takes a collection and an item and returns an integer. The method returning the position should return a special value which can't otherwise be returned to indicate if the item being searched for is not present, returning -1 would be a good example since array indexes start at 0 and go upwards. Note, it's very important to express our problem in terms of methods that take values and return values, NOT code fragments in the main method, or methods that print some message. This makes our code far more flexible since we can re-use our method whenever required just by calling it, and we have made a clear distinction between solving the problem and printing a message telling us the solution.

Searching an unordered array means looking at each item in turn until we either find the one we want or we have searched all the items and can conclude the one we want isn't there. This is referred to as linear search. It is O(N) since on average it will take from N/2 to N (depending on the probability of the item being in the array) checks to say whether the item is in the array.

If the items in the array are put in order, we can use the more efficient binary search. This means looking at the middle item. If it's the item we're looking for, we can halt. Otherwise if the item we're looking for is less than the middle item we know we can restrict search to the first half of the array, otherwise we know we can restrict it to the second half. We first considered a recursive method which actually copied half of the original array into a new half-sized array and called itself recursively with that array as an argument. However, this copying is inefficient. A more efficient method keeps the bounds of the part of the array being searched as arguments, and passes the original array into the recursive call along with more restricted bounds. Since the arguments giving these bounds are not part of the original problem definition, they should not appear as arguments to the public method that solves the problem. This is achieved by making the public method call a private method with the bounds as extra arguments. Note that binary search is O(log N) since it looks at an array item and then cuts the array into half, so the maximum number of itsm it will look at is the number of time an array can be divided into half. We noted that a slight improvement could be made to the merge sort method given previously by adopting the technique of passing the whole array and bounds arguments rather than a new half sized array.

We looked at the concept of environments. When a static method is executed, it is executed in its own environent, which is a collection of variables and their values. The environment consists of the variables which are the parameters to the method plus any local variables (plus also, though we did not consider them, class variables). When a method call is made, a new environment is created with the variables which are the parameters set by assigning the corresponding arguments in the method call to them. The old environment is returned to when execution of the method finishes. Executing a method in an environment means when a variable name is used, it is taken to mean the variable in that environment. Recursion creates a whole stack of environments as each recursive call creates its own environment. Note there is not necessarily a link between variables of the same name in different environments. Assigning a value to a variable in the environment of the current method call does not change the value of a variable of the same name in previous environments which are waiting to be returned to.

We noted that array assignment gave aliasing. So if a and b both refer to arrays, b=a causes b to refer to the array that a refers to. Then any change to b[i] for any i will also change the value of a[i] since they are two names for the same thing. This is why when arrays are passed as parameters to methods, any changes made to them remain in place after the method finishes (enabling destructive methods to work). Although the method is executed in its own environment with its own variable referring to the array, the assignment involved in parameter passing causes the array to be shared, not copied.

We looked at a phenomenon known as tail recursion. This is where a method returns the result of a recursive call as its own return value. The consequence is that there is no code left to execute after the recursive call, Since there is no code to execute, there is actually no point in saving the environment to be used in code after the recursive call. This means that tail recursion can be converted to iteration with a single environment replacing the multiple environments of recursion, and assigning a new value to a variable replacing the way recursion creates a new variable of the same name with a possibly different value.

We then moved onto a different topic, how we represent collections of data. Suppose we want to represent a set, which is a collection where no element occurs more than once and the order the elements are stored in doesn't matter. Suppose we want the set to change so we can add and delete items destructively. One technique we could use is the array and count technique. This means having an array of the maximum size we would ever need, and a variable holding a count, if the variable stores N then the first N cells of the array count as the data in the set. Adding a new item to the set involves putting it in the cell indexed by the count and increasing the count by one. Deleting an item from the set means decreasing the count by one then putting the item in the cell indexed by count into the cell where the deleted item was. We can do this because order of items in the array doesn't matter. For simplicity, we will agree attempting to delete an item that isn't there or adding an item which is already there has no effect.

We considered a "set maintenance program" which used this representation. We first wrote the program all as one method. This made it long and complex as it was difficult to tell which parts of the program were about manipulating the set, and which parts were about providing the ineraction with the program user.

So next we broke the program up and put all the code that was about manipulating the set into separate methods named according to their function which took the array and count representing the set as their arguments. This gave us a program which was easier to understand, with the main method restricted to providing the user interface.

Finally we considered putting the code for sets into a separate class. Objects of this class had an array and count within them. This provided better structuring. Now the array and count representing a set were clearly brought together in one object. The user interface code interacted with that object just by calling its methods. It did not even have to refer to the array and count. The effect was as if the set type was built into Java.

Code for the "all in one method" set maintenance program can be found in the file UseIntSets1.java in the directory http://www.dcs.qmul.ac.uk/~mmh/ADS/code/sets/. Code for the "all in one class" set maintenance program can be found in the file UseIntSets2.java in the same directory. The front end for the version we saw with one file for objects representing sets can be found in the file UseIntSets3.java in this directory, with the actual code for the set objects in the file IntSet1.java.

Note that the code for the user interface is not important, once again this is only given to provide a framework for you to run the more relevant code, which is that which directly implements sets. It is certainly not meant to be an example of a good user interface, rather it is designed to be as simple as possible so as not to detract attention from the main issue. The directory given above contains a large number of files all giving variations on the basic theme of implementing a set of integers. We will be going through various implementations in the next few weeks. A general guide to all the files in the sets directory can be found here.

Week 5: 7/8 February

We continued on the theme of using a class to represent sets. This led to the idea of an Abstract Data Type. An abstract data type (often abbreviated to ADT) is a type of object which is seen only in terms of the operations possible on it. It is not seen in terms of the component parts of objects of that type. This is precisely how the user interface file UseIntSets3.java sees the object of the type IntSet1.java which it manipulates. It cannot access the array and count fields within this object as they are declared as private. It can only access the object through its public methods, which are add, delete, member and print and its constructor. So far as the person writing the code for UseIntSets3 is concerned, IntSet1 might as well be a built-in part of Java with the operations add, delete, member and print and its constructor provided.

The important thing is that we have broken up our set maintenance program into two parts which have only a limited way of interacting. It helps us with the design and construction of programs if we can break them into logically separate parts, perhaps with different programmers working on the different parts. If there is only a limited way of interacting, through the public methods of a class which are defined to work in a particular way, there is less chance of unexpected errors in the whole program. Programmers writing the code for an ADT need only be concerned that they give objects of that type the correct behaviour, they need not otherwise be concerned with how the objects are used. Programmers using the code for an ADT can assume it behaves correctly, they need not be concerned with what goes on inside objects of that type.

Note that it is important to distinguish between an object and the type of that object, where in Java the type is given by its class. Do not talk or write of "an abstract data type" when you really mean "an object of that abstract data type" - they are two separate things. One way of thinking about this is that classes are "recipes" or "blueprints" for objects. You would be sure to distinguish, for example, between a cake and a recipe for a cake. You could not eat a recipe for a cake. You could change a cake, and you could change a recipe for a cake. But changing a recipe for a cake has a different effect than changing an individual cake!

Note also the distinction between an abstract data type, and the code which implements it. Different code may implement the same abstract data type if the way an object of that type behaves, in the view of the code which uses it, does not change. Perhaps you could think of this as altering the recipe in a way that doesn't change the appearance or taste of the cake created from it. With sets, we noted that we could change our representation from the original "array and count" form to an "array and count" form where the items in the array are stored in order. This has the advantage that the more efficient binary search can be used in implementing the member method, but the insert and delete methods are less efficient as they require more work in order to keep the array in order. There is no change to the way objects of the type behave in terms of the sets they are taken as representing, but a change in time performance of the operations could be observed (though only with sets of large size would there be a difference significant enough to be observed by an ordinary user).

In our examples, the class IntSet2.java represents a change to our original set ADT, IntSet1, because we replace the method print by the method toString, and these two methods have different effects. The method print actually causes a text representation of the objet to be printed, while the method toString only returns a text representation which could be printed or used otherwise as desired. This is more flexible, so in future we will always provide objects with a toString method returning a text representation, rather than a print method which causes actual output from the program but returns nothing to be manipulated in the program. The class IntSet3.java represents the same ADT as IntSet2, but uses an ordered array implementation.

We saw a very different implementation of the set ADT, using an array of booleans. The code for this is in IntSet4.java. Here a set of integers is represented by an array, where the cell indexed by i is set to true if i is in the set, and to false otherwise. This has the advantage that it is very efficient - the operations add, delete and member can all be done in just one step. The disadvantage is that we can only use this representation if we can be sure that items in the set will always be in the range of 0 to one less than the size of the array.

In IntSet5.java a representation of sets is given where a set of integers is represented by an array of the same size as the set. Then to add an item to or delete an item from a set, a new array is constructed and the array field within the object is made to refer to this new array.

We noted that object assignment in Java gives rise to aliasing. If set1 and set2 are two references referring to set objects, executing the statement set2=set1 causes set2 to refer to the object that set1 refers to, while set1 still refers to the same object. So if, after executing this statement, we call add or delete on set2 then the set that set1 refers to will change because it is the same object. To avoid such unexpected side effects (which could easily be missed in a large program which created aliases), one technique would be to add a method copy to the abstract data type. So, for example, set2=set1.copy() causes set2 to refer to a set which has the same data as set1 but is a different object. A copy method needs to be inside the class for a set so it can access the data inside set objects. A copy of the data (in our case the array) is created, and this is put into a new object. In order to create a new object with this copy inside it, we have to introduce a new constructor which takes the data stucture inside an object as its argument and sets the field of the object created to this data structure. We make this constructor private so that it can only be used by the copy method. This is done in IntSet6.java. Note the class IntSet6 shows a simple example of inheritance. Rather than write a completely new class, we make IntSet6 extend IntSet5. This means that IntSet6 objects have the attributes of IntSet5 objects without the code for them having to be written in the class IntSet6. Note, however, that in order for the field array declared in IntSet5 to be accessible in code written in IntSet6, it needs to be declared as protected rather than private.

We followed from this by considering a constructive implementation of sets. This generalises from the idea of constructive and destructive methods we saw in week 2. Up till now, the set classes we have seen have been destructive, because the methods add and delete when called on them actually cause them to change (and hence destroy the data they were holding and replace it by the updated data). In a constructive implementation, these methods would return new objects, leaving the original object unchanged. So, for example, set2=set1.add(5) would cause set2 to refer to an object representing the same set that set1 refers to but with the addition of 5; it would not however change the object that set1 refers to. The technique used for copying in IntSet6 can be used to make constructive versions of the add and delete methods, this is done in IntSet7.java.

If all the public methods of a class of objects work constructively, objects of that class are referred to as immutable, meaning they can't be changed because there is no way of changing them. This avoids the problem of unexpected side-effects caused by aliasing since there can't be any if there are no destructive operations. Objects which are not immutable are referred to as mutable.

We noted the use of this in methods, it means "the object this method is called on". So, for example, when set2=set1.add(5) is executed, this in the code for add is taken as referring to set1. If the object returned is the same as the object the method is called on, for example in set2=set1.add(5) when the set represented by set1 already contains 5, we can return this rather than create a new object if the object is immutable. Although this results in aliasing, that is not a problem if there are no destructive methods.

Finally we looked at axioms. These complete the concept of an Abstract Data Type. Axioms are rules which the methods of an ADT must obey if the code is to be a correct implementation. They can be expressed in terms of a relationship between the results of different call of constructive methods on objects of the ADT.

As an example, one axiom for sets is that

s.add(i).member(j) == s.member(j) if i!=j
This means that for any set s and integers i and j, the result of calling member(j) on the result of calling add(i) on s is the same as the result of calling member(j) on s, providing i and j are not equal. If i and j are equal, we can't say that s.add(i).member(j) == s.member(j) is always true, since i is always a member of s.add(i); regardless of whether it is a member of s. Otherwise, however, adding i to s does not cause a change to whether j is a member of s.

If we have sufficient well-chosen axioms, we can fully define an Abstract Data Type. Any code which causes the methods to behave as asked for in the axioms is a correct implementation of the Abstract Data Type defined by the axioms.

Notes covering this week's material can be found in the local notes and resources section on representing arrays using sets part 1, part 2 and part 3.

Week 6: 14/15 February

In order to demonstrate the general principles of abstract data types, we considered another one, the sequence, or "list with insertion point". Unlike sets, this ADT does have the concept of elements of the collection being put arranged in an order. Like sets, items may be added to the collection, but they are added in the position given by the insertion point. There is no operation to delete a particular item, rather there is a delete operation which takes no argument and deletes whatever item is currently at the insertion point. There are also operations to move the insertion point backwards and forwards. We considered a destructive version (i.e. the operations change the representation within an object), with the sequence being implemented by a data structure consisting of an array and count as we used with sets, plus an extra integer field giving the position in the array which is the current insertion point.

We returned to sets to consider a version of the abstract data type which contained union and intersection operators, making it closer to the concept of a set you may have come across in courses on discrete mathematics. Adding axioms for the union and intersection operations to the original axioms we had for the concept of "set" was easy. Implementing the operators as methods was a little more tricky.

The union of set s1 with set s2 consists of all the items in s1 plus all the items in s2 which are not members of s1. The intersection of set s1 with set s2 consists of all the items in s1 which are also members of s2. We need a way of going through all the items in a set in order to achieve this. We have not provided a public method which enables us to do this, but if the code for methods union and intersection is provided in the set class, when we call the method as s2.union(s1) we have access in the code to the array in the data structure for s2, so we can go through that. Note that we considered the operations only with a constructive implementation, so s2.union(s1) is a value not a statement, it only makes sense when something is done with the value (assigned to a variable or passed as an argument or used for another method call on it), and the value is a new object. Calling s2.union(s1) causes no change to s1 or s2 (the same applies to intersection).

Our initial implementation (code in IntSet8.java) turned out to be rather inefficient. Each time we called the method add to add a new item to the set being constructed, we created a whole new set which existed only until the next add was called. A more efficient implementation needed to go through the arrays of both sets s1 and s2 when s2.union(s1) or s2.intersection(s1) is called, creating a new array and only then creating a new set object with this array within it. This was easily done because a feature of Java and similar object oriented languages is that if an object of a particular class is passed as an argument to a method in that same class, the code for that method has access not only to the private fields of the object it is called on, but also to the private fields of the object passed as an argument. Suppose, for example, the method union is inside class Set, and Set has a private field called array, and the header to the method is:

public Set union(Set s)
Then when s2.union(s1) is called, the variable array refers to the array field of the object referred to by s2, but also the array field of the object referred to by s1 is referred to as s.array. It is probably a good idea to use this.array rather than just array in this case, to make it a little clearer which array is being referred to, it means the same thing as just array. An implementation done in this way is found in IntSet9.java.

To demonstrate the general principle, we also looked at a constructive implementation of sets of integers including union and intersection using the "array of booleans" technique we considered in week 5. Since the set operations are constructive, they need to work not by changing the array of booleans as we saw before, but by creating a new array of booleans representing the new set we wish to create, then incorporating this in a new object using a private constructor which takes an array of booleans as its argument. The code for this can be seen in IntSet10.java.

The code for IntSet8, IntSet9 and IntSet10 contains an additional method, fromString, which converts a string representation of a set to an actual set object. You need not be concerned with how this method works, it is not an essential part of the set class, but is there because it helps us run examples of the use of these sets.

Apart from looking at sets and sequences, we went through the exercises set in week 2, which asked you to write various array manipulation methods. In the case of the shuffle operation, it was noted that the trick to a successful implementation was to use a loop in which the array index was increased by 2 rather than the usual 1 each time round. In the case of the filter method, we had the problem that the size of the final array was not known at the start. We noted that a solution in which the array returned was of the same size as the initial array but with 0 in the place of the filtered-out numbers was not what was asked for. One solution was to go through the initial array counting how many items were to be filtered out, then create an array of the correct size, then go through again this time checking and copying the non-filtered items into the new array. Another solution, which had the advantage of checking each number only once to see if it was to be filtered out, was to create a tempoary array of the size of the inital array to hold the items which are not filtered out, and then to copy them into an array of the appropriate size and return that array. A solution using recursion, in which the final array was created in the base case and items put in the array on return from recursion, was also discussed.

We looked at quick sort, a simple version of which could be created using modifed versions of the code for filter and append. The idea is to divide an array into two parts, one consisting of all the elements from the array less than its first element, the other consisting of all the elements of the array greater than or equal to the first element (but not including the first element itself). Then sort each of these parts using recursion, and join the sorted two parts together with the original first element in the middle. The base case is when an array is 0 or 1 in length, it is then returned unchanged. Note that the more efficient version of quick sort (code here) which you saw in week 11 of your procedural programming course works by moving elements around in the initial array rather than creating two new arrays, so it works destructively, the code is perhaps less clear than ours in order to gain this efficiency.

Quick sort is similar to the merge sort we saw in week 3, in both cases an array is divided into two, the two parts sorted recursively and the sorted parts joined together. Using a similar argument to that used for merge sort, quick sort can be categorised as O(N log N). Although we did not discuss it, unlike merge sort, quick sort has a bad worst case since the two parts are not necessarily the same size, the worst case being when one part is always of length 0, which would happen if quick sort were applied to an already sorted array. A point we noted between the two is that quick sort works by having the comparisons done, which are the main work in sorting, before the recursion, while the joining of the two sorted parts is simple. In merge sort the separation of the two parts is simple, but the main work involving the comparisons is done after the recursion when the two parts are merged together.

Week 7: 21/22 February

The Monday lecture period was given over to the first term-time test for the course, while the Tuesday morning lecture was given over to discussing solutions to the lab exercises for Week 3 which are available here.

In the Tuesday afternoon lecture we started the important topic of linked lists. A linked list involves the simplest sort of recursive data structure, one built up out of "cell" objects which have two fields. One field holds data, the other is of the same type as the cell. So a cell refers to another cell, that cell refers to a further cell, and so on, creating a list. If we consider the data in a cell to be the first item in a list, and the data in the cell that cell refers to in its cell link to be the next item in the list, and so on, the structure given by a cell object and the further cell objects it links to can be considered in the abstract as just a list of data items.

The list finishes in a cell whose cell field is set to the value null. This is built into the Java language, and any variable of any object type can be set to the value null, indicating that it currently refers to no actual object.

The two fields in our cell objects were not declared as private, and the class had no methods except for a constructor to create a new cell object with a given piece of data to store and a given next cell (or null) to link to. So the data in a list may be changed by any code which has access to the list. Also links may be changed and made to refer to other cells. The aliasing effect of object assignment means a cell in a linked list may be referred to in several different ways, perhaps directly through a variable which refers to it (or points to it, as we tend to say in this context), and indirectly through a variable which refers to another cell whose field links to the cell in question.

This is shown diagrammatically by cells and pointers diagrams. In a cells and pointers diagram, each cell is shown by a rectangle divided into two, one part containing the data item, the other part containing an arrow which leads out and points to the rectangle representing the cell it links to. Link fields which are set to null are represented either by an arrow which points to an "electrical earth" symbol, or by a line drawn diagonally across the part of the rectangle representing the link. Variables which refer to cells in the situation are shown by a small box labelled with the variable's name containing an arrow which leads out and points to the rectangle representing the cell the variable points to.

The aliasing effect means unresticted use of linked lists can lead to confusing code and unexpected side effects as a cell referred to through one variable is changed and this changes the linked list of which it is a part referred to by another variable. Therefore linked lists are best used as a data structure referred to inside another class which implements an abstract data type. This means the linked lists can only be changed by the methods of that class so, as long as they are carefully designed to work properly together, no unexpected side-effects can occur. If linked lists can only be referred to within another class, we can show that by declaring the class for cells inside that class. This is referred to as a nested class. Like methods and fields, nested classes may or may not be declared as static. A static nested class, like a static method, is self-contained, meaning it is not attached to any particular object of the class it is in.

In IntSet11 we saw an implementation of the abstract data type "set" we have considered in previous weeks, but here using a list data structure rather than an array or array and count as used previously. We noted that a common operation on linked lists involves setting a pointer variable to refer to the first cell in the list, then in a loop update changing it to refer to the following cell. This may be used, for example, to check for the presence of a particular data item in a linked list, with the loop exited either if the pointer becomes equal to null or the pointer points to a cell which holds that item as its data. If the loop exits with the pointer equal to null, we know we have checked the entire list and the item has been found not to be present. An item may be deleted from a linked list by stopping the pointer moving down the list at the stage where it points to the cell before the one holding the item. Then its linking field is changed to set it equal to the linking field of the cell holding that item, having the effect of cutting that cell out of the linked list.

Our implementation of the abstract data type "set" using linked lists had set objects with a single private field of the type of cell objects. This field could be accessed directly by the methods to implement the set operations, with these methods being called on individual set objects and hence accessing and manipulating the linked list within the set object the method is called on.

Week 8: 28 February / 1 March

We continued to look at linked list implementations of the ADT "set" and at operations on linked lists to provide the set operations. As with arrays, the linked list data structure used to represent sets may be ordered or unordered. With arrays, ordering had a major benefit because we could use binary search to implement the set membership test. However binary search relies on the fact that any cell in an array can be accessed in one step given its index. Cells in a linked list can only be accessed by starting at the first cell and following the links. So testing for membership when a set is implemented by an ordered link list is still O(N), since on average we have to go through N/2 cells for the test when N is the size of the set. We can halt at the point where the data in the cell is greater than the data being searched for, rather than going right through the list if it is not present, but this is not a major efficiency improvement. When a new item is added to an ordered linked list, it has to be added in the right place. This is achieved by setting a pointer to the cell before the place where the new item is to be inserted, and setting the link field of the cell that pointer points to to a new cell whose data field holds the item being inserted and whose link field holds the old value of the link field. Note that to insert an item into a linked data structure it is necessary to do it by altering a pointer in a cell which is already linked into the structure. Inserting an item at the front of the linked list needs to be treated as a special case however.

As linked lists are a recursive data structure, it can be convenient to write code operating on them using recursive methods. The recursive methods we saw needed to take a linked list as an argument. We had a public non-static method, which being non-static had access to the linked list field of the set object it was called on, passing that reference to a private static method which was the actual recursive method where the list being manipulated was given as an argument.

As we saw with sets implemented using arrays, sets implemented using linked lists had just the same problem with aliasing when the operations were destructive. If s1 and s2 are variables of a set type, then executing s2=s1 causes s2 to be an alias of s1, and any adding or deleting of an element from the set represented by s1 will also happen to the set represented by s2 and vice-versa, as they are represented by the same object. Again, we need a separate copy operation so we can execute s2=s1.copy() which causes s2 to represent a set which initially stores the same elements as the set represented by s1 but is independent of it so destructive changes to one won't affect the other.

We saw three different ways of copying linked lists. One, the pointer at back technique, created an initial cell which copied the data from the initial cell of the original linked list, and then had a pointer to that cell and a pointer to the original linked list, with new cells being added via the pointer to the back of the new linked lists, and both pointers then being moved down (the new list pointer then being on the new last cell). The next technique used recursion. If a linked list is null then its copy is null. Otherwise its copy is a new cell whose data item is the data item from the first cell of the linked list being copied, and which links to a linked list which is a copy of the linked list the first cell of the original linked list refers to. This simple recursive definition translates directly to a recursive method. The third technique was stack and reverse. If we start with a new linked list which is initially null and run a pointer down the original linked list adding cells to the front of new linked list which copy the data from the old linked list, when the pointer reaches the end of the original linked list, the new linked list will hold a copy - but with the data in reverse order. If the order matters (which it doesn't for sets) we can use the same process to create a third linked list which reverses this reversed list to give the exact copy. Variations on these three techniques can be used in general in any case where we want to construct a new linked list which represents some change to an original linked list. We also saw that if we used recursion, the new linked list could share cells from the original linked list when there was no change to the remainder of the list.

Code to extend the implementation of sets using linked list in IntSet11 to add a copy method is found in IntSet11a, IntSet11b and IntSet11c, using the three techniques given above. Code to implement sets using linked lists destructively, with the operations implemented using recursion can be found in IntSet14a. Code to implement sets using linked lists with constuctive operations can be found in IntSet17 (here recursion is used for the delete method) and IntSet17a (with iteration used for the delete method).

Notes on using linked lists to implement sets can be found in the local notes and resources under the heading Sets using linked lists , with further notes in the following sections on recursion with linked structures and constructive recursion with linked structures.

We started looking at the linked tree data structure. Like linked lists, this consists of cells which link through having references to further cells. With linked lists, each cell has a reference to one further cell, with trees each cell has a reference to two further cells. Strictly, this is a "binary tree", since one could have trees with cells which have references to more than two further cells. We refer to the initial cell of a tree as its "root", and the two cells to which it links plus the further cells to which they link as its "branches", with the two branches of a binary tree being termed "left" and "right".

A tree can be yet another data structure to represent sets if it has the property of being an ordered binary tree. A binary tree is ordered if every item in its left branch is less than the item at its root, every item in its right branch is greater than the item at its root, and the two branches are themselves ordered. The empty tree, represented just by null, is also ordered. To test whether an item is a member of a set represented by an ordered binary tree, if the tree is empty, it is not. Otherwise, we first test whether the root item is the item being searched for, if so the item is a member. Otherwise, if the item is less than the root we can restrict search to the left branch since we know if it is present it must be in that branch, if the item is greater than the root we can restrict search to the right branch. This is similar to binary search, where each step in the algorithm cuts the problem size into half. Note, however, that there is no guarantee that a binary tree will have equal numbers of elements in its left and right branch. There are techniques to ensure trees are balanced in this way, but we will not have time in this course to cover them.

Week 9: 7/8 March

We continued looking at ordered binary trees used to implement sets. We looked at how linked binary trees could be altered destructively to represent adding and deleting items from a set. The technique used is to set a pointer to the root of the tree then move it down the left or right branch depending on whether the item to be added or deleted is less than or greater than the root. To add an item we do that until the pointer either points to a cell which already contains the item, in which case we do nothing (to keep the set property that adding an item which is already there causes no change), or it points to a cell where the next position it would move to, to the left or to the right as appropriate, has the value null. The field in the cell which contains null is then set to refer to a new cell which contains the item as its root and a null left and right branch. A similar process takes place to delete an item which is stored in a leaf cell (i.e. one with null for both branches): the pointer is moved to the left or right as appropriate until it points to the cell whose left or right branch is the leaf to be removed, and that pointer to the left or right branch is set to null. If the item to be deleted is in a cell one of whose branches is null then the pointer to that cell from the cell above it can be set to the other branch to remove the cell. The important thing in all these cases is that to change a linked structure the change must be done through a pointer to a cell which is linked into the structure by changing one of that cell's fields.

The difficult case with deleting an item from an ordered binary tree is when the item is in a cell neither of whose branches is null. The cell cannot just be removed to leave a "hole". What can be done is that the item to be removed is replaced by the largest item in its left branch and the cell holding that item is removed. This keeps the property that everything in the left branch is less than the root item and everything in the right branch is greater than the root item. The largest item in the left branch must be the one obtained by going to the left branch and moving down the right branches until the next right branch is null, the cell then reached is either a leaf or has a null right branch, so can be removed as described previously.

We looked at tree traversal which we saw when we considered how we could get a textual representation of the set represented by a tree. Tree traversal is any case where we need to go through every item in a tree. It involves a recursive algorithm with three parts: processing the root, going through the left branch recursively, and going through the right branch recursively. If the left branch is processed, then the root, then the right branch, this is referred to as in-order traversal and has the result that the items are processed in the ordering (e.g. numerical ascending with our tree of integers) used to order the binary tree. Processing the root, then the left branch then the right branch is referred to as pre-order traversal, while processing the left branch, the right branch and finally the root is known as post-order traversal.

An implementation of sets using an ordered binary tree with the set operations implemented destructively can be found in IntSet16. Note this implementation also includes, for demonstration purposes, a method that will print the tree structure of the underlying representation. The printing is done with the root on the left and the branches growing towards the right. The set maintenance program front-end in UseIntSets15a causes its set to be represented by IntSet16, and includes an extra command t which prints the tree representation of the set.

We looked at algorithms for adding and deleting items from an ordered binary tree constructively. If the required change is to the left branch, a new tree reflecting the change can be obtained by making a new tree tree reflecting the change in the left branch, and then returning a new tree cell whose left branch is this new tree, whose root is the same root as the original tree, and whose right branch is the same right branch as the original tree. Similar applies if the change is to the right branch. If the root only is to be changed, the new tree returned consists of a new cell whose root item is the new root, but whose left and right branches are the left and right branches of the original tree. The result is that the new tree shares some cells with the old tree but only where the shared cells are not in the part of the tree that has been changed. Since all operations are constructive there is no danger that the shared cells could lead to unexpected side effects. The class IntSet18 shows sets implemented by ordered binary trees using constructive version of the set operations.

Notes on using linked lists to implement sets can be found in the local notes and resources under the heading Sets using trees , with further notes in the following sections on recursion with linked structures and constructive recursion with linked structures.

We next looked at the abstract data type List. This is a simple example of an ordered collection, with the only operations on it being head which returns the first item of the collection, tail which returns a new collection consisting of every item except the first item (in the same order), cons which takes an item as an argument and returns a new list whose first item is the argument to the call to cons and whose other items are the items of the list the call was made on (in the same order), isEmpty which tests whether the list it is called on is the empty list, and empty, a static method returning a new List object representing the empty list.

We noted the difference between this abstract data type List and the linked list data structure we have seen previously. Whereas a linked list may be manipulated directly through the fields of its cells, a List object can only be manipulated through its operations which are provided as Java methods. It is possible to change a linked list destructively by assigning values to the fields of its cells, but there is no way of changing a List object destructively, the abstract data type forces constructive programming. The abstract data type List is most obviously implemented by a linked list, so a List object contains a field referring to a linked list inside it. However, to show that it is a distinct thing from a linked list, we saw an alternative implementation using an array and count. An abstract data type is defined only by the way its public methods work, so what is inside it to make them work is not part of the definition.

Code for linked lists can be found in the lists subdirectory of the code directory. The class List gives an implementation of the abstract data type List using linked lists, while the class ArrayList gives an implementation using an array and count. A simple front-end which reads two lists and appends them to form a new list can be found in this directory in class UseLists1.

Notes on the abstract data type List as we have defined it in this course can be found in the local notes and resources under the heading Lists.

Week 10: 14/15 March

In the Monday lecture we looked at the first test going through the questions, covering both the answers to them and common reasons why people did badly on them.

In the Tuesday lectures we covered two abstract data types stacks and queues.

A stack is a destructive version of the List abstract data type we saw last week. It is a collection of data ordered by the time the items were added to the collection. The operations are to return the first item of the collection, take off the first item, and add a new first item (plus an emptiness test and a constructor to produce a new empty collection). Returning the first item is exactly equivalent to returning the head of a List object. Taking off the first item is like the tail operation, except instead of returning a new object and leaving the object it is called on unchanged, it actually changes the object it is called on to make it represent a collection with the same items as before but missing its first item. The operation of adding a new item is like cons with List objects, except that it changes the object it is called on to make it represent what is represented by the new List object returned by the cons operation. These operations on stacks are given the names top, pop and push. To get a feel for the abstract data type, we can think of the items in it piled vertically into a stack, like a stack of books or some other flat object (hence the name). Then we can only look at the top item of the stack, pop off the top item, or push a new item onto the top.

In order to generalise our concept of abstract data types, we considered stacks whose components were of type Object whereas up to now we have considered various collections of integers. The type Object is Java's "most general type". A variable of type Object can hold a reference to an object of any type. Note that numerical data and characters in Java are the only things which are not objects, so a variable of type Object cannot be assigned a value of type int, for example. Java gets round this by having some special built-in classes called wrapper classes, which are the object equivalent of non-object data. For example, the class Integer is the wrapper class for int, it has a constructor for making a new Integer object to represent a particular int value, and a method that may be called to return the int value a particular Integer object represents.

A stack may be implemented by an array and count data structure, or by a linked list data structure.

Stacks are common in Computer Science. They may be used to represent a list of tasks which have to be done in order. If the current task being done breaks into a list of subtasks, these may be pushed onto the stack. This then ensures they are done in order, with the following task left waiting until they are all done. The same applies if the subtask itself breaks down into further subtasks. We saw how this could be used to implement problem solving that would otherwise require recursion. We looked at the example of traversing a tree in-order. This breaks down into the task of traversing the left branch, processing the root, and traversing the right branch. This could be achieved by having a stack which can contain both trees and integers. The whole tree is initially pushed onto the stack. Then a loop is executed until the stack becomes empty. The loop consists of taking the first item from the stack. If it is an integer, it is processed. If it is a tree, its right branch and its root integer and its left branch are pushed onto the stack (leaving the left branch at the top of the stack). A branch is not put on the stack if it is empty.

Stacks are sometimes referred to as LIFO collections, standing for "last in, first out" meaning that the delete operation removes whatever item which is still in the collection was the last to be added. Queues are sometimes referred to as FIFO collections, meaning that the delete operations removes whatever item which is still in the collection was first to be added. This leads to the behaviour we are familiar with in queues to be served where people join the back of the queue when they arrive, and the next person to be served when a server is free is the person at the front of the queue.

A queue can be implemented by a linked list, with the items being stored in linked list cells in the same order we think of the queue as having. Then deleting the first item from the queue is easy (set the linked list field of the queue object to the next field of the first linked list cell), but adding a new item to the back requires traversing the whole list to reach the back cell, making the operation O(N). The efficiency can easily be improved by having the queue object have two linked list fields, one to refer to the first cell in the linked list, the other to refer to the last cell. Then adding a new item to the queue can be done by changing the last cell's next link to refer to a new cell which holds the item and whose next link is null. This can be done in one step through the pointer to the last cell which is then changed to refer to the new last cell.

An alternative implementation of queues uses an array and two array indexes, one to index the cell holding the first item in the queue, the other to index the cell after the one holding the last item. If the two indexes are the same, it represents the empty queue. To represent the change in the queue to cause the front item to leave, increase the front index by one, the data in the cell it used to refer to is then no longer considered part of the queue. To add a new item to the back, put it in the cell indeed by the back index, and increase the back index by one. With stacks implemented by arrays, there is a single index which goes up and down as items are added and deleted. Here the two indexes can only go up, so will eventually reach the end of the array. In order to prevent this causing a problem, if an index reaches the end of the array, next time it is supposed to be increased it is set to 0 instead. Then if the front index is greater than the back index, the data treated as being in the queue is that starting at the cell indexed by the front index, going up to the end of the array, and then continuing at the start of the array up to the cell before the one indexed by the back index. Now, as with our array implementation of stacks, we only hit a problem is the number of items in the collection exceeds the size of the array.

We can use a queue in a similar way that we used a stack above to traverse trees. The algorithm is to start with a queue which consists of the tree being traversed. Then we iteratively remove the first item from the queue of trees, process its root, and add its left and right branches to the queue (as it is a queue they can only be added to the back). The result is that we have breadth-first traversal. This means the root item is processed, then the roots of its two branches, then all the items at depth 2 in the tree, then all the items at depth 3 and so on. We could not achieve this using simple recursion, as it involves jumping between branches in the tree rather than processing one as a whole before moving on to the other.

We looked at two ways of using linked structures to implement the abstract data type sequence which we considered in week 6. One was to use doubly-linked lists. These are lists formed of cells each of which has two references to further cells, with the cells linked so each has a link leading both forward and backward in the list. The insertion point is indicated by a pointer to one of the cells, and adding and deleting items from the sequence involves linking or deleting cells from the doubly-linked structure. The other data structure which was shown to implement sequences was two ordinary linked lists. One represents the items that would be encountered in moving backwards in the sequence from the insertion point, the other the items that would be encountered in moving forwards. Then adding or deleting an item involves adding or deleting the first item from one of these lists, and moving backwards or forwards is reprsented by taking an item from the front of one and putting it on the front of the other.

The material covered this week is given in detail in the two sets of notes Stacks and the type Object and Using stacks and queues. The code can be found in the lists subdirectory of the code directory. Here Stack and Queue are the implementations using linked lists, while ArrayStack and ArrayQueue are the implementations using arrays. UseIntSets20 is the set maintenance program front end which uses a stack to implement an undo and redo command. The class IntSets21 is an implementation of sets using ordered binary trees which includes an extra method which displays the items in the set in the order in which they are stored breadth-first in the tree.

Notes on linked data structures used to implement sequences can be found in the Editor example notes.

Week 11: 21/22 March

We covered the Table abstract data type, which is also known as a Map. The Java Collections Framework, a library of classes for abstract data types which is provided with the Java language uses the name Map for its class which provides this sort of data collection. Note that in this course we look at simplified versions of classes that are already provided in Java as part of its extensive library of built-in classes, and see how they might be implemented using the basic Java concepts. As a programmer and computer scientist, it is important you are familiar with this, but in practical circumstances where you are writing Java programs it is better to use Java classes that are provided with the language, so long as they do what you require, rather than write your own.

The idea of a table or map is that it is a collection of data where individual data items can be accessed by a key. A key is a field of a data item, typically but not necessarily a string, which is known to be unique for each individual item of data. For example, we know that every motor vehicle registered in the UK has a unique registration number, so if we had a table of data representing records of cars, we could use the car registration number as the key. National Insurance numbers in the UK or Social Security Numbers in the USA are frequently used as keys for personal records as there is meant to be a separate one for each individual (see here for some discussion on this issue).

The name "table" comes from the idea that our data could be stored in a table with a separate column for each field of the data, with the items stored in key order, so given a key you could do search to find the rest of the data associated with that key. The name "map" comes from the idea that the data structure represents a mathematical mapping function from the domain of keys to the domain of data items.

In the previous week, in order to make our collections as general as possible we made them collections of references of Java's most general type, Object. This week we considered an abstract class which defined only those aspects of data items with keys we need to build tables, we gave this class the name KeyObject. We can define our tables as collections of objects of this class, and then use classes which extend it when we make actual use of the table class. All that is required are methods which enable us to compare objects using their keys (in order to test for equality, or to place objects in an ordering), and a method which enables us to get the key of a particular object. The comparison methods are given completely, the method to get a key is specified only in terms of its signature - this method has to be filled in as appropriate by the classes which extend the abstract class.

We defined a class with the name Table as a Java interface. An interface is an abstract class in which no code is given for any method, they are all only given as signatures and it is necessary for classes which implement the interface to give the actual code. This fits in closely with the idea of abstract data types, as developed in this course, defined only by the operations on them.

A table object is very similar to the set objects we have considered extensively previously. It contains a collection of data which has no concept of the data being stored in any particular position and no concept of a data item occurring multiple number of times. There are operations to add and delete data items. With a table, however, the operations to delete a data item takes only the item's key as its argument. It changes the table to delete the item with the given key. With sets we defined the delete operations so that an attempt to delete an item which wasn't there had no effect. With tables we give the delete operation a boolean return value, which is set to false if an attempt is made to delete an item which isn't in the collection, and to true otherwise. The member operation of sets is replaced by a retrieve operation, which takes a key as its argument and returns the data item in the collection which has that key. If no data item in the collection has the given key, it returns the value null, which can be done since any variable of an object type can be set to null.

In order to display the content of a table we gave a method print which took a PrintStream as its argument, this can be connected to a file or to the screen output. PrintStream is defined in the java.io package, so an import statement is needed in the interface file to allow the reference to it.

As tables resemble sets they can be implemented using the data structures and algorithms we used to implement sets. For example, Table1 shows an implementation using the array-and-count technique. Since the collection contains KeyObjects rather than the int values we used previously, we use the equalKey method from the KeyObject class where previously we used ==. In those implementations, such as an ordered binary tree, which depend on an ordering of the data items, we use the lessThan method from KeyObject where we used < with int values. An implementation of the interface Table which uses an unordered linked list can be found in Table3 , an implementation which uses an ordered binary tree can be found in Table4 .

In order to demonstrate practical use of tables, we gave a class Student which extends KeyObject and a front-end in StudentDatabase1 which provides a simple student database maintenance program. For simplicity we used a student's surname as the key, although this is not really a good idea as several people nay have the same surname. The constructor for objects of type Student took a BufferedReader as its argument, enabling Student objects to be constructed directly from file or screen input. The date of birth and marks for a student were defined by nested classes, with a public method addMark in class Student which changes the student record to add a new mark. The class Mark inside class Student gives its own addMark method used to change the Mark object which forms part of a Student object.

We saw implementations of class Table using arrays, linked lists and trees, similar to the previous implementations of sets. However, the most efficient implementation of a set of integers was to use an array of booleans, with a[i] set to true if i is in the set and to false otherwise. The advantage of this implementation was that it was very fast in terms of time, since given the value i we could access a[i], and hence find whether i was in the set it represented, in one step. This raises the question of whether there's an equivalent implementation for tables.

One possibility would be to have an array so large that there's a separate cell for every possible key. We can convert keys which contain a mixture of numeric and alphabetic characters to integers by, for example, treating them as numbers written in base 36, with the ith letter of the alphabet representing the base 36 digit i+10. However, it can easily be seen that for many practical examples there would be far too many possible numbers to be accommodated. Instead, given an array of length n, we use an array of length n, convert the key to an integer, and have a function which takes this integer and returns an integer in the range 0 to n-1. This function is known as a hash function. Then an item with a particular key is stored in the cell of the array indexed by the hash function applied to the integer equivalent of the key. A table implemented in this way is known as a hash table.

The problem with this is that we cannot guarantee that two different keys won't give the same index when the hash function is applied to their integer equivalent. When this happens it is known as a collision. A good hash function will ensure that there is as far as possible an equal chance for any array index that a key will hash into it, so the data is distributed evenly across the array and the chances of a collision are minimised. However, it is still necessary to deal with collisions when they do occur.

One technique used is referred to as linear probing. Here if the cell an item should be stored in is occupied, it is stored in the next free cell. When searching for an item, the cell it should be in is looked at, but if it is occupied by an item with another key, the next cell is looked at and the same principle applies. Note that if an item is deleted, a special marker needs to be put in its place in case a items that should have been stored in its place were displaced to later cells, so that when a search is made it doesn't stop at the cell where the deleted item was.

Another technique is known as separate chaining. Here each cell in the hash table array refers to a linked list of data items. So long as collisions are rare, the linked lists will not be long, so there will still be few steps required to find an item in the table given its key.

An implementation of the interface Table which uses a hash table with linear probing can be found in Table6 , an implementation which uses a hash table with linear chaining can be found in Table7 .

Further notes on the topics covered this week can be found in the local notes and resources under the headings Tables, student database part 1 and student database part 2.

Week 12: 4/5 April

The time for the Monday lecture was given over to the second term-time test.

The first Tuesday lecture was on the final topic for this course, which is enumerations. In our definition of the abstract data type Table last week, we suggested a method which prints the contents of the table in textual form, but we said that it was too specific. What we want is a way of going through the contents of a table in a general way so that we can do with them whatever we like. It might be writing a textual representation to a file, but it might be anything else. For example, in the lab exercise we had this week, we wanted to go through the contents of a table used to store a database of student records in order to find the records of all students whose mark fell below some pass mark. There is no way of doing this in the definition of Table we gave, apart from printing the contents of the table to a file and then re-reading the records, which is inconvenient and inefficient.

An enumeration is an object which is obtained from a collection object and enables us to go through the contents of the collection object one at a time. Java has an interface which defines such an object provided as part of the java.util package which comes with the Java language. It is defined through just two methods: nextElement and hasMoreElements. Each time the method nextElement is called on an object of type Enumeration, a new element from the collection it was created from is returned, until all the elements from that colletion have been returned (attempting to call nextElement after that will result in an exception being thrown). The method hasMoreElements returns a boolean value, true if there are some remaining elements that have not yet been returned by a call to nextElement, and false otherwise.

Note this is our first (and only in this course) example of the common technique in Java of implementing a built-in interface. The advantage is that this provides us with a standard way of referring to a particular type of object. We can write our own classes which implement the Enumeration interface, but we can write code which uses objects of these classes as if they were of type Enumeration. Note also that in recent versions of Java, the type Enumeration has been replaced by a similar type called Iterator. This means that Java programmers are encouraged to use Iterator rather than Enumeration. In general, however, once a feature has been added to a programming language it can't be taken away in later versions as that would cause programs written assuming the old version to stop working. So the definition of Enumeration is still in Java, and we use it for the purpose of illustration as it is slightly simpler than the definition of Iterator.

We gave a Java interface for tables, ETable, which included a method getEnumeration which returned an enumeration of a table. An enumeration for a particular implementation of a table is given by an inner class declared inside the class of the table. As the inner class is not static, an object of this class has access to the data fields of the class the object is created in.

We looked at a simple example, in ETable1, which shows a table implemented with an array and count which provides an enumeration. The enumeration object created by calling getEnumeration on a particular ETable1 object has access to the array and count of that ETabl1 object, and has its own index variable which gives the position in the array of the next object from the table that will be returned by a nextElement call. When that call is made, the object is returned and the index increased by one. The hasMoreElements test involves comparing the index variable against the count variable, when it reaches the value of the count variable it has returned every item of data stored in the table.

Enumeration objects for other implementations of tables may be more complex. In ETable3 we had an implementation of a table using an ordered binary tree. The enumeration needs to go through the items of the ordered binary tree one at a time, each call of nextElement returning the next one. To accomplish this, the technique of using a stack to traverse a tree, as covered in week 10 is used. In order to build the stack, the inner class for the enumeration needs itself to have an inner class for the cells of the linked list of the stack. This shows that it is possible to have layers of nesting of classes in Java - here a class inside a class inside a class.

In ETable4 we have a table which can provide an enumeration, with the table implemented as a hash table with linear probing, while in ETable5 we have similar, but with the hash table using separate chaining. The separate chaining example shows a class with two nested subclasses, a static one to form the cells that are chained together, a non-static one to construct the enumeration. We noted that as the whole point of hash tables is that the data is scattered in the array seemingly at random, an enumeraion which goes through the data in the hash table in the order of the array will seem to go through the items in no particular order. We noted therefore that if we have a case where we want a table but often have to go through the items in it in the order of their key, then it would be better to implement the table using an ordered binary tree as this enables us easily to produce an enumeration which goes through the items in order.

In the final lecture, we went through the questions in the second test. We covered again the crucial idea of an abstract data type, since it was obvious from looking through the test papers that many students are failing because they have not grasped this fundamental concept.

The idea of an abstract data type is that it is defined only by the operations possible on it, which in Java is done by a class where the only operations possible on it are to call its public methods. Objects of the class will have a private data structure in them, and there is a correspondence between particular values of this data structure and the abstract concept they are used to represent. The methods of the class manipulate the private data structure to change it or construct new objects representing a change to the abstract concept given by the method. But only the methods in the class access and manipulate its private data structure. Question 2 of the test was a good example of this - it was about the abstract data type List, and not about linked lists. An object of the type List may well have a linked list data structure inside it, but List objects can only be manipulated by the public methods of the class, which the question gave as head, tail, cons, isEmpty and empty. If you are asked to write code to do things with objects of type List, you should not make reference to the type Cell or to first and next fields, as these are aspects of a possible implementation of the type rather than the public operations it provides. This was covered in a week 9 lectures and the week 10 lab exercise.

Question 5 of the test was about the abstract data type sequence, which is a list with an insertion point. This means that a class has to be provided which gives the sequence operations, which are add, delete, backwards and forwards. Inside the class there is a data structure, and there is a correspondence between configurations of the data structure and the abstract idea of a particular sequence value. Many answers to this question showed little understanding of this concept, for example, many people suggested "tree" as a possible implementation of sequence, which it is not as there is no correspondence betwen a particular value of a tree and a particular value of a sequence.


Matthew Huntbach
Last modified: 16 February 2011