The List Abstract Data Structure

Linked cells data structures

In the student database part 2 example we saw objects which stored a list of students. The objects could be accessed and modified only by the public methods in their class: insert, retrieve, delete and list. Outside the code which implemented the database, it did not matter whether the database was implemented by a linked list or by an array, it was not even possible to tell.

An Abstract Data Type is a type which is defined in terms of the operations possible on it, and not in terms of how it is represented in the computer. Providing we make the internal representation of an object private, it will always be an abstract data type - we can change the internal representation without having to change anything that uses it. When programming in any language, keeping to the principal of abstract data types is recommended, but it can be up to the programmer to ensure it. In an object-oriented language like Java, with its ability to define object components as private, the idea of abstract data types comes so naturally that it is hardly necessary to have to think about it.

In this section of the notes, we shall use the term abstract data structure in the place of abstract data type. The reason for this is that we are interested in the actual structure used to store data. The most simple abstract data structure is the List. A list is defined by the methods possible on it, which are head, tail, cons and isempty. A list is an ordered linear structure. Its head is the first element in line, its tail is the list consisting of all elements except the first. The cons of an element to a list is the list whose head is that element, and whose tail is the original list. A list may be empty, in which case it is an error to try to find its head or tail, but an element may be consed to it to get a new list. The method isempty tests whether a list is empty like this or not.

So, for example, we may write a list [a,b,c,d,e]. As previously, this is just an annotation we choose to use, it is not supported by the Java system. If the list L is [a,b,c,d,e], then the head of L is a, and the tail of L is [b,c,d,e]. If we cons x to L we get a new list [x,a,b,c,d,e]. The list L is obviously not empty.

As we did with the student database, we shall give the methods which define a list by a Java interface. Here it is:

interface List
{
 public boolean isempty();
 public Object head();
 public List tail();
 public List cons(Object obj);
}
Like the type Database in the student database example, here the list stores objects of type Object. As we noted back in the dates example: part 3, the type Object is a general type which is a supertype of all objects. So this declares a list as capable of storing objects of any type.

The code for this and the other classes given in this web page can be found in directory:

/import/teaching/BSc/1st/ItP/lists

It is important to distinguish between an abstract list, which is simply an object on which you can do the head, tail, cons and isempty operations on, and a linked list which is one way of implementing such an object. Here is a class LinkedList which implements the interface List using the linked list method we saw in the student database part 2 example:

class LinkedList implements List
{
 private Cell myList;

 LinkedList()
 {
  myList=null;
 }

 private LinkedList(Cell list)
 {
  myList=list;
 }

 public boolean isempty()
 {
  return myList==null;
 }

 public Object head()
 {
  return myList.first;
 }

 public List tail()
 {
  return new LinkedList(myList.rest);
 }

 public List cons(Object obj)
 {
  return new LinkedList(new Cell(obj,myList));
 }

 public static List empty()
 {
  return new LinkedList(null);
 }

 private static class Cell
 {
  Object first;
  Cell rest;

  Cell(Object h,Cell t)
  {
   first=h;
   rest=t;
  }
 }

}

As you can see, the class LinkedList has its own inner class called Cell which actually gives the linked structure. The class LinkedList has methods defined which implement each of the List methods, plus an additional method empty which is used to return a new empty list. The methods are not static, so they are called in an object oriented way. So, for example, if L represents [a,b,c,d] then L.head() represents a, L.tail() represents [b,c,d], and L.cons(x) represents [x,a,b,c,d] (where x represents x). In functional or procedural languages, you would find these written head(L), tail(L) and cons(x,L).

The inner class Cell is identical to the inner class StudentList in the student database examples, except that each cell stores data of type Object rather than of type Student, and the data and next fields have been named first and rest rather than head and tail in order to prevent confusion with the head and tail methods of class LinkedList.

Now we have an implementation for lists, we can get a runnable program by having a front-end class (i.e. one with a main method to provide a user interface) which shows some use of lists:

     1  import java.io.*;
     2  
     3  class UseLists0
     4  {
     5  
     6   public static void main(String [] args) throws IOException
     7   {
     8    BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
     9    List list1,list2;
    10    String str1,str2;
    11    System.out.print("Enter list 1: ");
    12    str1=in.readLine();
    13    list1=stringToList(str1);
    14    System.out.print("Enter list 2: ");
    15    str2=in.readLine();
    16    list2=stringToList(str2);
    17    System.out.print("\nList 1 is: ");
    18    listPrint(list1);
    19    System.out.print("\nList 2 is: ");
    20    listPrint(list2);
    21    System.out.print("\n\nAppending list 1 to list 2 is: ");
    22    listPrint(append(list1,list2));
    23    System.out.print("\nMerging list 1 with list 2 is: ");
    24    listPrint(merge(list1,list2));
    25    System.out.println();
    26    
    27   }
    28
    29   private static List empty()
    30   {
    31    return LinkedList.empty();
    32   }
    33
    34   public static List stringToList(String str)
    35   {
    36    List list = empty();
    37    for(int i=str.length()-1;i>=0;i--)
    38       list=list.cons(new Character(str.charAt(i)));
    39    return list;
    40   }
    41
    42   public static void listPrint(List list)
    43   {
    44    Character Ch;
    45    if(!list.isempty())
    46       {
    47        Ch=(Character)(list.head());
    48        System.out.print(Ch.charValue());
    49        listPrint(list.tail());
    50       }
    51   }
    52
    53   public static List append(List l1,List l2)
    54   {
    55    if(l1.isempty())
    56       return l2;
    57    else
    58       {
    59        List a=append(l1.tail(),l2);
    60        return a.cons(l1.head());
    61       }
    62   }
    63
    64   public static List merge(List l1,List l2)
    65   {
    66    if(l1.isempty())
    67       return l2;
    68    else if(l2.isempty())
    69       return l1;
    70    else
    71       {
    72        List m=merge(l1.tail(),l2.tail());
    73        return m.cons(l2.head()).cons(l1.head());
    74       }
    75   }
    76
    77  }
Here the lists are both input and output as a string of characters, so the list [a,b,c,d], where a represents the actual character a for example, is read and written as just abcd. The only actual reference to the implementation of list using linked lists is on line 31 where the method empty which returns a new empty list has to show what sort of list implementation is used. This is the only thing that ties in the class UseLists0 with the class LinkedList. The class UseLists0 gives two examples of methods which use lists, append on lines 53-62, and merge on lines 64-75. Both use recursion, and since a list is a recursive data structure (i.e. a list is defined in terms of a list: it is either an empty list or a structure consisting of an object and a list), it is natural to write programs for it using recursion. The method append joins two lists (or strictly, since it is constructive it takes two lists and creates a new one consisting of the two input lists joined together), and the method merge creates a new list from two lists by taking in turn one element from each.

Wrapper Classes

The class UseLists0 shows an example of the use of a wrapper class, also known as an envelope class. On lines 38, 44 and 47 you will see the word Character. Character is a wrapper class for the primitive type char. Each primitive type has a corresponding wrapper class, for int it is Integer, for example. The reason for this is that although something of type Object can store an object of any type, primitive type values are not objects. A wrapper class object is an object which stores the corresponding primitive value, but being an object can be stored in something of type Object. The value new Character(ch) where ch is of type char gives an object which stores the value of ch. On line 38 it is used to convert the char values in a string to Character values to be stored in a list. If an object Ch is of type Character, then Ch.charValue is the equivalent char and this is used on line 48 to convert the Characters from the list to chars for printing.

On line 47 note the use of the type conversion (Character). This is needed because if L is a List, then its head, L.head() is of the general type Object. The type conversion turns it back into the more specific type Character.

Different implementations of the same abstract data structure

Linked lists are generally the most sensible way to implement the abstract data type list, but to show it is not necessary to implement them in this way, and to illustrate the general principle that the implementation of an abstract data type does not affect its behaviour, we give an alternative implementation here. Here is the linked list representation of the abstract list [a,b,c,d,e]:

It is easy to see that the head and tail methods in class LinkedLists give the output of a and the representation of [b,c,d,e] on this.

Here is an alternative implementation of lists using arrays rather than linked lists:

     1  class ArrayList implements List
     2  {
     3   private Object[] myArray;
     4
     5   ArrayList()
     6   {
     7    myArray = new Object[0];
     8   }
     9
    10   private ArrayList(Object[] array)
    11   {
    12    myArray=array;
    13   }
    14
    15   public boolean isempty()
    16   {
    17    return myArray.length==0;
    18   }
    19
    20   public Object head()
    21   {
    22    return myArray[0];
    23   }
    24
    25   public List tail()
    26   {
    27    Object[] array = new Object[myArray.length-1];
    28    for(int i=1;i<myArray.length;i++)
    29       array[i-1]=myArray[i];
    30    return new ArrayList(array);
    31   }
    32
    33   public List cons(Object obj)
    34   {
    35    Object[] array = new Object[myArray.length+1];
    36    array[0]=obj;
    37    for(int i=0;i<myArray.length;i++)
    38       array[i+1]=myArray[i];
    39    return new ArrayList(array);
    40   }
    41
    42   public static List empty()
    43   {
    44    return new ArrayList();
    45   }
    46
    47 }
To use this implementation in the place of the linked list implementation in UseLists0, all that is needed is for line 31 in UseLists0.java to be changed from return LinkedList.empty(); to return ArrayList.empty();. When the program is run it will behave exactly the same as when it was run with lists implemented by linked lists, the only difference being that it might take up different amounts of memory or take a different time to execute (on a small example like this that would not be noticeable).

In the array implementation a list [a,b,c,d,e] is actually represented as an array, declared as a private field with name myArray on line 3, where myArray[0] stores a, myArray[1] stores b, myArray[2] stores c and so on. The length of the array is the same as the length of the list. As the code for tail on lines 25-31, and for cons on lines 33-40 show, this implementation is more time consuming and complex, since these methods require the creation of an entirely new array, and the copying of the information from the initial array into this new array.

Further list examples

It would be better if lists could be read and written in the conventional manner with square brackets to open and close them, and the separate elements separated by commas. We can ensure lists are output in this way by adding a toString method to the LinkedList (or ArrayList) class. Here is such a method (which makes use of a private method that also has to be added to the class):
 public String toString()
 {
  if(isempty())
     return "[]";
  else
     return "["+head()+restToString(tail());
 }

 private static String restToString(List L)
 {
  if(L.isempty())
     return "]";
  else
     return ","+L.head()+restToString(L.tail());
 }
Reading in a list written in this way is rather more tricky, particularly if we do not know what type the elements of the list are. Here is a class which gives one solution:
     1  import java.io.*;
     2  import java.util.*;
     3
     4  class ListRead
     5  {
     6
     7   public static List readCharList(BufferedReader in) 
     8   throws CharListFormatException,IOException
     9   {
    10    StringTokenizer chars;
    11    String token,str=in.readLine().trim();
    12    List l1,l2;
    13    if(str.charAt(0)!='['||str.charAt(str.length()-1)!=']')
    14       throw new CharListFormatException();
    15    str=str.substring(1,str.length()-1);
    16    chars = new StringTokenizer(str,","); 
    17    l1=LinkedList.empty();
    18    while(chars.hasMoreTokens())
    19       {
    20        token=chars.nextToken();
    21        if(token.length()!=1)
    22           throw new CharListFormatException();
    23        l1=l1.cons(new Character(token.charAt(0)));
    24       }
    25    l2=LinkedList.empty();
    26    while(!l1.isempty())
    27       {
    28        l2=l2.cons(l1.head());
    29        l1=l1.tail();
    30       }
    31    return l2;
    32   }
    33
    34   public static List readIntList(BufferedReader in) 
    35   throws IntListFormatException,IOException
    36   {
    37    StringTokenizer ints;
    38    String token,str=in.readLine().trim();
    39    List l1,l2;
    40    int n;
    41    if(str.charAt(0)!='['||str.charAt(str.length()-1)!=']')
    42       throw new IntListFormatException();
    43    str=str.substring(1,str.length()-1);
    44    ints = new StringTokenizer(str,","); 
    45    l1=LinkedList.empty();
    46    while(ints.hasMoreTokens())
    47       {
    48        token=ints.nextToken();
    49        try {
    50             n=Integer.parseInt(token);
    51            }
    52        catch(NumberFormatException e)
    53            {
    54             throw new IntListFormatException();
    55            }
    56        l1=l1.cons(new Integer(n));
    57       }
    58    l2=LinkedList.empty();
    59    while(!l1.isempty())
    60       {
    61        l2=l2.cons(l1.head());
    62        l1=l1.tail();
    63       }
    64    return l2;
    65   }
    66   
    67 } 
The problem of reading in a list is simplified by having a method in this class to read lists of only characters, and another to read lists of only integers. The methods throw an exception if they read something which is not a correct list of characters in the one case and integers in the other, so the following classes are needed to give the exceptions:
class CharListFormatException extends Exception
{
}

class IntListFormatException extends Exception
{
}
The way the methods work is to read in a whole line of text as a string, trimming off any blank characters at either end (done on lines 11 and 38 above). Then a check is made to see that the first and last characters of the trimmed string are the opening and closing brackets (done on lines 13 and 41). If not, a format exception is thrown, otherwise the substring which removes the brackets is created (on lines 15 and 43). A string tokenizer using a comma as a separator is used to break the string into its component integers or characters. A format error is thrown if when checking the component parts of the character array one is a string with length more than one (i.e. not a single character) on line 21. If when checking the component parts of an array of integers, one will not parse to an integer, the NumberFormatException thrown when the parse fails is caught (on line 52) and an IntListFormatException is thrown.

Here is a program uses the method for reading lists of characters:

import java.io.*;

class UseLists1
{

 public static void main(String [] args) throws IOException
 {
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  List list1,list2;
  int n;
  for(;;) try
     {
      System.out.print("Enter list of characters 1 : ");
      list1=ListRead.readCharList(in);
      break;
     }
  catch(CharListFormatException e)
     {
      System.out.println("Not a valid list of characters");
     }
  for(;;) try
     {
      System.out.print("Enter list of characters 2: ");
      list2=ListRead.readCharList(in);
      break;
     }
  catch(CharListFormatException e)
     {
      System.out.println("Not a valid list of characters");
     }
  System.out.print("\nList 1 is: "+list1);
  System.out.print("\nList 2 is: "+list2);
  System.out.print("\n\nAppending list 1 to list 2 is: ");
  System.out.println(ListOps.append(list1,list2));
  System.out.print("\nMerging list 1 with list 2 is: ");
  System.out.println(ListOps.merge(list1,list2));
  System.out.println();

  for(;;) try
     {
      System.out.print("Enter an integer: ");
      n=Integer.parseInt(in.readLine());
      break;
     }
  catch(NumberFormatException e)
     {
      System.out.println("Not a valid number");
     }
  System.out.print("The first "+n+" elements of list 1 are: ");
  System.out.println(ListOps.take(n,list1));
  System.out.print("\nList 2 less its first "+n+" elements is: ");
  System.out.println(ListOps.drop(n,list2));
  System.out.println();
 }

}
Note this program puts the operations which use lists in a separate class called ListOps. Here is that class:
class ListOps
{

 private static List empty()
 // Returns a new empty list
 {
  return LinkedList.empty();
 }

 public static List append(List l1,List l2)
 // Joins its two input lists together
 {
  if(l1.isempty())
     return l2;
  else
     {
      List a=append(l1.tail(),l2);
      return a.cons(l1.head());
     }
 }

 public static List merge(List l1,List l2)
 // Merges its two input lists, taking an object from each
 // one alternatively
 {
  if(l1.isempty())
     return l2;
  else if(l2.isempty())
     return l1;
  else
     {
      List m=merge(l1.tail(),l2.tail());
      return m.cons(l2.head()).cons(l1.head());
     }
 }

 public static List take(int n,List L)
 // Gives the list consisting of the first n elements of L
 {
  if(n==0||L.isempty())
     return empty();
  else
     return take(n-1,L.tail()).cons(L.head());
 }

 public static List drop(int n,List L)
 // Gives the list consisting of the elements of L less its
 // first n elements
 {
  if(n==0||L.isempty())
     return L;
  else    
     return drop(n-1,L.tail());
 }

}
It makes sense to build up a library of methods which operate on lists like this. Note that since these methods deal with lists of Objects, they may be used with lists whose elements are of any type. In fact, it is possible to build up lists whose elements are of mixed type. The following program demonstrates this:
import java.io.*;

class UseLists2
{

 public static void main(String [] args) throws IOException
 {
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  List list1=null,list2=null;
  for(;;) try 
     {
      System.out.print("Enter list of characters: ");
      list1=ListRead.readCharList(in);
      break;
     }
  catch(CharListFormatException e)
     {
      System.out.println("Not a valid list of characters");
     }
  for(;;) try 
     {
      System.out.print("Enter list of integers: ");
      list2=ListRead.readIntList(in);
      break;
     }
  catch(IntListFormatException e)
     {
      System.out.println("Not a valid list of integers");
     }
  System.out.print("\nList 1 is: "+list1);
  System.out.println("\nList 2 is: "+list2);
  System.out.print("\nAppending list 1 to list 2 is: ");
  System.out.println(ListOps.append(list1,list2));
  System.out.print("Merging list 1 with list 2 is: ");
  System.out.println(ListOps.merge(list1,list2));
 }

}
Although it reads in a separate list of characters and a list of integers, it uses append and merge to construct lists which contain a mixture of characters and integers.

Writing recursive methods to build lists

The ListOps class given above shows four methods: append, merge, take and drop each of which builds a new list. It is important to note that lists as defined here are an immutable data type, so it is not possible to get a list object to change the data it stores using its own methods. A variable of type List may change the data it refers to only by assigning a new value of type List to it. So any methods which operate on lists have to be constructive. We may talk of, for example, append joining two lists together, but it does not actually do this, it creates a new list which as the value of the two argument lists to append joined together.

Writing such a recursive method generally involves thinking about the problem by considering what happens when you apply the recursive method to the tail of the input list. If a logical relation can be given which links the desired output with the output of the recursive operation, a method can be built up. It often helps to start with an example and then generalise. For example, suppose we want to append [a,b,c] to [x,y,z]. The result should be [a,b,c,x,y,z]. Thinking about it recursively, if we append [b,c] (the tail of the first input) to [x,y,z] (the second input), we get [b,c,x,y,z]. This is almost the output we want, in fact all we need do is append the head of the first input to it to get it. This gives us the recursive part of the append method, which just generalises from this example:

      List a=append(l1.tail(),l2);
      return a.cons(l1.head());
We could in fact express this as one statement:
      return append(l1.tail(),l2).cons(l1.head());
There's no right or wrong way of doing this - you should use whatever seems clearest to you when deciding whether or not to break up some long value into several assignment statements forming partial values. Note that if we had started by considering what is the result of applying the recursive operation on the tails of both inputs (i.e. [b,c] and [y,z]) the result would have been [b,c,y,z], and since there is no obvious link between this and the desired output we would know to think again about the recursive step. In some other methods on two lists, merge is an example, the correct recursive step does take the tails of both inputs.

Having got the recursive part right, we need to consider the base case - what happens when we can't take the tail of the first input because it is empty. Obviously appending an empty list to a list L gives the list L as output. With this we can write our complete append method, as given above.

Now let us consider in a similar way, how to write the method for take. Suppose the arguments to take are 3 and [k,l,m,n,o,p,q]. Then the result should be [k,l,m]. If we try the recursive call with the same integer argument but the tail of the list argument i.e. 3 and [l,m,n,o,p,q] we get [l,m,n] which doesn't quite help us. Let us consider again, this time the recursive call with arguments the integer argument less one and the tail of the list argument, i.e. 2 and [k,l,m,n,o,p,q]. This time the result of the recursive call is [l,m], which is close to the [k,l,m] we want, so this looks like the recursive call we should use. Again the final result comes from appending the head of the input list to the result of the recursive call. We could do this in two steps (as before, generalising from the specific example to the general case):

List t=take(n-1,L.tail());
return t.cons(L.head());
or one step as was the case when the method was given above:
return take(n-1,L.tail()).cons(L.head());
As always we need to consider a base case. In fact there are two base cases. Since we both reduce the length of the list by one in the recursive call and reduce the integer by one, we hit the base case when either we have reached the point where the empty list is the list input or where 0 is the integer input. It is obvious that the list consisting of the first 0 elements of a list must be the empty list. We might consider that it's an error to try and take the first n elements from an empty list when n is greater than 0, but we might decide that it's logical to say that the result should also be the empty list, which is what we do in the example code.

Because recursive methods on lists are based on thinking about logical relations between method calls, they are often the best way to write list methods - it's easy to do once you've got the hang of it, and what the code does is very clear (again, once you've got the hang of using recursion). It is often possible to use iteration however. Beware that a loop which takes elements of a list and conses them to something else causes them to be reversed. So the following does not append two lists:

public static list append1(List l1,List l2)
{
 while(!l1.isempty())
    {
     l2=l2.cons(l1.head());
     l1=l1.tail();
    }
 return l2;
}
If you tried this with arguments [a,b,c] and [x,y,z], you would get [c,b,a,x,y,z]. The correct iterative code for append is:
public static list append2(List l1,List l2)
{
 List l3=empty();
 while(!l1.isempty()
    {
     l3=l3.cons(l1.head());
     l1=l1.tail();
    }
 while(!l3.isempty())
    {
     l2=l2.cons(l3.head());
     l3=l3.tail();
    }
 return l2;
}
where the elements of the first input list are first taken off and stacked on a temporary list. Note that though in these cases there are lines like
l1=l1.tail();
these do not change the values of the variables originally passed as parameters. If we call append2(m1,m2), l1 is initially a local name within append2 for m1, but after the first l1=l1.tail(), m1 remains a variable referring to its original data, but l1 starts referring to the tail of that data.
Matthew Huntbach

These notes were produced as part of the course Introduction to Programming as it was given in the Department of Computer Science at Queen Mary, University of London during the academic years 1998-2001.

Last modified: 25 Jan 1999