Lisp Lists

Lisp Lists

In this section we introduce another generic collection type, this time called LispList. This type is not provided as a standard part of Java, so to use it you will have to copy the file LispList.class from the location ~mmh/DCS128/lispLists/LispList.class, or just download it from the link. As a generic type, it should be thought of as having type LispList<E>, meaning that a full type needs to be made by combining LispList with some base type in the same way as we combined ArrayList with base types. So LispList<Integer> is the type "Lisp list of integers", LispList<String> is the type "Lisp list of strings" and so on. The code in the file LispList.class makes use of code from another file LispList$Cell.class, so you will need to copy this from the same directory to thesame directory as the file LispList.class, or download it from the link.

The name "Lisp list" comes from the programming language Lisp, which is one of the oldest programming languages in existence, having first been introduced in the 1950s. Lisp was the first example of a functional programming language, a more modern example of such a language is Haskell, although Lisp is still widely used. The name "Lisp" comes from "list processor", with lists being the fundamental data structure of the language. That is why we use the term "Lisp List" to name this data structure here.

The reason for considering programming with Lisp lists is that it is a particularly simple data structure, yet a very powerful one. Because of this, some fundamental aspects of algorithms and data structures are best illustrated by considering them in terms of Lisp lists. Some universities, for example Oxford, Cambridge and Imperial College in the UK and MIT in the States, use a functional language as the first language for teaching programming, believing it the best way to put across important issues which get buried in the detail of more widely used languages such as Java.

A Lisp list is a data structure which consists of an ordered collection of items, in that way it is like the arrayList we considered previously. However, arrayLists are mutable, that is there are operations on them which change their content, and are considered as an enhanced form of array, that is programming with them is centred round accessing their components using an integer index. Lisp lists are immutable and are not primarily manipulated using indexes. Only the following operations are provided:

Conventionally, we write lists enclosed with the square brackets [ and ], with the items in the list in order separated by commas. So, for example, [30,15,7,12] is the way we would write a list of type LispList<Integer> whose head is 30 and whose tail is [15,7,12]. If ls is a variable of type LispList<Integer> which refers to an object whose value we write as [30,15,7,12], then ls.head() will return 30 and ls.tail() will return a reference to an object whose value is written as [15,7,12]. The call ls.tail() does not change the object ls refers to, it remains with the value represented as [30,15,7,12], but the assignment ls1=ls.tail() would cause the variable ls1 to start referring to a new object which can be represented as [15,7,12]. Also the call ls.tail() could be passed directly as an argument to a method, or used directly to have a further method called on it. So, for example, ls2=ls.tail().tail() would cause ls2 to refer to a new object whose value can be written [7,12]. Similarly, ls.cons(8) does not change the object ls refers to, it continues to be one represented by [30,15,7,12], but ls3=ls.cons(8) would cause the variable ls3 to start referring to a new object whose value can be represented by [8,30,15,7,12]. The empty list is represented by [], an attempt to call the methods head or tail on a variable referring to an object representing the empty list would cause an exception to be thrown, but the cons operation can be called on such a variable.

Note that in Lisp the operations we call head and tail are called car and cdr respectively, this is for historical reasons connected with the original implementation of the language. In some data structures textbooks, the term "tail" is used to mean the last element of a list. It is unfortunate that these two different uses of "tail" have arisen with lists, but in these notes we will only ever use the term "tail" when applied to a list to mean "all but the first element". So Lisp lists are broken down using the operations head and tail, and assembled using the operation cons. We also need a way of getting an object representing the empty list, which in Java is done by a static method called empty. In general, when you call a static method from another class, you attach the call to the class name, but here we have the additional aspect of calling a generic static method. To create a new object of type LispList<Integer> representing an empty list, you use the call LispList.<Integer>empty(). The <Integer> after the dot which attaches the method call to the class name LispList gives a value to the type variable of this method, in this case setting it to Integer. But you can omit it if the value of the type variable can be worked out otherwise, for example if the value being returned by the call to the method empty is being assigned to a variable of type LispList<Integer> we know the type variable must be Integer so the call can be just LispList.empty().

To maintain the simplicity of the LispList generic type, there are no additional methods provided except for a toString method so that a suitable text representation of Lisp Lists is given, and an equals method to enable them to be compared for equality. There are no public constructors at all. Any other operations must be provided by writing static methods as required. The methods cons, tail and empty are the only way to construct new LispList objects. You can find code for Lisp lists in the directory ~mmh/DCS128/code/lispLists. To use the type LispList, you will need to copy two .class files into your directory, LispList.class and LispList$Cell.class. The second of these is a class which is only used internally by objects of type LispList.

Iterative methods on Lisp Lists

If the variable ls refers to an object of type LispList<T>, then ls.head() gives the first item in the list, ls.tail().head() gives the second item in the list, ls.tail().tail().head() gives the third item in the list, and so on. This is because the tail of a list is the list of every item except the first one, so the head of the tail must be the second item, and so on. A common way of dealing with lisp lists is to have a variable which is first set to a list, then set to the tail of the list, then to the tail of the tail, then to the tail of the tail of the tail, and so on. If the variable is set to its own tail, it will refer to the portion of the list with one less item at the front than it used to, and keep resetting it this way means it goes through the whole list with the head of the object referenced by the variable being each item in the original list in turn. When the variable refers to the empty list, it has gone through the whole list.

For example, suppose ls1 is of type LispList<Integer> and we want to add together all the integers in the list. The following code fragment will do this:

int sumSoFar=0;
for(LispList<Integer> ls2=ls1; !ls2.isEmpty(); ls2=ls2.tail())
   {
    int n = ls2.head();
    sumSoFar = sumSoFar+n;
   }
Note that, strictly, ls2.head() is of type Integer, not int, but assigning it to a variable of type int will convert it to the int equivalent, using the principle of "unboxing" we described previously.

After this code fragment has been executed, the variable count holds the value obtained by adding together all the integers in the list referred to by ls1, and ls1 continues to refer to the whole list. Suppose ls1 refers to the list [8,30,15,7,12]. Then the first time round the loop, ls2 also refers to [8,30,15,7,12], and l2.head() returns 8. The update statement in the for loop causes ls2 to be reset to refer to [30,15,7,12] on the second time round the loop, with l2.head() returning 30. The next time round the loop, ls2 refers to [15,7,12] and ls2.head() returns 15 and so on. This could be put into a static method:

static int sum(LispList<Integer> ls)
{
 int sumSoFar=0;
 for(; !ls.isEmpty(); ls=ls.tail())
    {
     int n = ls.head();
     sumSoFar = sumSoFar+n;
    }
 return sumSoFar;
}

Code for reading a Lisp list of integers and adding the integers in it can be found in the files UseLispLists1.java and UseLispLists2.java. The second of these shows the operation done through the static method given above. It demonstrates that, for the reasons we showed previously, although ls=ls.tail() is executed within the method, this does not change what the list variable used as an argument to the method refers to. The variable ls inside the method is a local variable, which is initially set to the object passed as the argument to the method, but if it is changed to refer to another object, the variable that was used as the argument to the method call stays referring to its original object. So a call to sum(ls) will return the sum of integers in the Lisp list referred to by ls, but will not change what ls refers to.

The code given in the file also includes a method parseIntLispList which takes a string and gives the Lisp list of integers it represents using the text representation given above. This uses some of the methods provided in the Java library we discussed previously.

The method sum here is not generic because it relies on the base type of the Lisp list it handles being Integer. However, we can consider generic methods for operations which do not rely on the base type. We have seen that for arrayLists there is a method provided by Java which gives the position of an item in an arrayList. Here is a static method which provides a similar operation for Lisp lists:

 public static <T> int indexOf(LispList<T> ls,T item)
 {
  for(int pos=0; !ls.isEmpty(); ls=ls.tail(),pos++)
     if(ls.head().equals(item))
        return pos;
  return -1;
 }
This gives the position of any item in a Lisp list of items of any type, so long as the type of the item and the base type of the Lisp list are the same. It uses the Java convention that counting positions starts at position 0 and returning -1 if the item does not occur in the list. It uses the technique we used previously of a for loop, where the exit test for the for loop is when the whole of the structure has been gone through and the item not found, and where there is a return statement in the for loop which is executed to return the position (and hence exit the for loop and the whole method) when it is found. This is a case of a for-loop with two update statements - ls=ls.tail() and pos++, changing what the list variable ls refers to and the value of pos giving the position in the original list of the current head of ls. It is possible to have more than one update statement in a for loop, if so the update statements are separated by commas.

An alternative way of programming the loop would be to have it exit either if the list becomes the empty list, or if the head of the list is the item whose position we are trying to find. Then -1 is returned if the list is the empty list, and the value of the position variable if it is not as that means the loop has been exited because the position of the item has been found. Then the body of the loop is empty, all it does is update and test. Here is the code for this:

 public static <T> int indexOf(LispList<T> ls,T item)
 {
  int pos=0;
  for(; !ls.isEmpty()&&!ls.head().equals(item); ls=ls.tail(),pos++) {}
  if(ls.isEmpty())
     return -1;
  else
     return pos;
 }
The files UseLispLists3.java and UseLispLists4.java give these two versions of the method, and support code to test them with lists of integers and lists of strings. A separate method parseStringLispList is needed to convert a string representing a list of strings into the actual Lisp list object. The text representation is the conventional one we used above, starting with [ and ending with ] with the strings separated by commas, spaces before and after the commas are not treated as part of the strings in the list, otherwise all characters are treated as characters in the strings.

Now let us consider the operation of changing all occurrences of one item to another, as we have already considered with arrays and arrayLists. Since Lisp lists are immutable, we can only do this constructively. A first attempt at such a change method might look rather like the one we used previously for a constructive change in arrayLists, going through the items of the argument collection, adding them to a new collection which is initially empty, but if the item is the one to be changed, adding the one it is to be changed to instead. With arrayLists, the cons method is the only way to construct a new list which represents adding an item to an existing list. This gives the following code:

 public static <T> LispList<T> change(LispList<T> ls1,T m,T n)
 {
  LispList<T> ls2 = LispList.empty();
  for(; !ls1.isEmpty(); ls1=ls1.tail())
     {
      T h = ls1.head();
      if(h.equals(m))
         ls2 = ls2.cons(n);
      else
         ls2 = ls2.cons(h);
     }
  return ls2;
 }
You will find this code and supporting code to demonstrate it in the file UseLispLists5.java. If you run this, you will find it makes the change as required, but returns the list in reverse order. If ls is set to represent the list written [2,12,4,17,21,4,9,10,4] and m is set to 4 and n is set to 50, then the call change(ls,m,n) will return a reference to a Lisp List object with textual representation [50,10,9,50,21,17,50,12,2].

The reason for this is that we can only take items from the front of a list and join them to the front of a list. If you repeatedly take items from the front of one list and join them to the front of another, you will end up reversing the list. If we want the list in the correct order, a simple way to do it would be to use the samer technique again to reverse the reversed list, so this would make the method:

 public static <T> LispList<T> change(LispList<T> ls1,T m,T n)
 {
  LispList<T> ls2 = LispList.empty();
  for(; !ls1.isEmpty(); ls1=ls1.tail())
     {
      T h = ls1.head();
      if(h.equals(m))
         ls2 = ls2.cons(n);
      else
         ls2 = ls2.cons(h);
     }
  LispList<T> ls3 = LispList.empty();
  for(; !ls2.isEmpty(); ls2=ls2.tail())
     ls3 = ls3.cons(ls2.head());
  return ls3;
 }
This code and supporting code to demonstrate it can be found in the file UseLispLists6.java. Note that ls3 = ls3.cons(ls2.head()) means the same as
   {
    T h = ls2.head();
    ks3 = ls3.cons(h);
   }
You can express this using a local variable to hold the value returned by ls2.head() then making that variable the argument to the call to cons, or you can make the call ls2.head() directly the argument to the call to cons. It makes no difference to the way the code executes when run in Java, deciding how to write it is just a matter of what you feel looks clearer.

Methods which are centred around the use of a loop are known as iterative. An iterative algorithm is a way of solving a problem which is best explained in terms of making repeated changes to some data until the desired condition is met. In the above example, the data being changed in each loop consist of two variables referring to Lisp list objects. Each time round the loop, one of them is changed to refer to a shorter list, the other is changed to refer to a longer list. We know the loop will only be gone through a certain number of times, because the loop test is that the variable which keeps being set to a shorter list refers to a non-empty list - eventually it will refer to an empty list and the loop woll be exited. Then the variable which kept being set to a longer list holds the desired result. No actual object is changed, since Lisp list objects are immutable meaning they can't be changed. But new objects are created and the variable change their values because they are set to refer to these new objects.

The technique used here for solving Lisp list problems iteratively can be called stack and reverse. We could imagine our lists as a stack of blocks, one on top of each other, with the head of the list being the top block. We can take the top block off a stack, and we can put a new block on top of a stack. But we can't take a block from the middle of a stack or put one in the middle. So our problems have to be solved by taking blocks off one stack, investigating them, and putting them on top of another. We are then likely to want to go through the process of taking from one stack and putting onto another again, in order to reverse the order.

Recursive methods on Lisp lists

The alternative to iteration when thinking about problem solving is recursion. A recursive approach to problem solving uses the consideration that we can often solve a problem by solving a smaller version of the same problem then adapting the solution, or sometimes using more than one solutions to smaller versions of the problem. For example, suppose I need to make a journey from town A to town B, and I don't know a direct route. I could pick another town, let's call it C, where I know a direct route from A to C and C is closer to B than it is to A. Then I have reduced the problem of finding a route from A to B to one of finding a route from B to C and putting the route from A to C I know on the front of it. Finding a route from B to C is done in the same way, by reducing the problem to finding a route to B from a town closer to B, and eventally the problem will be reduced to one where there is a clear direct route and it does not have to be broken down further. An alternative way of solving the problem is to pick a town midway between A and B, again we'll call it C. Now the problem has been reduced to two similar problems: finding a route from A to C and finding a route from C to B. Eventually, if we solve those two problems in the same way, we'll get down to routes which are short enough for a direct route to be found.

Recursive approaches to solving Lisp list manipulation problems often work well because Lisp lists are a recursive data type. A Lisp list is either the empty list, or it consists of two parts its head and its tail, and the tail is itself a Lisp list. So a Lisp list is defined in terms of a Lisp list. Suppose we want to change all occurrences of m to n in a Lisp list, ls. If we change all occurrences of m to n in the tail of ls (remember this involves creating a new Lisp list object representing the change, and not destructively changing the tail of ls1) we almost have the desired Lisp list. All we need to do is put n on the front of it if m was on the front of the original list, or put the front item, of the original list at the front of the new list if m was not at the front of the original list. We also need to consider that if we wish to change all occurrences of m to n in the Lisp list ls, then if ls is the empty list, then the answer is the empty list. This gives us the following code which performs the same change operation as above:

 public static <T> LispList<T> change(LispList<T> ls,T m,T n)
 {
  if(ls.isEmpty())
    return LispList.empty();
  else
     {
      LispList<T> t = change(ls.tail(),m,n);
      T h = ls.head();
      if(h.equals(m))
         return t.cons(n);
      else
         return t.cons(ls.head());
     }
 }
Again, there is no right or wrong way to express a solution to this problem. You may find the iterative solution easier to understand, or you may find the recursive solution easier to understand. Recursive solutions often have a simpler structure, but require a bit more intuitive thinking to appreciate. A good computer scientist should be able to understand and devise recursive and iterative solutions to problems. If recursion seems a rather strange way to go about tackling a problem, you will need to practice with it until you find it natural.

Let us consider recursive version of the other operations we have discussed on Lisp lists. The sum of the all the elements in the empty list is clearly 0. The sum of all the elements in any other list is equal to the first element added to the sum of the elements in the rest of the list. I hope this seems intuitively obvious, and yet it is very close to a program to give the sum. The "sum of all the elements in the rest of the list" is a recursive call to the sum operation taking as its argument the tail of the original argument. So this gives the following code:

static int sum(LispList<Integer> ls)
{
 if(ls.isEmpty())
    return 0;
 else
    return ls.head()+sum(ls.tail());
}
A recursive method must always have a case when it doesn't make a recursive call. With Lisp lists this is often, but not always, when the argument list is the empty list. The case when no recursive call is made is known as the base case. Any recursive call must always have arguments which are closer to a base case than the original arguments, since we want a succession of recursive calls eventually to come to an end in the base case. So with Lisp lists this generally (but not always) means the recursive call takes as an argument the tail of an original argument.

If we consider the operation indexOf we considered previously, there are two base cases. One is when the list is empty, then the item cannot occur in it so it must return -1. The other is when the item is the first item in the list, then it must return 0. Otherwise, the index of an item in a list is one more than its index in its tail. For example, the position of 80 in [20,90,70,50,80,60] is 4 (remember with the usual Java convention of starting counting at 0), this is one greater than the index of 80 in [90,70,50,80,60]. But this relationship is not quite right. The index of 40 in [20,90,70,50,80,60] is -1 (meaning it doesn't occur), but the index of 40 in [90,70,50,80,60] is also -1. So the index of an item in a Lisp list is -1 if the list is empty, 0 is it is the first element, -1 if the index in the tail is -1, otherwise it is one greater than its index in the tail. This translates to the following code:

 public static <T> int indexOf(LispList<T> ls,T item)
 { 
  if(ls.isEmpty())
     return -1;
  else if(ls.head().equals(item))
     return 0;
  else
     {
      int index = indexOf(ls.tail(),item);
      if(index==-1)
         return -1;
      else
         return index+1;
     }
 } 
You can see from this that when programming using recursion, it often makes more sense to think in terms of logical relationships between the result of a call and the result of a recursive call than it does to think of it in terms of one-by-one steps as with iteration.

The file UseLispLists2a.java is the same as the file UseLispLists2.java and the file UseLispLists3a.java is the same as the file UseLispLists3.java except that they use recursion as given above, rather than iteration. You will see that when you run the code using the recursive method the result is exactly the same as when the code is run using the iterative method. This is correct - the important thing is not how the code is programmed, but whether it gives the right result. There may be more than one way of programming an operation to give a particular result, but someone who uses a program to get a particular result is only concerned that the result is correct.

Actually, as we shall see later, there is also the issue of efficiency. Different ways of programming an operation may give the same result in terms of what the method for the operation does or returns, but one way may take much longer than another. Someone who uses a program will be concerned that it operates as quickly as possible.


Matthew Huntbach

Last modified: 13 September 2005