Implementing ArrayLists using Linked Lists

Array-like operations on a linked list structure

When we considered implementing Lisp lists, we first considered an implementation using arrays, then considered an implementation using linked lists. The linked list implementation turned out to be more efficient, saving space because it made use of shared cells, and saving time because it did not involve whole arrays being copied. Part of the reason the linked list implementation turned out to be more efficient is that there is a close correspondence between Lisp lists and linked lists.

We have seen previously that arrayLists may be implemented using arrays. They may also be implemented with linked lists. The array implementation is generally the most efficient since there is a close correspondence between arrays and arrayLists. However, it gives us chance for more practice with the linked list data structure if we consider how it may be used to implement an arrayList abstract data type.

Like our implementation of Lisp lists with linked lists, our implementation of arrayLists will have a single private variable in an object which is set to refer to a linked list structure. It will be made up of the same sort of Cell objects as we used with for Lisp lists, with a nested class to define them. However, as arrayLists are a destructive or mutable type of object, the operations on it work by changing the linked list structure referred to inside the object, rather than constructing a new one and putting it inside a new object. If we call the new type MyArrayList5, it will have the following structure:

class MyArrayList5 <E>
{
 private Cell<E> myList;

 public MyArrayList5()
 {
  myList=null;
 }

 ...

 private static class Cell <T>
 {
  T first;
  Cell<T> next;

  Cell(T h,Cell<T> t)
  {
   first=h;
   next=t;
  }
 }
}
The recursive variable in the nested class is called next here rather than rest (which is what we used when we had otherwise the same class for implementing Lisp lists) as that better captures the idea that it represents a link in the linked list, but changing the name of the variable does not change the way the class operates. This time there is a public constructor which is the equivalent of the zero-argument constructor in Java's ArrayList class provided in the java.util package, it returns an object representing an empty arrayList. The data structure representing the empty arrayList is just null. The class has a type variable E allowing it be used to build arrayLists of any base type.

Following this, we need methods representing the basic operations on an arrayList. These will change the linked list data structure. Let us consider an example. Suppose we want the arrayList object representing the arrayList we can represent textually as [5,8,3,4]. We can use the same representation we used for the linked list [5,8,3,4], so if a is the name of the object referring to it, diagrammatically this can be shown as:

Now suppose we want to represent the effect of calling a.add(7). This needs to change the structure to:

Unlike cons with Lisp lists, the new item is added at the back rather than the front of the list.

To achieve this, what we need is a variable which is an alias to the last cell (that is the cell whose next variable is null). The reason we need to add the new cell through an alias to an existing cell in the linked list is that this causes the new cell to be linked into the linked list. If the alias variable is called ptr, then executing ptr.next = new Cell(n,null) will add a new cell to the end of the list whose first variable is set to the value in n (let us assume that is 7) and whose next variable is set to null. So we need to reach the following situation:

To reach this situation, we start with ptr at the front of the list:

which is achieved by executing ptr=myList. Then executing ptr=ptr.next will cause ptr to stop referring to what it is referring to and start referring to what ptr.next refers to, that is, to the next cell in the linked list:

Each time we execute ptr=ptr.next we "move" ptr one link down the linked list. If we continue doing this until ptr refers to a cell whose next variable is set to null, we will move it to the last cell. This leads to the following code for add:

 public void add(E item)
 {
  if(myList==null)
     myList = new Cell<E>(item,null);
  else
     {
      Cell<E> ptr=myList;
      for(; ptr.next!=null; ptr=ptr.next) {}
      ptr.next = new Cell<E>(item,null);
     }
 }
Note that special code is needed for the case where myList is equal to null, that is adding an item to an empty arrayList.

It is because the variables first and next are not declared as private in the class Cell that they can have their values changed directly by code in the class MyArrayList5. However, no code outside MyArrayList5 can change their values directly because the variable myList which refers to the linked list structure is declared as private. In fact since the type Cell is declared as private, it isn't even possible to have variables of that type outside the class MyArrayList5.

The variable ptr is referred to as a "pointer" because we think of it as something that can be made to point to various cells in the structure. The technique of running a pointer down a linked list using a loop with update ptr=ptr.next is common for operations on linked list. We can see it also in the code which implements the size operation:

 public int size()
 {
  int count=0; 
  for(Cell <E> ptr=myList; ptr!=null; ptr=ptr.next, count++) {}
  return count;
 }
Here, each time the pointer is moved one link down, a count is increased by one. As a consequence, when the pointer has gone right through the list, the count will give the total number of items in the list. Note the difference between the loop condition uses previously ptr.next!=null and the loop condition used now, ptr!=null. In the first case, as soon as the pointer reached the last cell, execution exits the loop. In the second case, the loop is obeyed one more time with the pointer pointing to the last cell. So in the first case when the loop exits the pointer is left pointing to the last cell in the linked list, while in the second case it is left set to null. In our code for size the pointer ptr is declared inside the for loop, which means it can't be used outside it. In our code for add however, we need to use ptr after the loop has finished, so it is declared before the loop.

Now we need code for the get and set operations, which are the direct equivalent of using array indexing. In this case we need a loop which ends with ptr pointing to the posth cell in the linked list. So we have the same pattern of ptr running down the linked list, with a count kept of the number of links gone over. When this count is equal to pos the pointer points to the posth cell. If ptr becomes equal to null before the count becomes equal to pos it indicates there aren't enough cells in the linked list, so an IndexOutOfBoundsException is thrown. When ptr points to the posth cell, executing return ptr.first returns the data item in that cell, while executing ptr.first=item causes it to be changed to refer to the object referred to by item. So this gives the following code:

 public E get(int pos)
 {
  Cell<E> ptr=myList;
  for(int count=0; count<pos&&ptr!=null; ptr=ptr.next, count++) {}
  if(ptr==null)
     throw new IndexOutOfBoundsException();
  return ptr.first;
 }

 public void set(int pos,E item)
 {
  Cell<E> ptr=myList;
  for(int count=0; count<pos&&ptr!=null; ptr=ptr.next, count++) {}
  if(ptr==null)
     throw new IndexOutOfBoundsException();
  ptr.first=item;
 }
The disadvantage of this representation can be seen by the way these operations have to go through the cells in the list to reach a cell indexed by a particular value. In terms of the big-O notation, this is O(N). When using an array as representation, accessing an array component indexed by a particular value is done in one step, with no dependency on the size of the array, so it is O(1).

Adding and deleting cells from a linked list

ArrayLists have an operation add which takes a single argument and adds it to the end of the existing contents of the arrayList. They also have an operation add which takes two arguments, the first an integer, and adds the item given by the second argument to the list of items in the arrayList at the position given by the integer argument. This is not the same as set, for which we gave code above, as that changes the value of the item indexed by the integer. Instead, nothing is changed, instead all following items are moved up one place.

Adding a new cell into a linked list can be done through a pointer to the previous cell. If ptr points to a cell in a linked list, then executing ptr.next = new Cell(item,ptr.next) will cause ptr.next to refer to a new Cell object whose first variable is set to item and whose next variable is set to what ptr.next used to refer to. So if the situation is:

and item holds 7, the result of executing ptr.next = new Cell(item,ptr.next) will be:

Executing ptr.next = ptr.next.next causes ptr.next to start referring to what the next variable of the cell it originally referred to refers to. From the above original position, that would lead to:

But since an object disappears once nothing refers to it, that can be considered as:

So this serves to cut the cell after the cell that ptr is referring to out from the list.

In both cases here, since cells are added into and taken out of the list, the following cells automatically change their position in the list, so there is no need to move them up or down as there was in the array implementation. So a linked list implementation may be preferred if the main operations that are to be done on it are to add and remove items from particular positions, rather than just to access or change items at particular positions.

The techniques described here lead to the code for these operations:

 public void add(int pos,E item)
 {
  if(pos==0)
     myList = new Cell<E>(item,myList);
  else
     {
      Cell<E> ptr=myList;
      for(int count=1; ptr!=null&&count<pos; ptr=ptr.next,count++) {}
      if(ptr==null)
         throw new IndexOutOfBoundsException();
      ptr.next = new Cell<E>(item,ptr.next);
     }
 }

 public E remove(int pos)
 {
  E item;
  if(myList==null)
     throw new IndexOutOfBoundsException();
  else if(pos==0)
     {
      item=myList.first;
      myList=myList.next;
      return item;
     }
  else
     {
      Cell<E> ptr=myList;
      for(int count=1; ptr.next!=null&&count<pos; ptr=ptr.next,count++) {}
      if(ptr.next==null)
         throw new IndexOutOfBoundsException();
      item=ptr.next.first;
      ptr.next=ptr.next.next;
      return item;
     }
 }
Separate code is needed to deal with the case where the item being added or deleted is the first item in the list. There also needs to be code to exit from the loop moving the pointer down and throw a IndexOutOfBoundsException if the pointer reaches the end of the linked list before the count of its position reaches the required value. The method remove has as its return value a reference to the item that was removed, so this needs to be saved in a local variable called item before the cell containing it is cut out from the list.

The other remove operation takes as its argument the item to be removed and causes it to be removed from the collection. It returns a boolean value, false if the item does not appear in the collection and so cannot be removed, true otherwise. Again, the technique that has to be used is to get a pointer, ptr, to point to the cell before the one that is to be removed and then execute ptr.next = ptr.next.next. The test that the next cell to the one pointed to by ptr contains an item equal to that referenced by the variable item is item.equals(ptr.next.first). Before testing this we have to test that ptr.next is not equal to null, so we know there is a next cell to test. This leads to the following code:

 public boolean remove(E item)
 {
  if(myList==null)
     return false;
  else if(item.equals(myList.first))
     {
      myList=myList.next;
      return true;
     }
  else
     {
      Cell<E> ptr=myList;
      for(; ptr.next!=null&&!item.equals(ptr.next.first); ptr=ptr.next) {}
      if(ptr.next==null)
         return false;
      else
         {
          ptr.next = ptr.next.next;
          return true;
         }
     }
 }

The equals method

Any object can have the method equals called on it with another object as the argument. We used equals in the above code for remove with an argument not of type int. If a class does not have its own equals method, it will use the default one, which is that obj1.equals(obj2) only returns true when obj1 and obj2 are aliases of the same object. This may not be the behaviour we require. For example, in the situation illustrated diagrammatically below:

a1.equals(a2) will evaluate to true, but a1.equals(a3) will evaluate to false. If we want equals to return true in situations other than aliasing, we need to write our own equals method in classes where objects are going to be tested using equals. For example, we could have an arrayList of Lisp lists of integers. This type would be written ArrayList<LispList<Integer>>. Suppose a refers to an arrayList of this type, and ls refers to a list object representing [9,4,6]. We would want a.remove(ls) to change the arrayList referred to by a so that if a list [9,4,6] is in it, it will be removed, even if it is not an alias of ls. In order to achieve this, it would be necessary to add the following method to the class LispList:

 public boolean equals(Object other)
 {
  LispList<E> otherList = (LispList<E>) other;
  if(this.isEmpty())
     return otherList.isEmpty();
  else
     return this.head().equals(otherList.head()) &&
            this.tail().equals(otherList.tail());
 }
Note that due to Java's definition of equals, the parameter has to be of type Object and then cast to type LispList<E>. The code should be fairly intuitive: two lists are equal either if they are both empty, or if their heads are equal and their tails are equal. This intuitive definition translates directly to a recursive method. If you prefer to think iteratively, however, the following would work just as well:
 public boolean equals(Object other)
 {
  LispList<E> list1 = this;
  LispList<E> list2 = (LispList<E>) other;
  while(!list1.isEmpty()&&!list2.isEmpty()&&list1.head().equals(list2.head()))
      {
       list1 = list1.tail();
       list2 = list2.tail();
      }
  return list1.isEmpty()&&list2.isEmpty();
 }
The variables list1 and list2 are initially set to the two lists and then repeatedly set to their tails until either one is empty, or their heads are not equal. If this process ends because one is empty and the other is not, or because the two lists do not have equal heads, the initial two lists were not equal. If when it ends because the first list is empty, the second is two it must be that all the previous matching head items were equal, and the lengths must be the same, so the initial two lists must be equal.

However, a more efficient version of equals for Lisp lists would note that as the code is inside the class LispList, we can program it directly in terms of the linked list data structure in side this class, rather than in terms of the public operations of the LispList class. Doing it this way gives the code:

 public boolean equals(Object other)
 {
  if(!(other instanceof LispList))
     return false;
  Cell<E> ptr1 = this.myList;
  Cell<E> ptr2 = ((LispListE) other).myList;
  for(;ptr1!=null&&ptr2!=null; ptr1=ptr1.rest,ptr2=ptr2.rest)
     if(!ptr1.first.equals(ptr2.first))
        return false;
  return (ptr1==null&&ptr2==null);
 }
Note that a disadvantage of writing methods for extra operations in a way which uses the internal data structure is that if the internal data structure changes, the methods have to be changed. This last version of equals for Lisp lists would have to be completely rewritten if we changed to an array-and-count representation for Lisp lists, whereas the previous code could be used unchanged since nothing in it relies on knowing the internal data structure of a LispList object.

Java's own code for ArrayList as provided in its library implements remove and other operations by using the equals method found in the base type, so remembering to provide code for an equals is necessary for any class of objects you intend to put in arrayLists if you intend to use methods like remove on them.

The methods indexOf and lastIndexOf also rely on the use of the equals method. Here is code for them for the class with arrayLists implemented by linked lists:

 public int indexOf(E item)
 {
  int count=0;
  Cell<E> ptr=myList;
  for(; ptr!=null&&!item.equals(ptr.first); ptr=ptr.next, count++) {}
  if(ptr==null)
     return -1;
  else
     return count;
 }

 public int lastIndexOf(E item)
 {
  int pos=-1;
  Cell<E> ptr=myList;
  for(int count=0; ptr!=null; ptr=ptr.next, count++)
      if(item.equals(ptr.first))
         pos=count;
  return pos;
 }
Both involve running a pointer down the linked list and keeping a count of the number of links moved down. In the case of indexOf, the loop terminates when an item equal to the item whose position is being searched for is found, and the count of the number of links moved down is returned as its position. If it terminates with the pointer equal to null, that indicates it has gone right through the linked list and not found the item, so -1 is returned. In the case of lastIndexOf, the pointer must move down the whole list. If it finds an item equal to the one being searched for, a variable keeping the highest position so far of such an item is updated to the new position.

You can find the complete code for MyArrayList5 in the directory ~mmh/DCS128/code/linkedlists in the file MyArrayList5.java. Also in that directory is a file UseMyArrayLists5.java which can be used to test the class MyArrayList5.

In the file UseArrayListsLispLists.java you will find code which prompts you to type a list of list of integers (for example [[2,3,8],[1,12,9,7],[20,6,4],[3,7]]) which is stored in an arrayList of Lisp lists of integers. It then asks you to type in a single list, for example [20,6,4], and it will give you the position of that list in the arrayList. In this case, you would expect it to give 2. You will find, however, that unless you have added the method equals, given above, to the class LispList it will tell you that the list is not found in any position. This indicates how Java's ArrayList relies on equals being implemented in the class of components of an arrayList. Note that the code in UseArrayListsLispLists.java uses an aspect of Java's library we have not covered ("regular expressions") in order to divide a string up to make the list of lists.


Matthew Huntbach

Last modified: 20 March 2006