Java's Collections Framework

Java Lists

We noted previously that Java's library provides us with two types, ArrayList and LinkedList which seem to behave in the same way. Actually, this is not quite correct because the type LinkedList has some extra operations which the type ArrayList does not have. You can find the complete documentation for the class ArrayList here and the complete documentation for the class LinkedList here. Note that correctly these types should be described as ArrayList<E> and LinkedList<E> to indicate they are types with a type parameter, to get the full type you must give the type of the objects which are held in the ArrayList or LinkedList object.

If you look at the complete Java documentation, you will see the extra operations which LinkedList has and ArrayList does not. In particular, there are some concerned with accessing the first and last items of the list. This reflects the fact that LinkedList is implemented using a doubly-linked list, so these operations are particularly suited to this implementation. But also, ArrayList has some operations which LinkedList does not have. One is removeRange which takes two integers as its arguments and changes the object so that all elements starting with the one indexed by the first argument and going up to but not including the one indexed by the second argument are removed. The others reflect the underlying implementation of ArrayList using the array and count technique. The method trimToSize changes the array in the object so that it is of the exact size of the object with no spare capacity. The method ensureCapacity takes an integer argument and changes the array in the object so that it is of the size of the integer argument if it was previously smaller. Using these operations would only make sense if you had a particular concern about efficiency.

You will note that the Java documentation refers to the types ArrayList<E> and LinkedList<E> as being subtypes of other types, and implementing various interface types. The reason for this is to allow generalised code to be written. The various classes and interfaces of which these are part are referred to as the Java Collections Framework. We will cover the most important aspects of this in this and the following section of notes, but if you are interested you can find a full guide here. For our purpose now, the main thing to note is that both ArrayList<E> and LinkedList<E> implement the interface List<E>. You can find the full documentation for this type here. This means that if you write a method whose parameter type is List, you can use that same method for arguments that are of type ArrayList or LinkedList. This is the best thing to do unless the code in the method really needs to use methods which are only found in ArrayList or LinkedList. To make your code as general as possible you should always use the type List rather than the types ArrayList or LinkedList except when you are actually creating objects using new. As List is an interface type, you cannot create an object whose actual type is List, you can only make a variable of type List refer to an object whose actual type is one of those which implements List. Apart from ArrayList and LinkedList, another type which implements List is Vector, which you may see referred to in some older Java textbooks. You can find the full reference to this type here, but the intention of Java's supplier is that though it is still made available for old code, the type ArrayList is a replacement for it.

The type List provides all the operations needed to manage a collection of items stored in a way so that each has a position in the collection. As a simple example, here is one program:

import java.util.*;

class BiggestInt
{
 public static void main(String[] args) throws Exception
 {
  Scanner input = new Scanner(System.in);
  System.out.println("Enter some integers (all on one line):");
  String line = input.nextLine();
  String[] words = line.split(" ");
  List<Integer> a = new ArrayList<Integer>();
  for(int i=0; i<words.length; i++)
     a.add(Integer.parseInt(words[i]));
  int pos = ListOps.posBiggest(a);
  System.out.print("The biggest integer is "+a.get(pos));
  System.out.println(" at position "+pos);
 }
}

which creates an ArrayList of integers and here is another one:

import java.util.*;

class LastAlphabetically
{
 public static void main(String[] args) throws Exception
 {
  Scanner input = new Scanner(System.in);
  System.out.println("Enter some words (all on one line):");
  String line = input.nextLine();
  String[] words = line.split(" ");
  List<String> a = new LinkedList<String>();
  for(int i=0; i<words.length; i++)
     a.add(words[i]);
  int pos = ListOps.posBiggest(a);
  System.out.print("The last word alphabetically is \""+a.get(pos));
  System.out.println("\" at position "+pos);
 }
}

which creates a LinkedList of strings. In both these cases, the variable a is of a List type, but the actual object created can't be of this type as List is only an interface. Both programs make use of a static method in the class ListOps (we have just made this up, it is not built into Java) which is written:

 public static <T extends Comparable<? super T>> int posBiggest(List<T> ls)
 {
  int pos = 0;
  for(int i=1; i<ls.size(); i++)
     if(ls.get(i).compareTo(ls.get(pos))>0)
        pos=i;
  return pos;
 }

So this method can be used to find the position of the "biggest" item in any collection, provided the collection is of a type which implements the List interface. Note it also uses the bounded generic concept we saw previously to indicate that the elements of the collection must be of a type which has its own sense of ordering, and the "biggest" object in the collection is as given by this sense of ordering. So the same method can be used to find the highest number from an ArrayList of integers, and the last alphabetically from a LinkedList of strings.

You can find all the code examples given here in the directory

~mmh/DCS128/code/collections

Java Collections, Sets and Iterators

The Java type Collection<E> is an even more general type than the type List<E>. You can find its full documentation here. This is an interface which contains a method add, which adds an item to a collection, a method remove which removes an item from a collection, and a method contains which tests whether an item is in a collection. But it contains no methods which refer to the position of an item in the collection. The interface List extends the interface Collection by adding those methods which are concerned with items having a position in a collection. So if we want to write code which depends only on an item being either in or not in a collection, and makes no reference to its position, we can use the more general type Collection.

The type Set (documented here) is an alternative extension of Collection which does not have any concept of items being in a position in the collection, but which enforces the rule that an item may be in a set only once at most. So if s is of type Set<T> and t is of type t, then if s.contains(t) returns true (meaning t is already in s) then s.add(t) will have no effect on s. Like List, the type Set is itself an interface which is implemented by two actual classes, HashSet and TreeSet (documented here and here) which implement the underlying concept in two completely different ways. For the purposes of this course, you need not be concerned with how those two classes work underneath, but you would be able to find details in any good book on algorithms and data structures: one uses hash tables, the other uses trees. Using the "big-O" notation we considered previously, accessing a particular data item is O(1) with HashSet, but O(log N) with TreeSet. This suggests that HashSet should be used rather than TreeSet, unless we particularly need to use the extra operations and conditions which TreeSet provides, as HashSet gives more efficient access.

Here are two simple programs which use these two classes:

import java.util.*;

class HashSetTest
{
 public static void main(String[] args) throws Exception
 {
  Scanner input = new Scanner(System.in);
  System.out.println("Enter some words (all on one line):");
  String line = input.nextLine();
  String[] words = line.split(" ");
  Collection<String> a = new HashSet<String>();
  for(int i=0; i<words.length; i++)
     a.add(words[i]);
  System.out.println("The words are: "+a);
 }
}

using HashSet and

import java.util.*;

class TreeSetTest
{
 public static void main(String[] args) throws Exception
 {
  Scanner input = new Scanner(System.in);
  System.out.println("Enter some words (all on one line):");
  String line = input.nextLine();
  String[] words = line.split(" ");
  Collection<String> a = new TreeSet<String>();
  for(int i=0; i<words.length; i++)
     a.add(words[i]);
  System.out.println("The words are: "+a);
 }
}

using TreeSet. All these programs do is read some words in, store them in a collection of strings, and print out the text representation of the collection. You will see if you run this code that in both cases any duplicated words are stored only once. In the TreeSet implementation, you will see the text representation puts the words in alphabetical ordering, while in the HashSet representation, the words are jumbled up in an ordering which is neither alphabetical nor the ordering in which they were added.

Now suppose you wanted to go through all the items in a TreeSet or HashSet object to find which is the largest according to their own sense of ordering (last alphabetically if the items are strings). The problem is that since sets have no sense of items being in a particular position, there is no method get which takes an integer argument i and returns the ith element of the set. So we could not use a for-loop in a way similar to that we used in generalised code for List objects above. The Java collections framework gives us a way round this using the concept of an iterator.

The method iterator is provided in the Collection interface, it is documented here. So for any object whose actual type is a subtype of Collection<E>, if a is a reference to the object, a.iterator() returns an object of type Iterator<E>. So this can be used for objects of type HashSet, TreeSet, ArrayList or LinkedList. If it refers to an object of type Iterator<E> and was produced by the call it=a.iterator() then each time it.next() is called it will return a different item from the collection referenced by a until all the items in that collection have been returned exactly once. You can also call a.hasNext() which will return a boolean value, true if there are some items in a which have not yet been returned by a call of a.next() and false otherwise. Attempting to call a.next() when a.hasNext() would return false causes an exception of type NoSuchElementException to be thrown. Using this idea of an iterator, the following loop:

for(Iterator<E> it=a.iterator(); it.hasNext(); )
    {
     E element = it.next();
     ...
    }

will cause whatever code is given by ... to be executed for the variable element having each of the value from the collection referenced by the variable a. The type written as E can be any type. This pattern is so common that Java 5 introduced a variation on the for-loop, called the for-each loop which represents it. The code

for(E element: a)
   {
    ...
   }

will behave exactly the same as the previous for loop. All that is required is that a (or whatever variable name is used there) is such that the call a.iterator() will return a value of type Iterator<E>. Any variable name can be used in the place of element and then ... will be executed with that variable set in turn to each of the elements from the collection referenced by a. Note, this also works if a is of type E[], that is an ordinary array.

Here, for example, is code which reads some words, stores them in a TreeSet and prints them out using a for-each loop:

import java.util.*;

class TreeSetPrint2
{
 public static void main(String[] args) throws Exception
 {
  Scanner input = new Scanner(System.in);
  System.out.println("Enter some words (all on one line):");
  String line = input.nextLine();
  String[] words = line.split(" ");
  Collection<String> a = new TreeSet<String>();
  for(int i=0; i<words.length; i++)
     a.add(words[i]);
  int count=1;
  System.out.println("The words (duplicates deleted) are: ");
  for(String str: a)
      {
       System.out.println(count+": "+str);
       count++;
      }
 }
}

This will cause the words to be printed out in alphabetic order. If TreeSet is replaced by HashSet they will not be printed in alphabetic order. So the difference between a TreeSet and a HashSet is that it is possible, by using an iterator, to access the elements in the TreeSet in their natural order. A HashSet, however, has more efficient access to data, but this is something that would only be noticeable with a large amount of data. Note also that TreeSet has some extra methods over HashSet, for example headSet takes as its argument an item from the type of object stored in the set and returns the set of all objects from the original set less than that item according to its natural ordering.

As another example of the use of an iterator, here is a static method which returns the biggest item (according to the items' own natural ordering) from any collection of items:

 public static <T extends Comparable<? super T>> T biggest(Collection<T> coll)
 {
  Iterator<T> it = coll.iterator();
  T biggest = it.next();
  while(it.hasNext())
     {
      T nextItem = it.next();
      if(nextItem.compareTo(biggest)>0)
         biggest = nextItem;
     }
  return biggest;
 }

Suppose this method is in a class called CollectionOps, then a call CollectionOps.biggest(a) will return the biggest item from the collection referenced by a, where that collection could be an ArrayList, a LinkedList, a TreeSet or a HashSet. Note that an iterator for a List object (that is, either ArrayList or LinkedList) will go through the items in the list in the order of their index. List objects also have a method listIterator which produces an object which is like an iterator but can move through the items in the collection in both directions. A call to listIterator on an object of type List<E> produces an object of type ListIterator<E>. This is like an object of type Iterator<E> except that as well as having methods next and hasNext for moving forward through the collection and testing whether the end of the collection has been reached, it also has methods previous and hasPrevious for moving backwards through the collection and testing whether the start of the collection has been reached.

`Comparator` objects

The method biggest above will always return the "biggest" item from a collection according to the natural ordering of the base type of the collection. As we noted previously, however, we may want to consider orderings other than the natural ordering of some type. For example, we may want to order strings not by their alphabetic ordering, but by their length. We can generalise methods which rely on ordering objects by using a comparator object as an argument. The Java system provides an interface type for such objects, Comparator<T>, documented here. You can write your own Comparator classes, all you have to do is write a class which extends Comparator (possibly specifying a particular base class), and provides the method compare which takes two objects of the base class and returns an integer. The integer returned should be negative if the first object is considered less than the second object in the ordering the comparator gives, it should be positive if the first object is considered greater than the second in that ordering, and it should be 0 if the two objects are considered equal in that ordering. As an example, here is a class for a comparator object which orders strings according to their length:

import java.util.Comparator;

class LengthComparer implements Comparator<String>
{
 public int compare(String str1,String str2)
 {
  return str1.length()-str2.length();
 }
}

Note you have to specify that the method compare is public and you need the import statement to make the reference to Comparator refer to the type from the java.util package. You will see that if comp refers to an object of type LengthComparer, then comp.compare(s1,s2) where s1 and s2 refer to strings will return a negative number if s1 is shorter than s2, a positive number if s2 is shorter than s1 and 0 if they are of equal length.

Now here is a version of the method biggest which takes a comparator object as an argument:

 public static <T> T biggest(Collection<T> coll,Comparator<T> comp)
 {
  Iterator<T> it = coll.iterator();
  T biggest = it.next();
  while(it.hasNext())
     {
      T nextItem = it.next();
      if(comp.compare(nextItem,biggest)>0)
         biggest = nextItem;
     }
  return biggest;
 }

You will see that where, in the previous version of the method biggest, we had nextItem.compareTo(biggest)>0 to compare the item referred to by nextItem with the item referred to by biggest we now have comp.compare(nextItem,biggest)>0. Since we no longer rely on the items being in a class which has the method compareTo in it, we no longer have to specify that the type parameter T extends Comparable<? super T>. If this method biggest is in a class called CollectionOps then the following code provides a framework for testing it with a LengthComparer argument turning it into a method which returns the longest string from a collection of strings:

import java.util.*;

class Longest
{
 public static void main(String[] args) throws Exception
 {
  Scanner input = new Scanner(System.in);
  System.out.println("Enter some words (all on one line):");
  String line = input.nextLine();
  String[] words = line.split(" ");
  List<String> a = new ArrayList<String>();
  for(int i=0; i<words.length; i++)
     a.add(words[i]);
  Comparator<String> comparer = new LengthComparer();
  String longest = CollectionOps.biggest(a,comparer);
  System.out.println("The longest word is \""+longest+"\"");
 }
}

Another comparator object we could use orders strings by their Scrabble scores (as discussed previously). So the method biggest with its second argument set to such a comparator will take a collection of strings as its first argument and return the string from that collection with the highest Scrabble score. Here is the front-end program which does this:

import java.util.*;

class BestScrabble1
{
 public static void main(String[] args) throws Exception
 {
  Scanner input = new Scanner(System.in);
  System.out.println("Enter some words (all on one line):");
  String line = input.nextLine();
  String[] words = line.split(" ");
  List<String> a = new ArrayList<String>();
  for(int i=0; i<words.length; i++)
     a.add(words[i]);
  Comparator<String> comparer = new Scrabble();
  String best = CollectionOps.biggest(a,comparer);
  System.out.print("The word with the highest Scrabble score is \""+best);
  System.out.println("\" with score "+Scrabble.score(best));
 }
}

Here is the class Scrabble which provides the necessary comparator:

import java.util.Comparator;

class Scrabble implements Comparator<String>
{

 public static final int[] scores = {1,3,3,2,1,4,2,4,1,8,5,1,3,1,1,3,10,1,1,1,1,4,4,8,4,10};

 public static int score(String str)
 {
  String str1 = str.toUpperCase();
  int score = 0;
  for(int i=0; i<str1.length(); i++)
     score+=scores[str1.charAt(i)-'A'];
  return score;
 }

 public int compare(String str1,String str2)
 {
  return score(str1)-score(str2);
 }
}

This class also has a static method score in it, which takes a string and returns its Scrabble score. This shows that just because a class implements some interface provided by Java does not mean it can't also be used to provide some additional methods.

Comparator objects may also be used to vary the behaviour of TreeSet objects. As we saw above, a TreeSet object stores a collection of data (with no duplicates) and has the property that if we go through the data using an iterator, it will be given in its natural ordering. But the class TreeSet<E> has, besides its zero-argument constructor which gives this sort of behaviour, another constructor which takes an argument of type Comparator<E>. If a TreeSet is constructed using this constructor, when its data is gone through using an iterator, the items will be given in the order determined by the comparator object given as the argument to the constructor. The following shows some code which demonstrates using a TreeSet constructed using a comparator of type LengthCompararer as given above:

import java.util.*;

class TreeSetPrint3
{
 public static void main(String[] args) throws Exception
 {
  Scanner input = new Scanner(System.in);
  System.out.println("Enter some words (all on one line):");
  String line = input.nextLine();
  String[] words = line.split(" ");
  Collection<String> a = new TreeSet<String>(new LengthComparer());
  for(int i=0; i<words.length; i++)
     a.add(words[i]);
  int count=1;
  System.out.println("The words in order of length with duplicates deleted are: ");
  for(String str: a)
      {
       System.out.println(count+": "+str);
       count++;
      }
 }
}

It is meant to take some words, store them in a TreeSet ordered by a LengthComparer object, and print them out. If you run it, however, you will see it doesn't behave as you might expect. It doesn't print out every word that was typed in. To see why this, consider that a TreeSet object does not store duplicates. Two objects referred to by obj1 and obj2 are considered equal if obj1.compareTo(obj2) returns 0. But if the TreeSet object is constructed using a comparator referred to by comp it regards obj1 and obj2 as equal if comp.compare(obj1,obj2) returns 0. So if obj1 is in the set, then attempting to add obj2 to the set won't change it, because it is considered that obj2 is just a duplicate of obj1. As we defined LengthComparer, two strings are considered equal if they are of the same length, regardless of the letters in them. So the TreeSet created by new TreeSet<String>(new LengthComparer()) will be such that once a string of a particular length is added to it, no other string of the same length may be added to it, attempting to add one will just cause it to stay the same without indicating an "error". If we want a TreeSet which stores words in order of their length, but stores different words of the same length, we need a modified comparator object which does not regard different words of the same length as equal. Here is code for such an object:

import java.util.Comparator;

class LengthComparer1 implements Comparator
{
 public int compare(String str1,String str2)
 {
  if(str1.length()==str2.length())
     return str1.compareTo(str2);
  else
     return str1.length()-str2.length();
 }
}

This causes strings to be ordered by their length, but if their lengths are equal they are then ordered alphabetically. So creating a TreeSet using new TreeSet<String>(new LengthComparer1()) will give us what we want.

We noted previously that Collections.sort(a) would cause the elements in an arrayList referenced by a to be sorted in their natural ordering (with the arrayList being changed destructively). Note Collections (with a final s) in the java.util package is a separate class from Collection (without a final s). You can find the documentation for the class Collections here. It contains a number of static methods (hence they are called attached to the name Collections and not to an object reference) which deal with collections. Apart from the single argument sort discussed previously, there is a double argument sort which takes a Java List object (so an object of type ArrayList or LinkedList) and a comparator, and sorts the list in the ordering given by the comparator. Many of the other operations in the class Collections are ones we could write ourselves, but it makes sense to use the code already provided as part of the Java system, both because that code can be guaranteed to be correct and it helps to standardise the names used for these common operations. For example, the two methods max are the exact equivalents of the methods we have called biggest here. One takes a collection and returns the biggest element of the collection according to its natural ordering, the other takes a collection and a comparator and returns the biggest element of the collection according to the ordering given by the comparator. This means that in the program above we could replace the line:

String best = CollectionOps.biggest(a,comparer);

String best = Collections.max(a,comparer);

and it would work the same, but this time does not rely on us providing the method biggest in our own class called

CollectionOps


Matthew Huntbach

Last modified: 17 November 2005