Algorithms and Data Structures in an Object-Oriented Framework (“ADSOOF”)

Sorting Arrays and ArrayLists: Part 1

We are looking at sorting algorithms here, but it is important to realise that we are doing this in order to gain a general understanding of principles of algorithms, and how they can be implemented in code, and not because when writing code to build applications you would need to write sorting code. As we will see later, sorting code is provided in Java's standard code library, and there is no good reason for writing your own rather than using what is provided. However, using Java's built-in sort code may require an understanding of generalisation features, which we will cover later, either the Comparable<T> or the Comparator<T> interface types.

Sorting is a good example to use to illustrate general principles of algorithms. What the algorithm has to achieve is something we all understand, as it is a common task in real world situations. It also illustrates the concept of generalisation, as the general concept of ordering relies on there being a comparison operator between two elements, and that differs according to the type of element, and also there may be more than one way of comparing two elements of the same type (for example, comparing strings by alphabetic order and comparing strings by length). Sorting also demonstrates that there may be more than one algorithm to solve a problem, and different algorithms may have different efficiencies.

The algorithms in this section are all inefficient sorting algorithms. So even if you did have to write your own sorting code, they would not be good algorithms to use. In the second set of notes in this section we look at more efficient sorting algorithms, and in the third set of notes we look at formal analysis of efficiency.

Constructive Insertion sort

We have previously looked at sorting algorithms using Lisp lists. Now let us consider sorting algorithms using arrays and arrayLists. The first sorting algorithm we considered was insertion sort. The idea of this is that we start off with a new collection which is empty at the start, and we go through the elements of the collection which is to be sorted, inserting each one of them into this new collection. We need to insert them in a way which keeps the new collection in sorted order.

This is easily done with arrayLists, since an arrayList can change in size. You will remember that the class ArrayList defines two methods called add. One adds the argument it takes to the end of the arrayList it is called on, but the other specifies a position it is to go in. If we assume an arrayList is sorted, and we want to insert a new element and keep it sorted, we can go through the elements in the arrayList one at a time until we find the first one which is not less than the one being inserted. Then we insert the new element at that position, which will cause the element already at that position and all the ones following it to be moved up one place. This gives the following code for insertion sort on an arrayList of integers:

 public static ArrayList<Integer> sort(ArrayList<Integer> a)
 {
  ArrayList<Integer> b = new ArrayList<Integer>();
  for(int i=0; i<a.size(); i++)
     insert(a.get(i),b);
  return b;
 }

 private static void insert(Integer n,ArrayList<Integer> a)
 {
  int i;
  for(i=0; i<a.size()&&a.get(i).compareTo(n)<0; i++) {}
  a.add(i,n);
 }

Remember that the element type for ArrayList (as for any generic type) must be an object type, so we use Integer rather than int. We use the compareTo method to compare Integer values. So if m and n are expressions of type Integer (they could just be variables, but they could instead be method calls whose return type is Integer) then m.compareTo(n) gives a negative value of type int if m represents an integer value less than that represented by n, a positive int value if m represents a value greater than n, and 0 if m and n represent the same integer value.

Note that the loop in the method insert has to have a conjunction of two tests. It has to check that the integer indexed by the loop variable i is less than the integer n to be inserted, which is done by a.get(i).compareTo(n)<0. It also has to check that the loop index is less than the size of the arrayList. If the loop index is equal to the size of the arrayList, it means we have gone through the whole arrayList and even the last element is less than the element to be inserted, so we insert the new element at the end. In that case, i<a.size() is false and the test a.get(i).compareTo(n)<0 is never tried because of the “short circuit” effect of the && operator. That is why the i<a.size() test has to come first, because if i has reached the value of a.size() then calling a.get(i) will throw an IndexOutOfBoundsException. When writing code you always need to consider if there are any special cases which need careful dealing with, such as this case where inserting an element in an ordered collection happens to mean inserting it after all other elements in the collection because all other elements are smaller.

The loop in the method insert has an empty body, indicated by {} following the loop header. Remember that a loop body is surrounded by { and }. So { and } surrounding nothing means the loop body contains nothing, all the loop work is done by the test and update in the loop header. The method call a.add(i,n) occurs after the loop, it is not inside the loop. The fact that it not indented helps the human reader of the code see it is not inside the loop, though it is the occurrence of the previous {} which tells the compiler that this method call is not in the loop body. The variable i is declared before the loop rather than in the initialisation part of the loop to enable it to be accessed in the a.add(i,n) after the loop.

The { and } of a loop can be omitted if the loop body consists of a single statement, and the method sort show an example of this. In that case, the single statement was a single method call, but it could be a compound statement such as an if-statement or another loop. In that case, it is still a single statement, even if it contains multiple statements inside its own body.

There are three files in the code folder for this section which have exactly the same code for the sorting algorithm, but which differ in the support code to demonstrate it. In InsertSort1.java, a human user is prompted to type in numbers to be sorted. In InsertSort2.java, the human user is only asked for the number of numbers and the highest possible number. Then random numbers are generated within this range, but the numbers are printed before and after sorting. In InsertSort3.java, the numbers are generated randomly as before, but they are not printed; they are still sorted but only the time taken to sort them is printed. This enables you to test the code with your own numbers, then test it with so many numbers that it would be tedious to type that many yourself but they are still displayed so you can see the code works, then to test it with so many numbers that it would be impractical to display them.

The issue of efficiency is something we will deal with later. In practice, computer programs often have to deal with large amounts of data, much more than you could type by hand or display on one screen. Differences of efficiency between different algorithms become much more apparent when they are run with large amounts of data. So it's important to test code on large amounts of data.

Note that while testing code by running it with data and observing the results is an important part of software development, we can also analyse algorithms mathematically to prove they work correctly and to find out their efficiency. Although the emphasis in this course is on programming algorithms, the scientific study of algorithms is much more concerned with mathematical analysis of them than on coding issues.

Helper methods

We could have written the code for constructive insertion sort of an arrayList of integers in the following way:

 public static ArrayList<Integer> sort(ArrayList<Integer> a)
 {
  ArrayList<Integer> b = new ArrayList<Integer>();
  for(int i=0; i<a.size(); i++)
     {
      int n = a.get(i);
      int j;
      for(j=0; j<b.size()&&b.get(j).compareTo(n)<0; j++) {}
      b.add(j,n);
     }
  return b;
 }

Here, instead of a separate method insert for inserting an integer into position in a sorted arrayList, the code which does the same thing is incorporated into the sort method. It is the same algorithm, just expressed slightly differently. For the human reader of the code this may make it a little harder to understand. If there is a part of the code which does what you can think of as a separate job, it often makes sense to put it into a separate helper method as in the previous code. If you are asked to write “a method” to do some task, that does not mean literally one method: splitting it up into a method and one or more helper methods is always acceptable. Note, however, that helper methods should be declared as private, although in some cases you may find that an existing public method can be used as a helper method. Adding public methods which weren't asked for could enable other code to interact with a class in a way which wasn't intended, in some cases (although not this one) leading to problems in security or corruption of data.

It is a matter of taste when to introduce helper methods. Giving the helper method a meaningful name and definition means when thinking about the code means we can separate it off rather than have to think of it in terms of one big chunk. In the first version of the code, it should be easy to see that insertion sort involves repeatedly doing an insert operation, this is less easy to see when the code is just a loop within a loop. The situation would obviously get worse for more complex structure of loops and conditionals. So code broken into several methods with simple structures is usually better than code which has a complicated structure. It may also be the case that a helper method can be used again when the same task is required in another public method. This makes code shorter and easier to modify, as if a task is put into a separate method which is called several times then if it needs to be modified that can be done once, rather than have to modify the code for the task everywhere it occurs if it was not made into a helper method. Excessive use of helper methods may make code more complicated to follow, however, if they are used for tasks so trivial that it would be shorter to use the actual code in the method rather than a method call, and also when a helper methods doesn't do an operation which is easily considered as a single coherent task.

Code for the version without the helper method can be found in InsertSort1a.java. In InsertSort3a.java the same code is used, as with InsertSort3.java, to enable testing it with large amounts of data. Sometimes the use of helper methods is criticised as leading to inefficiency, as the execution mechanism has to deal with the additional work of setting up the new environments for the extra method calls. However, good compilers can overcome issues like this. In fact, when I ran the code, the version with helper methods ran faster than the version which did the sorting all in one method. In general, it's the algorithm used and not minor issues of exactly how it's expressed which is the main thing affecting efficiency. So you should always code it in the way which makes it most understandable for humans viewing the code.

Insertion sort in place

Destructive sorting of a collection involves moving the elements from their original locations to new locations in the collection rather than constructing a new sorted collection. Another name for this destructive sorting is sorting in place. Destructive insertion sort works like constructive insertion sort by going through the collection one element at a time and inserting it in place. However, instead of inserting the element into a new collection, it works by having part of the collection holding sorted data, and the rest unsorted. As each element is gone through in the unsorted data it is inserted into the sorted data with the portion of the data which is sorted growing until it becomes all of it.

The method for insertion sort in place of an arrayList of integers is:

 public static void sort(ArrayList<Integer> a)
 {
  for(int i=1; i<a.size(); i++)
     insert(a.get(i),a,i);
 }

At the start of each execution of the loop body, the portion of the arrayList up to but not including that component indexed by i is sorted. The call of insert(a.get(i),a,i) in the loop body takes the component indexed by i and inserts it into the sorted part. It does this by moving up one place all those elements in the sorted part which are greater than the element indexed by i, then putting the element which was indexed by i in its place. This makes the code for insert:

 private static void insert(Integer n,ArrayList<Integer> a,int i)
 {
  for(; i>0&&a.get(i-1).compareTo(n)>0; i--)
     a.set(i,a.get(i-1));
  a.set(i,n);
 }

This is best illustrated using an example shown by a diagram. Suppose we want to sort the following arrayList:

When the loop body in the sort method has been executed five times, leaving i with the value 6, the arrayList will be:

The bar through it indicates the division between the sorted part and the unsorted part. The method insert is executed in an environment where n is set to a.get(i), which is 28, and there is a local variable i set to the value of i in the environment for sort, which is 6, and a refers to the same arrayList. Diagrammatically, the situation is:

After the loop in the insert method has been executed, the elements in the sorted part of the arrayList which are greater than 28 will have neen "moved up" one place, making the situation:

with the local variable i in the environment for the call to insert having its value changed to 3. Then, executing a.set(i,n) sets the arrayList to:

where the bar again indicates the division between a sorted and unsorted part, and you can see it has moved up one place. When execution returns to the environment for the sort method call, the contents of the arrayList referred to by a will have been changed in this way. The loop on the sort call finishes when its i variable has as its value the size of the arrayList, indicating that now all of it has been sorted.

Since this algorithm for sorting an arrayList in place does not change the size of the arrayList at any time, the same algorithm can be used to sort an array in place. The code for this is:

 public static void sort(Integer[] a)
 {
  for(int i=1; i<a.length; i++)
     insert(a[i],a,i);
 }

 private static void insert(Integer n,Integer[] a,int i)
 {
  for(; i>0&&a[i-1].compareTo(n)>0; i--)
     a[i]=a[i-1];
  a[i]=n;
 }

Full code for sorting an arrayList in place can be found in the files InsertSort4.java and InsertSort5.java, in the first case with randomly generated numbers which are displayed, in the second with randomly generated numbers and only timing figures, no display. Similarly, full code and supporting code for demonstrating sorting in place of an array rather than an arrayList can be found in InsertSort6.java and InsertSort7.java.

Selection Sort

Selection sort is an algorithm which is quite similar to insertion sort in place. As with insertion sort in place, the array is divided into a portion which is unsorted and a portion which is sorted. But with insertion sort in place, the sorted portion of the array consists of the first element from the original array, then the first and second element from the original array, then the first and second and third element from the original array, and so on. With selection sort, the sorted portion of the array consists of the lowest element from the original array, then the lowest and second lowest element from the original array, then the lowest and second lowest and third lowest element from the original array and so on.

The algorithm for selection sort works by first finding the lowest element in the original array and swapping that in position with the element in the first indexed component of the array. Then the lowest element in the remainder of the array (i.e. the second lowest element in the whole array) is found and swapped in position with the element in the second indexed component of the array. Then the lowest element in the portion of the array starting at the third indexed position is found and swapped with the element at the third indexed position, and so on. Here is the code to do that:

 public static void sort(Integer[] a)
 {
  for(int i=0; i<a.length-1; i++)
     {
      int pos = findMinPos(a,i);
      swap(a,i,pos);
     }
 }
 
 private static int findMinPos(Integer[] a,int pos)
 {
  for(int i=pos+1; i<a.length; i++)
     if(a[i].compareTo(a[pos])<0)
        pos=i;
  return pos;
 }

 private static void swap(Integer[] a,int pos1,int pos2)
 {
  Integer temp = a[pos1];
  a[pos1]=a[pos2];
  a[pos2]=temp;
 }

Here the method findMinPos with arguments a and pos returns the position of the smallest element in the portion of array a starting with the component indexed by pos. The method swap with arguments a, pos1 and pos2 changes the array a destructively by swapping round the element at position pos1 with the element at position pos2.

Code for selection sort of arrays of integers can be found in the files SelSort1.java, SelSort3.java and SelSort5.java. In the first case there is supporting code for the numbers to be sorter to be typed in, in the second they are generated randomly and displayed, in the third they are generated randomly and only timing figures are given. In SelSort2.java, SelSort4.java and SelSort6.java selection sort is similarly applied to an array of strings. You will also need the file WordGen.java, which randomly generates pronounceable words, to run the programs in SelSort4.java and SelSort6.java. The code for selection sort of an array of strings is:

 public static void sort(String[] a)
 {
  for(int i=0; i<a.length-1; i++)
     {
      int pos = findMinPos(a,i);
      swap(a,i,pos);
     }
 }
 
 private static int findMinPos(String[] a,int pos)
 {
  for(int i=pos+1; i<a.length; i++)
     if(a[i].compareTo(a[pos])<0)
        pos=i;
  return pos;
 }

 private static void swap(String[] a,int pos1,int pos2)
 {
  String temp = a[pos1];
  a[pos1]=a[pos2];
  a[pos2]=temp;
 }

As you can see, it is exactly the same as the code for selection sort of an array of integers, apart from the type String used where Integer was used before. The reason it would not be possible to write a simple generic version with a type variable replacing String and Integer is the call to compareTo in the method findMinPos. Only some types may have the method compareTo called on them, so we cannot call it on an object where we know nothing about its type, which is the case with type variables as we have seen them up till now. Later we will see a way round this, but for now we will give code for sorting algorithms in terms of sorting collections of integers.

If we use the same example as before, sorting the array which starts out as [30,25,47,99,8,12,23,63,16,20], the lowest element in the whole array is found and swapped with the first element, giving [8,25,47,99,30,12,23,63,16,20]. Then the lowest element in the rest of the array is found and swapped with the element in the second position, giving [8,12,47,99,30,25,23,63,16,20]. Following each step in the algorithm, the successive steps in the transformation of the array are:

[30,25,47,99,8,12,23,63,16,20]

[8|25,47,99,30,12,23,63,16,20]

[8,12|47,99,30,25,23,63,16,20]

[8,12,16|99,30,25,23,63,47,20]

[8,12,16,20|30,25,23,63,47,99]

[8,12,16,20,23|25,30,63,47,99]

[8,12,16,20,23,25|30,63,47,99]

[8,12,16,20,23,25,30|63,47,99]

[8,12,16,20,23,25,30,47|63,99]

[8,12,16,20,23,25,30,47,63|99]

[8,12,16,20,23,25,30,47,63,99]

with the | character used to indicate the division between the sorted portion and the unsorted portion.

Selection sort may be applied to an arrayList as well, the only difference with its application to arrays being the use of method calls rather than the special array syntax. Here is the code for selection sort on an object of type ArrayList<Integer>:

 public static void sort(ArrayList<Integer> a)
 {
  for(int i=0; i<a.size()-1; i++)
     {
      int pos = findMinPos(a,i);
      swap(a,i,pos);
     }
 }

 private static int findMinPos(ArrayList<Integer> a,int pos)
 {
  for(int i=pos+1; i<a.size(); i++)
     if(a.get(i).compareTo(a.get(pos))<0)
        pos=i;
  return pos;
 }

 private static void swap(ArrayList<Integer> a,int pos1,int pos2)
 {
  Integer temp = a.get(pos1);
  a.set(pos1,a.get(pos2));
  a.set(pos2,temp);
 }

The files SelSort7.java and SelSort8.java give code for selections sort on arrayLists of integers, together with supporting code.

Note that although it makes the code easier to understand when there is a separate method findMinPos which finds the position of the minimum integer in the array starting at a particular position, there is no need to make this a separate method. The code for this method could be incorporated into the main sort method:

 public static void sort(Integer[] a)
 {
  for(int i=0; i<a.length-1; i++)
     {
      int pos = i;
      for(int j=i+1; j<a.length; j++)
         if(a[j].compareTo(a[pos])<0)
            pos=j;
      swap(a,i,pos);
     }
 }

or, taking this forward, the code for the separate method swap could also be incorporated:

 public static void sort(Integer[] a)
 {
  for(int i=0; i<a.length-1; i++)
     {
      int pos = i;
      for(int j=i+1; j<a.length; j++)
         if(a[j].compareTo(a[pos])<0)
            pos=j;
      Integer temp=a[i];
      a[i]=a[pos];
      a[pos]=temp;
     }
 }

This doesn't change the algorithm, just the way it's expressed in Java. The files SelSort1a.java and SelSort1b.java give the versions without the separate methods. You should always regard it just as a matter of understandability whether you choose to break an algorithm into separate methods each doing part, or whether to have one method with a more complex structure.

Bubble Sort

Another sorting algorithm you may have heard of, or come across, is bubble sort. It is very similar to selection sort, and like selection sort works destructively. Again, it works by having the array at any one time divided into a sorted and an unsorted part, each time round the main loop causing the size of the sorted part to increase by one and the size of the unsorted part to decrease by one. With bubble sort, the first time round the main loop has the element in each position in the array compared with the element following it, and if the element following it is lower, the two are swapped in position. If the two are swapped, the next element looked at will be the one looked at previously since it was moved up one place in the swap. Since the largest element in the array will always be moved up as it is larger than any element that might follow it, it will end up in the last place in the array. This means the next time round the loop we needn't consider the last element at all, so repeating the process but not up to the last element means the second highest element gets moved to the second to last position in the array. The idea is that each time round the main loop, the highest element in the unsorted part of the array “bubbles up” to the highest indexed cell in the array, then it forms the lowest element in the sorted part.

The code for it is this:

 public static void sort(Integer[] a)
 {
  for(int j=0; j<a.length-1; j++)
     for(int i=0; i<a.length-1-j; i++)
       if(a[i+1].compareTo(a[i])<0)
          swap(a,i,i+1);
 }

with the code for the method swap being the same as it was for selection sort above. The code for this, and supporting code to run it can be found in BubbleSort1.java, with BubbleSort2.java showing the same algorithm used to sort an array of strings, BubbleSort3.java showing a random array of integers being sorted, BubbleSort4.java showing a random array of strings being sorted and BubbleSort5.java showing the timed sorting of a random array of integers without printing it in order to be able to test timings with a large sized array.

Matthew Huntbach

Last modified: 28 February 2019