Arrays in Java

Array syntax

You have already seen the concept of arrays in Java. So far this is the only way you have seen of storing a collection of data. The idea of a collection is that it's an entity that can be treated as one single thing, but also seen in terms of individual components.

Java has a special syntax which is used just for dealing with arrays, and for nothing else. It employs the square brackets, [ and ]. The first thing to note is that the combination [] is the way to make an array type out of any other type. Just add it to the end of the type name, and you have a new type which means "array of ..." where "..." is the type name you added it to. So int[] is the type "array of integer", String[] is the type "array of strings", Can[] is the type "array of cans", where Can refers to a type defined in the Java class Can, perhaps the class we used in the previous set of notes. You can use a type name which ends in [] in just the same way as any other type name: to declare variables, to declare arguments to methods, and to give a return type for methods. If we have a type name which ends in [] then we refer to the name which comes before the [] as the "base type" of the array type.

Arrays are a form of object. As with other objects, you need to distinguish between a variable which refers to the object and the object itself, and you need to be aware that assignment causes aliasing. For example,

int[] a,b;
declares two variables, a and b which are of type int[], that is "array of integer". It doesn't, however, cause any array objects to be created. Suppose that later in the code, a and b have been made to refer to some objects - which must be arrays of integers. In this case, executing the assignment b=a will cause b to stop referring to the array it was referring to before the assignment and start referring to the array a was referrring to before the assignment. This doesn't stop a continuing to refer to the same array it referred to before the assignment, and it doesn't stop any other variable which refers to the array b referred to before the assignment from still referring to that array after the assignment. Note that if you want to refer to the whole array object, you just use the variable name, as with a and b here, it is not needed and it is not even correct to add [] to the variable name to indicate that the variable is of an array type. Remember that the only use of [] as a single symbol is to make an array type out of some other type.

To create a new array object you need to use the construction consisting of the word new, followed by the base type of the array object, followed by [ followed by an expression which evaluates to an integer, followed by ]. This creates a new array object whose length is the integer value the expression evaluated to. So, for example, a = new int[10]; will cause a, assuming a has already been declared to be of type int[] as above, to be set to refer to an array of integers of length 10. Note that if a had previously been set to refer to another array, it would stop referring to that array and start referring to the new array of length 10. Declaration of a variable of an array type and its initialisation to a particular array are often combined, as in:

int[] a = new int[10];
There is no restriction on the length of an array a variable of an array type may refer to, so a variable which refers to an array object of one length can be assigned to refer to another array object of another length, so long as it is of the same type. However, array objects themselves cannot change length. When an array object is created, its length is fixed at that time.

If a refers to an array object, then a[i] refers to the component of the array object indexed by i. Here i can be any expression which evaluates to an integer, it needn't be a single variable of type int. You can treat a[i] exactly like a variable of the base type of a, you can assign a value to it, for example as in a[i]=n, or you can use it as an argument to a method, or if the base type is a numerical type, as part of an arithmetic expression, as in n=a[i]+1. The flexible thing is that the variable a[i] refers to changes as i changes its value. Note that the length of the array which a refers to is given by a.length. You can use a.length in any expression or as a method argument which requires a value of type int, but you can't assign a value to it. If an array object is referred to by the variable a, then its component parts are referred to by a[0], a[1] and so on up to a[a.length-1], or, of course, by a followed by [ followed by an expression which evaluates to any of the values from 0 to a.length-1, followed by ].

Because you can change the value of the content of an array by the assignment statement a[i]=expr, where a is any reference to an array, i is any expression which evaluates to an int value, and expr is any expression which evaluates to a value of the base type of a, arrays are mutable objects. This means that methods are sometimes written where arrays are passed in as arguments with the intention that executing the method will change the array, as that change will be passed on once the method has finished. It also means you have to be careful if two different variables refer to the same array to remember that changing the array through one variable will cause the array to be changed as it is viewed through the other.

Searching for an item in an array

A common array operation is to find if a particular value occurs in an array. You can do this by looking at each value in the array in turn, until either you have found the one you want or you have gone through the whole array and not found it. Here is a code fragment that, given an integer in the variable n and an array of integers referred to by a finds if the integers n is stored in the array a:
for(int i=0; i<a.length&&a[i]!=n; i++) {}
A variable called i is set to run through the numbers 0, 1, 2 in turn. Note the test condition here, i<a.length&&a[i]!=n, first you have to test that you have not reached the end of the array which happens when i reaches the value a.length, then you have to test that the integer in the part of the array indexed by the variable i is equal to n. In Java, when you have expr1&&expr2 where expr1 and expr2 evaluate to booleans (that is, to true or false), if expr1 evaluates to false, the joint expression is given the value false without any attempt to evaluate expr2. This is acceptable logically, since "P AND Q" viewed as a statement in logic is only TRUE if both P and Q are TRUE, so must be FALSE if P is FALSE regardless of the value of Q. But it is also essential if the value of expr1 tells us we can't evaluate expr2. If i<a.length is false then i is beyond the maximum size for referencing a component of a, so we shouldn't even try to evaluate a[i]!=n since attempting to look at the value of a[i] will cause an error.

The {} indicates this is a for-loop without a body. All it does is update and test. We could write it as the while loop:

int i=0;
while(i<a.length&&a[i]!=n)
   {
    i++;
   }
Another way of writing it would be:
for(int i=0; i<a.length; i++) 
   {
    if(a[i]==n)
       {
        break;
       }
   }
Here I have put the opening { and closing } for both the for loop and the if statement. Note you can omit them if the code they enclose is just a single statement. That is the case here - inside the for loop is a single if statement, inside the if statement (which here does not have an else part) is a single break statement. So this could be written as:
for(int i=0; i<a.length; i++)
   if(a[i]==n)
      break;
It's a matter of taste whether you put omit the brackets when they are not required due to enclosing just one statement. My preference is to omit them in this case as I feel it makes the code look less cluttered, but other authors suggest they should always be used. Remember it makes no difference to how the program executes, it's just a matter of what makes the code look clearer to the human reader.

It's also a matter of taste whether you use the loop test with the two parts as given at first, or prefer a simpler loop test with an alternative exit from the loop using a break statement. A break statement has the effect of execution immediately leaving the loop it is in and starting on the statement following the loop. If there is very little code in the loop body it may make the code clearer if the two ways of exiting the loop are separated in this way. A break statement hidden in a lengthy piece of code, however, could be easily missed so on glancing at the loop header the human reader may not realise there is an alternative way of exiting the loop other than the test there becoming false. So, in general, use break with caution. Java also has a "labelled break" which enables execution to jump out of a loop within a loop, or even more layers of loops. Just very occasionally you may find this construct helps avoid what would otherwise be very convoluted code, but it's not something you should make a habit of using.

Now we have seen a loop which halts either when we have gone through the whole array or found the integer we are looking for, what are we going to do with it? Note that when the loop terminates, the loop index variable i either has the value a.length in which case an integer with the value equal to that in variable n hasn't been found in the array, or the variable i has the value of the index of the component of the array where the integer with value n has been found. But an immediate problem with the code we wrote is that as the integer variable i is declared in the initialisation part of the for loop, it goes out of scope after the for loop. So if we want to access it after the loop, it should be declared before it rather than in it, as in:

int i=0;
for(; i<a.length; i++)
   if(a[i]==n)
      break;
if(i==a.length)
   System.out.println("The integer "+n+" is not in the array");
else
   System.out.println("The integer "+n+" is in the array");
Of course, the System.out.println statements could be replaced by whatever it is we want to do which varies depending on whether the integer is in the array or not.

The operation of testing whether an integer is in a particular array is so common we might want to make it a separate static method. If we call the method isIn it must take an integer and an array of integers as an argument, and return a boolean. Then the above could be written just:

if(isIn(a,n))
   System.out.println("The integer "+n+" is not in the array");
else
   System.out.println("The integer "+n+" is in the array");
We might initially think of writing the method as:
public static boolean isIn(int[] a,int n)
{
 int i=0;
 for(; i<a.length; i++)
    if(a[i]==n)
       break;
 if(i<a.length)
    return true;
 else
    return false;
}
But remember that i<a.length is itself a boolean value, so we could make our code a little neater by writing it as:
public static boolean isIn(int[] a,int n)
{
 int i=0;
 for(; i<a.length; i++)
    if(a[i]==n)
       break;
 return i<a.length;
}
Whenever you find yourself with a method which returns a boolean and you are writing something of the form
 if(test)
    return true;
 else
    return false;
remember you can always write it as:
   return test;
However, another way of writing the method is:
public static boolean isIn(int[] a,int n)
{
 for(int i=0; i<a.length; i++)
    if(a[i]==n)
       return true;
 return false;
}
Remember that return in a method acts to halt execution of the method, so in this case it combines breaking out of the loop and returning the value true. The final statement return false is only executed if the loop terminates because its test is false, so the condition where a[i]==n which would have caused it to exit before never occured. It is important to note that this final statement is outside the loop, if the brackets enclosing the loop body were put in, it would look like:
public static boolean isIn(int[] a,int n)
{
 for(int i=0; i<a.length; i++)
    {
     if(a[i]==n)
        return true;
    }
 return false;
}
which makes this clearer. Do not make the mistake of confusing this with:
public static boolean isIn(int[] a,int n)
// THIS CODE IS SILLY!
{
 for(int i=0; i<a.length; i++)
    if(a[i]==n)
       return true;
    else
       return false;
}
where the return false is inside the loop. Here, when a[0]==n is true, the method halts and returns true but when a[0]==n is false, the method halts and returns false without checking the rest of the array, which is obviously not what we want.

Now, suppose a is of type String[], and we are searching for whether a particular string, given by variable str of type String is in the array. Here is a code fragment which does this, and prints a message saying whether the string is in the array:

int i=0;
for(; i<a.length; i++)
   if(a[i].equals(str))
      break;
if(i==a.length)
   System.out.println("The string "+str+" is not in the array");
else
   System.out.println("The str "+str+" is in the array");
Or we could write a static method that tests whether a string is in an array of strings:
public static boolean isIn(String[] a,String str)
{
 for(int i=0; i<a.length; i++)
    {
     if(a[i].equals(str))
        return true;
    }
 return false;
}
You should be able to spot the similarity to the previous code. The types have to be changed, and also to test whether two integers m and n are equal we use m==n, but to test whether two strings str1 and str2 are equal, we use str1.equals(str2), so if a is an array of strings to test whether the string in the component of a indexed by i is equal to str we use a[i].equals(str).

What is happening here is that the algorithm is exactly the same since it does not depend on the types in the array. The word "algorithm" means "a way of solving a problem". The problem here is finding whether a particular item is in an array, the algorithm is to look at the components in the array one at a time in the order in which they are indexed until we have either found the item or gone through the whole array.

You might wonder whether we have to write separate method isIn for every possible type of item where we want to test whether a particular item of that type is stored in an array of items of that type. Java (since the version called Java 5, which was introduced in 2004) offers a way round this which enables us to write a generic version of isIn that can be specialised to work for objects of any type. But this is something to be discussed later.

In the directory ~mmh/DCS128/code/arrays you will find two files, UseArrays1.java and UseArrays2.java with a demonstration of testing for membership of arrays, one with arrays of integers, the other with arrays of strings. Supporting code is needed to read in the contents of the arrays, but at this point you need not be concerned with how this code works.

Finding a position in an array and the importance of specification

Suppose you want to find not just whether an item appears in an array, but its actual position. You could use the same code as before, but return the value of the loop index when it finds the item being searched for. Obviously, your method must now return an int rather than a boolean. Here is a version which uses the loop without a body we considered first:
public static int position(int[] a,int n)
{
 int i=0;
 for(; i<a.length&&a[i]!=n; i++) {}
 return i;
}
What happens if the integer is not in the array? Here the value returned is equal to the length of the array, but this is probably not a good way of dealing with the problem. It would be more clear that we are dealing with the special case of an integer not occurring in the array if in that case we returned a value which could not otherwise be returned, a fairly standard way of dealing with this would be to return -1. Here is some code which does this, this time using the technique of a return statement inside the loop:
public static int position(int[] a,int n)
{
 for(int i=0; i<a.length; i++)
    {
     if(a[i]==n)
        return i;
    }
 return -1;
}
Again, remember the final return statement only gets executed if the return statement inside the loop never gets executed because at no stage is a[i]==n true.

Suppose we decide to go through the array starting at the highest indexed component and working down:

public static int position(int[] a,int n)
{
 int i=a.length-1;
 for(; i>=0&&a[i]!=n; i--) {}
 return i;
}
In this case it happens that i will have the value -1 if n is not in the array a. But there is a subtle difference between this code and the above. What if the integer n appears more than once in the array a? In the first case what will be returned is the lowest of all the indexes of occurrences of n, in the second case the highest. When we draw diagrams of arrays we generally show the contents listed from the lowest indexed on the left to the highest indexed on the right, as in:

Here, the integer 54 occurs in the array component indexed by 1 and in the array component indexed by 6. Depending on how we wrote the code, the method positions could return either 1 or 6 if its arguments were a reference to the array shown diagrammatically above and the integer value 54. Sometimes we actually refer to the lower indexed components of an array as being to its "left" and the higher indexed components as being to its "right". So we may say our method to return the position of an integer in an array of integers will return the position of the "leftmost" or "rightmost" occurrence. If we describe the behaviour of some code in terms of diagrams we have drawn up to help us visualise it, we should make sure the person we are describing it to also understands it in terms of the same diagram. We should remember the diagrams are not how it is "really" represented on the computer.

The issues we have encountered with this problem indicate that when we write a piece of code to solve some problem, we should be careful to make sure we cover every possible circumstance. Here we started off saying we wanted a method to return the position of an integer in an array of integers, but when we came to write the method we found out we needed to decide how to deal with the case of the integer not occurring at all in the array, and the integer occurring more than once in the array. When a large program is being written, it will often be the case that one person writes the code that uses a method and states what they want the method to do, while another person writes the code for that method. The description of what the method is meant to do is called the specification and the actual code that does it is called the implementation. If what we are told about what a method should do does not cover every possibility, we term that method underspecified. A full specification of a method to return the position of an integer in an array of integers would say what is returned when the integer does not occur in the array, and what position is returned if it occurs more than once.

The danger with underspecified methods is that the person who uses the method may just assume that in cases not specifically covered one way of dealing with them will be used, while the person writing the code may choose another way. This could lead to problems when the code for the method and the code that uses it is put together to make a complete program. In the above example it may be that the person who wanted to use the method position just assumed it would return 1 and never supposed the person writing the method would write it so that it returned 6. It is for reasons like this that in large scale programming, writing the specification for a piece of code is as important a skill as writing the code itself.

So we see here, as we saw with the drinks machines example, the importance of specification. We need to think of our methods and classes as things which are designed to do a particular job, and not just as arbitrary pieces of Java code put together. Then when in one piece of code we write a call to a Java method, we think of that call in terms of the job it is supposed to do rather than in terms of the Java code that it executes when the call is made. Quite often, thinking about the exact job we want a method or class to perform helps us to write better code. But sometimes also when we come to write the code we find there were aspects which weren't specified originally, and so the task of writing the code leads us to refine the specification.

In the directory ~mmh/DCS128/code/arrays you will find two files, UseArrays3.java and UseArrays4.java which demonstrate finding the position of an integer in an array of integers. Note, the important issue here is the code in the method position which actually implements the algorithm to find the position. The code in the method main is just support code which enables a demonstration of a call to the method to be run. The two files show starting search of the array in different directions, UseArrays3.java from the lowest indexed to the highest and UseArrays4.java the other way round. Remember when running the demonstrations that the position of components of an array starts at position 0, then position 1 and so on. So the number given may be one less than you are expecting if you didn't realise this.

You will also find a file UseArrays5.java in the directory, which demonstrates finding the position of a string in an array of strings. You can see that the method which does this is identical in pattern to the method in UseArrays3.java, differing only in the name of the base type and the use of equals for equality of strings rather than == for equality of integers.


Matthew Huntbach

Last modified: 16 June 2005