String and related classes in Java

Strings in Java

You have already seen the idea of String as a type in Java. What you may not yet have grasped is that a value of type String in Java is actually an object. So values of type String are like values of type DrinksMachine which we saw previously: you can create new objects of type String using the constructors in class String, and you can call the non-static methods in class String with the calls attached to references to String objects.

However, DrinksMachine is not a built-in type of Java, to use it you had to have a copy of the file DrinksMachine.class in your directory. The class String is provided as part of Java, it is always available for you to use in any Java program you write. To use most types provided with Java you have to use an import statement (we mentioned these briefly here), but a few are regarded as so common that you don't even have to import them, String is one of these (the others are all those in the java.lang package). In fact every type in Java defines a type of object and has an associated class with it, with the exception of the numerical types, int, double and so on, the type boolean and the type char. These non-object types are known as the primitive types.

One reason you have not seen strings as objects is that, unlike any other type of object, there is a special way of representing string values, that is the representation using the quote characters enclosing the characters that make up the string. You can also make new string objects by Java's use of + to mean string joining. So there is little need to create new strings using a constructor. However, for example, there is a constructor in class String which takes an array of characters as its argument and produces a string where each character in the string is equal to the character in the same position in the array. So if a is a variable of type char[], and has been set to refer to an array object, then new String(a) will return the equivalent string.

Strings in Java (unlike, for example, the language C) are not the same thing as arrays of characters. If we have a variable a of type char[] and a variable str of type String, then for any expression i which evaluates to an integer, a[i] gives the ith character of the array referred to by a, while str.charAt(i) gives the ith character of the string referred to by str. As we saw here, we can use a[i] as a variable, so we can put it on the right hand side of assignment statement and assign a value to it. For example, a[3]='x' will change the fourth character of the array referred to by a (remembering the first character is indexed by 0) to lower-case 'x'. However, str.charAt(i) is a method call, and you cannot assign a value to a method call. The Java compiler will give a error message if you attempted to compile code with the statement str.charAt(3)='x' in it. So string objects can't be changed in the way array objects can be changed, in fact string objects are immutable: they cannot be changed at all, you can only create new ones.

A minor but perhaps rather annoying difference between arrays and strings in Java is that the length of an array a is given by a.length and the length of a string str is given by str.length(). The difference is the () at the end of str.length() which indicates this is just a method call, the zero-argument method length, called on the object str.

Methods in class `String` in Java

When dealing with the libraries provided with Java, you can find full documentation of all the methods they provide on the official web site provided by the Sun company here. The documentation for the class String is provided here. You should be aware that as this is the official documentation, it has to cover every aspect, including rarely used methods and methods which only make sense when you have become familiar with more advanced aspects of Java. It would be a good idea to get used to using this official documentation, but for the purposes of this course don't feel you have to become familiar with the vast range of classes and methods that are available. We will cover just a small number of them in this course, either because they are particularly fundamental, or because they illustrate an important point. Remember, the main purpose of this course is to introduce concepts of algorithms and data structures rather than to concentrate of detailed aspects of the Java language.

As an example, the method replace acts in a similar way to the method change we discussed previously with arrays. It takes a string and two characters and returns the result of replacing all occurrences of the first character by the second character in the string. It does this constructively rather than destructively, that is it returns a new string representing the change, it does not change the string it is called on. In fact the class String has no destructive methods, this ensures that String objects are immutable, there is no method that can be called on them to change them, there are only methods that can be called on them to return new strings representing a change.

With the constructive method change we developed for arrays of integers, if a1 and a2 were of type int[] and n1 and n2 were of type int then the statement a2=change(a1,n1,n2) causes a2 to be set to an array which represents the effect of changing all occurrences of the integer n1 to the integer n2 in the array a1. With the method replace in class String, if str1 and str2 are variables of type String and ch1 and ch2 are variables of type char, then the statement str2=str1.replace(ch1,ch2) causes str2 to be set to a string which represents the effect of changing all occurrences of the character ch1 to the character ch2 in the string str1. For example, if str1 holds a reference to the string "happy", and ch1 hold the character 'p' and ch2 holds the character 'r', then str1.replace(ch1,ch2) will return a reference to the string "harry".

The difference in construct between str1.replace(ch1,ch2) and change(a1,n1,n2) is because our change was a static method while the replace method in class String is not a static method. So the array being changed had to be passed as one of the arguments to the method change, and this method was not called attached to any object. The method replace has to be called attached to a String object, which is the object to be changed, and this is done instead of the object being passed as one of the arguments to the method call.

Other useful methods in class String which return strings based on transforming the string they are called on include:

trim returns the string with all blank characters at either end removed. So if str represents " Hello World " then the call str.trim() will return the string "Hello World".
toUpperCase returns the string with all lower case letters replaced by the equivalent upper case letters. So if str represents "Hello World" then str.toUpperCase() will return the string "HELLO WORLD".
toLowerCase returns the string with all upper case letters replaced by the equivalent lower case letters. So if str represents "Hello World" then str.toLowerCase() will return the string "hello world".
substring takes two integers as arguments and returns the portion of the string it is called on which starts with the character indexed by the first integer and goes up to but does not include the character indexed by the second integer. So if str represents "Hello world", and p1 stores 3 and p2 stores 8 then str.substring(p1,p2) will return "lo wo". A second version of substring takes just one integer argument and returns the portion of the string indexed by that integer argument up to the end of the string. So with the same value of str and with p storing 4, the call str.substring(p) would return "o world".


If these methods did not already exist in the code library provided
as part of Java, we could write our own methods to do the same
tasks. They would have to be static methods, however, since we cannot
add new methods to classes already provided by Java (in a sense we
can when we come to the object-oriented concept of inheritance,
although Java declares the class String to be
final meaning it can't be extended by inheritance anyway).
Here is an example of a static method which has the effect of changing
one character to another in a string:

static String replace(String str1,char ch1,char ch2)
// Returns the string resulting from replacing all occurrences
// of ch1 by ch2 in str.
{
 String str2="";
 for(int i=0; i<str1.length(); i++)
    if(str1.charAt(i)==ch1)
       str2=str2+ch2;
    else
       str2=str2+str1.charAt(i);
 return str2;
}


So if str references "happy" and
c1 stores 'p' and c2 stores
'r' then replace(str,c1,c2) will
return "harry".

Where methods to do a particular job already exist in the Java code
library, it makes sense to use them rather than write our own method
to do the job. Not only does it save us the time and effort, it makes
our code easier to follow by those already familiar with the Java
code library, and we can be sure that the distributors of Java have
implemented their methods in the best way possible and they will be
efficient and free of errors. In some cases in this course, however,
we will show a method that does a job where there is already a method in
the Java library to do the job. This will be done because looking
at the code for the method ourselves helps develop coding skills and
an understanding of algorithms and data structures. It should not be
regarded as good practice in general to write our own code when we
can call code that already exists in the Java library. When studying
at university, however, do not confuse
software re-use with plagiarism.

The method indexOf in class String takes
a character and returns the position of the lowest indexed occurrence of that
character in the string it is called on, or -1 if the
character does not occur in the string. So it is very similar to the
method we gave previously for 
finding the position of an integer in an array of integers. Once again,
as it is not a static method, str.indexOf(ch) gives the
position of character ch in string str whereas
with our static method for arrays of integers, position(a,n)
gave the position of integer n in array a. A
separate method lastIndexOf in class String
gives the position of the highest indexed occurrence of a character
in a string. So if str is "Hello World"
then str.indexOf('o') returns 4 and
str.lastIndexOf('o') returns 7. 


The methods startsWith and endsWith both
take string arguments and are called on strings, and return boolean
values as indicated by their names. So
str1.startsWith(str2) returns true if
str1 starts with str2 and false
otherwise, and str1.endsWith(str2) returns true if
str1 ends with str2 and false
otherwise. So if str1 is "Hello World",
str2 is "Hell" and str3 is
"rld", then str1.startsWith(str2) evaluates
to true, str1.startsWith(str3) evaluates to
false, str1.endsWith(str3) evaluates to true
and so on.



An important method in class String is compareTo,
which like startsWith and endsWith takes
another string as an argument. It is used to compare strings
alphabetically, so str1.compareTo(str2) says whether
str1 comes before str2 in standard alphabetic
ordering or not. You might suppose that compareTo would
return a boolean value, but its return type is actually int.
This is so it can be used to distinguish three possibilities from the
call str1.compareTo(str2): either str1 is before
str2 alphabetically, or str2 is before str1
alphabetically, or str1 and str2 are equal.
If str1 and str2 are equal, the call
str1.compareTo(str2) returns 0.  It returns a
negative integer if str1 is before str2
alphabetically, and a positive integer if str1 is after
str2 alphabetically. You can find the complete definition of the 
String method compareTo in the Java documentation
here. 
Many other classes also have their own compareTo method which
works similarly to the one in class String and enables
objects of their class to be considered in some sort of ordering with
each other. 


The class String also has its own version of
the method equals so that str1.equals(str2)
returns true if str1 and str2
have exactly the same characters in exactly the same order, and false
otherwise. This is not the same as str1==str2 because that only
evaluates to true if str1 and str2
refer to the same object - it evaluates to false in all other
circumstances, even if str1 and str2 refer to
different objects which happen to have the same characters in the same
order as each other.

The reverse of the constructor for class String which takes an array of strings as its argument and produces the equivalent String object is the method toCharArray. If str references a string, then str.toCharArray() returns an object of type char[], that is an array of characters of the same length as the string, where the characters in the array are the same and indexed in the same order as the characters in the string.

You can find code which demonstrates some of these methods in the directory ~mmh/DCS128/code/strings. See the files Test1.java, Test2.java, Test3.java, Test4.java, Test5.java, Test6.java and Test7.java.

Input in Java

Input in Java is rather complicated, this is due to the fact that Java is a programming language designed for "real world" usage where there is a big variety of issues which need to be taken into account. It makes things rather difficult, however, when Java is being used to teach programmming and we want a simple way to read text typed on the screen and convert it to Java values so that we can demonstrate example code. Because of this complication, many authors of Java text books have decided to introduce their own Java simple input methods rather than use those provided as an official part of the Java language. There is nothing wrong with this, Java is a language which is designed so that programmers can extend it by adding their own classes as they need them if what they want is not already provided as part of the Java code library.

However, in this course we will use the input methods provided with Java so that you are no longer reliant on non-standard code. Up till 2004, the following was recommended as the way to use standard Java to read text from a console window. You created an object of the type BufferedReader using the following construction:

BufferedReader input = new BufferedReader(new InputStreamReader(System.in));

This is a standard variable declaration combined with an initialisation. The variable is called input, but it could be called anything. It is set to a new BufferedReader object created with a constructor, that constructor takes a new inputStream object as its argument and the constructor for that takes the special value System.in as its argument. But you need not worry why this is done. All you need to know is that the object referred to by the variable input created in this way has two zero-argument methods that can be called on it, read and readLine. The first reads a single character, and the second reads a whole line of characters. So input.read() returns the next character typed, while input.readLine() returns a string consisting of all the characters which have been typed in but not yet read up to but not including the "end-of_line" character, and it will wait until the end-of-line character is typed in before returning.

The problem with this is that it only reads single characters or whole lines. If you wanted it to read an integer, you would have to write code which got the user to type the integer on a line of its own, and write code to convert the string of characters returned by readLine() into an integer. If you wanted to read several integers on one line, you would have to write code to break the string representing the whole line into smaller strings representing the integers and then code to convert them. Also the methods read and readLine may throw a checked exception of type IOException. So any method which contains a call to either of them must have throws IOException added to its header unless it uses Java's exception-catching mechanism to catch the exception.

The classes BufferedReader, InputStreamReader and IOException are all in the Java library package java.io. So they need to be imported to any file which uses them, this can be done by adding the line

import java.io.*;

at the beginning of the file. This has the effect of importing all the classes from the package java.io.

The version of Java introduced in 2004 officially known as "Java 2 Standard Platform Edition 5.0" or J2SE5.0 introduced something which really ought to have been in the earlier versions of Java, a simple reader object. Had this been done earlier on, authors of introductory programming textbooks could have used it as the standard rather than introducing their on simple reading mechanisms. We have already used this in our example code, you saw it first here. An object of type Scanner is created by a call to one of its constructors, if the argument to the call is System.in it creates a Scanner object which reads from the console window. As we saw from our example, the class Scanner is in the java.util package, but nothing else from that package is required to make it work, so importing just that class with

import java.util.Scanner;

is enough to make it available for use. We can declare a variable of type Scanner called input and set it to refer to a Scanner object which reads from the console window by:

Scanner input = new Scanner(System.in);

Then input.nextInt() reads the next text typed in up to the next blank space or end-of-line, and returns the integer that it represents. If it happens that the next text typed in does not consist of all numerical characters it throws an exception of type InputMismatchException. However, it is possible to test in advance of reading whether the next text types in represents and integer, this would be done by the call input.hasNextInt() which returns a boolean. If you wanted to read value of type double there is the similar method nextDouble to read the next text typed in, convert it to a double value and return that value, together with hasNextDouble which tests whether the next unread text can be interpreted as a double value. The method next returns the next text typed in, starting with the next non-blank character and up to but not including the following blank character, as a string. Here "blank character" means either a space or an end-of-line. The method nextLine is equivalent to the method readLine for a BufferedReader object, it returns the whole line typed as a string value. The methods of Scanner do not throw checked exceptions, so unlike BufferedReader, there is no need to add exception catching or throws annotations to the code. However, Scanner does not provide a method like BufferedReader's read which reads and returns a single character.

String Tokenizing

As we saw above, a Scanner object can split a line of text into individual words treating space characters as "word delimiters" rather than as part of the words. In fact, it is possible to use a Scanner object just to split up a string rather than to read text and split it up. This is done by using an alternative constructor for Scanner which takes as its argument a string rather than a reference to the input from the console window. Below is some simple code (in the file TestScanner1.java in the directory ~mmh/DCS128/code/strings) which shows using BufferedReader to read a line of text, then Scanner to break it into words:

import java.util.Scanner;
import java.io.*;

class TestScanner1
{
 public static void main(String[] args) throws IOException
 {
  BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
  System.out.println("Enter some words: ");
  String words = input.readLine();
  Scanner splitter = new Scanner(words);
  System.out.println("\nThe words you entered are: ");
  while(splitter.hasNext())
      System.out.println(splitter.next());
 }
}

The method hasNext on a Scanner object returns true until the point where the Scanner object has gone through all its input and reached the end.

A similar effect can be obtained by the use of the method split in class String. Calling split on an object of class String returns an object of type String[], that is an array of strings, where the strings in the array are the separate word the string it was called on breaks up into. So if the variable str refers to the string "Hello world, how are you?", the variable declaration and assignment

String[] words = str.split(" ");

causes the variable words to refer to an array of strings of length 5, where words[0] is "Hello", words[1] is "world,", words[2] is "how", words[3] is "are" and words[4] is "you?".

Here the punctuation symbols are treated as normal characters and so are treated as part of the words they are next to. The argument to the method split tells it what it must treat as word delimiters. So str.split(" ") tells it to treat the space character as spacing between words, and all other characters as normal characters. If we made it str.split(",") it would tell it to split the string up treating the comma character as the separating characters. So in the above example, it would split the string into two parts, words[0] would be "Hello world" and words[1] would be " how are you?". The full rules for how this argument to split work use a concept known as "regular expressions", which we won't go into here.

In older notes on Java, you may see reference to and use of objects of class StringTokenizer to split strings of words into individual words. The class StringTokenizer is found in the package java.util. A StringTokenizer object is created from a constructor which takes a string as its argument. Then it works like a Scanner object created with a string argument constructor, except that the method to obtain the next word is nextToken, while the method to check whether there are more words to return is hasMore Tokens. One thing StringTokenizer has which Scanner does not is the method countTokens which returns the number of words in the string. But Scanner when constructed with a string argument still has the methods nextInt and hasNextInt which enable it to be checked whether the next word represents an integer and to return its integer representation, but StringTokenizer doesn't have an equivalent. The process of splitting a string into its individual words is known as string tokenizing, but the class StringTokenizer has been superseded by the more general class Scanner and the String method split.

If you want to convert a string, perhaps obtained from a call of the method split, which represents an integer to the equivalent integer, you need to use the method parseInt which is a static method in the class Integer. Because the class Integer is in the java.lang package it does not have to be imported. If you want to call a static method from another class, the call is attached to the name of the class. So if str refers to a string object all of whose characters are numerical, then Integer.parseInt(str) returns the value of type int it represents. If ch represents the string "6315" then int n=Integer.parseInt(str) will declare the variable of type called n and set it to the number 6315. You need to be aware that "6315" is just the string of four characters '6', '3', '1' and '5', it is an entirely separate thing from the number 6315 which on the computer isn't even represented using decimal digits. If you call Integer.parseInt(str), but str is not a string which represents an integer, it will cause an exception of type NumberFormatException to be thrown. The static method parseDouble in class Integer works in a similar way to praseInt, but coverts strings that represent double values (that is, floating point numbers) to the actual double value they represent.

Wrapper Classes

The class Integer is one of the wrapper classes. It is the wrapper class for the type int. There is also the class Character which is the wrapper class for the type char. The other primitive types all have wrapper classes which are the same name as the primitive type except for an initial upper-case letter, so Double is the wrapper class for the type double.

Wrapper classes exist because in some places in Java you can only use types which are not primitive types. A variable of type Integer refers to an object which holds a value of type int inside it, so it is a different thing from a variable of type int which holds the value directly. In older versions of Java, if i is of type int, to create the Integer equivalent you would have to use the constructor, so Integer n = new Integer(i) would create a new variable of type Integer called n and set it to refer to an object which contains the value of i. The reverse conversion would use the method called intValue in class Integer, so int j = n.intValue() would declare a variable of type int called j and set it to the int value inside the object referred to by n. In Java 5.0, however, the conversion is automatic. Integer n=i has the same effect as Integer n = new Integer(i), and int j=n has the same effect as int j = n.intValue(). In general in Java 5.0, you can use references to objects of type int whenever a value of type Integer is required, and it will be converted automatically (referred to as boxing) and you can use a value of type Integer whenever a value of type int is required (referred to as unboxing).

The wrapper classes also contain a number of static methods dealing with values of the primitive type they wrap. For example, the static method toBinaryString in class Integer takes an int argument and returns a string of '1' and '0' characters representing its binary equivalent. So if variable n of type int holds the value 21 then Integer.toBinaryString(n) will return "10101". In the class Character, the static method toUpperCase takes a value of type char and returns its upper case equivalent, so if variable ch holds 'q' then Character.toUpperCase(ch) will return 'Q'. Class Character has various static methods for determining the category of characters. For example, Character.isUpperCase(ch) will return true if ch stores an upper case letter, and false otherwise.

In the directory ~mmh/DCS128/code/strings the file Test8.java demonstrates the toBinaryString method of class Integer and Test9.java demonstrates some methods from class Character.

Recursion with Strings

An alternative way of programming with strings to the way using loops we considered previously is to use recursion. We shall consider recursion in more detail when we look at its use with Lisp lists in the next section. Recursion means thinking of a solution to a problem in terms of a solution to a smaller version of the same problem. For example, above we considered the test that one string starts with another. If we had to program a static method to test whether str1 starts with str2, one way we could think of it is as follows. If str2 is the empty string, obviously str1 does start with str2. If str1 is the empty string, and we have already tested str2 and found it is not the empty string, then obviously str1 does not begin with str2. If str1 and str2 start with different characters, then obviously str1 cannot start with str2, we need consider no further. If str1 and str2 start with the same character, then str1 starts with str2 if the string consisting of everything except the first character of str1 starts with the string consisting of everything except the first character of str2. So, for example, we can tell that "woodpecker" starts with "wood" by first noting both have the same first character 'w', and then testing that "oodpecker" starts with "ood". This leads to the following code:

 public static boolean startsWith(String str1,String str2)
 // Returns true if str1 starts with str2, false otherwise
 {
  if(str2.length()==0)
     return true;
  else if(str1.length()==0)
     return false;
  else if(str1.charAt(0)!=str2.charAt(0))
     return false;
  else
     return startsWith(str1.substring(1),str2.substring(1));
 }

since str.substring(1) returns the string which is all of str except the first character. So we have the method startsWith making a call to the method startsWith, this "method calling itself" is what is referred to as "recursion". You can find this code and supporting code to run it in the file Test11.java. Although it may seem odd if you are not used to this style of programming, it works. If you want to think through why it works, remember the call of startsWith inside the code for startsWith will execute in its own environment, as we discussed previously. This will be one where there is a separate variable called str1 which holds the value of str1.substring(1) from the previous environment, and a separate variable called str2 which holds the value of str2.substring(1) from the previous environment. A new environment will be created each time "the method calls itself", but this won't go on forever because we'll eventually get to the case where either str1 is the empty string or str2 is the empty string, or the two strings have initial characters which are not equal. When a method which contains a recursive call has arguments which mean the recursive call is not made, it is known as a base case. With the method startsWith, the base cases are when either string is of length 0, and when the first character of one string is not equal to the first character of the other.

For comparison, here is the same operation performed using loops (which to contrast with "recursion" we call iteration):

 public static boolean startsWith(String str1,String str2)
 // Returns true if str1 starts with str2, false otherwise
 {
  int i;
  for(i=0 ;i<str1.length()&&i<str2.length(); i++)
      if(str1.charAt(i)!=str2.charAt(i))
         break;
  return i==str2.length();
 }

You can find this and supporting code to run it in the file Test10.java. Another way of doing it using loops is more similar to the recursive way:

 public static boolean startsWith(String str1,String str2)
 // Returns true if str1 starts with str2, false otherwise
 {
  while(str2.length()!=0&&str1.length()!=0&&str1.charAt(0)==str2.charAt(0))
      {
       str1=str1.substring(1);
       str2=str2.substring(1);
      }
  return str2.length()==0;
 }

You can find this and supporting code to run it in the file Test12.java. As you can see, it has a loop where the condition to stay in the loop is the opposite of the base case conditions in the recursive version. Instead of setting up a separate environment with new variables str1 and str2, the existing variables of these names are changed to hold the values the variables of the names would have in the recursive call. The value the method returns is the same value the base case returns in the recursive version (true if the length of str2 is 0, false otherwise).

Not all recursive code can be converted so easily to iterative code. The startsWith code is an example of what is called tail recursion, which is when the return statement has just a recursive call as its return value. A more complicated example is when something is done with the result of the recursive call to get the return value. For example, above, we saw an iterative method which took a string and two characters and returned the string resulting from changing all occurrences of the first character to the second. A recursive method which does the same thing is given below:

 public static String replace(String str,char ch1,char ch2)
 // Returns the string resulting from replacing all occurrences
 // of ch1 by ch2 in str.
 {
  if(str.equals(""))
     return "";
  else
     {
      String str1 = replace(str.substring(1),ch1,ch2);
      if(str.charAt(0)==ch1)
         return ch2+str1;
      else
         return str.charAt(0)+str1;
     }
 }

You can find this and supporting code to run it in the file Test12.java. Thinking about this code logically, if you want to change all occurrences of ch1 to ch2 in a string, if the string is empty, you just return the empty string. Otherwise, you get the result of changing all occurences of ch1 to ch2 in the string which consists of all but the first character of str. Then, if it happens the first character of the original string is ch1, you add ch2 to the front, otherwise you add the first character of the original string to the front.

The reason this can't be converted so easily into a loop is that you need to keep the old value of the string argument variable, str in this case, for use after the recursive call, so you cannot just reassign the variable and thus lose its old value. Tail recursion can convert easily into iteration, because we do not need to go back and use values from previous environments. With recursion where something is done after each recursive call, it will be done in the previous environment as it is returned to.

Matthew Huntbach

Last modified: 27 January 2006