String as a type in Java.
What you may not yet have grasped is that a value of type String
in Java is actually an object. So values of type String
are like values of type DrinksMachine which we saw
previously: you can create new
objects of type String using the constructors in class
String, and you can call the non-static methods in
class String with the calls attached to references to
String objects.
However, DrinksMachine is not a built-in type of Java,
to use it you had to have a copy of the file DrinksMachine.class
in your directory. The class String is provided as part
of Java, it is always available for you to use in any Java program you
write. To use most types provided with Java you have to use an
import statement (we mentioned these briefly
here), but a few are regarded
as so common that you don't even have to import them, String is
one of these (the others are all those in the java.lang package).
In fact every type in Java defines a type of object and has
an associated class with it, with the exception of the numerical
types, int, double and so on, the type
boolean and the type char. These non-object
types are known as the primitive types.
One reason you have not seen strings as objects is that, unlike any other
type of object, there is a special way of representing string values,
that is the representation using the quote characters enclosing the
characters that make up the string. You can also make new string
objects by Java's use of + to mean string joining. So
there is little need to create new strings using a constructor.
However, for example, there is a constructor in class String
which takes an array of characters as its argument and produces a
string where each character in the string is equal to the character
in the same position in the array. So if a is a variable
of type char[], and has been set to refer to an array
object, then new String(a) will return the
equivalent string.
Strings in Java (unlike, for example, the language C) are not the same
thing as arrays of characters. If we have a variable a of
type char[] and a variable str of type
String, then for any expression i
which evaluates to an integer, a[i] gives
the ith character of the array referred to by
a, while str.charAt(i) gives the
ith character of the string referred to by
str. As we saw here,
we can use a[i] as a variable, so we can put
it on the right hand side of assignment statement and assign a value to it.
For example, a[3]='x' will change the fourth character of the
array referred to by a (remembering the first character
is indexed by 0) to lower-case 'x'. However,
str.charAt(i) is a method call, and you cannot
assign a value to a method call. The Java compiler will give a
error message if you attempted to compile code with the statement
str.charAt(3)='x' in it. So string objects can't
be changed in the way array objects can be changed, in fact string
objects are immutable: they cannot be changed at all, you can only
create new ones.
A minor but perhaps rather annoying difference between arrays and strings
in Java is that the length of an array a is given by
a.length and the length of a string str is given by
str.length(). The difference is the () at
the end of str.length() which indicates this is just a
method call, the zero-argument method length, called on the
object str.
String in JavaString is provided
here. You should be aware that as this is the official documentation, it
has to cover every aspect, including rarely used methods and methods
which only make sense when you have become familiar with more
advanced aspects of Java. It would be a good idea to get used to using
this official documentation, but for the purposes of this course don't
feel you have to become familiar with the vast range of classes and
methods that are available. We will cover just a small number of them in
this course, either because they are particularly fundamental, or because
they illustrate an important point. Remember, the main purpose of this
course is to introduce concepts of algorithms and data structures
rather than to concentrate of detailed aspects of the Java language.
As an example, the method replace acts in a
similar way to the method change we discussed
previously with arrays.
It takes a string and two characters and returns the result
of replacing all occurrences of the first character by the
second character in the string. It does this constructively
rather than destructively, that is it returns a new string
representing the change, it does not change the string it is called
on. In fact the class String has no destructive
methods, this ensures that String objects are
immutable, there is no method that can be called on them to
change them, there are only methods that can be called on them to
return new strings representing a change.
With the constructive method change we developed
for arrays of integers, if a1 and a2
were of type int[] and n1 and
n2 were of type int then the
statement a2=change(a1,n1,n2) causes a2
to be set to an array which represents the effect of changing
all occurrences of the integer n1 to the integer
n2 in the array a1. With the method
replace in class String, if
str1 and str2 are variables of type
String and ch1 and ch2
are variables of type char, then the statement
str2=str1.replace(ch1,ch2) causes str2
to be set to a string which represents the effect of changing
all occurrences of the character ch1 to the
character ch2 in the string str1.
For example, if str1 holds a reference to
the string "happy", and ch1 hold the character
'p' and ch2 holds the character 'r',
then str1.replace(ch1,ch2) will return a reference to
the string "harry".
The difference in construct between str1.replace(ch1,ch2)
and change(a1,n1,n2) is because our change
was a static method while the replace method in class
String is not a static method. So the array being changed had
to be passed as one of the arguments to the method change,
and this method was not called attached to any object. The method
replace has to be called attached to a String
object, which is the object to be changed, and this is done instead
of the object being passed as one of the arguments to the method call.
Other useful methods in class String which return
strings based on transforming the string they are called on include:
trim returns the string with all blank characters at
either end removed. So if str represents
" Hello World " then the call str.trim()
will return the string "Hello World".
toUpperCase returns the string with all lower case
letters replaced by the equivalent upper case letters. So if str
represents "Hello World" then
str.toUpperCase() will return the string "HELLO WORLD".
toLowerCase returns the string with all upper case
letters replaced by the equivalent lower case letters. So if str
represents "Hello World" then
str.toLowerCase() will return the string "hello world".
substring takes two integers as arguments and returns
the portion of the string it is called on which starts with the character
indexed by the first integer and goes up to but does not include the
character indexed by the second integer. So if str
represents "Hello world", and p1 stores
3 and p2 stores 8 then
str.substring(p1,p2) will return "lo wo".
A second version of substring takes just one integer argument
and returns the portion of the string indexed by that integer argument
up to the end of the string. So with the same value of str
and with p storing 4, the call
str.substring(p) would return "o world".
String to be
final meaning it can't be extended by inheritance anyway).
Here is an example of a static method which has the effect of changing
one character to another in a string:
static String replace(String str1,char ch1,char ch2)
// Returns the string resulting from replacing all occurrences
// of ch1 by ch2 in str.
{
String str2="";
for(int i=0; i<str1.length(); i++)
if(str1.charAt(i)==ch1)
str2=str2+ch2;
else
str2=str2+str1.charAt(i);
return str2;
}
So if str references "happy" and
c1 stores 'p' and c2 stores
'r' then replace(str,c1,c2) will
return "harry".
Where methods to do a particular job already exist in the Java code library, it makes sense to use them rather than write our own method to do the job. Not only does it save us the time and effort, it makes our code easier to follow by those already familiar with the Java code library, and we can be sure that the distributors of Java have implemented their methods in the best way possible and they will be efficient and free of errors. In some cases in this course, however, we will show a method that does a job where there is already a method in the Java library to do the job. This will be done because looking at the code for the method ourselves helps develop coding skills and an understanding of algorithms and data structures. It should not be regarded as good practice in general to write our own code when we can call code that already exists in the Java library. When studying at university, however, do not confuse software re-use with plagiarism.
The method
The methods
An important method in class indexOf in class String takes
a character and returns the position of the lowest indexed occurrence of that
character in the string it is called on, or -1 if the
character does not occur in the string. So it is very similar to the
method we gave previously for
finding the position of an integer in an array of integers. Once again,
as it is not a static method, str.indexOf(ch) gives the
position of character ch in string str whereas
with our static method for arrays of integers, position(a,n)
gave the position of integer n in array a. A
separate method lastIndexOf in class String
gives the position of the highest indexed occurrence of a character
in a string. So if str is "Hello World"
then str.indexOf('o') returns 4 and
str.lastIndexOf('o') returns 7.
startsWith and endsWith both
take string arguments and are called on strings, and return boolean
values as indicated by their names. So
str1.startsWith(str2) returns true if
str1 starts with str2 and false
otherwise, and str1.endsWith(str2) returns true if
str1 ends with str2 and false
otherwise. So if str1 is "Hello World",
str2 is "Hell" and str3 is
"rld", then str1.startsWith(str2) evaluates
to true, str1.startsWith(str3) evaluates to
false, str1.endsWith(str3) evaluates to true
and so on.
String is compareTo,
which like startsWith and endsWith takes
another string as an argument. It is used to compare strings
alphabetically, so str1.compareTo(str2) says whether
str1 comes before str2 in standard alphabetic
ordering or not. You might suppose that compareTo would
return a boolean value, but its return type is actually int.
This is so it can be used to distinguish three possibilities from the
call str1.compareTo(str2): either str1 is before
str2 alphabetically, or str2 is before str1
alphabetically, or str1 and str2 are equal.
If str1 and str2 are equal, the call
str1.compareTo(str2) returns 0. It returns a
negative integer if str1 is before str2
alphabetically, and a positive integer if str1 is after
str2 alphabetically. You can find the complete definition of the
String method compareTo in the Java documentation
here.
Many other classes also have their own compareTo method which
works similarly to the one in class String and enables
objects of their class to be considered in some sort of ordering with
each other.
The class String also has its own version of
the method equals so that str1.equals(str2)
returns true if str1 and str2
have exactly the same characters in exactly the same order, and false
otherwise. This is not the same as str1==str2 because that only
evaluates to true if str1 and str2
refer to the same object - it evaluates to false in all other
circumstances, even if str1 and str2 refer to
different objects which happen to have the same characters in the same
order as each other.
The reverse of the constructor for class String which
takes an array of strings as its argument and produces the equivalent
String object is the method toCharArray.
If str references a string, then str.toCharArray()
returns an object of type char[], that is an array of
characters of the same length as the string, where the characters in
the array are the same and indexed in the same order as the characters in the
string.
You can find code which demonstrates some of these methods in the directory
~mmh/DCS128/code/strings. See the files
Test1.java,
Test2.java,
Test3.java,
Test4.java,
Test5.java,
Test6.java and
Test7.java.
However, in this course we will use the input methods provided with Java
so that you are no longer reliant on non-standard code. Up till 2004,
the following was recommended as the way to use standard Java to read text
from a console window. You created an object of the type
BufferedReader using the following construction:
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));This is a standard variable declaration combined with an initialisation. The variable is called
input, but it could be called
anything. It is set to a new BufferedReader object created
with a constructor, that constructor takes a new inputStream
object as its argument and the constructor for that takes the special
value System.in as its argument. But you need not
worry why this is done. All you need to know is that the object
referred to by the variable input created in this
way has two zero-argument methods that can be called on it,
read and readLine. The first reads a single
character, and the second reads a whole line of characters. So
input.read() returns the next character typed, while
input.readLine() returns a string consisting of all the
characters which have been typed in but not yet read up to but not
including the "end-of_line" character, and it will wait until the
end-of-line character is typed in before returning.
The problem with this is that it only reads single characters or whole
lines. If you wanted it to read an integer, you would have to write
code which got the user to type the integer on a line of its own, and write
code to convert the string of characters returned by readLine()
into an integer. If you wanted to read several integers on one line,
you would have to write code to break the string representing the
whole line into smaller strings representing the integers and then
code to convert them. Also the methods read and
readLine may throw a checked exception of type
IOException. So any method which contains a call to
either of them must have throws IOException added
to its header unless it uses Java's exception-catching mechanism to
catch the exception.
The classes BufferedReader, InputStreamReader
and IOException are all in the Java library package
java.io. So they need to be imported to any file which
uses them, this can be done by adding the line
import java.io.*;at the beginning of the file. This has the effect of importing all the classes from the package
java.io.
The version of Java introduced in 2004 officially known as
"Java 2 Standard Platform Edition 5.0" or
J2SE5.0 introduced
something which really ought to have been in the earlier versions
of Java, a simple reader object. Had this been done earlier on,
authors of introductory programming textbooks could have used it
as the standard rather than introducing their on simple reading
mechanisms. We have already used this in our example code, you
saw it first here.
An object of type Scanner is created by a call to one
of its constructors, if the argument to the call is System.in
it creates a Scanner object which reads from the
console window. As we saw from our example, the class Scanner
is in the java.util package, but nothing else from that
package is required to make it work, so importing just that class with
import java.util.Scanner;is enough to make it available for use. We can declare a variable of type
Scanner called input and set it
to refer to a Scanner object which reads from the console
window by:
Scanner input = new Scanner(System.in);Then
input.nextInt() reads the next text typed in up to the next
blank space or end-of-line, and returns the integer that it represents.
If it happens that the next text typed in does not consist of all
numerical characters it throws an exception of type
InputMismatchException. However, it is possible to test
in advance of reading whether the next text types in represents and
integer, this would be done by the call input.hasNextInt()
which returns a boolean.
If you wanted to read value of type double there is the
similar method nextDouble to read the next text typed in,
convert it to a double value and return that value, together
with hasNextDouble which tests whether the next unread
text can be interpreted as a double value. The method
next returns the next text typed in, starting with the
next non-blank character and up to but not including the following
blank character, as a string. Here "blank character" means either a
space or an end-of-line. The method nextLine is equivalent
to the method readLine for a BufferedReader
object, it returns the whole line typed as a string value. The methods
of Scanner do not throw checked exceptions, so unlike
BufferedReader, there is no need to add exception catching
or throws annotations to the code. However, Scanner
does not provide a method like BufferedReader's
read which reads and returns a single character.
Scanner object can split a line of
text into individual words treating space characters as "word
delimiters" rather than as part of the words. In fact, it is possible to
use a Scanner object just to split up a string rather
than to read text and split it up. This is done by using an alternative
constructor for Scanner which takes as its argument
a string rather than a reference to the input from the console window.
Below is some simple code
(in the file
TestScanner1.java
in the directory
~mmh/DCS128/code/strings)
which shows using BufferedReader
to read a line of text, then Scanner to break it into words:
import java.util.Scanner;
import java.io.*;
class TestScanner1
{
public static void main(String[] args) throws IOException
{
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter some words: ");
String words = input.readLine();
Scanner splitter = new Scanner(words);
System.out.println("\nThe words you entered are: ");
while(splitter.hasNext())
System.out.println(splitter.next());
}
}
The method hasNext on a Scanner object
returns true until the point where the Scanner
object has gone through all its input and reached the end.
A similar effect can be obtained by the use of the method
split in class String. Calling
split on an object of class String returns
an object of type String[], that is an array of strings,
where the strings in the array are the separate word the string
it was called on breaks up into. So if the variable str
refers to the string "Hello world, how are you?",
the variable declaration and assignment
String[] words = str.split(" ");
causes the variable words to refer to an array of strings
of length 5, where words[0] is "Hello",
words[1] is "world,",
words[2] is "how",
words[3] is "are" and
words[4] is "you?".
Here the punctuation symbols are treated as normal characters and so
are treated as part of the words they are next to. The
argument to the method split tells it what it
must treat as word delimiters. So str.split(" ")
tells it to treat the space character as spacing between words, and
all other characters as normal characters. If we made it
str.split(",") it would tell it to split the string
up treating the comma character as the separating
characters. So in the above example, it would split the string into
two parts, words[0] would be "Hello world"
and words[1] would be " how are you?". The full rules
for how this argument to split work use a concept known
as "regular expressions", which we won't go into here.
In older notes on Java, you may see reference to and use of objects
of class StringTokenizer to split strings of words into
individual words. The class StringTokenizer is found in
the package java.util. A StringTokenizer
object is created from a constructor which takes a string as its
argument. Then it works like a Scanner object created
with a string argument constructor, except that the method to obtain
the next word is nextToken, while the method to check
whether there are more words to return is hasMore Tokens.
One thing StringTokenizer has which Scanner
does not is the method countTokens which returns the
number of words in the string. But Scanner when constructed
with a string argument still has the methods nextInt and
hasNextInt which enable it to be checked whether
the next word represents an integer and to return its integer
representation, but StringTokenizer doesn't have
an equivalent. The process of splitting a string into its individual words
is known as string tokenizing, but the class StringTokenizer
has been superseded by the more general class Scanner and
the String method split.
If you want to convert a string, perhaps obtained from a call of the
method
The wrapper classes also contain a number of static methods dealing with values
of the primitive type they wrap. For example, the static method
In the directory
For comparison, here is the same operation performed using loops
(which to contrast with "recursion" we call iteration):
Not all recursive code can be converted so easily to iterative
code. The
The reason this can't be converted so easily into a loop is
that you need to keep the old value of the string argument
variable,
Last modified: 27 January 2006
split, which represents an integer to the
equivalent integer, you need to use the method parseInt
which is a static method in the class Integer.
Because the class Integer is in the java.lang package
it does not have to be imported. If you want to call a static method
from another class, the call is attached to the name of the class.
So if str refers to a string object all of whose characters
are numerical, then Integer.parseInt(str) returns
the value of type int it represents. If ch
represents the string "6315" then
int n=Integer.parseInt(str) will declare the variable
of type n and set it to the number
6315. You need to be aware that "6315" is
just the string of four characters '6', '3',
'1' and '5', it is an entirely separate thing
from the number 6315 which on the computer isn't even
represented using decimal digits. If you call Integer.parseInt(str),
but str is not a string which represents an integer, it
will cause an exception of type NumberFormatException to be thrown.
The static method parseDouble in class Integer
works in a similar way to praseInt, but coverts strings
that represent double values (that is, floating point numbers)
to the actual double value they represent.
Wrapper Classes
The class Integer is one of the wrapper classes.
It is the wrapper class for the type int. There is
also the class Character which is the wrapper class for
the type char. The other primitive types all have wrapper
classes which are the same name as the primitive type except for an initial
upper-case letter, so Double is the wrapper class for the
type double.
toBinaryString in class Integer takes an
int argument and returns a string of '1' and
'0' characters representing its binary equivalent. So if
variable n of type int holds the value
21 then Integer.toBinaryString(n) will return
"10101". In the class Character, the static
method toUpperCase takes a value of type char
and returns its upper case equivalent, so if variable ch
holds 'q' then Character.toUpperCase(ch) will
return 'Q'. Class Character has various static
methods for determining the category of characters. For example,
Character.isUpperCase(ch) will return true
if ch stores an upper case letter, and false
otherwise.
~mmh/DCS128/code/strings the file
Test8.java
demonstrates the toBinaryString method of class
Integer and
Test9.java
demonstrates some methods from class Character.
Recursion with Strings
An alternative way of programming with strings to the way using loops
we considered previously is to use
recursion. We shall consider recursion in more detail when we
look at its use with Lisp lists in the
next section. Recursion means
thinking of a solution to a problem in terms of a solution to a
smaller version of the same problem. For example,
above we considered the test
that one string starts with another. If we had to program a static
method to test whether str1 starts with str2,
one way we could think of it is as follows. If str2 is the
empty string, obviously str1 does start with str2.
If str1 is the empty string, and we have already tested
str2 and found it is not the empty string, then obviously
str1 does not begin with str2.
If str1 and str2 start with different characters,
then obviously str1 cannot start with str2, we
need consider no further. If str1 and str2
start with the same character, then str1 starts with
str2 if the string consisting of everything except the first
character of str1 starts with the string consisting of
everything except the first character of str2. So, for
example, we can tell that "woodpecker" starts with
"wood" by first noting both have the same first character
'w', and then testing that "oodpecker" starts
with "ood". This leads to the following code:
public static boolean startsWith(String str1,String str2)
// Returns true if str1 starts with str2, false otherwise
{
if(str2.length()==0)
return true;
else if(str1.length()==0)
return false;
else if(str1.charAt(0)!=str2.charAt(0))
return false;
else
return startsWith(str1.substring(1),str2.substring(1));
}
since str.substring(1) returns the string which is all
of str except the first character. So we have the method
startsWith making a call to the method startsWith,
this "method calling itself" is what is referred to as "recursion".
You can find this code and supporting code to run it in the file
Test11.java.
Although it may seem odd if you are not used to this style of programming,
it works. If you want to think through why it works, remember the call of
startsWith inside the code for startsWith will
execute in its own environment, as we discussed
previously.
This will be one where there is a separate variable called
str1 which holds the value of str1.substring(1)
from the previous environment, and a separate variable called
str2 which holds the value of str2.substring(1)
from the previous environment. A new environment will be created each time
"the method calls itself", but this won't go on forever because we'll
eventually get to the case where either str1 is the empty
string or str2 is the empty string, or the two strings
have initial characters which are not equal. When a method which
contains a recursive call has arguments which mean the recursive call
is not made, it is known as a base case. With the method
startsWith, the base cases are when either string is
of length 0, and when the first character of one string is not
equal to the first character of the other.
public static boolean startsWith(String str1,String str2)
// Returns true if str1 starts with str2, false otherwise
{
int i;
for(i=0 ;i<str1.length()&&i<str2.length(); i++)
if(str1.charAt(i)!=str2.charAt(i))
break;
return i==str2.length();
}
You can find this and supporting code to run it in the file
Test10.java.
Another way of doing it using loops is more similar to the
recursive way:
public static boolean startsWith(String str1,String str2)
// Returns true if str1 starts with str2, false otherwise
{
while(str2.length()!=0&&str1.length()!=0&&str1.charAt(0)==str2.charAt(0))
{
str1=str1.substring(1);
str2=str2.substring(1);
}
return str2.length()==0;
}
You can find this and supporting code to run it in the file
Test12.java. As
you can see, it has a loop where the condition to stay in the loop
is the opposite of the base case conditions in the recursive version.
Instead of setting up a separate environment with new variables
str1 and str2, the existing variables
of these names are changed to hold the values the variables of the
names would have in the recursive call. The value the method returns
is the same value the base case returns in the recursive version
(true if the length of str2 is 0,
false otherwise).
startsWith code is an example of what
is called tail recursion, which is when the
return statement has just a recursive call as its return value.
A more complicated example is when something is done with the result of
the recursive call to get the return value.
For example, above, we
saw an iterative method which took a string and two characters and
returned the string resulting from changing all occurrences of the
first character to the second. A recursive method which does the
same thing is given below:
public static String replace(String str,char ch1,char ch2)
// Returns the string resulting from replacing all occurrences
// of ch1 by ch2 in str.
{
if(str.equals(""))
return "";
else
{
String str1 = replace(str.substring(1),ch1,ch2);
if(str.charAt(0)==ch1)
return ch2+str1;
else
return str.charAt(0)+str1;
}
}
You can find this and supporting code to run it in the file
Test12.java.
Thinking about this code logically, if you want to change all occurrences
of ch1 to ch2 in a string, if the string is
empty, you just return the empty string. Otherwise, you get the result
of changing all occurences of ch1 to
ch2 in the string which consists of all but the first
character of str. Then, if it happens the first character of
the original string is ch1, you add ch2
to the front, otherwise you add the first character of the original
string to the front.
str in this case, for use after the recursive call,
so you cannot just reassign the variable and thus lose its old value.
Tail recursion can convert easily into iteration, because we do not need
to go back and use values from previous environments. With recursion where
something is done after each recursive call, it will be done in the previous
environment as it is returned to.
Matthew Huntbach