String
as a type in Java.
What you may not yet have grasped is that a value of type String
in Java is actually an object. So values of type String
are like values of type DrinksMachine
which we saw
previously: you can create new
objects of type String
using the constructors in class
String
, and you can call the non-static methods in
class String
with the calls attached to references to
String
objects.
However, DrinksMachine
is not a built-in type of Java,
to use it you had to have a copy of the file DrinksMachine.class
in your directory. The class String
is provided as part
of Java, it is always available for you to use in any Java program you
write. To use most types provided with Java you have to use an
import statement (we mentioned these briefly
here), but a few are regarded
as so common that you don't even have to import them, String
is
one of these (the others are all those in the java.lang
package).
In fact every type in Java defines a type of object and has
an associated class with it, with the exception of the numerical
types, int
, double
and so on, the type
boolean
and the type char
. These non-object
types are known as the primitive types.
One reason you have not seen strings as objects is that, unlike any other
type of object, there is a special way of representing string values,
that is the representation using the quote characters enclosing the
characters that make up the string. You can also make new string
objects by Java's use of +
to mean string joining. So
there is little need to create new strings using a constructor.
However, for example, there is a constructor in class String
which takes an array of characters as its argument and produces a
string where each character in the string is equal to the character
in the same position in the array. So if a
is a variable
of type char[]
, and has been set to refer to an array
object, then new String(a)
will return the
equivalent string.
Strings in Java (unlike, for example, the language C) are not the same
thing as arrays of characters. If we have a variable a
of
type char[]
and a variable str
of type
String
, then for any expression i
which evaluates to an integer, a[i]
gives
the i
th character of the array referred to by
a
, while str.charAt(i)
gives the
i
th character of the string referred to by
str
. As we saw here,
we can use a[i]
as a variable, so we can put
it on the right hand side of assignment statement and assign a value to it.
For example, a[3]='x'
will change the fourth character of the
array referred to by a
(remembering the first character
is indexed by 0
) to lower-case 'x'. However,
str.charAt(i)
is a method call, and you cannot
assign a value to a method call. The Java compiler will give a
error message if you attempted to compile code with the statement
str.charAt(3)='x'
in it. So string objects can't
be changed in the way array objects can be changed, in fact string
objects are immutable: they cannot be changed at all, you can only
create new ones.
A minor but perhaps rather annoying difference between arrays and strings
in Java is that the length of an array a
is given by
a.length
and the length of a string str
is given by
str.length()
. The difference is the ()
at
the end of str.length()
which indicates this is just a
method call, the zero-argument method length
, called on the
object str
.
String
in JavaString
is provided
here. You should be aware that as this is the official documentation, it
has to cover every aspect, including rarely used methods and methods
which only make sense when you have become familiar with more
advanced aspects of Java. It would be a good idea to get used to using
this official documentation, but for the purposes of this course don't
feel you have to become familiar with the vast range of classes and
methods that are available. We will cover just a small number of them in
this course, either because they are particularly fundamental, or because
they illustrate an important point. Remember, the main purpose of this
course is to introduce concepts of algorithms and data structures
rather than to concentrate of detailed aspects of the Java language.
As an example, the method replace
acts in a
similar way to the method change
we discussed
previously with arrays.
It takes a string and two characters and returns the result
of replacing all occurrences of the first character by the
second character in the string. It does this constructively
rather than destructively, that is it returns a new string
representing the change, it does not change the string it is called
on. In fact the class String
has no destructive
methods, this ensures that String
objects are
immutable, there is no method that can be called on them to
change them, there are only methods that can be called on them to
return new strings representing a change.
With the constructive method change
we developed
for arrays of integers, if a1
and a2
were of type int[]
and n1
and
n2
were of type int
then the
statement a2=change(a1,n1,n2)
causes a2
to be set to an array which represents the effect of changing
all occurrences of the integer n1
to the integer
n2
in the array a1
. With the method
replace
in class String
, if
str1
and str2
are variables of type
String
and ch1
and ch2
are variables of type char
, then the statement
str2=str1.replace(ch1,ch2)
causes str2
to be set to a string which represents the effect of changing
all occurrences of the character ch1
to the
character ch2
in the string str1
.
For example, if str1
holds a reference to
the string "happy"
, and ch1
hold the character
'p'
and ch2
holds the character 'r'
,
then str1.replace(ch1,ch2)
will return a reference to
the string "harry"
.
The difference in construct between str1.replace(ch1,ch2)
and change(a1,n1,n2)
is because our change
was a static method while the replace
method in class
String
is not a static method. So the array being changed had
to be passed as one of the arguments to the method change
,
and this method was not called attached to any object. The method
replace
has to be called attached to a String
object, which is the object to be changed, and this is done instead
of the object being passed as one of the arguments to the method call.
Other useful methods in class String
which return
strings based on transforming the string they are called on include:
trim
returns the string with all blank characters at
either end removed. So if str
represents
" Hello World "
then the call str.trim()
will return the string "Hello World".
toUpperCase
returns the string with all lower case
letters replaced by the equivalent upper case letters. So if str
represents "Hello World"
then
str.toUpperCase()
will return the string "HELLO WORLD"
.
toLowerCase
returns the string with all upper case
letters replaced by the equivalent lower case letters. So if str
represents "Hello World"
then
str.toLowerCase()
will return the string "hello world"
.
substring
takes two integers as arguments and returns
the portion of the string it is called on which starts with the character
indexed by the first integer and goes up to but does not include the
character indexed by the second integer. So if str
represents "Hello world"
, and p1
stores
3
and p2
stores 8
then
str.substring(p1,p2)
will return "lo wo"
.
A second version of substring
takes just one integer argument
and returns the portion of the string indexed by that integer argument
up to the end of the string. So with the same value of str
and with p
storing 4
, the call
str.substring(p)
would return "o world"
.
String
to be
final
meaning it can't be extended by inheritance anyway).
Here is an example of a static method which has the effect of changing
one character to another in a string:
static String replace(String str1,char ch1,char ch2) // Returns the string resulting from replacing all occurrences // of ch1 by ch2 in str. { String str2=""; for(int i=0; i<str1.length(); i++) if(str1.charAt(i)==ch1) str2=str2+ch2; else str2=str2+str1.charAt(i); return str2; }So if
str
references "happy"
and
c1
stores 'p'
and c2
stores
'r'
then replace(str,c1,c2)
will
return "harry"
.
Where methods to do a particular job already exist in the Java code library, it makes sense to use them rather than write our own method to do the job. Not only does it save us the time and effort, it makes our code easier to follow by those already familiar with the Java code library, and we can be sure that the distributors of Java have implemented their methods in the best way possible and they will be efficient and free of errors. In some cases in this course, however, we will show a method that does a job where there is already a method in the Java library to do the job. This will be done because looking at the code for the method ourselves helps develop coding skills and an understanding of algorithms and data structures. It should not be regarded as good practice in general to write our own code when we can call code that already exists in the Java library. When studying at university, however, do not confuse software re-use with plagiarism.
The method
The methods
An important method in class indexOf
in class String
takes
a character and returns the position of the lowest indexed occurrence of that
character in the string it is called on, or -1
if the
character does not occur in the string. So it is very similar to the
method we gave previously for
finding the position of an integer in an array of integers. Once again,
as it is not a static method, str.indexOf(ch)
gives the
position of character ch
in string str
whereas
with our static method for arrays of integers, position(a,n)
gave the position of integer n
in array a
. A
separate method lastIndexOf
in class String
gives the position of the highest indexed occurrence of a character
in a string. So if str
is "Hello World"
then str.indexOf('o')
returns 4
and
str.lastIndexOf('o')
returns 7
.
startsWith
and endsWith
both
take string arguments and are called on strings, and return boolean
values as indicated by their names. So
str1.startsWith(str2)
returns true
if
str1
starts with str2
and false
otherwise, and str1.endsWith(str2)
returns true
if
str1
ends with str2
and false
otherwise. So if str1
is "Hello World"
,
str2
is "Hell"
and str3
is
"rld"
, then str1.startsWith(str2)
evaluates
to true
, str1.startsWith(str3)
evaluates to
false
, str1.endsWith(str3)
evaluates to true
and so on.
String
is compareTo
,
which like startsWith
and endsWith
takes
another string as an argument. It is used to compare strings
alphabetically, so str1.compareTo(str2)
says whether
str1
comes before str2
in standard alphabetic
ordering or not. You might suppose that compareTo
would
return a boolean value, but its return type is actually int
.
This is so it can be used to distinguish three possibilities from the
call str1.compareTo(str2)
: either str1
is before
str2
alphabetically, or str2
is before str1
alphabetically, or str1
and str2
are equal.
If str1
and str2
are equal, the call
str1.compareTo(str2)
returns 0
. It returns a
negative integer if str1
is before str2
alphabetically, and a positive integer if str1
is after
str2
alphabetically. You can find the complete definition of the
String
method compareTo
in the Java documentation
here.
Many other classes also have their own compareTo
method which
works similarly to the one in class String
and enables
objects of their class to be considered in some sort of ordering with
each other.
The class String
also has its own version of
the method equals
so that str1.equals(str2)
returns true
if str1
and str2
have exactly the same characters in exactly the same order, and false
otherwise. This is not the same as str1==str2
because that only
evaluates to true
if str1
and str2
refer to the same object - it evaluates to false
in all other
circumstances, even if str1
and str2
refer to
different objects which happen to have the same characters in the same
order as each other.
The reverse of the constructor for class String
which
takes an array of strings as its argument and produces the equivalent
String
object is the method toCharArray
.
If str
references a string, then str.toCharArray()
returns an object of type char[]
, that is an array of
characters of the same length as the string, where the characters in
the array are the same and indexed in the same order as the characters in the
string.
You can find code which demonstrates some of these methods in the directory
~mmh/DCS128/code/strings
. See the files
Test1.java
,
Test2.java
,
Test3.java
,
Test4.java
,
Test5.java
,
Test6.java
and
Test7.java
.
However, in this course we will use the input methods provided with Java
so that you are no longer reliant on non-standard code. Up till 2004,
the following was recommended as the way to use standard Java to read text
from a console window. You created an object of the type
BufferedReader
using the following construction:
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));This is a standard variable declaration combined with an initialisation. The variable is called
input
, but it could be called
anything. It is set to a new BufferedReader
object created
with a constructor, that constructor takes a new inputStream
object as its argument and the constructor for that takes the special
value System.in
as its argument. But you need not
worry why this is done. All you need to know is that the object
referred to by the variable input
created in this
way has two zero-argument methods that can be called on it,
read
and readLine
. The first reads a single
character, and the second reads a whole line of characters. So
input.read()
returns the next character typed, while
input.readLine()
returns a string consisting of all the
characters which have been typed in but not yet read up to but not
including the "end-of_line" character, and it will wait until the
end-of-line character is typed in before returning.
The problem with this is that it only reads single characters or whole
lines. If you wanted it to read an integer, you would have to write
code which got the user to type the integer on a line of its own, and write
code to convert the string of characters returned by readLine()
into an integer. If you wanted to read several integers on one line,
you would have to write code to break the string representing the
whole line into smaller strings representing the integers and then
code to convert them. Also the methods read
and
readLine
may throw a checked exception of type
IOException
. So any method which contains a call to
either of them must have throws IOException
added
to its header unless it uses Java's exception-catching mechanism to
catch the exception.
The classes BufferedReader
, InputStreamReader
and IOException
are all in the Java library package
java.io
. So they need to be imported to any file which
uses them, this can be done by adding the line
import java.io.*;at the beginning of the file. This has the effect of importing all the classes from the package
java.io
.
The version of Java introduced in 2004 officially known as
"Java 2 Standard Platform Edition 5.0" or
J2SE5.0 introduced
something which really ought to have been in the earlier versions
of Java, a simple reader object. Had this been done earlier on,
authors of introductory programming textbooks could have used it
as the standard rather than introducing their on simple reading
mechanisms. We have already used this in our example code, you
saw it first here.
An object of type Scanner
is created by a call to one
of its constructors, if the argument to the call is System.in
it creates a Scanner
object which reads from the
console window. As we saw from our example, the class Scanner
is in the java.util
package, but nothing else from that
package is required to make it work, so importing just that class with
import java.util.Scanner;is enough to make it available for use. We can declare a variable of type
Scanner
called input
and set it
to refer to a Scanner
object which reads from the console
window by:
Scanner input = new Scanner(System.in);Then
input.nextInt()
reads the next text typed in up to the next
blank space or end-of-line, and returns the integer that it represents.
If it happens that the next text typed in does not consist of all
numerical characters it throws an exception of type
InputMismatchException
. However, it is possible to test
in advance of reading whether the next text types in represents and
integer, this would be done by the call input.hasNextInt()
which returns a boolean.
If you wanted to read value of type double
there is the
similar method nextDouble
to read the next text typed in,
convert it to a double
value and return that value, together
with hasNextDouble
which tests whether the next unread
text can be interpreted as a double
value. The method
next
returns the next text typed in, starting with the
next non-blank character and up to but not including the following
blank character, as a string. Here "blank character" means either a
space or an end-of-line. The method nextLine
is equivalent
to the method readLine
for a BufferedReader
object, it returns the whole line typed as a string value. The methods
of Scanner
do not throw checked exceptions, so unlike
BufferedReader
, there is no need to add exception catching
or throws
annotations to the code. However, Scanner
does not provide a method like BufferedReader
's
read
which reads and returns a single character.
Scanner
object can split a line of
text into individual words treating space characters as "word
delimiters" rather than as part of the words. In fact, it is possible to
use a Scanner
object just to split up a string rather
than to read text and split it up. This is done by using an alternative
constructor for Scanner
which takes as its argument
a string rather than a reference to the input from the console window.
Below is some simple code
(in the file
TestScanner1.java
in the directory
~mmh/DCS128/code/strings
)
which shows using BufferedReader
to read a line of text, then Scanner
to break it into words:
import java.util.Scanner; import java.io.*; class TestScanner1 { public static void main(String[] args) throws IOException { BufferedReader input = new BufferedReader(new InputStreamReader(System.in)); System.out.println("Enter some words: "); String words = input.readLine(); Scanner splitter = new Scanner(words); System.out.println("\nThe words you entered are: "); while(splitter.hasNext()) System.out.println(splitter.next()); } }The method
hasNext
on a Scanner
object
returns true
until the point where the Scanner
object has gone through all its input and reached the end.
A similar effect can be obtained by the use of the method
split
in class String
. Calling
split
on an object of class String
returns
an object of type String[]
, that is an array of strings,
where the strings in the array are the separate word the string
it was called on breaks up into. So if the variable str
refers to the string "Hello world, how are you?"
,
the variable declaration and assignment
String[] words = str.split(" ");causes the variable
words
to refer to an array of strings
of length 5, where words[0]
is "Hello"
,
words[1]
is "world,"
,
words[2]
is "how"
,
words[3]
is "are"
and
words[4]
is "you?"
.
Here the punctuation symbols are treated as normal characters and so
are treated as part of the words they are next to. The
argument to the method split
tells it what it
must treat as word delimiters. So str.split(" ")
tells it to treat the space character as spacing between words, and
all other characters as normal characters. If we made it
str.split(",")
it would tell it to split the string
up treating the comma character as the separating
characters. So in the above example, it would split the string into
two parts, words[0]
would be "Hello world"
and words[1]
would be " how are you?"
. The full rules
for how this argument to split
work use a concept known
as "regular expressions", which we won't go into here.
In older notes on Java, you may see reference to and use of objects
of class StringTokenizer
to split strings of words into
individual words. The class StringTokenizer
is found in
the package java.util
. A StringTokenizer
object is created from a constructor which takes a string as its
argument. Then it works like a Scanner
object created
with a string argument constructor, except that the method to obtain
the next word is nextToken
, while the method to check
whether there are more words to return is hasMore Tokens
.
One thing StringTokenizer
has which Scanner
does not is the method countTokens
which returns the
number of words in the string. But Scanner
when constructed
with a string argument still has the methods nextInt
and
hasNextInt
which enable it to be checked whether
the next word represents an integer and to return its integer
representation, but StringTokenizer
doesn't have
an equivalent. The process of splitting a string into its individual words
is known as string tokenizing, but the class StringTokenizer
has been superseded by the more general class Scanner
and
the String
method split
.
If you want to convert a string, perhaps obtained from a call of the
method
The wrapper classes also contain a number of static methods dealing with values
of the primitive type they wrap. For example, the static method
In the directory
For comparison, here is the same operation performed using loops
(which to contrast with "recursion" we call iteration):
Not all recursive code can be converted so easily to iterative
code. The
The reason this can't be converted so easily into a loop is
that you need to keep the old value of the string argument
variable,
Last modified: 27 January 2006
split
, which represents an integer to the
equivalent integer, you need to use the method parseInt
which is a static method in the class Integer
.
Because the class Integer
is in the java.lang
package
it does not have to be imported. If you want to call a static method
from another class, the call is attached to the name of the class.
So if str
refers to a string object all of whose characters
are numerical, then Integer.parseInt(str)
returns
the value of type int
it represents. If ch
represents the string "6315"
then
int n=Integer.parseInt(str)
will declare the variable
of type n
and set it to the number
6315
. You need to be aware that "6315"
is
just the string of four characters '6'
, '3'
,
'1'
and '5'
, it is an entirely separate thing
from the number 6315
which on the computer isn't even
represented using decimal digits. If you call Integer.parseInt(str)
,
but str
is not a string which represents an integer, it
will cause an exception of type NumberFormatException
to be thrown.
The static method parseDouble
in class Integer
works in a similar way to praseInt
, but coverts strings
that represent double
values (that is, floating point numbers)
to the actual double
value they represent.
Wrapper Classes
The class Integer
is one of the wrapper classes.
It is the wrapper class for the type int
. There is
also the class Character
which is the wrapper class for
the type char
. The other primitive types all have wrapper
classes which are the same name as the primitive type except for an initial
upper-case letter, so Double
is the wrapper class for the
type double
.
toBinaryString
in class Integer
takes an
int
argument and returns a string of '1'
and
'0'
characters representing its binary equivalent. So if
variable n
of type int
holds the value
21
then Integer.toBinaryString(n)
will return
"10101"
. In the class Character
, the static
method toUpperCase
takes a value of type char
and returns its upper case equivalent, so if variable ch
holds 'q'
then Character.toUpperCase(ch)
will
return 'Q'
. Class Character
has various static
methods for determining the category of characters. For example,
Character.isUpperCase(ch)
will return true
if ch
stores an upper case letter, and false
otherwise.
~mmh/DCS128/code/strings
the file
Test8.java
demonstrates the toBinaryString
method of class
Integer
and
Test9.java
demonstrates some methods from class Character
.
Recursion with Strings
An alternative way of programming with strings to the way using loops
we considered previously is to use
recursion. We shall consider recursion in more detail when we
look at its use with Lisp lists in the
next section. Recursion means
thinking of a solution to a problem in terms of a solution to a
smaller version of the same problem. For example,
above we considered the test
that one string starts with another. If we had to program a static
method to test whether str1
starts with str2
,
one way we could think of it is as follows. If str2
is the
empty string, obviously str1
does start with str2
.
If str1
is the empty string, and we have already tested
str2
and found it is not the empty string, then obviously
str1
does not begin with str2
.
If str1
and str2
start with different characters,
then obviously str1
cannot start with str2
, we
need consider no further. If str1
and str2
start with the same character, then str1
starts with
str2
if the string consisting of everything except the first
character of str1
starts with the string consisting of
everything except the first character of str2
. So, for
example, we can tell that "woodpecker"
starts with
"wood"
by first noting both have the same first character
'w'
, and then testing that "oodpecker"
starts
with "ood"
. This leads to the following code:
public static boolean startsWith(String str1,String str2)
// Returns true if str1 starts with str2, false otherwise
{
if(str2.length()==0)
return true;
else if(str1.length()==0)
return false;
else if(str1.charAt(0)!=str2.charAt(0))
return false;
else
return startsWith(str1.substring(1),str2.substring(1));
}
since str.substring(1)
returns the string which is all
of str
except the first character. So we have the method
startsWith
making a call to the method startsWith
,
this "method calling itself" is what is referred to as "recursion".
You can find this code and supporting code to run it in the file
Test11.java.
Although it may seem odd if you are not used to this style of programming,
it works. If you want to think through why it works, remember the call of
startsWith
inside the code for startsWith
will
execute in its own environment, as we discussed
previously.
This will be one where there is a separate variable called
str1
which holds the value of str1.substring(1)
from the previous environment, and a separate variable called
str2
which holds the value of str2.substring(1)
from the previous environment. A new environment will be created each time
"the method calls itself", but this won't go on forever because we'll
eventually get to the case where either str1
is the empty
string or str2
is the empty string, or the two strings
have initial characters which are not equal. When a method which
contains a recursive call has arguments which mean the recursive call
is not made, it is known as a base case. With the method
startsWith
, the base cases are when either string is
of length 0, and when the first character of one string is not
equal to the first character of the other.
public static boolean startsWith(String str1,String str2)
// Returns true if str1 starts with str2, false otherwise
{
int i;
for(i=0 ;i<str1.length()&&i<str2.length(); i++)
if(str1.charAt(i)!=str2.charAt(i))
break;
return i==str2.length();
}
You can find this and supporting code to run it in the file
Test10.java.
Another way of doing it using loops is more similar to the
recursive way:
public static boolean startsWith(String str1,String str2)
// Returns true if str1 starts with str2, false otherwise
{
while(str2.length()!=0&&str1.length()!=0&&str1.charAt(0)==str2.charAt(0))
{
str1=str1.substring(1);
str2=str2.substring(1);
}
return str2.length()==0;
}
You can find this and supporting code to run it in the file
Test12.java. As
you can see, it has a loop where the condition to stay in the loop
is the opposite of the base case conditions in the recursive version.
Instead of setting up a separate environment with new variables
str1
and str2
, the existing variables
of these names are changed to hold the values the variables of the
names would have in the recursive call. The value the method returns
is the same value the base case returns in the recursive version
(true
if the length of str2
is 0,
false
otherwise).
startsWith
code is an example of what
is called tail recursion, which is when the
return
statement has just a recursive call as its return value.
A more complicated example is when something is done with the result of
the recursive call to get the return value.
For example, above, we
saw an iterative method which took a string and two characters and
returned the string resulting from changing all occurrences of the
first character to the second. A recursive method which does the
same thing is given below:
public static String replace(String str,char ch1,char ch2)
// Returns the string resulting from replacing all occurrences
// of ch1 by ch2 in str.
{
if(str.equals(""))
return "";
else
{
String str1 = replace(str.substring(1),ch1,ch2);
if(str.charAt(0)==ch1)
return ch2+str1;
else
return str.charAt(0)+str1;
}
}
You can find this and supporting code to run it in the file
Test12.java.
Thinking about this code logically, if you want to change all occurrences
of ch1
to ch2
in a string, if the string is
empty, you just return the empty string. Otherwise, you get the result
of changing all occurences of ch1
to
ch2
in the string which consists of all but the first
character of str
. Then, if it happens the first character of
the original string is ch1
, you add ch2
to the front, otherwise you add the first character of the original
string to the front.
str
in this case, for use after the recursive call,
so you cannot just reassign the variable and thus lose its old value.
Tail recursion can convert easily into iteration, because we do not need
to go back and use values from previous environments. With recursion where
something is done after each recursive call, it will be done in the previous
environment as it is returned to.
Matthew Huntbach