Download Elements - IndiaStudyChannel.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CORPORATE PERFORMANCE ENGINEERING
Java Performance Optimization - Senthil Kumar N
1. Introduction
This course presents different ways to improve the performance of your Java
applications. These techniques focus on the Java language and libraries. Performance is
defined to include both speed and space that is how to make your programs faster, while
using less memory. This course describes a variety of performance issues and gives some
hard numbers about how specific performance improvements work out. It should be
noted up that there is no way to present totally definitive advice on performance, because
various applications have different performance characteristics and bottlenecks, and
because performance varies across different hardware, operating systems, and Java
development tools such as compilers and virtual machines. The Java programming
language is still evolving, and its performance continues to improve. The ultimate aim of
this course is to promote awareness of Java performance issues, so that you can make
appropriate design and implementation choices for specific applications.
1.1
Why Is It Slow?
There are overheads in the Java run time system, mainly due to virtual machine layer that
abstracts the Java away from underlying hardware. It is also true that there are overheads
from Java’s dynamic nature. These overheads can cause the Java application to run
slower than the other application written in a lower level language like C. Java’s
advantages namely, it’s platform independence, memory management, powerful
exception checking, built-in multi threading, dynamic resource loading, and security
checks- add costs in terms of an interpreter, garbage collector, thread monitors, repeated
disk and network accessing, and extra run time checks.
For example, hierarchical method invocation requires an extra computation for every
method call, because the runtime system has to work out which of the possible methods
in the hierarchy is the actual target of the call. Most modern CPUs are designed to be
optimized for fixed call and branch targets and do not perform as well as when a
significant percentage of calls need to be computed on the fly.
Java language features that cause these overheads may be the features that persuaded you
to use Java in the first place. The important thing is that none of these overheads slow
down too much. But the key point with Java is that a good round of performance tuning
normally makes your application run as fast as you need it to run.
1.2 System Limitations and What to Tune
Three resources limit all applications:
1. CPU speed and availability.
2. System Memory
3. Disk (and network) input/output
1
When tuning an application, the first step is to determine which of these is causing your
application to run too slowly. If your application is CPU-bound, you need to concentrate
your efforts on the code, looking for bottlenecks, inefficient algorithms, too many shortlived objects, and other problems, which will cover in this course. If your application is
hitting system-memory limits, it may be paging sections in and out of main memory. In
this case the problem may be caused by too many objects, or even just few large objects,
being erroneously held in memory: by too many large arrays being allocated; or by
design of the application, which may need to be reexamined to reduce its running
memory footprint. On the other hand, external data access or writing to the disk can be
slowing your application. In this case, you need to look at exactly what you are doing to
the disks that is slowing the application: first identify the operations, then determine the
problems, and finally eliminate or change these to improve the situation.
1.3 What to Measure
The main measurement is always wall-clock time. You should use this measurement to
specify all benchmarks, as it’s the real-time interval that is most appreciated by the users.
2. Overview
2.1 Scope Of the course
Assuming basic knowledge of programming concepts in Java. This course aims at
discussing important aspects of optimizing code, which control the performance of a Java
application.
2.2 Performance Issues Not Covered in this course
This course describes a set of techniques, rooted in the Java language and libraries. There
are other areas of performance mentioned only in passing.
The first of these is algorithm performance. If your application contains fundamentally
slow algorithm these techniques may not help you.
The other area is architecture. Sometimes poor performance is literally “built in” to an
application, making it very difficult to do any useful performance improvement by
tuning.
2.3 Environment and Tools Used in code Examples
The examples in this course were developed against the Java Development Kit 1.2.2 from
Sun that implements the Java 2 version of the language and libraries. The JDK was run
on a Windows NT 4.0 system, a 450 MHz Pentium with 128 MB of memory.
Compilation was done by
C:\ javac prog.java
and examples run with
C:\ java prog
2
2.4 How Examples were timed
The examples in this course are measured using a special Timer class, defined as follows:
class Timer {
long t;
public Timer() {
reset();
}
public void reset() {
t = System.currentTimeMillis();
}
public long elapsed() {
return System.currentTimeMillis()-t;
}
public void print(String s){
System.out.println(s+":"+elapsed());
}
}
2.5Performance Analysis Tools
There are a variety of Java Performance analysis tools available today. One that comes
with JDK 1.2.2 is invoked by saying: $ java –Xrunhprof prog
The default profile output gained from executing with –Xhrunprof in Java 2 is not useful
for method profiling. The default output generates objects-creation statistics from the
heap as the dump. By default dump occurs when an application terminates; you can
modify the dump time by typing Ctrl-Break on win32. To get a useful method profile,
you need to modify the profiler options to specify method profiling. A typical call to
achieve this is:
$ java –Xrunhprof:cpu=times prog
with the results written into a file java.hprof.txt. This tool provides information on the
time spent on each method in application. See Appendix A for more details about this
tool.
Another tool is:
$ javap – c prog
used to display Java Virtual Machine bytecodes. Using javap you can check for the
presence of particular constructs in your program, such as generated constructors.
3
3. Optimization Techniques:
3.1 Classes
Optimization 1
Class and Instance Initialization
When an instance of a class is created using new, initialization of the
class’s instance variables (variables unique to each instance) must be done. By contrast
class variables (those declared static and shared across instances) need only be initialized
once, conceptually at program invocation time. The difference between these types of
initialization is quite important, as the example illustrates:
/*
This program decribes the class initialization using non-static varibales
public class cls_init_using_nonstatic
{
static class Data {
private int month;
private String name;
*/
Data(int i,String str)
{
month = i;
name = str;
}
}
Data months[] = {
new Data(1,"January"),new Data(2,"Febrauary"), new Data(3,"March"),
new Data(4,"April"),new Data(5,"May"),new Data(6,"June")
};
// Main method starts here
public static void main(String args[]) {
final int N = Integer.parseInt(args[0]);
Timer t = new Timer();
cls_init_using_nonstatic x;
for(int i=1;i<=N;i++)
x = new cls_init_using_nonstatic();
t.print(“Total Time with non-static”);
}
}
Fig 1.1 Program which describes the class initialization using non-static variables.
This example takes 340 ms to run for creating 250000 objects. If we look closely at this
class, there is a potential inefficiency. The month number/name data found in months [] is
an instance variable of the class, that is, a copy of the data is found in every instance of
the class. Structuring the data in this way doesn’t make sense, in that the number/name
data never changes, and is the same across all class instances. So we can change the
program slightly, to turn the number/name data into a class variable, with a single copy
across all instances:
4
/* This program describes the class initialization using static variable */
public class cls_init_using_static
{
static class Data {
private int month;
private String name;
Data(int i,String str)
{
month = i;
name = str;
}
}
static Data months[ ] = {
new Data(1,"January"),new Data(2,"February"),new Data(3,"March"),
new Data(4,"April"),new Data(5,"May"),new Data(6,"June")
};
// Main method starts here
public static void main(String args[]) {
final int N = Integer.parseInt(args[0]);
Timer t = new Timer();
cls_init_using_static x;
for(int i=1;i<=N;i++)
x = new cls_init_using_static();
t.print(“Total Time with Static”);
}
}
Fig 1.2 Program which describes the class initialization using static variables.
This program takes 40 ms to run for creating 250000 objects, a saving of 8-1 over the
first approach. More over it saves a lot of space per class instance as well.
The following table shows the timings in milli seconds as a function of number of
elements.
Elements
1000
10000
100000
1000000
With Non Static
30
50
150
1281
With Static
0
10
30
90
Optimization 2 Reuse Objects
Object reuse is important for a couple of reasons. First, the creation of an object is a
costly operation in terms of memory allocation. As you know, the Garbage Collector in
the Java Virtual Machine is responsible for memory management. In terms of object
creation, the Garbage collector is responsible for allocating the amount of memory
required by an object. This means that the Garbage collector must determine the amount
space required by the object and then allocate the space. Since the Java language supports
5
inheritance, the memory requirement determination is achieved by “climbing” the
inheritance tree and looking for member variables. Once the top of tree is reached, the
appropriate space is allocated. In order for the garbage collector to manage the memory,
it needs to update the “memory table”. The memory table maintains the information
about the memory and the number of references to that space. Automatic memory
management comes with a price: the Garbage Collector needs to maintain the memory
table. This could be costly as the number of objects the Garbage Collector is responsible
for grows. Performance hits also may be incurred when the Garbage Collector releases an
Object.
Secondly, the creation of an object is a costly operation in terms of execution speed. Not
only the JVM have to climb the inheritance tree to determine the appropriate amount of
memory to allocate, it has to initialize the allocated memory. The object memory
initialization starts at the top of the inheritance tree and works down, calling the
constructor of each class in the tree, and finishing with the instantiated object’s
constructor. The following code demonstrates the amount of time spent in the creation of
N same java.lang.Object and N different java.lang.Object
/* This program describes the use of resuing objects */
import java.util.*;
public class obj_reuse
{
private static long delta;
static int iterations;
public static void main(String[] args)
{
iterations = Integer.parseInt(args[0]);
Timer t = new Timer();
// Creation of different objects
Object[] tmpObj = new Object[iterations];
for(int i=0;i<iterations;i++) {
tmpObj[i] = new Object();
}
t.print("time1");
t.reset();
// Reuse of the same objects
for(int i=0;i<=iterations;i++) {
Object tmpObject = new Object();
}
t.print("time2");
}
}
Fig 2.1 Program to describes the Object reuse
6
The time taken for creation of object by reusing one million times is 70 ms where as time
taken for creation of one million different objects is 1542 ms. From the above example,
object reuse is a good idea in almost every feasible case, from collections to
implementations of event listeners. However, reuse is especially important when you are
using objects, when you are using objects that are associated with system resources such
as sockets, streams and threads. The creation of an object associated with a system
resource is more costly than the creation of an object with no system resources.
The following table shows the timings in milli seconds as a function of number of
elements.
Elements
10000
100000
1000000
3.2
With out Object Reuse
20
70
1542
With Object Reuse
0
10
70
Methods
There is an intrinsic cost associated with calling Java methods. These costs involve actual
transfer of control to the method, parameter passing, value passing, and establishment of
the called method’s stack frame where local variables are stored. Such costs show up in
other languages as well. In this section we will look at a few performance issues with
methods.
Optimization 3 Inlining
Perhaps the most effective way to deal with method call overhead is method inlining,
either by a compiler doing it automatically, or doing it your self manually. Inlining is
done by expanding the inlined method’s code in the code that calls the method. Consider
the example shown below.
/* Example for describing the inline and method calls */
class A_001 {
int min(int a,int b)
{
return(a<b?a:b);
}
}
public class inline_method_opt
{
public static void main(String args[])
{
final int N=Integer.parseInt(args[0]);
int a=3,b=5,c;
A_001 a1 = new A_001();
//method call
7
Timer t = new Timer();
for(int i=1;i<=N;i++)
c = a1.min(a,b);
t.print("time for method call");
// inline
t.reset();
for(int i=1;i<=N;i++)
c=(a<b?a:b);
t.print("Time for inline method");
}
}
Fig 3.1 Program describes the inline and method calls
The first case takes 50ms and second one takes 10ms for N = 1000000. There are several
ways that compilers can perform automatic inlining. One way is to expand the called
method inline in the caller, which improve the speed at the expense of code space.
Another approach is more dynamic, where methods are inlined in a running program.
The following table shows the timings in milli seconds as a function of number of
elements.
Elements
10000
1000000
10000000
With Method Call
10
60
441
With Inline
0
10
120
One way you can help a compiler with inlining is to declare methods as final. That is
declaring that no subclass method overrides the method.
Optimization 4: Scope of variables impact performance
Performance can be improved by using local variables. Do not overuse class variables.
The following example shows the use of local variables over class variables.
// This program describes the use of variables with in the scope
public class scope_opt
{
static final int N=25000000;
public static void loop() {
int j=0;
int i;
for(i=0;i<N;i++)
j = j + 1;
}
static int k=0;
public static void loop1() {
8
int i;
for(i=0;i<N;i++)
k = k + 1;
}
public static void main(String[] args)
{
Timer t = new Timer();
loop();
t.print("using local variable");
t.reset();
loop1();
t.print("using non-local variable");
}
}
The following table shows the timings in milli seconds as a function of number of
elements.
Elements
2500000
25000000
250000000
Using Local Variable
30
290
2864
Using Non Local Variable
40
391
3936
3.3 Strings
Java provides built-in implementation for Strings and also provides the String Buffer
implementation for dynamic changes in length. Strings are widely used data type in the
Java language. Java strings are represented as objects of type String and store sequences
of 16-bit Unicode characters, along with the current string length.
Optimization 5 Strings are immutable
Perhaps the most important point about Java strings relative to performance is that strings
are immutable, that is, they never change after creation. For example, in this sequence:
String str = "testing";
str = str + "abc";
the string "testing", once created, does not change, but a reference to the string may
change. The string reference in str originally points to "testing", but then is changed to
point to a new string formed by concatenating str and "abc". The above sequence is
implemented internally using code like:
String str = "testing"; StringBuffer tmp = new StringBuffer (str);
tmp.append ("abc"); str = tmp.toString();
In other words, the two strings to be concatenated are copied to a temporary string buffer,
then copied back. Such copying is quite expensive. So a fundamental performance rule to
remember with strings is to use StringBuffer objects explicitly if you’re building up a
9
string. String concatenation operators like + and += are fine for casual use, but quite
expensive otherwise. This program illustrates the respective costs of + and
StringBuffer.append ():
// string append
public class str_app {
public static void main(String args[]) {
final int N = 10000;
// using +
Timer t = new Timer();
String s1 = "";
for(int i =1;i <=N;i++)
s1 = s1 + "*";
t.print("append using +");
// using StringBuffer
t.reset();
StringBuffer sb = new StringBuffer();
for(int i =1;i <=N;i++)
sb.append("*");
String s2 = sb.toString();
t.print("append using StringBuffer");
}
}
This program takes around 821 ms to run using the + operator and about 10ms using
StringBuffer.append(), a difference of 80-1
The following table shows the timings in milli seconds as a function of number of
elements.
Elements
1000
10000
100000
With +
30
821
195521
With StringBuffer.append()
0
10
30
Optimization 6: Using == and String.equals() to Compare Strings
If you’ve programmed in languages such as C++, that support overloaded operators, you might be
used to using the == operator to compare strings. You can also use this operator in Java
programming, but it won’t necessarily give you the results you expect. In the Java language, the
== operator, when applied to references, simply compares the references themselves for equality,
and not the referenced objects. For example, if you have strings:
String s1 = "abc";
String s2 = "def";
then the boolean expression:
s1 == s2
10
will be false, not because the string contents are unequal, but because the s1 and s2 references are
so. Conversely, if two references are equal using ==, then you can be sure that they refer to
identical objects. So if you are comparing strings, and there is a good chance of encountering
identical ones, then you can say:
if (s1 == s2 || s1.equals(s2)) ...
If the references are identical, this will short-circuit the equals() method call.
This technique illustrates a more general principle of performance – always perform a cheap test
before an expensive one, if you possibly can. In this example, == is much less expensive to
perform than equals ().
3.3 Input and Out Put Operations
I/O to the disk or the network is hundreds to thousands of times slower than I/O to computer
memory. Disk and n/w transfers are expensive activities, and are two of the most likely
candidates for performance problems. Two standard optimization techniques for reducing I/O
overhead are buffering and caching. For a given amount of data, I/O mechanism work more
efficiently if the data is transferred using few large chunks of data, rather than many small
chunks. Buffering groups into larger chunks, improving the efficiency of the I/O by reducing the
number of I/O operations that need to be executed.
Optimization 7 Buffering:
Perhaps the most important idea in improving I/O performance is buffering. Doing input
and output in large chunks of data instead of a byte character at time. To see what
difference this can make, consider the following program:
import java.io.*;
public class io_buf {
public static void main(String args[]) {
// one read() per character
FileInputStream fis = new FileInputStream("e:/ LectureNotes.doc");
int cnt = 0,c;
Timer t = new Timer();
while ((c = fis.read()) != -1)
if (c == ‘x’)
cnt++;
t.print("read() per character");
fis.close();
// buffered
fis = new FileInputStream("e:/ LectureNotes.doc");
byte buf[] = new byte[1024];
cnt =0;
int n;
t.reset();
while ((n = fis.read(buf)) > 0)
for(int i =0;i <n;i++)
11
if(c == ‘x’)
cnt++;
t.print("buffered");
fis.close();
}
}
This program uses two different approaches to count the number of “x” bytes in a file.
The first repeatedly calls read() on the input stream to grab individual bytes, while the
second reads 1024 – byte chunks of the file and then iterates through each chunk ,and
counts the bytes that way.
The first approach was taken 2313 ms to read a 455KB file, where as second approach
was taken 10ms to read the
Size of the file
186 KB
455 KB
With out Buffering
972
2313
With Buffering
0
10
Optimization 8: Buffered Reader
Suppose that you would like to count the number of text lines in a file. One way of doing
this is to say:
// comparison between File Input Stream and File Reader
import java.io.*;
public class io_opt
{
// Method 1 using InputStream Class
static void io_inputStream()
{
try
{
FileInputStream fis = new FileInputStream("e:/ LectureNotes.doc");
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream dis = new DataInputStream(bis);
int cnt = 0;
String line = new String();
int x=0,y=0,z=0;
while((line=dis.readLine()) !=null)
cnt++;
System.out.println("count1="+cnt);
bis.close();
fis.close();
}
12
catch(Exception e) {
}
}
// method 2 using Readers Class
static void io_Reader()
{
try{
FileReader fr = new FileReader("e:/ LectureNotes.doc");
BufferedReader br = new BufferedReader(fr);
int cnt = 0;
String line = new String();
int x=0,y=0,z=0;
while((line=br.readLine()) !=null)
cnt++;
br.close();
fr.close();
}
catch(Exception e){
}
}
public static void main(String[] args)
{
Timer t = new Timer();
io_inputStream();
t.print("Using File Input Stream:");
t.reset();
io_Reader();
t.print("Using Reader");
}
}
This example has two parts. The first part counts the number of lines available in a
455KB file using FileInputStream class. This is quite slow, in part because
DataInputStream class does a method call read() for each character. The second part runs
faster than the first part, and avoids a read() call for each character. readLine () in this
case grabs the underlying data in large buffered chunks.
First part has taken a time of 151ms, where as second part has taken a time of 110ms.
Size of the file
186 KB
455 KB
With BufferedInputStream
80
150
13
With BufferedReader
60
111
4. Libraries:
This section touches on some of the performance issues with using classes and methods
from the standard libraries.
Optimization 9: System.arraycopy()
System.arraycopy() is a method that supports efficient copying from one array to another.
For example, if you have two arrays vec1 and vec2, of length N, and you want to copy
from vec1 to vec2,you say:
System.arraycopy(vec1, 0, vec2, 0, N);
specifying the starting offset in each array.
It’s worth asking how much System.arraycopy() improves performance, over alternative
approaches for copying arrays. Here is a program that uses a copy loop,
System.arraycopy(), and Object.clone() to copy one array to another:
// copying arrays
public class lib_copy {
public static void main(String args[]) {
final int N = 5000000;
int vec1[] = new int[N];
for(int i =0;i <N;i++)
vec1[i] = i;
int vec2[] = new int[N];
// copy using loop
Timer t = new Timer();
for(int i =0;i <N;i++)
vec2[i] = vec1[i];
t.print("loop");
// copy using System.arraycopy()
t.reset();
System.arraycopy(vec1, 0, vec2, 0, N);
t.print("System.arraycopy");
// copy using Object.clone()
t.reset();
vec2 = (int[])vec1.clone();
t.print("clone");
}
}
The timings for various methods are:
Loop
System.arraycopy
Object.clone
231ms
140ms
180ms
14
For very short arrays, use of this method may be counterproductive, because of overhead
in actually calling the method, checking the methods, checking the method’s parameter
and so on. Object.clone () represents another approach to copy an array. Clone ()
allocates a new instance of the array and copies all the array elements. Note that
Object.clone () does a shallow copy, as does System.arraycopy (), so if the array elements
are object references, the references are copied and no copy is made of the referenced
objects.
Optimization 10: Vector vs Arrays
Java provides the Vector Class to create a list of objects in which elements can be
inserted, removed, indexed, enumerated, etc. This allows one to easily program
applications without having to worry about initial sizing, dynamic increase in size, and
managing data structures upon insertion and deletion. On the other hand use of arrays is
more cumbersome and can incur higher memory consumption if oversized, but can be
high in performance as shown in the example below.
We consider a simple example to test the performance of vectors and arrays. We first
define the coordinate class to consist of the pair (x, y) as integers and provide member
functions to initialize, update, and print out the values of the coordinate. We assume for
the purpose of this discussion that all valid coordinates have each of x and y as nonnegative.
In our main program we consider a list of coordinates. Initially we keep inserting a
specified number of coordinates and then delete a number of coordinates. Finally, we
print the first few coordinates in the list. We implemented this using an array of
coordinates and using a vector of coordinates. We provide sample listing of the
coordinate class and the main program that uses an array of coordinates. A complete
listing is given in Appendix B.
Array implementation :
…
CoArray = new coordinate[num_elements];
Coordinate
class coordinate {
int x;
int y;
public coordinate();
public coordinate(int x, int y);
public void set_values(int x, int y);
public int getxval();
public int getyval();
public void print_vals();
}
15
// create coordinates
for (i=0; i < num_elements; i++)
CoArray[i] = new coordinate(i,i);
//delete every third element
for (i=0; i < num_elements/3; i++)
CoArray[i].set_values(-1,-1);
//print the first 10 valid coordinates
while (count < 10) {
if (CoArray[i].getxval() > 0) {
CoArray[i].print_vals();
count++;
}
i++;
}
Now consider the vector implementation of the same (again the complete listing is given
in Appendix B). For the vector implementation we assume that the capacity increment is
a command line argument.
Vector :
…
CoVector = new Vector(cap_incr,cap_incr);
// create coordinates
for (i=0; i < num_elements; i++)
CoVector.addElement(new coordinate(i,i));
//delete every third element
count = 0;
for (i=0; i < num_elements/3; i++) {
j = i*3 – count;
CoVector.removeElementAt(j);
count++;
}
//print the first 10 valid coordinates
count = 0;
while (count < 10) {
x = CoVector.elementAt(count);
x.print_vals();
count++;
}
}
Note that for the vector class performance will be sensitive to the initial size and capacity
increment. We have observed that as these parameters increase the performance
improves.
The table below provides timings in milliseconds as a function of the number of
elements. Under vector we show timings for capacity increment also (we assume initial
number of elements is same as capacity increment).
Elements
1,000
10,000
100,000
Arr
ay
(ms)
20
40
130
1.1.1
Vector
Inc=10
100
1,000
10,000
30
380
82008
20
220
24926
20
200
18557
200
17195
100,000
16924
As can be seen the array implementation, though slightly more tedious to program,
significantly outperforms the vector implementation. In particular, as the number of
elements increases the performance gap also increases dramatically. Therefore, we
strongly recommend use of arrays instead of vectors.
16
Optimization 11: ArrayList
The class java.util.Vector is used to represent lists of object references, with support for
dynamic expansion of the vector, random access to vector elements, and so on. A newer
scheme is the Java collection framework, which includes a class ArrayList that can be
used in place of Vector. Some of the performance differences between Vector and
ArrayList include:
• Vector’s methods are synchronized, ArrayList’s are not. This means that Vector is
thread-safe, at some extra cost
• The collection framework provides an alternative to ArrayList called LinkedList, which
offers different performance tradeoffs.
• When Vector needs to grow its internal data structure to hold more elements, the size of
the structure is doubled, whereas for ArrayList, the size is increased by 50%. So
ArrayList is more conservative in its use of space.
It’s worth using the collection framework in your applications if you possibly can,
because it’s now the "standard" way to handle collections. If you are concerned about
thread safety, one way to handle this issue is to use wrappers around objects like
ArrayList, for example:
List list = Collections.synchronizedList (new ArrayList ());
This technique makes list thread-safe.
Collection classes like ArrayList periodically must grow their internal data structures to
accommodate
new elements. This process is automatic, and normally you don’t need to worry about it.
But if you have a
very large array, and you know in advance that it’s going to be large, then you can speed
things up a bit by
calling ensureCapacity() to set the size of the array. An example:
// ensureCapacity()
import java.util.*;
public class lib_cap {
public static void main(String args[]) {
final int N = 1000000;
Object obj = new Object();
ArrayList list = new ArrayList();
Timer t = new Timer();
for(int i =1;i <=N;i++)
list.add(obj);
t.print("without ensurecapacity");
17
list = new ArrayList();
t.reset();
list.ensureCapacity(N);
for(int i =1;i <=N;i++)
list.add(obj);
t.print("with ensurecapacity");
}
}
Calling ensureCapacity () means that ArrayList will not have to keep growing the internal
structures as list elements are added. Of course, if you call ensureCapacity () when you
don’t really need it, you may end up wasting a lot of space. The time taken for the first
case is 681ms where as second is taken 160ms.
Elements
Array List
100000
1000000
5000000
80
671
2844
ArrayList
capacity
20
180
771
with
initial
Optimization 12: ArrayList vs. LinkedList
The Java collection framework provides two classes for handling lists of data items,
ArrayList and LinkedList. The first of these is conceptually like an array, the second like
a linked data structure. An ArrayList is implemented using an internal array of Object[],
while a LinkedList uses a series of internal records linked together. These two classes
have very different performance characteristics, as illustrated by a couple of examples.
The first deals with inserting new elements at position 0 in a list:
import java.util.*;
public class lib_list1 {
public static void main(String args[]) {
final int N = 25000;
// ArrayList
ArrayList al = new ArrayList();
Timer t = new Timer();
for(int i =1;i <=N;i++)
al.add(0, new Integer(i));
t.print("arraylist");
// LinkedList
LinkedList ll = new LinkedList();
t.reset();
for(int i =1;i <=N;i++)
ll.add(0, new Integer(i));
t.print("linkedlist");
}
18
}
In this example the times are as follows:
Array List
3115ms
Linked List
50ms
Inserting elements at the beginning of an ArrayList requires that all-existing element to
be pushed down. But inserting at the beginning of LinkedList is cheap, because the
elements of the structure are connected with each other via links, an it’s easy to create a
new element and link it with the current element at the head of the list.
The second example does random lookup of elements already in a structure.
import java.util.*;
public class lib_list2 {
public static void main(String args[]) {
final int N = 25000;
Object o;
// ArrayList
ArrayList al = new ArrayList();
for(int i =0;i <N;i++)
al.add(new Integer(i));
Timer t = new Timer();
for(int i =0;i <N;i++)
o = al.get(i);
t.print("arraylist");
// LinkedList
LinkedList ll = new LinkedList();
for(int i =0;i <N;i++)
ll.add(new Integer(i));
t.reset();
for(int i =0;i <N;i++)
o = ll.get(i);
t.print("linkedlist");
}
}
The running times here are :
ArrayList
LinkedList
3155ms
8492ms
5. Compilation and Run time Optimization
5.1 Optimization performed when using the –O option
19
The only standard compile-time option that can improve the performance with the
JDK compiler is the –O option. Note that –O is a common option for the compilers,
and further optimizing options for other compilers often take the –O1,-O2 etc. You
should always check your compiler’s documentation to find what other options are
available and what they do. Some compilers allow you to make the choice between
optimizing the compiled code for speed or minimizing the size; there is often a
tradeoff between these two aspects. The standard –O option does not currently apply
a variety of optimizations in the Sun JDK (up to JDK1.2). Currently the option makes
the compiler eliminate optional tables in the .class files, such as line number and local
variable tables; this gives a small performance improvement by making class files
smaller and therefore quicker to load. You should definitely use this option if your
class files are sent across a network. But the main performance improvement of using
the –O option comes from the compiler inline method. When using the –O option the
compiler considers inlining methods defined with any of the following modifiers :
Final, Static or Private.
Choosing simple methods to inline does have a rationale behind it. The larger the
method being inlined, the more the code gets bloated with copies of the same code
being inserted in many places. This has runtime costs in extra code being loaded and
extra space taken by the run time system. A JIT VM would also have the extra cost of
having to compile more code. At some point, there is decrease in performance from
inlining too much code. The compiler applies its methodology for selecting methods
to inline, irrespective of whether the target method is in a bottleneck. A performance
tuner applying inlining works the other way around. First finding the bottlenecks,
then selectively inlining methods inside bottlenecks. This later stage can result in
good speedups, especially in loop bottlenecks. This is because a loop can be speeded
up significantly by removing the overhead of a repeated method call.
5.2 Performance Effects From Runtime Options
Some run time options can help your application to run faster. These include:
 Options that allow the VM to have a bigger footprint ( -Xmx/ -mx).
 -noverify, which eliminates the overhead of verifying classes at classload time.
Heap
The Java Virtual Machine has a heap that is shared among all threads. The heap is the
runtime data area from which memory for all class instances and arrays is allocated.
The Java heap is created on virtual machine start-up. Heap storage for objects is
reclaimed by an automatic storage management system (typically a garbage
collector); objects are never explicitly de-allocated. The Java Virtual Machine
assumes no particular type of automatic storage management system, and the storage
management technique may be chosen according to the implementor's system
requirements. The Java heap may be of a fixed size, or may be expanded as required
by the computation and may be contracted if a larger heap becomes unnecessary. The
memory for the Java heap does not need to be contiguous.
20
A Java Virtual Machine implementation may provide the programmer or the user
control over the initial size of the heap, as well as, if the heap can be dynamically
expanded or contracted, control over the maximum and minimum heap size.
The following exceptional condition is associated with the Java heap:
 If a computation requires more Java heap than can be made available by the
automatic storage management system, the Java Virtual Machine throws an
OutOfMemoryError.
Sun's JDK 1.0.2 implementation of the Java Virtual Machine dynamically expands its
Java heap as required by the computation, but never contracts its heap. Its initial and
maximum sizes may be specified on virtual machine start-up using the "-ms" and "mx" flags, respectively.
Increasing the maximum heap size beyond the default of 16MB usually improves
performance for applications that can use the extra space. However, there is a tradeoff
in higher space-management costs to the VM and at some point there is no longer any
benefit in increasing the maximum heap size. Increasing the heap size actually causes
the garbage collection to take longer, as it needs to examine more objects and a larger
space. We have found that no better method than trial and error to determine optimal
maximum heap sizes for any particular applications.
21
Appendix A
There are a variety of Java Performance analysis tools available today. One that comes
with JDK 1.2.2 is invoked by saying: $ java –Xrunhprof prog
The default profile output gained from executing with –Xhrunprof in Java 2 is not useful
for method profiling. The default output generates objects-creation statistics from the
heap as the dump. By default dump occurs when an application terminates; you can
modify the dump time by typing Ctrl-Break on win32. To get a useful method profile,
you need to modify the profiler options to specify method profiling. A typical call to
achieve this is:
$ java –Xrunhprof:cpu=times prog
with the results written into a file java.hprof.txt. This tool provides information on the
time spent on each method in application. See Appendix A for more details about this
tool.
For example if you run this tool on a class string_opt.class. The following output writes
into the java.hprof.txt file.
CPU TIME (ms) BEGIN (total = 420) Tue Dec 26 20:58:49 2000
rank self
accum
count trace method
1
9.52% 9.52%
213
13
java.util.jar.Attributes.read
2
7.14% 16.67%
7860 18
java.util.jar.Attributes$Name.isValid
3
7.14% 23.81%
637
8
java.lang.String.toLowerCase
4
4.76% 28.57%
637 26
java.util.jar.Attributes.putValue
5
4.76% 33.33%
8508 16
java.lang.Character.toLowerCase
6
4.76% 38.10%
8146
7
java.util.jar.Attributes$Name.isAlpha
7
4.76% 42.86%
1
3
java.util.jar.Manifest.read
8
2.38% 45.24%
2 21
java.lang.ClassLoader.initializePath
9
2.38% 47.62%
5 30
java.lang.String.endsWith
10
2.38% 50.00%
638 23
java.lang.String.equals
11
2.38% 52.38%
2 4
java.util.jar.JarFile.getEntry
12
2.38% 54.76%
637 14
java.util.jar.Attributes$Name.isValid
13
2.38% 57.14%
1 25
sun.net.www.URLConnection.<init>
14
2.38% 59.52%
17 10
java.lang.String.intern
15
2.38% 61.90%
848 32
java.util.jar.Manifest.toLower
16
2.38% 64.29%
2 24
java.io.Win32FileSystem.normalize
17
2.38% 66.67%
213 17
java.util.AbstractMap.<init>
18
2.38% 69.05%
4 9
java.io.Win32FileSystem.normalize
19
2.38% 71.43%
1 31
io.ByteToCharISO8859_1.convert
22
20
2.38% 73.81%
21
2.38% 76.19%
22
2.38% 78.57%
23
2.38% 80.95%
24
2.38% 83.33%
25
2.38% 85.71%
26
2.38% 88.10%
27
2.38% 90.48%
28
2.38% 92.86%
29
2.38% 95.24%
30
2.38% 97.62%
31
2.38% 100.00%
CPU TIME (ms) END
1
212
1
401
638
638
854
637
44
1
637
637
2
11
12
28
22
5
20
15
6
27
29
19
com.sun.rsajca.Provider.<init>
java.util.jar.Manifest.parseName
java.security.AccessController.doPrivileged
java.lang.StringBuffer.append
java.lang.String.<init>
java.lang.System.arraycopy
java.lang.System.arraycopy
java.util.jar.Attributes$Name.hashCode
java.security.Provider.put
java.security.Security.<clinit>
java.util.HashMap.put
java.lang.String.toLowerCase
Where
Rank: Simply counts the entries in the table, starting with 1 at the top, and incrementing
by 1 for each entry.
Self:
The self-field is usually interpreted as a percentage of the total running time spent
in this method.
Accum: This field is running additive total of the self field percentages as you go down
the table.
Count: This field indicates how many times the unique stack trace that gave rise this
entry was sampled while the program ran.
Trace : This field shows the unique trace identifier from the second section of the profile
output that generated entry. This trace is recorded only once in the second section
no matter how many times the it is sampled; the number of times that this trace
has been sampled is listed in the count field.
Method: This field shows the method name from the top line of the stack trace referred to
from the trace field, i.e the method that was running when the stack was sampled.
Example :
rank self
1
11.55%
accum
11.55%
count
18382
trace
545
method
java/lang/*.dtoa
This example show that the stack trace 545 occurred in 18,382 of the sample stack traces,
this is 11.55% of the total number of stack trace samples made. This method indicates
that this method was probably executing for about 11.55 % of the application execution
time, because the samples are at regular time intervals.
23