Download Elements - IndiaStudyChannel.com

CORPORATE PERFORMANCE ENGINEERING Java Performance Optimization - Senthil Kumar N 1. Introduction This course presents different ways to improve the performance of your Java applications. These techniques focus on the Java language and libraries. Performance is defined to include both speed and space that is how to make your programs faster, while using less memory. This course describes a variety of performance issues and gives some hard numbers about how specific performance improvements work out. It should be noted up that there is no way to present totally definitive advice on performance, because various applications have different performance characteristics and bottlenecks, and because performance varies across different hardware, operating systems, and Java development tools such as compilers and virtual machines. The Java programming language is still evolving, and its performance continues to improve. The ultimate aim of this course is to promote awareness of Java performance issues, so that you can make appropriate design and implementation choices for specific applications. 1.1 Why Is It Slow? There are overheads in the Java run time system, mainly due to virtual machine layer that abstracts the Java away from underlying hardware. It is also true that there are overheads from Java’s dynamic nature. These overheads can cause the Java application to run slower than the other application written in a lower level language like C. Java’s advantages namely, it’s platform independence, memory management, powerful exception checking, built-in multi threading, dynamic resource loading, and security checks- add costs in terms of an interpreter, garbage collector, thread monitors, repeated disk and network accessing, and extra run time checks. For example, hierarchical method invocation requires an extra computation for every method call, because the runtime system has to work out which of the possible methods in the hierarchy is the actual target of the call. Most modern CPUs are designed to be optimized for fixed call and branch targets and do not perform as well as when a significant percentage of calls need to be computed on the fly. Java language features that cause these overheads may be the features that persuaded you to use Java in the first place. The important thing is that none of these overheads slow down too much. But the key point with Java is that a good round of performance tuning normally makes your application run as fast as you need it to run. 1.2 System Limitations and What to Tune Three resources limit all applications: 1. CPU speed and availability. 2. System Memory 3. Disk (and network) input/output 1 When tuning an application, the first step is to determine which of these is causing your application to run too slowly. If your application is CPU-bound, you need to concentrate your efforts on the code, looking for bottlenecks, inefficient algorithms, too many shortlived objects, and other problems, which will cover in this course. If your application is hitting system-memory limits, it may be paging sections in and out of main memory. In this case the problem may be caused by too many objects, or even just few large objects, being erroneously held in memory: by too many large arrays being allocated; or by design of the application, which may need to be reexamined to reduce its running memory footprint. On the other hand, external data access or writing to the disk can be slowing your application. In this case, you need to look at exactly what you are doing to the disks that is slowing the application: first identify the operations, then determine the problems, and finally eliminate or change these to improve the situation. 1.3 What to Measure The main measurement is always wall-clock time. You should use this measurement to specify all benchmarks, as it’s the real-time interval that is most appreciated by the users. 2. Overview 2.1 Scope Of the course Assuming basic knowledge of programming concepts in Java. This course aims at discussing important aspects of optimizing code, which control the performance of a Java application. 2.2 Performance Issues Not Covered in this course This course describes a set of techniques, rooted in the Java language and libraries. There are other areas of performance mentioned only in passing. The first of these is algorithm performance. If your application contains fundamentally slow algorithm these techniques may not help you. The other area is architecture. Sometimes poor performance is literally “built in” to an application, making it very difficult to do any useful performance improvement by tuning. 2.3 Environment and Tools Used in code Examples The examples in this course were developed against the Java Development Kit 1.2.2 from Sun that implements the Java 2 version of the language and libraries. The JDK was run on a Windows NT 4.0 system, a 450 MHz Pentium with 128 MB of memory. Compilation was done by C:\ javac prog.java and examples run with C:\ java prog 2 2.4 How Examples were timed The examples in this course are measured using a special Timer class, defined as follows: class Timer { long t; public Timer() { reset(); } public void reset() { t = System.currentTimeMillis(); } public long elapsed() { return System.currentTimeMillis()-t; } public void print(String s){ System.out.println(s+":"+elapsed()); } } 2.5Performance Analysis Tools There are a variety of Java Performance analysis tools available today. One that comes with JDK 1.2.2 is invoked by saying: $ java –Xrunhprof prog The default profile output gained from executing with –Xhrunprof in Java 2 is not useful for method profiling. The default output generates objects-creation statistics from the heap as the dump. By default dump occurs when an application terminates; you can modify the dump time by typing Ctrl-Break on win32. To get a useful method profile, you need to modify the profiler options to specify method profiling. A typical call to achieve this is: $ java –Xrunhprof:cpu=times prog with the results written into a file java.hprof.txt. This tool provides information on the time spent on each method in application. See Appendix A for more details about this tool. Another tool is: $ javap – c prog used to display Java Virtual Machine bytecodes. Using javap you can check for the presence of particular constructs in your program, such as generated constructors. 3 3. Optimization Techniques: 3.1 Classes Optimization 1 Class and Instance Initialization When an instance of a class is created using new, initialization of the class’s instance variables (variables unique to each instance) must be done. By contrast class variables (those declared static and shared across instances) need only be initialized once, conceptually at program invocation time. The difference between these types of initialization is quite important, as the example illustrates: /* This program decribes the class initialization using non-static varibales public class cls_init_using_nonstatic { static class Data { private int month; private String name; */ Data(int i,String str) { month = i; name = str; } } Data months[] = { new Data(1,"January"),new Data(2,"Febrauary"), new Data(3,"March"), new Data(4,"April"),new Data(5,"May"),new Data(6,"June") }; // Main method starts here public static void main(String args[]) { final int N = Integer.parseInt(args[0]); Timer t = new Timer(); cls_init_using_nonstatic x; for(int i=1;i<=N;i++) x = new cls_init_using_nonstatic(); t.print(“Total Time with non-static”); } } Fig 1.1 Program which describes the class initialization using non-static variables. This example takes 340 ms to run for creating 250000 objects. If we look closely at this class, there is a potential inefficiency. The month number/name data found in months [] is an instance variable of the class, that is, a copy of the data is found in every instance of the class. Structuring the data in this way doesn’t make sense, in that the number/name data never changes, and is the same across all class instances. So we can change the program slightly, to turn the number/name data into a class variable, with a single copy across all instances: 4 /* This program describes the class initialization using static variable */ public class cls_init_using_static { static class Data { private int month; private String name; Data(int i,String str) { month = i; name = str; } } static Data months[ ] = { new Data(1,"January"),new Data(2,"February"),new Data(3,"March"), new Data(4,"April"),new Data(5,"May"),new Data(6,"June") }; // Main method starts here public static void main(String args[]) { final int N = Integer.parseInt(args[0]); Timer t = new Timer(); cls_init_using_static x; for(int i=1;i<=N;i++) x = new cls_init_using_static(); t.print(“Total Time with Static”); } } Fig 1.2 Program which describes the class initialization using static variables. This program takes 40 ms to run for creating 250000 objects, a saving of 8-1 over the first approach. More over it saves a lot of space per class instance as well. The following table shows the timings in milli seconds as a function of number of elements. Elements 1000 10000 100000 1000000 With Non Static 30 50 150 1281 With Static 0 10 30 90 Optimization 2 Reuse Objects Object reuse is important for a couple of reasons. First, the creation of an object is a costly operation in terms of memory allocation. As you know, the Garbage Collector in the Java Virtual Machine is responsible for memory management. In terms of object creation, the Garbage collector is responsible for allocating the amount of memory required by an object. This means that the Garbage collector must determine the amount space required by the object and then allocate the space. Since the Java language supports 5 inheritance, the memory requirement determination is achieved by “climbing” the inheritance tree and looking for member variables. Once the top of tree is reached, the appropriate space is allocated. In order for the garbage collector to manage the memory, it needs to update the “memory table”. The memory table maintains the information about the memory and the number of references to that space. Automatic memory management comes with a price: the Garbage Collector needs to maintain the memory table. This could be costly as the number of objects the Garbage Collector is responsible for grows. Performance hits also may be incurred when the Garbage Collector releases an Object. Secondly, the creation of an object is a costly operation in terms of execution speed. Not only the JVM have to climb the inheritance tree to determine the appropriate amount of memory to allocate, it has to initialize the allocated memory. The object memory initialization starts at the top of the inheritance tree and works down, calling the constructor of each class in the tree, and finishing with the instantiated object’s constructor. The following code demonstrates the amount of time spent in the creation of N same java.lang.Object and N different java.lang.Object /* This program describes the use of resuing objects */ import java.util.*; public class obj_reuse { private static long delta; static int iterations; public static void main(String[] args) { iterations = Integer.parseInt(args[0]); Timer t = new Timer(); // Creation of different objects Object[] tmpObj = new Object[iterations]; for(int i=0;i<iterations;i++) { tmpObj[i] = new Object(); } t.print("time1"); t.reset(); // Reuse of the same objects for(int i=0;i<=iterations;i++) { Object tmpObject = new Object(); } t.print("time2"); } } Fig 2.1 Program to describes the Object reuse 6 The time taken for creation of object by reusing one million times is 70 ms where as time taken for creation of one million different objects is 1542 ms. From the above example, object reuse is a good idea in almost every feasible case, from collections to implementations of event listeners. However, reuse is especially important when you are using objects, when you are using objects that are associated with system resources such as sockets, streams and threads. The creation of an object associated with a system resource is more costly than the creation of an object with no system resources. The following table shows the timings in milli seconds as a function of number of elements. Elements 10000 100000 1000000 3.2 With out Object Reuse 20 70 1542 With Object Reuse 0 10 70 Methods There is an intrinsic cost associated with calling Java methods. These costs involve actual transfer of control to the method, parameter passing, value passing, and establishment of the called method’s stack frame where local variables are stored. Such costs show up in other languages as well. In this section we will look at a few performance issues with methods. Optimization 3 Inlining Perhaps the most effective way to deal with method call overhead is method inlining, either by a compiler doing it automatically, or doing it your self manually. Inlining is done by expanding the inlined method’s code in the code that calls the method. Consider the example shown below. /* Example for describing the inline and method calls */ class A_001 { int min(int a,int b) { return(a<b?a:b); } } public class inline_method_opt { public static void main(String args[]) { final int N=Integer.parseInt(args[0]); int a=3,b=5,c; A_001 a1 = new A_001(); //method call 7 Timer t = new Timer(); for(int i=1;i<=N;i++) c = a1.min(a,b); t.print("time for method call"); // inline t.reset(); for(int i=1;i<=N;i++) c=(a<b?a:b); t.print("Time for inline method"); } } Fig 3.1 Program describes the inline and method calls The first case takes 50ms and second one takes 10ms for N = 1000000. There are several ways that compilers can perform automatic inlining. One way is to expand the called method inline in the caller, which improve the speed at the expense of code space. Another approach is more dynamic, where methods are inlined in a running program. The following table shows the timings in milli seconds as a function of number of elements. Elements 10000 1000000 10000000 With Method Call 10 60 441 With Inline 0 10 120 One way you can help a compiler with inlining is to declare methods as final. That is declaring that no subclass method overrides the method. Optimization 4: Scope of variables impact performance Performance can be improved by using local variables. Do not overuse class variables. The following example shows the use of local variables over class variables. // This program describes the use of variables with in the scope public class scope_opt { static final int N=25000000; public static void loop() { int j=0; int i; for(i=0;i<N;i++) j = j + 1; } static int k=0; public static void loop1() { 8 int i; for(i=0;i<N;i++) k = k + 1; } public static void main(String[] args) { Timer t = new Timer(); loop(); t.print("using local variable"); t.reset(); loop1(); t.print("using non-local variable"); } } The following table shows the timings in milli seconds as a function of number of elements. Elements 2500000 25000000 250000000 Using Local Variable 30 290 2864 Using Non Local Variable 40 391 3936 3.3 Strings Java provides built-in implementation for Strings and also provides the String Buffer implementation for dynamic changes in length. Strings are widely used data type in the Java language. Java strings are represented as objects of type String and store sequences of 16-bit Unicode characters, along with the current string length. Optimization 5 Strings are immutable Perhaps the most important point about Java strings relative to performance is that strings are immutable, that is, they never change after creation. For example, in this sequence: String str = "testing"; str = str + "abc"; the string "testing", once created, does not change, but a reference to the string may change. The string reference in str originally points to "testing", but then is changed to point to a new string formed by concatenating str and "abc". The above sequence is implemented internally using code like: String str = "testing"; StringBuffer tmp = new StringBuffer (str); tmp.append ("abc"); str = tmp.toString(); In other words, the two strings to be concatenated are copied to a temporary string buffer, then copied back. Such copying is quite expensive. So a fundamental performance rule to remember with strings is to use StringBuffer objects explicitly if you’re building up a 9 string. String concatenation operators like + and += are fine for casual use, but quite expensive otherwise. This program illustrates the respective costs of + and StringBuffer.append (): // string append public class str_app { public static void main(String args[]) { final int N = 10000; // using + Timer t = new Timer(); String s1 = ""; for(int i =1;i <=N;i++) s1 = s1 + "*"; t.print("append using +"); // using StringBuffer t.reset(); StringBuffer sb = new StringBuffer(); for(int i =1;i <=N;i++) sb.append("*"); String s2 = sb.toString(); t.print("append using StringBuffer"); } } This program takes around 821 ms to run using the + operator and about 10ms using StringBuffer.append(), a difference of 80-1 The following table shows the timings in milli seconds as a function of number of elements. Elements 1000 10000 100000 With + 30 821 195521 With StringBuffer.append() 0 10 30 Optimization 6: Using == and String.equals() to Compare Strings If you’ve programmed in languages such as C++, that support overloaded operators, you might be used to using the == operator to compare strings. You can also use this operator in Java programming, but it won’t necessarily give you the results you expect. In the Java language, the == operator, when applied to references, simply compares the references themselves for equality, and not the referenced objects. For example, if you have strings: String s1 = "abc"; String s2 = "def"; then the boolean expression: s1 == s2 10 will be false, not because the string contents are unequal, but because the s1 and s2 references are so. Conversely, if two references are equal using ==, then you can be sure that they refer to identical objects. So if you are comparing strings, and there is a good chance of encountering identical ones, then you can say: if (s1 == s2 || s1.equals(s2)) ... If the references are identical, this will short-circuit the equals() method call. This technique illustrates a more general principle of performance – always perform a cheap test before an expensive one, if you possibly can. In this example, == is much less expensive to perform than equals (). 3.3 Input and Out Put Operations I/O to the disk or the network is hundreds to thousands of times slower than I/O to computer memory. Disk and n/w transfers are expensive activities, and are two of the most likely candidates for performance problems. Two standard optimization techniques for reducing I/O overhead are buffering and caching. For a given amount of data, I/O mechanism work more efficiently if the data is transferred using few large chunks of data, rather than many small chunks. Buffering groups into larger chunks, improving the efficiency of the I/O by reducing the number of I/O operations that need to be executed. Optimization 7 Buffering: Perhaps the most important idea in improving I/O performance is buffering. Doing input and output in large chunks of data instead of a byte character at time. To see what difference this can make, consider the following program: import java.io.*; public class io_buf { public static void main(String args[]) { // one read() per character FileInputStream fis = new FileInputStream("e:/ LectureNotes.doc"); int cnt = 0,c; Timer t = new Timer(); while ((c = fis.read()) != -1) if (c == ‘x’) cnt++; t.print("read() per character"); fis.close(); // buffered fis = new FileInputStream("e:/ LectureNotes.doc"); byte buf[] = new byte[1024]; cnt =0; int n; t.reset(); while ((n = fis.read(buf)) > 0) for(int i =0;i <n;i++) 11 if(c == ‘x’) cnt++; t.print("buffered"); fis.close(); } } This program uses two different approaches to count the number of “x” bytes in a file. The first repeatedly calls read() on the input stream to grab individual bytes, while the second reads 1024 – byte chunks of the file and then iterates through each chunk ,and counts the bytes that way. The first approach was taken 2313 ms to read a 455KB file, where as second approach was taken 10ms to read the Size of the file 186 KB 455 KB With out Buffering 972 2313 With Buffering 0 10 Optimization 8: Buffered Reader Suppose that you would like to count the number of text lines in a file. One way of doing this is to say: // comparison between File Input Stream and File Reader import java.io.*; public class io_opt { // Method 1 using InputStream Class static void io_inputStream() { try { FileInputStream fis = new FileInputStream("e:/ LectureNotes.doc"); BufferedInputStream bis = new BufferedInputStream(fis); DataInputStream dis = new DataInputStream(bis); int cnt = 0; String line = new String(); int x=0,y=0,z=0; while((line=dis.readLine()) !=null) cnt++; System.out.println("count1="+cnt); bis.close(); fis.close(); } 12 catch(Exception e) { } } // method 2 using Readers Class static void io_Reader() { try{ FileReader fr = new FileReader("e:/ LectureNotes.doc"); BufferedReader br = new BufferedReader(fr); int cnt = 0; String line = new String(); int x=0,y=0,z=0; while((line=br.readLine()) !=null) cnt++; br.close(); fr.close(); } catch(Exception e){ } } public static void main(String[] args) { Timer t = new Timer(); io_inputStream(); t.print("Using File Input Stream:"); t.reset(); io_Reader(); t.print("Using Reader"); } } This example has two parts. The first part counts the number of lines available in a 455KB file using FileInputStream class. This is quite slow, in part because DataInputStream class does a method call read() for each character. The second part runs faster than the first part, and avoids a read() call for each character. readLine () in this case grabs the underlying data in large buffered chunks. First part has taken a time of 151ms, where as second part has taken a time of 110ms. Size of the file 186 KB 455 KB With BufferedInputStream 80 150 13 With BufferedReader 60 111 4. Libraries: This section touches on some of the performance issues with using classes and methods from the standard libraries. Optimization 9: System.arraycopy() System.arraycopy() is a method that supports efficient copying from one array to another. For example, if you have two arrays vec1 and vec2, of length N, and you want to copy from vec1 to vec2,you say: System.arraycopy(vec1, 0, vec2, 0, N); specifying the starting offset in each array. It’s worth asking how much System.arraycopy() improves performance, over alternative approaches for copying arrays. Here is a program that uses a copy loop, System.arraycopy(), and Object.clone() to copy one array to another: // copying arrays public class lib_copy { public static void main(String args[]) { final int N = 5000000; int vec1[] = new int[N]; for(int i =0;i <N;i++) vec1[i] = i; int vec2[] = new int[N]; // copy using loop Timer t = new Timer(); for(int i =0;i <N;i++) vec2[i] = vec1[i]; t.print("loop"); // copy using System.arraycopy() t.reset(); System.arraycopy(vec1, 0, vec2, 0, N); t.print("System.arraycopy"); // copy using Object.clone() t.reset(); vec2 = (int[])vec1.clone(); t.print("clone"); } } The timings for various methods are: Loop System.arraycopy Object.clone 231ms 140ms 180ms 14 For very short arrays, use of this method may be counterproductive, because of overhead in actually calling the method, checking the methods, checking the method’s parameter and so on. Object.clone () represents another approach to copy an array. Clone () allocates a new instance of the array and copies all the array elements. Note that Object.clone () does a shallow copy, as does System.arraycopy (), so if the array elements are object references, the references are copied and no copy is made of the referenced objects. Optimization 10: Vector vs Arrays Java provides the Vector Class to create a list of objects in which elements can be inserted, removed, indexed, enumerated, etc. This allows one to easily program applications without having to worry about initial sizing, dynamic increase in size, and managing data structures upon insertion and deletion. On the other hand use of arrays is more cumbersome and can incur higher memory consumption if oversized, but can be high in performance as shown in the example below. We consider a simple example to test the performance of vectors and arrays. We first define the coordinate class to consist of the pair (x, y) as integers and provide member functions to initialize, update, and print out the values of the coordinate. We assume for the purpose of this discussion that all valid coordinates have each of x and y as nonnegative. In our main program we consider a list of coordinates. Initially we keep inserting a specified number of coordinates and then delete a number of coordinates. Finally, we print the first few coordinates in the list. We implemented this using an array of coordinates and using a vector of coordinates. We provide sample listing of the coordinate class and the main program that uses an array of coordinates. A complete listing is given in Appendix B. Array implementation : … CoArray = new coordinate[num_elements]; Coordinate class coordinate { int x; int y; public coordinate(); public coordinate(int x, int y); public void set_values(int x, int y); public int getxval(); public int getyval(); public void print_vals(); } 15 // create coordinates for (i=0; i < num_elements; i++) CoArray[i] = new coordinate(i,i); //delete every third element for (i=0; i < num_elements/3; i++) CoArray[i].set_values(-1,-1); //print the first 10 valid coordinates while (count < 10) { if (CoArray[i].getxval() > 0) { CoArray[i].print_vals(); count++; } i++; } Now consider the vector implementation of the same (again the complete listing is given in Appendix B). For the vector implementation we assume that the capacity increment is a command line argument. Vector : … CoVector = new Vector(cap_incr,cap_incr); // create coordinates for (i=0; i < num_elements; i++) CoVector.addElement(new coordinate(i,i)); //delete every third element count = 0; for (i=0; i < num_elements/3; i++) { j = i*3 – count; CoVector.removeElementAt(j); count++; } //print the first 10 valid coordinates count = 0; while (count < 10) { x = CoVector.elementAt(count); x.print_vals(); count++; } } Note that for the vector class performance will be sensitive to the initial size and capacity increment. We have observed that as these parameters increase the performance improves. The table below provides timings in milliseconds as a function of the number of elements. Under vector we show timings for capacity increment also (we assume initial number of elements is same as capacity increment). Elements 1,000 10,000 100,000 Arr ay (ms) 20 40 130 1.1.1 Vector Inc=10 100 1,000 10,000 30 380 82008 20 220 24926 20 200 18557 200 17195 100,000 16924 As can be seen the array implementation, though slightly more tedious to program, significantly outperforms the vector implementation. In particular, as the number of elements increases the performance gap also increases dramatically. Therefore, we strongly recommend use of arrays instead of vectors. 16 Optimization 11: ArrayList The class java.util.Vector is used to represent lists of object references, with support for dynamic expansion of the vector, random access to vector elements, and so on. A newer scheme is the Java collection framework, which includes a class ArrayList that can be used in place of Vector. Some of the performance differences between Vector and ArrayList include: • Vector’s methods are synchronized, ArrayList’s are not. This means that Vector is thread-safe, at some extra cost • The collection framework provides an alternative to ArrayList called LinkedList, which offers different performance tradeoffs. • When Vector needs to grow its internal data structure to hold more elements, the size of the structure is doubled, whereas for ArrayList, the size is increased by 50%. So ArrayList is more conservative in its use of space. It’s worth using the collection framework in your applications if you possibly can, because it’s now the "standard" way to handle collections. If you are concerned about thread safety, one way to handle this issue is to use wrappers around objects like ArrayList, for example: List list = Collections.synchronizedList (new ArrayList ()); This technique makes list thread-safe. Collection classes like ArrayList periodically must grow their internal data structures to accommodate new elements. This process is automatic, and normally you don’t need to worry about it. But if you have a very large array, and you know in advance that it’s going to be large, then you can speed things up a bit by calling ensureCapacity() to set the size of the array. An example: // ensureCapacity() import java.util.*; public class lib_cap { public static void main(String args[]) { final int N = 1000000; Object obj = new Object(); ArrayList list = new ArrayList(); Timer t = new Timer(); for(int i =1;i <=N;i++) list.add(obj); t.print("without ensurecapacity"); 17 list = new ArrayList(); t.reset(); list.ensureCapacity(N); for(int i =1;i <=N;i++) list.add(obj); t.print("with ensurecapacity"); } } Calling ensureCapacity () means that ArrayList will not have to keep growing the internal structures as list elements are added. Of course, if you call ensureCapacity () when you don’t really need it, you may end up wasting a lot of space. The time taken for the first case is 681ms where as second is taken 160ms. Elements Array List 100000 1000000 5000000 80 671 2844 ArrayList capacity 20 180 771 with initial Optimization 12: ArrayList vs. LinkedList The Java collection framework provides two classes for handling lists of data items, ArrayList and LinkedList. The first of these is conceptually like an array, the second like a linked data structure. An ArrayList is implemented using an internal array of Object[], while a LinkedList uses a series of internal records linked together. These two classes have very different performance characteristics, as illustrated by a couple of examples. The first deals with inserting new elements at position 0 in a list: import java.util.*; public class lib_list1 { public static void main(String args[]) { final int N = 25000; // ArrayList ArrayList al = new ArrayList(); Timer t = new Timer(); for(int i =1;i <=N;i++) al.add(0, new Integer(i)); t.print("arraylist"); // LinkedList LinkedList ll = new LinkedList(); t.reset(); for(int i =1;i <=N;i++) ll.add(0, new Integer(i)); t.print("linkedlist"); } 18 } In this example the times are as follows: Array List 3115ms Linked List 50ms Inserting elements at the beginning of an ArrayList requires that all-existing element to be pushed down. But inserting at the beginning of LinkedList is cheap, because the elements of the structure are connected with each other via links, an it’s easy to create a new element and link it with the current element at the head of the list. The second example does random lookup of elements already in a structure. import java.util.*; public class lib_list2 { public static void main(String args[]) { final int N = 25000; Object o; // ArrayList ArrayList al = new ArrayList(); for(int i =0;i <N;i++) al.add(new Integer(i)); Timer t = new Timer(); for(int i =0;i <N;i++) o = al.get(i); t.print("arraylist"); // LinkedList LinkedList ll = new LinkedList(); for(int i =0;i <N;i++) ll.add(new Integer(i)); t.reset(); for(int i =0;i <N;i++) o = ll.get(i); t.print("linkedlist"); } } The running times here are : ArrayList LinkedList 3155ms 8492ms 5. Compilation and Run time Optimization 5.1 Optimization performed when using the –O option 19 The only standard compile-time option that can improve the performance with the JDK compiler is the –O option. Note that –O is a common option for the compilers, and further optimizing options for other compilers often take the –O1,-O2 etc. You should always check your compiler’s documentation to find what other options are available and what they do. Some compilers allow you to make the choice between optimizing the compiled code for speed or minimizing the size; there is often a tradeoff between these two aspects. The standard –O option does not currently apply a variety of optimizations in the Sun JDK (up to JDK1.2). Currently the option makes the compiler eliminate optional tables in the .class files, such as line number and local variable tables; this gives a small performance improvement by making class files smaller and therefore quicker to load. You should definitely use this option if your class files are sent across a network. But the main performance improvement of using the –O option comes from the compiler inline method. When using the –O option the compiler considers inlining methods defined with any of the following modifiers : Final, Static or Private. Choosing simple methods to inline does have a rationale behind it. The larger the method being inlined, the more the code gets bloated with copies of the same code being inserted in many places. This has runtime costs in extra code being loaded and extra space taken by the run time system. A JIT VM would also have the extra cost of having to compile more code. At some point, there is decrease in performance from inlining too much code. The compiler applies its methodology for selecting methods to inline, irrespective of whether the target method is in a bottleneck. A performance tuner applying inlining works the other way around. First finding the bottlenecks, then selectively inlining methods inside bottlenecks. This later stage can result in good speedups, especially in loop bottlenecks. This is because a loop can be speeded up significantly by removing the overhead of a repeated method call. 5.2 Performance Effects From Runtime Options Some run time options can help your application to run faster. These include:  Options that allow the VM to have a bigger footprint ( -Xmx/ -mx).  -noverify, which eliminates the overhead of verifying classes at classload time. Heap The Java Virtual Machine has a heap that is shared among all threads. The heap is the runtime data area from which memory for all class instances and arrays is allocated. The Java heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (typically a garbage collector); objects are never explicitly de-allocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor's system requirements. The Java heap may be of a fixed size, or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the Java heap does not need to be contiguous. 20 A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the heap, as well as, if the heap can be dynamically expanded or contracted, control over the maximum and minimum heap size. The following exceptional condition is associated with the Java heap:  If a computation requires more Java heap than can be made available by the automatic storage management system, the Java Virtual Machine throws an OutOfMemoryError. Sun's JDK 1.0.2 implementation of the Java Virtual Machine dynamically expands its Java heap as required by the computation, but never contracts its heap. Its initial and maximum sizes may be specified on virtual machine start-up using the "-ms" and "mx" flags, respectively. Increasing the maximum heap size beyond the default of 16MB usually improves performance for applications that can use the extra space. However, there is a tradeoff in higher space-management costs to the VM and at some point there is no longer any benefit in increasing the maximum heap size. Increasing the heap size actually causes the garbage collection to take longer, as it needs to examine more objects and a larger space. We have found that no better method than trial and error to determine optimal maximum heap sizes for any particular applications. 21 Appendix A There are a variety of Java Performance analysis tools available today. One that comes with JDK 1.2.2 is invoked by saying: $ java –Xrunhprof prog The default profile output gained from executing with –Xhrunprof in Java 2 is not useful for method profiling. The default output generates objects-creation statistics from the heap as the dump. By default dump occurs when an application terminates; you can modify the dump time by typing Ctrl-Break on win32. To get a useful method profile, you need to modify the profiler options to specify method profiling. A typical call to achieve this is: $ java –Xrunhprof:cpu=times prog with the results written into a file java.hprof.txt. This tool provides information on the time spent on each method in application. See Appendix A for more details about this tool. For example if you run this tool on a class string_opt.class. The following output writes into the java.hprof.txt file. CPU TIME (ms) BEGIN (total = 420) Tue Dec 26 20:58:49 2000 rank self accum count trace method 1 9.52% 9.52% 213 13 java.util.jar.Attributes.read 2 7.14% 16.67% 7860 18 java.util.jar.Attributes$Name.isValid 3 7.14% 23.81% 637 8 java.lang.String.toLowerCase 4 4.76% 28.57% 637 26 java.util.jar.Attributes.putValue 5 4.76% 33.33% 8508 16 java.lang.Character.toLowerCase 6 4.76% 38.10% 8146 7 java.util.jar.Attributes$Name.isAlpha 7 4.76% 42.86% 1 3 java.util.jar.Manifest.read 8 2.38% 45.24% 2 21 java.lang.ClassLoader.initializePath 9 2.38% 47.62% 5 30 java.lang.String.endsWith 10 2.38% 50.00% 638 23 java.lang.String.equals 11 2.38% 52.38% 2 4 java.util.jar.JarFile.getEntry 12 2.38% 54.76% 637 14 java.util.jar.Attributes$Name.isValid 13 2.38% 57.14% 1 25 sun.net.www.URLConnection.<init> 14 2.38% 59.52% 17 10 java.lang.String.intern 15 2.38% 61.90% 848 32 java.util.jar.Manifest.toLower 16 2.38% 64.29% 2 24 java.io.Win32FileSystem.normalize 17 2.38% 66.67% 213 17 java.util.AbstractMap.<init> 18 2.38% 69.05% 4 9 java.io.Win32FileSystem.normalize 19 2.38% 71.43% 1 31 io.ByteToCharISO8859_1.convert 22 20 2.38% 73.81% 21 2.38% 76.19% 22 2.38% 78.57% 23 2.38% 80.95% 24 2.38% 83.33% 25 2.38% 85.71% 26 2.38% 88.10% 27 2.38% 90.48% 28 2.38% 92.86% 29 2.38% 95.24% 30 2.38% 97.62% 31 2.38% 100.00% CPU TIME (ms) END 1 212 1 401 638 638 854 637 44 1 637 637 2 11 12 28 22 5 20 15 6 27 29 19 com.sun.rsajca.Provider.<init> java.util.jar.Manifest.parseName java.security.AccessController.doPrivileged java.lang.StringBuffer.append java.lang.String.<init> java.lang.System.arraycopy java.lang.System.arraycopy java.util.jar.Attributes$Name.hashCode java.security.Provider.put java.security.Security.<clinit> java.util.HashMap.put java.lang.String.toLowerCase Where Rank: Simply counts the entries in the table, starting with 1 at the top, and incrementing by 1 for each entry. Self: The self-field is usually interpreted as a percentage of the total running time spent in this method. Accum: This field is running additive total of the self field percentages as you go down the table. Count: This field indicates how many times the unique stack trace that gave rise this entry was sampled while the program ran. Trace : This field shows the unique trace identifier from the second section of the profile output that generated entry. This trace is recorded only once in the second section no matter how many times the it is sampled; the number of times that this trace has been sampled is listed in the count field. Method: This field shows the method name from the top line of the stack trace referred to from the trace field, i.e the method that was running when the stack was sampled. Example : rank self 1 11.55% accum 11.55% count 18382 trace 545 method java/lang/*.dtoa This example show that the stack trace 545 occurred in 18,382 of the sample stack traces, this is 11.55% of the total number of stack trace samples made. This method indicates that this method was probably executing for about 11.55 % of the application execution time, because the samples are at regular time intervals. 23

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Elements - IndiaStudyChannel.com