Saturday, May 19, 2012

Top 20 Programming tips for Performance Improvement (Java/C#)


Many architects and developer are concerned about the performance of their application, but it is quite easy to write reliable and high-performance applications. This article show twenty short aspects that have to be concerned designing and written a managed language like Java/C# . The twenty points are not ordered by priority.
Note 
This article was originally targeted for JAVA language but all the thoughts presented here is applicable to C# as both share same school of thought (Managed Memory Model)
1. Avoid producing a lot of garbage
The Java/C# memory management is quite simple. The memory can be allocated in a program and when that memory is no longer referenced, it is garbage collected. You don’t have to care about these aspects. Theoretically, the garbage collector is executed only when the system load is low, and it is not supposed to affect an application. In real life a heavy used application can become very slow and the run of the garbage collector has an impact.

Always try to avoid generating garbage, because this will reduce the need to run a garbage collection. Just think about three things:
  1. Reuse existing objects
  2. Avoid creating unnecessary objects
  3. Create object pools (Use Object Pool only when object creation is very expensive like threads or data connection, )[Refer Comment Below]
2. Avoid using finalizers when possible
Whenever an object finalizer needs to be executed, it gets placed into a finalization queue. On most systems, this queue will be run only when a first level garbage collection fails. First level garbage collection only fails when room for a new object cannot be found in the heap or the overall free heap space drops below 25%. This behavior causes all the resources associated with that object to be retained until that time, which can be many garbage collection cycles later. This time delay between queuing a finalizer and actually running the finalizer can be considerable, causing potential shortages of system resources. Not using a finalizer on the object will avoid this delay.

3. Recycle and/ or cache objects
The cost of object creation and garbage collection is quite high. If existing objects can be reused, then savings in memory and runtime performance can be realized. Look for opportunities in writing code where existing objects can be reused and recycled instead of creating new ones. Cache objects in frequently used methods so that they will persist between calls, but make sure that the cached object’s state is set properly before subsequent use. For example, if you cached a date object, then prior to using it again, ensure that it is set to the proper date. Some objects are not as straightforward as a date object, so use care.
4. Variables
In contrast to a compiled language such as C++, application performance in Java is noticeably affected by what types of variables are accessed and how they are accessed. For example, while stack variables are directly addressable (and may even be placed in registers), instance variables typically require an extra level of indirection to be accessed.

This implies the potential value of data location shifting, changing the storage location of data based on the access patterns. For example, a data-intensive operation would benefit from first copying instance variables into stack variables, operating on the stack variables, and, finally, copying the stack variables back to the permanent instance variables.

This technique is particularly useful when a method variable is accessed repeatedly within a loop. For example, the common loop construct:

for (int i = 0; ++ i <= limit; )
can be improved by 25 percent (5 percent with a JIT compiler) by rewriting it as:

for (int i = limit; -- i >= 0; )
to reduce the number of accesses to the limit variable.

5. Reduce class hierarchy
In general, object creation gets slower as the class hierarchy grows deeper. In addition, a class hierarchy with a longer depth can causes longer load time of the applet because additional classes must be transferred across the network. You should avoid, when possible, specializing for minor variations that could otherwise be represented by a state variable. However, this must be done with care as it might sacrifice the object-oriented design of the application.

6. Avoid explicitly calling the Garbage Collector
Invoking garbage collection when responsiveness is expected, (e.g. when processing a mouse button event) can slow the program down at a time when the user is expecting fast processing. In most circumstances, invoking System.gc()explicitly will not be needed. Invoking System.gc() will not run the Garbage Collector at once, it just says that the Garbage Collector will run next time the system has time to run the Garbage Collector.

7. Avoid synchronization
The JVM uses class locks while resolving class references. Class references are resolved on an as used basis (versus resolving at class load time) so it is not easy to predict when the JVM will want a class lock. So, do not synchronize on a class.

An alternate approach would be to create a special purpose lock object instead. For example:

private static final Object lock = new Object();

However, synchronization activity can consume significant system resources and affect Java performance. This may be true even if there is only one thread being executed, due to constant checking of lock availability. As a result, synchronization should be reduced as much as possible.

8. Use Lazy Evaluation
Any large application or general purpose framework is likely to have a relatively significant amount of static data around behind the scenes. One way to initialize such data is to use static class initialization blocks. This mechanism is simple and supported by the Java language. However, it can increase an application’s load time, since all such initialization is done before the application starts, even if it is not needed until hours later (or perhaps never needed during a particular run of the application.)

Lazy evaluation is a reliable and much used mechanism for deferring initialization of static data until it is actually needed. So any data never used is never created, and any data not actually needed to bootstrap a subsystem does not place a startup time burden on its client applications.

In object oriented languages, lazy evaluation is quite easy to implement. You merely make your data private and provide a public getter method (something you would want to do anyway.) Since any access to the data is through the getter, it can allocate the object upon demand. This does imply the need for some synchronization, i.e. you must make sure that two threads don’t simultaneously try to create the static object. This is easily done through a synchronized code block that does the actual object creation.

The minimum synchronization overhead can be obtained by first checking the static reference for null and only entering the synchronized block if it is still null. A subsequent check is still required after entering the block to insure that another thread has not already created the object. But, once the object actually gets created, the only performance overhead imposed is that of a null reference check on each subsequent entry to the getter.

To do the synchronization, the class’ class member is used since it’s a static object that is already available to us. This avoids a bootstrapping issue of having to synchronize in order to create an object in order to synchronize the creation of an object, etc...

Another aspect of lazy evaluation involves complex, usually tabbed, GUI objects. Where possible, only immediately create the components that are initially visible to the user then create the others either in the background or upon their coming into view. This strategy will increase the perceived performance of the application by displaying the relevant information as soon as possible, and will often save on memory overhead if the user never even accesses the other components.

9. Optimized classes
Well written Java applications can suffer performance degradation when inappropriate classes are used. For instance, standard Java class libraries rely on general purpose interfaces to hide underlying implementation details. While modular, generic implementations may limit overall library performance.

An example of this is the interaction between the ByteArrayOutputStream and DataOutputStream classes. ByteArrayOutputStream encapsulates the writing of bytes into a byte array. DataOuputStream serializes high level Java built-in types (int, long, float, String, etc.) to an object exposing the OutputStream interface. However, as it is used to support a variety of output media including files , TCP/IP connections, and memory buffers, the OutputStream interface only allows one byte of data to be output at a time (via a synchronized method). When the DataOuputStream is attached to ByteArrayOutputStream, performance is sub-optimal as data is only produced one byte at a time (through the OutputStream interface). To output each byte, the DataOuputStream incurs the overhead of calling a synchronized method, checking for array overflow, copying the output byte into the byte array, and incrementing the output byte counter.

To achieve optimal performance, an optimized class could be implemented to merge the functionality of the ByteArrayOutputStream and DataOutputStream classes. To place data into a byte array, the optimized class simply copies the data into the buffer, checking the array bounds and incrementing the byte counter only once for each data item output. For applications that do not require synchronized access to the stream, the methods in the optimized class can be asynchronized to further boost performance.

10. Use StringBuffer /StringBuilder
Since strings are immutable, any change to a string will create at least one more string object. This degrades performance and unnecessarily creates objects that will eventually need to be garbage collected. StringBuffers, however, are modifiable and can be used to avoid creating temporary String objects.

StringBuilder are more advance API written on  the top of StringBuffer can also be used for the same purpose [See comment below]

11. Explicitly close resources
When using the FileInputStream method, do not rely on the object finalizer to close the file for you. Explicitly close the file when you are done. The problem with allowing the finalizer to close the file is there is a potentially long delay before the finalizers are actually run. This will keep the operating system file handle in use until the time the finalizers do run potentially creating a situation where the pool of operating system file handles become exhausted. Explicitly closing the file will cause the file handle to be released immediately.

12. Limit number of threads

Every java thread created requires dedicated memory for its’ native stack frame. On many systems, the size of this native stack frame is controlled by the -ss Java command line parameter and is the same for every Java thread crea ted. The default stack size on some platforms is as much as 32 kilobytes. For an application that has 20 threads this represents 32KB*20 or 640KB. Limiting the number of threads in the application will help reduce the system memory requirements. It may also improve performance by giving more CPU time to threads doing real work.

13. JDBC Data Access
When using the ResultSet.getXXX methods in JDBC make sure that the get call you use is the of the same type as the way the data is stored in the database. So, if there are integers in the database, please use the ResultSet.getInt() method to read it from the result set.

14. Using JNI
When using JNI and you need to get a member of a primitive array object (from C to JAVA) then use the Get<PrimitiveType>ArrayRegion call instead of the Get<PrimitiveType>Array calls.

Group native operations to reduce the number of JNI calls.

Consider consolidating transactions or operations to minimize the number of JNI calls needed to accomplish a task. Reducing the number of times the JNI overhead needs to be paid will improve performance.

15. Avoid excessive writing to the Java console

Writing information to the java console takes time and resources, and should not be done unless necessary. Although helpful in debugging, writing to the console usually involves a great deal of string manipulations, text formatting, and output. These are typically slow operations. Even when the console is not displayed, performance can be adversely affected.

16. Data Types
Primitive types are faster than classes encapsulating types. Avoid the costs of object creation and manipulation by using primitive types for variables when prudent. Memory can be reduced and variable access times can be improved.

In the following example, the second declaration is smaller and quicker:

Currency {
public double amount;
}
double currency_amount;

Unlike C++, casting in Java is not done at compile time. Since there is a cost at run-time, avoid unnecessary recasting of variables.

Use int instead of long when possible on 32-bit systems.

long is 64-bit while int is 32-bit data type. 32-bit operations are executed faster than 64-bit on 32-bit systems. Example 1 took about half of the time of Example 2. It is worthwhile noting that Example 2 will run faster on systems with 64-bit addressing.

Use static final when creating constants. When data is invariant, declare it as static and final. By reducing the number of times variables need to be initialized and giving better optimization information to the JVM, performance can be improved.

17. When possible, declare methods as final
Declare methods as final whenever possible. Final methods can be handled better by the JVM, leading to improved performance. In this example, the second method will execute faster than the first one.

void doThing(){
for (int i=0; i<100; ++i){
dosomething;
}

}
final void doThingfinal(){
for (int i=0; i<100; ++i){
dosomething;
}
}

18. Arrays are faster than Vectors
By avoiding vectors when arrays will suffice, application performance can be improved. If you use a vector, remember that it is faster to add or delete items from the end of the vector.

19. Do not overuse instance variables
Performance can be improved by using local variables. The code in example 1 will execute faster than the code in Example 2.

Example1:

public void loop() {
     int j = 0;
     for ( int i = 0; i<250000;i++){
     j = j + 1;
     }
}

Example 2:

int i;
public void loop() {
    int j = 0;
    for (i = 0; i<250000;i++){
    j = j + 1;
  }
}

20. Know when to use immutable objects
There are many valid reasons to use immutable objects. "The main disadvantage of immutable objects can be performance, so it is important to know when and where to use them.

The Java API provides two very similar classes, one immutable and one mutable. String was designed to be immutable, and thus simpler and safer. StringBuffer was designed to be mutable, with tremendous advantages in performance. For example, look at the following code, where one uses string concatenation and the other uses the StringBuffer append method. (The doSomething() method is overloaded to take characters as either String or StringBuffer.)

Case 1

for (i = 0; i < source.length; ++i) {
doSomething(i + ": " + source[i]);
}

Case 2

StringBuffer temp = new StringBuffer();
for (i = 0; i < source.length; ++i) {
temp.setLength(0); // clear previous contents
temp.append(i).append(": ").append(source[i]);
doSomething(temp);
}

Behind your back, the compiler will optimize Case 1, but it still involves the creation of two objects per iteration. Case 2, on the other hand, avoids any object creation. Because of this, Case 2 can be over 1000% faster than Case 1 (depending on your JIT, of course).


  

7 comments:

  1. I disagree with (1).
    Object Pools rather worses the performance. This was valid when memory allocation and deallcation was expensive. Object Pools rather adds extra load on garbage collector as objects are slowly shifted to to survivor or tenured generation from young because they are not de-allocated early and ultimately may result in a Full GC. It is a well known fact that collecting lot of dead objects is much faster than having live objects and clearing them later. Also synchonizing Object Pools is a very tough job.

    ReplyDelete
    Replies
    1. Hi Purijatin Thanks for your comment ,

      You are right in come context, here are some more information

      1.Actually Object Pool Should only be used when Object creation is very expensive
      2.Example of such scenario is Thread Pool and Database Connection Pool.
      3.However it is true that implementing pool takes lot of effort and my advise will be to avoid custom implementation whenever possible and try to use built-in pools provided by frameworks (Thread Pool)

      Hope this clarify

      Delete
  2. @10:

    Of course it is better to prefer StringBuffer over String for mutual strings.
    But nowadays if you have a single threaded environment it is even better to use StringBuilder instead (no synchronization).

    http://littletutorials.com/2008/07/16/stringbuffer-vs-stringbuilder-performance-comparison/
    http://javapractices.com/topic/TopicAction.do;jsessionid=97AD2964E6A6B3E28CBFE78B1098F7E6?Id=4

    Kind regards,

    Tim

    ReplyDelete
    Replies
    1. Yes you are correct,
      In fact StringBuilder is actually written on the top of StringBuffer

      Delete
  3. This is a very informative article. The tips given here are truly helpful for mobile developers and other programmers. Thank you for sharing.

    ReplyDelete
  4. Number 19 is correct but I want to add another example.
    If the variable in example 2 is declared locally, then the algorithm will be faster than example 1.
    Maybe in this loop it is not obvious but for larger loops it is a lot faster.
    Example 3:

    public void loop() {
    int j = 0;
    int i;
    for (i = 0; i<250000;i++){
    j = j + 1;
    }
    }

    ReplyDelete
  5. Your blog has given me that thing which I never expect to get from all over the websites. Nice post guys!

    regards,

    Melbourne web developer

    ReplyDelete