Tag Archives: performance

For next following posts, I will write down some tips when coding. These are all summaries while I practice on LeetCode. I know there are so many LeetCode answers of questions online, but that is a just question-to-answer model. If we just read the question and then answer it, we never grow up fast. We need to know why others’ algorithm is faster than mine and what’s the differences between them; in future, how can we apply others’ good places into our own. So, let’s start!

Example-1:

We all know Map (key-value pair) is a good data structure to optimize algorithm speed sometimes. Because Map’s get speed is O(1) with well-designed hash function. But if we don’t write it well, it still might lose its advantage.

Better Version:

For small volume of data, this kind of advantage might be not too obvious, but for big data, the second version is much better than the first one.

It is quite easy to understand its improvement. For the first version, it has “addAll” to insert the whole list to another one. From physical side, we know it needs to handle a lot of things, such as allocating space. In fact, ArrayList.addAll already did some optimization, like allocating enough space to add every single element in one step, rather than having to reallocate multiple times if adding large numbers of elements. To avoid these things, the second version uses an empty data (” new ArrayList<>()” ) to well get rid of space allocation issue.

Example-2:

The Question requests to design a set data structure that supports all operations in average O(1) time.

We see the great difference between the two version is that we introduce a Map to store each value’s index; for remove function, we swap the target with the last element of the list and then finally delete the last one.

The reason we improve like this is to reach O(1) purpose. ArrayList’s indexOf and remove method are opposite with this target.

remove(int)
This method removes a single element at given position. After that all elements from the right are shifted to the left via System.arraycopy.all, so this method has O(n) complexity. Each call to remove the last element would not invoke System.arraycopy all, so such method call complexity would be O(1).

remove(Object)
This method removes the first occurrence of a given element from the list. It iterates all list elements, so it has O(n) complexity. This method would access all array elements in any case – either read them while looking for requested element or move them from one position to the left by System.arraycopy call after requested element was found.
Never call remove(Object) when you know at which position you can find your element.

contains(Object), indexOf(Object)
The first method checks a given object whether it’s present in the list (and defined as indexOf(elem)>=0). The second method tries to find the position of given element in the list. Both methods have O(n) complexity because they are scanning an internal array from the beginning in order to find given element.

Summary of List:

add elements to the end of the list

remove elements from the end

avoid contains, indexOf and remove(Object) methods

even more avoid removeAll and retainAll methods

use subList(int, int).clear() idiom to quickly clean a part of the list

Last week, we normally deal with lots of things related to performance. But we didn’t dig into it and check why the problem happened. So I decide to write a post to explain JVM Memory Management. I know there are many blogs which are talking about it. It is not a new topic, but I will write down it from my own understanding and how I use some commands to prove this knowledge.

1. Check Memory Usage

Before we go to understand what is garbage collection, what is young generation, etc. First, we go to see our application’s memory usage status. You can use htop to see all thread’s memory usage. For Java/Scala Application, you have more choices.

S0U/S1U: the current usage of the Survivor0 and Survivor1 areas in KB. Notice that one of the survivor areas are empty all the time. (See “Young Generation” to know reason)

EC/EU: the current size and usage of Eden space in KB. Note that EU size is increasing and as soon as it crosses the EC, Minor GC is called and EU size is decreased.

OC/OU: the current size and current usage of Old generation in KB.

PC/PU: the current size and current usage of Perm Gen in KB.

YGC/YGCT: YGC displays the number of GC event occurred in young generation. YGCT displays the accumulated time for GC (unit is second) operations for Young generation. Notice that both of them are increasing in the same row where EU value is dropped because of minor GC. (See “Young Generation” to know reason)

FGC/FGCT: FGC displays the number of Full GC event occurred. FGCT displays the accumulated time for Full GC operations. Notice that Full GC time is too high when compared to young generation GC timing.

GCT: total accumulated time for GC operations. Notice that it is sum of YGCT and FGCT values.

3. How to set JVM parameters

Here we explain why you see S0C, S1C, EC, OC value is like above. There are multiple parameters which can set these values by VM Switch

-Xms: For setting the initial heap size when JVM starts

-Xmx: For setting the maximum heap size

-Xmn: For setting the size of the Young Generation, rest of the space goes for Old Generation

-XX:PermGen: For setting the initial size of the Permanent Generation memory

-XX:MaxPermGen: For setting the maximum size of Perm Gen

-XX:SurvivorRatio: For providing a ratio of Eden space and Survivor Space, for example, if Young Generation size is 10m and VM switch is -XX:SurvivorRatio=2 then 5m will be reserved for Eden Space and 2.5m each for both Survivor spaces. The default value is 8

-XX:NewRatio: For providing a ratio of old/new generation sizes. The default value is 2

4. JVM Memory Usage

The primary use of memory is in the heap and outside of the heap memory is also consumed in Metaspace, and the stack.

(1) Java Heap

The heap is where your class instantiations or objects are stored. Instance variables are stored in Objects. When discussing Java memory and optimization we most often discuss the heap because we have the most control over it and it is where Garbage Collection and GC optimizations take place. Heap size is controlled by the -Xms and -Xmx JVM flags.

(2) Java Stack

Each thread has its own call stack. The stack stored primitive local variables and object references along with the call stack (method invocations) itself. The stack is cleaned up as stack frames move out of context so there is no GC performed here. The -Xss JVM option controls how much memory gets allocated for each thread’s stack.

(3) Metaspace

Metaspace stores the class definitions of your objects. The size of Metaspace is controlled by setting -XX:MetaspaceSize.

(4) Additional JVM

In addition to the above values, there is some memory consumed by the JVM itself. This holds the C libraries for the JVM and some C memory allocation overhead that it takes to run the rest of the memory pools above. This type of memory can be affected by Tuning glibc Memory Behavior.

5. JVM Memory Model

Until now, we already know the status of our application. But we still don’t know what is Eden, What is Survivor, etc. Here we talk about how does JVM organizes memory. And then finally, we will better understand how to optimize it. I suggest when we read this part, we’d better go back to part2 and part3 to map each concept to real data output. This would be better.

There are five JVM Memory Models:

Eden

S0

S1

Old Memory

Perm

Eden + S0 + S1 === Young Gen (-Xmn)

Eden + S0 + S1 + Old Memory === JVM Heap (-Xms -Xmx)

JVM Heap memory is physically divided into two parts-Young Generation and Old Generation.

(1) Young Generation

Young generation is the place where all the new objects are created. When young generation is filled, garbage collection is performed. This garbage collection is called Minor GC. Young Generation is divided into three parts-Eden Memory and two Survivor Memory spaces.

Most of the newly created objects are located in the Eden Memory space. All new allocation happens in Eden. It only costs a pointer bump.

When Eden space is filled with objects, Minor GC is performed and all the survivor objects are moved to one of the survivor spaces. When Eden fills up, stop-the-world copy-collection into the survivor space. Dead objects cost zero to collect.

Minor GC also checks the survivor objects and move them to the other survivor space. So at a time, one of the survivor space is always empty.

Objects that are survived after many cycles of GC, are moved to the old generation memory space. Usually it’s done by setting a threshold for the age of the young generation objects before they become eligible to promote to Old generation.

Since Young Generation keeps short-lived objects, Minor GC is very fast and the application doesn’t get affected by this.

(2) Old Generation

Old Generation memory contains the objects that are long lived and survived after many rounds of Minor GC. Usually garbage collection is performed in Old Generation memory when it is full. Old Generation Garbage Collection is called Major GC and usually takes longer time.

Major GC takes longer time because it checks all the live objects. Major GC should be minimized because it will make your application unresponsive for the garbage collection duration.

6. Garbage Collection

All the Garbage Collections are “Stop the world” events because all application threads are stopped until the operation completes.

One of the best feature of java programming language is the automatic garbage collection. There are many JVM switch to enable the garbage collection strategy for the application: (I will not explain each) Serial GC (-XX:+UseSerialGC), Parallel GC(-XX:+UseParallelGC), Parallel Old GC(-XX:+UseParallelOldGC), Concurrent Mark Sweep(CMS) Collector (-XX:+UseConcMarkSweepGC) and G1 Garbage Collector( -XX:+UseG1GC).

7. How to optimize JVM parameters

We talk about so much, it looks like JVM already has automatic garbage collection, so we don’t need to do anything. In fact, there are still some tunings we can do.

(1) java.lang.OutOfMemoryError: PermGen

increase the Perm Gen memory space using -XX:PermGen and -XX:MaxPermGen

(2) a lot of Full GC operations

increase Old generation Memory space.

(3) java.lang.StackOverflowError

increase stack size by -Xss

(4) Good Practices

set the minimum -Xms and maximum -Xmx heap sizes to the same value

-Xmn value should be lower than the -Xmx value.

older generation is the value of -Xmx minus the -Xmn. Generally, you don’t want the Eden to be too big or it will take long for the GC to look through it for space that can be reclaimed.

keep the Eden size between one fourth and one third the maximum heap size. The old generation must be larger than the new generation.

To summary, there is no universal solution to fix all. When we meet problems, we need to use tool to find root and dig into it and then fix it.

It is a long way on this road. When you finish one feature, it is just a start. The rest thing is to make it better and better. You will need to care more about performance. Servals things which I experience here I write down.

Deployment method

Don’t use start or run to deploy your production. It is totally wrong. You need to use:

activator clean; activator dist

to package your project and then unzip it which is under

unzip #your_project_path/target/universal/

and then cd to this folder. you will find under /bin/ there is script which can run. My normal jvm parmaters like this: (Please note here my project uses Nginx as web server and New Relic to collect application performance info. You need to modify jvm parameters according to your project. )

Third-party Tool

Command Tool

When you really meet problem, New Relic just tells you that your app is not normal. It is not enough to solve the problem. So the rest thing for you is to dig out the real cause. Here are some useful commands which will help you a lot.

htop This is more detailed info than New Relic. You can do some actions on your application and monitory by htop to see whether these actions will cause performance changes.

jcmd This will list all jvm applications and you will get each PID. Or jps -l

jcmd <PID> help This will list commands to tell you how to use it to get which info you want.

The following commands are available:

VM.native_memory

VM.commercial_features

GC.rotate_log

ManagementAgent.stop

ManagementAgent.start_local

ManagementAgent.start

Thread.print

GC.class_histogram

GC.heap_dump

GC.run_finalization

GC.runVM.uptime

VM.flags

VM.system_properties

VM.command_line

VM.version

GC.class_stats : print the name of the ClassLoader

jcmd <PID> ### Here ### is getting from the list in above command’s result. I care about cpu usage, so normally I use Thread.print to check which threads are using.

jstack This directly outputs running threads. You also can output to local file by jstack > #YourFile

Problems which I met

When you already find root cause, the last thing is to solve it.

Today my problem is that I use Akka to do some tasks, but each time scheduler finishes task and doesn’t shutdown its ActorSytem. The more tasks it does, the more threads it opens. Finally my cpu usage reaches 100% and then the project doesn’t have any response.(All of them are analyzed by above commands) I use above commands to find which actors are running and shutdown them when they finish. Now my project’s cpu usage is only 1.0% or so.

Everything seems perfect. I also though my code would not meet performance in future. But nothing is perfect enough. So I come back to update this post to add more problems which I met. Today problem appears when I migrate application from old single-cpu server to new 8-cores server. The cpu usage for my application in old single-cpu server is only 0.5% every day. But for new high configuration cpu it is almost 400% (because it is 8 cores, it is equal to use up half of cpu). I use the method in above: and find root cause is HashMap.
The easy method is to change hashmap to ConcurrentHashMap. After this modification, cpu usage goes to normal, 0.5%.

ConcurrentHashMap can solve the HashMap’s thread 100% cpu issue. HashMap doesn’t support multiple threads. Detailed info please read here: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6423457ConcurrentHashMap concurrency is the size of segment, by default it is 16. This means at most there are 16 threads operates ConcurrentHashMap. This is great benefit for ConcurrentHashMap, not for HashTable.

My performance issue continues, now it is hard to find the root cause. I try to read more to understand them better. But it is really hard. This time, my memory is always increasing triggering by some actions which are hard to find role. To be honest, I find some hit in Thread.print which mentions lots of time of “ForkJoinPool”. So I go back to my code to check which thread will be created by forkjoinpool and how does forkjoinpool manage memory. Here I read these following words. In fact, in my old code, i used scala standard library, it caused increasing memory problem. When i change it to play.api.libs, thing becomes better.

In Play 2.3.x and prior, play.core.Execution.Implicits.internalContext is a ForkJoinPool with fixed constraints on size, used internally by Play. You should never use it for your application code. From the docs:
Play Interval Thread Pool - This is used internally by Play. No application code should ever be executed by a thread in this thread pool, and no blocking should ever be done in this thread pool. Its size can be configured by setting internal-threadpool-size in application.conf, and it defaults to the number of available processors.
Instead, you would use play.api.libs.concurrent.Execution.Implicits.defaultContext , which uses an ActorSytem.
In 2.4.x, they both use the same ActorSystem. This means that Akka will distribute work among its own pool of threads, but in a way this is invisible to you (other than configuration). Several Akka actors can share the same thread.
scala.concurrent.ExecutionContext.Implicits.global is an ExecutionContext defined in the Scala standard library. It is a special ForkJoinPool that using the blocking method to handle potentially blocking code in order to spawn new threads in the pool. You really shouldn't use this in Play application, as Play will have no control over it. It also has the poential to spawn a lot of threads and use a ton of memory, if you're not careful.

To be honest, there is no general method to optimize performance. The more experiences you have, the less errors you create. When you meet performance issue, don’t be afraid and pay attention on finding root cause step by step. Only when you know what’s the cause, you can find right solution to fix it. Before that, any guess or others’ method is only useless.