Analyzing issues caused by OOM Killer

The purpose of this article is to explain how a kernel can invoke (out-of-memory) oom killer to abruptly kill a process which might be a java or node process of your application and what steps we can take to resolve such type of issues.

When OOM Killer is invoked by kernel?

In Linux based systems, every process has a oom score, you can find oom score for a process like below.

-bash-4.1$ ps -ef | grep /bin/java | grep -v grep | awk '{print $2}' | xargs -i cat /proc/{}/oom_score881-bash-4.1$ ps -ef | grep 'splunkd pid' | grep -v grep | awk '{print $2}' | xargs -i cat /proc/{}/oom_score
0
-bash-4.1$ ps -ef | grep /usr/sbin/sshd | grep -v grep | awk '{print $2}' | xargs -i cat /proc/{}/oom_score
0

As you can see /bin/java process has 881 oom score where as splunkd and sshd has 0 oom score.

If splunkd asked for more memory and if there is no enough virtual memory kernel invokes oom killer on any process which has high oom score in this case its /bin/java . Here /bin/java is the main backend application and splunkd is the splunk forwarder to forward the logs of /bin/java process.

kernel invoking oom killer on a process with high oom score

Where to see if oom killer killed your process?

Whenever oom killer is invoked, you can check system logs at /var/log/messages , you will similar logs like below

Oct  8 11:12:20 app-44276360-12-70800876 kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0Oct  8 11:12:20 app-44276360-12-70800876 kernel: Out of memory: Kill process 27364 (java) score 933 or sacrifice child

If you are using any java service wrapper like https://wrapper.tanukisoftware.com/doc/english/download.jsp

Then you might see below logs in wrapper logs

STATUS | wrapper  | 2017/10/08 11:12:20 | JVM received a signal SIGKILL (9).STATUS | wrapper  | 2017/10/08 11:12:20 | JVM process is gone.ERROR  | wrapper  | 2017/10/08 11:12:20 | JVM exited unexpectedly.STATUS | wrapper  | 2017/10/08 11:12:24 | Launching a JVM...

Note: If /bin/java is asking for more memory, and if no virtual memory is available then oom killer kills /bin/java process. Now the issue here is we will not be able to find any memory leaks happening at JVM level if it is killed by oom-killer.

How to resolve this?

Adjust oom score

One way to resolve this is by having low oom score for application process.

For example —

I used below java program, which requests memory in a loop.

DemoLimit.java

Similarly I used another C program, which also requests memory in a loop

demo3.c

Before adjusting oom score for java process. When demo3 (with same oom score) process asked memory, oom killed java process

➜  ~ java -Xmx4g DemoLimit > /dev/null &[1] 19668

Now if we look at the score of above process 19668 below are the oom scores

➜  ~ more /proc/19668/oom_adj0➜  ~ more /proc/19668/oom_score_adj0

Now run the C program

➜  ~ gcc demo3.c -o demo3.out
➜ ~ ./demo3.out > /dev/null &
[2] 19713

After some time oom killed 19668 pid of java.

➜  ~[1]  + 19668 killed     java -Xmx4g DemoLimit > /dev/null

if you check /var/log/messages you can see that demo3.out invoked oom-killer and kernel killed java process

Oct 17 22:54:52 kernel: [202078.985459] demo3.out invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, order=0, oom_score_adj=0Oct 17 22:54:52 kernel: [202078.985460] demo3.out cpuset=/ mems_allowed=0…..Oct 17 22:54:52 kernel: [202078.985642] Out of memory: Kill process 19668 (java) score 779 or sacrifice childOct 17 22:54:52 kernel: [202078.985653] Killed process 19668 (java) total-vm:4919756kB, anon-rss:3146852kB, file-rss:0kB, shmem-rss:0kB

After adjusting oom score for java process. When demo3 process asked memory, oom killed demo3

➜  ~ java -Xmx4g DemoLimit > /dev/null &[1] 19793➜  ~[1]  + 19793 suspended (tty input)  java -Xmx4g DemoLimit > /dev/null

Adjusted the score for above java process 19793 like below.

➜  ~ sudo echo -17 > /proc/19793/oom_adj➜  ~ sudo echo -1000 > /proc/19793/oom_score_adj➜  ~ more /proc/19793/oom_adj-17➜  ~ more /proc/19793/oom_score_adj-1000

Now run demo3.c program

➜  ~ ./demo3.out > /dev/null &[2] 19834

After some time you will see that 19834 got killed by oom killer.

➜  ~[2]    19834 killed     ./demo3.out > /dev/null

if you check /var/log/messages on demo3 overcommit, demo3 process was selected to be killed instead of java.

Oct 17 23:04:31 kernel: [202657.851735] demo3.out invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, order=0, oom_score_adj=0
Oct 17 23:04:31 kernel: [202657.851736] demo3.out cpuset=/ mems_allowed=0
……
Oct 17 23:04:31 kernel: [202657.851954] Out of memory: Kill process 19834 (demo3.out) score 93 or sacrifice child
Oct 17 23:04:31 kernel: [202657.851958] Killed process 19834 (demo3.out) total-vm:64397816kB, anon-rss:250256kB, file-rss:0kB, shmem-rss:0kB

System impact perspective we need to make sure oom scores are given based on priority — for example sshd oom score is -17, we can java oom score as -16 & splunk oom score as -15.

Can we adjust overcommit behavior of kernel?

Linux kernel provides a setting/proc/sys/vm/overcommit_memory which is by default 0 . When it is 0 on overcommit kill process with low oom score, when it is 1 always overcommit, 2 on overcommit more than swap + x% of physical RAM do not kill but give errors on memory allocation. You can refer kernel overcommit documentation — https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Let’s see in action by having different settings and figure out which would be best option.

Below is the default behavior —

➜  ~ more /proc/sys/vm/overcommit_memory0➜  ~ more /proc/sys/vm/overcommit_ratio50

Out of 3.2 GB available, If you observe below from 1 to 3 GB Xmx java died of itself by out of memory. When Xmx is 4GB, kernel was not able to allocate and killed java process.

➜  ~ java -Xmx1g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx2g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx3g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx4g Demo[1]    8956 killed     java -Xmx4g Demo

Changed the overcommit default settings like below

➜  ~ more /proc/sys/vm/overcommit_memory2➜  ~ more /proc/sys/vm/overcommit_ratio80

Java was not able to allocate more than 800m

➜  ~ java -Xmx800m Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx900m DemoError occurred during initialization of VMCould not reserve enough space for object heapCould not create the Java virtual machine.➜  ~ java -Xmx1g DemoError occurred during initialization of VMCould not reserve enough space for object heapCould not create the Java virtual machine.

Above settings did not help. How about adding swap space?

Follow below steps to create swap space.

## Be as root-bash-4.1$ sudo bash## check if don’t have any swapfile create before
# swapon -s
Filename Type Size Used Priority## check if fallocate is present# which fallocate/usr/bin/fallocate## create swap file. I kept 2G here# fallocate -l 2G /swapfile## Verify if swapfile got created# ls -lh /swapfile-rw-r--r-- 1 root root 2.0G Oct 17 18:29 /swapfile## Give permissions should be 600# chmod 600 /swapfile## Making sure permissions got applied# ls -lh /swapfile-rw------- 1 root root 2.0G Oct 17 18:29 /swapfile## Make sure we have mkswap# which mkswap/sbin/mkswap## create the swapfile# mkswap /swapfilemkswap: /swapfile: warning: don't erase bootbits sectorson whole disk. Use -f to force.Setting up swapspace version 1, size = 2097148 KiBno label, UUID=e6b2850d-47dc-4b0c-8832-c1ea9f177a07## enable swap# swapon /swapfile# swapon -sFilename Type Size Used Priority/swapfile file 2097144 0 -1

Now we have swap space, lets try below settings

➜  ~ free -mtotal        used        free      shared  buff/cache   availableMem:           3949         471        3342          18         134        3284Swap:          1897           0        1897

If we go with below overcommit settings with above swap space

➜  ~ more /proc/sys/vm/overcommit_memory2➜  ~ more /proc/sys/vm/overcommit_ratio80

With 3.2GB rss+ 1.8 swap, for 3GB JVM is not allocatable.

➜  ~ java -Xmx1g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx2g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx3g Demo
Error occurred during initialization of VM
Could not reserve enough space for object heapCould not create the Java virtual machine.

How about going with default overcommit settings with swap space.

➜  ~ more /proc/sys/vm/overcommit_memory0➜  ~ more /proc/sys/vm/overcommit_ratio50

With 3.2GB rss + 1.8 swap = 5 GB, regardless of what defined in Xmx, java died by itself.

➜  ~ java -Xmx1g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx4g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx8g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)➜  ~ java -Xmx100g Demojava.lang.OutOfMemoryError: Java heap spaceat Demo.main(Demo.java:6)

With default over commit settings and having swap space we avoided oom-killer for java process and now by having this option if java crashes, by using param like -XX:+HeapDumpOnOutOfMemoryError, it will dump heap, which can be used to further analyze.

Hope this gave some idea on how to debug when an application is getting killed abruptly by kernel issuing oom-killer.

Software Engineer, Learner, Developer, Prokopton