留言板

LR 6.2 CE GA6 not responding after 50 hours with no errors logs

thumbnail
Vishal Panchal,修改在7 年前。

LR 6.2 CE GA6 not responding after 50 hours with no errors logs

Expert 帖子: 289 加入日期: 12-5-20 最近的帖子
Dear All,

I am using Jboss EAP 7.2 with Liferay 6.2 GA6 on HP unix 64-bit machine. Server is running fine for around ~50 hours and suddenly becomes unresponsive. I did checked jboss server and Liferay logs but nothing I found form it.

After some investigation from GC log analysis I came to conclusion that my Old Generation is getting full. During server up time GC is not freeing up much amount of memory.

Once the Old Gen mem is full server starts Full GC repetitively. That results in long running GC threads and server becomes completely unresponsive.
Further I checked database was running properly normally and when server becomes unresponsive. I am still analyzing heap dumps and will share the results soon.

From below link I have applied -XX:+UseCompressedOops, but that did not solved, in fact no behavior changed.
http://stackoverflow.com/questions/7513185/what-are-reservedcodecachesize-and-initialcodecachesize

Below are the current JVM memory tuning I did.

JAVA_OPTS="-server -XX:-RewriteBytecodes -d64 -Xss2m -Xms4096m -Xmx8192m -XXemoticonermSize=500m -XX:MaxPermSize=1024m -XX:ReservedCodeCacheSize=512m -Dfile.encoding=UTF-8 -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.server.default.config=standalone.xml -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:NewSize=700m -XX:MaxNewSize=700m -XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XXemoticonarallelGCThreads=20 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+UseCompressedOops -XX:+DisableExplicitGC"

Anyone can point me any further changes or what the exact issue is?

Find GC log analysis and heap status attached.
PFA :
GC analysis.JPG
heap status.JPG

Edit :
HeapDump Result-Size.JPG : Sorted by Size
HeapDump Result.JPG : Sorted by Instances[%]
Old mem generation activity.JPG


Thanks in advance,
Vishal
thumbnail
Olaf Kock,修改在7 年前。

RE: LR 6.2 CE GA6 not responding after 50 hours with no errors logs (答复)

Liferay Legend 帖子: 6396 加入日期: 08-9-23 最近的帖子
Vishal Panchal:
After some investigation from GC log analysis I came to conclusion that my Old Generation is getting full. During server up time GC is not freeing up much amount of memory.


Hard to say - you either have a memory leak somewhere (might be in Liferay code, but unlikely) or configured the cache to be allowed to grow bigger than your main memory.

A quick way to check what happens is to just up the Old Memory allowance and see if the memory consumption raises indefinitely or just attracts to a slightly higher value than you currently allow. If 1M more memory solves your problem once and for all, it might be cache size or just adoption to the services you offer. If twice the memory doesn't fix it (but just delay) it might be a memory leak.
thumbnail
Vishal Panchal,修改在7 年前。

RE: LR 6.2 CE GA6 not responding after 50 hours with no errors logs

Expert 帖子: 289 加入日期: 12-5-20 最近的帖子
Dear Olaf,

Thank you very much for your quick response and suggesting ways to narrow down the issue. emoticon

I will run few tests from my end with the ways you suggested, and will come up with the results.

Thanks Again,
Vishal Panchal
thumbnail
Vishal Panchal,修改在7 年前。

RE: LR 6.2 CE GA6 not responding after 50 hours with no errors logs

Expert 帖子: 289 加入日期: 12-5-20 最近的帖子
Dear Olaf,

I have tried couple of ways further.

When I increased the heap memory to 12g from 8g, the server remains up for few more hours and then again the same issue I've faced.

I have looked for memory leak possibilities. Mostly we are using web-content display, document library and asset publisher, and having hardly 3-4 custom portlets with very less functionality. I checked using eclipse Memory Analyzing tool and JVisualVm. I did not found any such memory leak areas. Further from the heap dump I found that java.util.HashMap$Entry is having more instances and occupied heap size too. We are not having any use of HashMap objects in custom portlets.

By keeping in mind above thing I further disabled caching and reducing few Map generation parameters as mentioned in attached portal-ext.properties.txt file.

After above changes the problem still persists.

The strange thing is there are no error logs in log file and database is also working fine at the time when server becomes unresponsive.
We are having Data Folder with 15gb of size which mostly contains lots of .pdf files.

I would be grateful if anyone can suggest me any further points or suspects.

Thanks in advance!
- Vishal Panchal
thumbnail
Olaf Kock,修改在7 年前。

RE: LR 6.2 CE GA6 not responding after 50 hours with no errors logs

Liferay Legend 帖子: 6396 加入日期: 08-9-23 最近的帖子
Vishal Panchal:
I have looked for memory leak possibilities. Mostly we are using web-content display, document library and asset publisher, and having hardly 3-4 custom portlets with very less functionality. I checked using eclipse Memory Analyzing tool and JVisualVm. I did not found any such memory leak areas. Further from the heap dump I found that java.util.HashMap$Entry is having more instances and occupied heap size too. We are not having any use of HashMap objects in custom portlets.


Even if you don't use HashMap directly, you might store data in the session (which will end up in a HashMap). It's inherently hard to debug this kind of problem even when in front of the machine, but particularly when just answering to forum post. The possibility for you to have a memory leak is quite high - and if you're using just basic Liferay functionality, there's a good chance that the memory leak is occuring in your custom portlets - if only because they haven't been widely tested (otherwise many Liferay users would have reported basic-usecase-memory-leaks already).

Is your custom code available somewhere that you could point to?
thumbnail
Vishal Panchal,修改在7 年前。

RE: LR 6.2 CE GA6 not responding after 50 hours with no errors logs

Expert 帖子: 289 加入日期: 12-5-20 最近的帖子
Dear Olaf,

Thanks again for further pointouts.

I have a good news that Liferay is stable since last almost 7 days. It is up and running without a single restart.

I changed maxclients and max threads configurations in web and application servers. Also doubled DB connection pool size. [it was low as we were having high number of threads]. Disabled schedulers and also few other increment counters in portal-ext.properties file. Full GC is happened only once right after server start.

Many thanks for your support. emoticon
- Vishal Panchal
Marco Castro,修改在4 年前。

RE: LR 6.2 CE GA6 not responding after 50 hours with no errors logs

New Member 发布: 1 加入日期: 18-2-26 最近的帖子
Hi Vishal,

I know this post is a couple of years old but I am running into the same issue with our server.  I was hoping you could share the settings you ended up using that solved your issue.  I would like to compare them to what we have on our side to get some clues about what we need to change to fix our issue.

Thank you,
Marco