Analyzing GC Logs

Understanding GC activity in a high performant system is critical. Here I am describing some ways to understand GC activity via GC logs.

Here is a good article that describes G1GC. Here is another good article on G1GC.

Here is a good article on how to read the GC logs and JVM flags corresponding to enabling features and log lines. 

Some important parameters for jvm related to G1GC that I have found useful are to set the min and max memory to be same. This is done because jvm sizes G1GC regions based on min heap size at startup.

I have tried experimenting with all the G1GC tuning parameters in a jvm application running on oracle jdk 8 which does a lot of memory allocation on events and the memory is freed very quickly (in a second or less after allocation). Thus this application heavily uses eden space; what I found here is that its best to let jvm control the memory space for various segments and tune based on the parameter ‘-XX:MaxGCPauseMillis’ while setting it to a reasonably low number that doesn’t issue too frequent GC events. Jvm automatically sizes the memory segments based on whether the max gc pause time is being met or otherwise and since it dynamically sizes the segments based on memory usage it results in overall better execution times.

Another idea I have found useful in practice is to prevent memory pressure from building up in any node. This can be done by designing a service to stop pulling in more work from a queue by unregistering itself when the node’s resource utilization is high. And designing work processing in code to be a sequence of tasks where rate limits can be applied to reduce memory pressure when load builds up. See in system design menu above an article on rate limiter that I wrote.

A key idea in design is to design objects with memory pressure in mind ; I have discussed this thought in more detail in the article on memory management.

GC logs over a large time are fairly complex to visually analyze. You can use an application such as GC viewer to analyze these logs – see here.

See below some G1 GC related logs and parameters that I have found useful in tuning the application memory.

‘-XX:+PrintGCDateStamps’ This flag is used to print the physical timestamp of the GC log. It adds a little overhead in the jvm but is beneficial to add since we can via the physical gc log timestamps relate gc events to slowness in application performance.

2020-07-11T12:31:53.747-0700: 14407.414: [GC pause (G1 Evacuation Pause) (mixed)

The above line is indicating occurrence of a GC pause with mixed GC collection. Mixed is when reclaiming memory from young and old space occurs at the same time. 

Desired survivor size 167772160 bytes, new threshold 15 (max 15)

The above indicates that the target survivor space is 167 mb. The tenuring threshold of GC is being printed as 15, this is enabled via the flag ‘-XX:+PrintTenuringDistribution’. If the tenuring threshold had reduced to 1, it would have indicated that survivor space wasnt being used and objects are being moved to old space directly from young which we dont want.

  
– age 1: 357016 bytes, 357016 total

Above indicates that 357KB have survived one young GC event.  
14407.415: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 465731, predicted base time: 183.32 ms, remaining time: 816.68 ms, target pause time: 1000.00 ms]
14407.415: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 133 regions, survivors: 20 regions, predicted young region time: 53.79 ms]
14407.451: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: old CSet region num reached max, old: 308 regions, max: 308 regions]
14407.451: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 133 regions, survivors: 20 regions, old: 308 regions, predicted pause time: 488.84 ms, target pause time: 1000.00 ms]
14407.864: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 518 regions, reclaimable: 3398476672 bytes (6.59 %), threshold: 5.00 %]
, 0.4501931 secs]

 ‘-XX:+PrintAdaptiveSizePolicy’ adds G1Ergonomics information to the GC logs. Its printing information about predicted GC pause time, number of regions in jvm space per young, survivor and old space. And lastly indicates that a mixed gc will run because the number of collectible regions in old space is above the threshold. 
[Parallel Time: 334.6 ms, GC Workers: 165]

Above indicates that 165 worker threads caused 334 ms of time stopping the world.
[GC Worker Start (ms): Min: 14407453.1, Avg: 14407455.7, Max: 14407458.5, Diff: 5.4]

The number of milliseconds between the start of the first and last gc parallel thread. This difference should ideally be very small.
[Ext Root Scanning (ms): Min: 0.1, Avg: 2.0, Max: 17.1, Diff: 17.0, Sum: 323.9]
[Update RS (ms): Min: 198.7, Avg: 206.1, Max: 215.8, Diff: 17.0, Sum: 34004.5]
[Processed Buffers: Min: 8, Avg: 12.0, Max: 27, Diff: 19, Sum: 1985]
[Scan RS (ms): Min: 52.3, Avg: 62.0, Max: 65.7, Diff: 13.4, Sum: 10234.9]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 2.6, Diff: 2.6, Sum: 35.3]

[Object Copy (ms): Min: 51.6, Avg: 56.4, Max: 62.3, Diff: 10.8, Sum: 9301.0]

Above are logs indicating time spent in scanning of regions for references in collection set.

[Termination (ms): Min: 0.1, Avg: 0.9, Max: 1.5, Diff: 1.4, Sum: 143.8]
[Termination Attempts: Min: 1, Avg: 52.6, Max: 67, Diff: 66, Sum: 8684]
[GC Worker Other (ms): Min: 0.1, Avg: 1.8, Max: 4.0, Diff: 3.9, Sum: 294.2]
[GC Worker Total (ms): Min: 324.9, Avg: 329.3, Max: 334.0, Diff: 9.2, Sum: 54337.7]
[GC Worker End (ms): Min: 14407783.3, Avg: 14407785.1, Max: 14407787.3, Diff: 3.9]

Above indicates GC worker time stamps.
[Code Root Fixup: 0.2 ms]
[Code Root Purge: 0.1 ms]
[Clear CT: 8.0 ms]
[Other: 107.2 ms]
[Choose CSet: 36.4 ms]
[Ref Proc: 7.8 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 25.3 ms]
[Humongous Register: 0.3 ms]
[Humongous Reclaim: 0.1 ms]

Above is time in reclaiming humongous objects.
[Free CSet: 33.7 ms]
[Eden: 2128.0M(2128.0M)->0.0B(2208.0M) Survivors: 320.0M->240.0M Heap: 22.1G(48.0G)→16.4G(48.0G)]

Above is indicating change in space usage for the various memory regions once the GC finished. Key is that eden was all cleared, survivor space reduced and overall heap usage reduced from 22 to 16GB.

[Times: user=56.10 sys=0.12, real=0.45 secs]

The user time above is the time across all threads/cpus. System is the time spent within kernel. And real is the wall clock time spent waiting for GC. In the above real is the time we need to pay more close attention to. It also shows that we spent a lot of user time across CPU threads but thats because on solaris T5 boxes we have a lot of CPU/threads available. 

Below GC log is same as that above except that its indicating that mixed GCs will stop as the number of reclaimable candidates in old region is below the threshold.

2020-07-11T13:01:46.675-0700: 16200.254: [GC pause (G1 Evacuation Pause) (mixed)
Desired survivor size 167772160 bytes, new threshold 15 (max 15)
– age 1: 638784 bytes, 638784 total
– age 2: 239464 bytes, 878248 total
16200.254: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 419747, predicted base time: 206.68 ms, remaining time: 793.32 ms, target pause time: 1000.00 ms]
16200.254: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 138 regions, survivors: 15 regions, predicted young region time: 37.22 ms]
16200.268: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 92 regions, max: 308 regions, reclaimable: 2576759240 bytes (5.00 %), threshold: 5.00 %]
16200.268: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 138 regions, survivors: 15 regions, old: 92 regions, predicted pause time: 362.50 ms, target pause time: 1000.00 ms]
16200.545: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 426 regions, reclaimable: 2576759240 bytes (5.00 %), threshold: 5.00 %]
, 0.2923730 secs]
[Parallel Time: 219.6 ms, GC Workers: 165]
[GC Worker Start (ms): Min: 16200269.9, Avg: 16200272.5, Max: 16200275.2, Diff: 5.3]
[Ext Root Scanning (ms): Min: 0.1, Avg: 2.2, Max: 14.0, Diff: 14.0, Sum: 355.9]
[Update RS (ms): Min: 98.1, Avg: 106.3, Max: 116.9, Diff: 18.8, Sum: 17538.5]
[Processed Buffers: Min: 6, Avg: 11.0, Max: 32, Diff: 26, Sum: 1808]
[Scan RS (ms): Min: 55.9, Avg: 66.4, Max: 71.7, Diff: 15.7, Sum: 10958.8]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 2.8, Diff: 2.8, Sum: 23.4]
[Object Copy (ms): Min: 34.4, Avg: 36.5, Max: 40.3, Diff: 5.9, Sum: 6024.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 13.0]
[Termination Attempts: Min: 1, Avg: 2.0, Max: 5, Diff: 4, Sum: 327]
[GC Worker Other (ms): Min: 0.1, Avg: 1.7, Max: 3.9, Diff: 3.9, Sum: 284.3]
[GC Worker Total (ms): Min: 209.1, Avg: 213.3, Max: 217.4, Diff: 8.3, Sum: 35198.4]
[GC Worker End (ms): Min: 16200484.1, Avg: 16200485.8, Max: 16200488.0, Diff: 3.9]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.1 ms]
[Clear CT: 8.5 ms]
[Other: 64.1 ms]
[Choose CSet: 14.3 ms]
[Ref Proc: 9.7 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 19.0 ms]
[Humongous Register: 0.3 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 17.6 ms]
[Eden: 2208.0M(2208.0M)->0.0B(22.2G) Survivors: 240.0M->64.0M Heap: 18.5G(48.0G)->15.3G(48.0G)]
[Times: user=35.91 sys=0.20, real=0.29 secs]

Below is a subsequent GC event in only the young generation. 

2020-07-11T14:34:48.543-0700: 21781.844: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 1501560832 bytes, new threshold 15 (max 15)
– age 1: 1013104 bytes, 1013104 total
– age 2: 24704 bytes, 1037808 total
– age 3: 239464 bytes, 1277272 total
21781.845: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 426744, predicted base time: 198.63 ms, remaining time: 801.37 ms, target pause time: 1000.00 ms]
21781.845: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1424 regions, survivors: 4 regions, predicted young region time: 453.84 ms]
21781.845: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1424 regions, survivors: 4 regions, old: 0 regions, predicted pause time: 652.47 ms, target pause time: 1000.00 ms]
, 0.3078034 secs]
[Parallel Time: 232.8 ms, GC Workers: 165]
[GC Worker Start (ms): Min: 21781846.9, Avg: 21781849.7, Max: 21781852.7, Diff: 5.8]
[Ext Root Scanning (ms): Min: 0.1, Avg: 2.5, Max: 18.1, Diff: 18.0, Sum: 414.8]
[Update RS (ms): Min: 17.7, Avg: 29.9, Max: 33.9, Diff: 16.3, Sum: 4925.6]
[Processed Buffers: Min: 6, Avg: 11.4, Max: 20, Diff: 14, Sum: 1881]
[Scan RS (ms): Min: 10.3, Avg: 17.0, Max: 19.4, Diff: 9.1, Sum: 2806.7]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.7, Diff: 0.7, Sum: 5.3]
[Object Copy (ms): Min: 172.3, Avg: 172.9, Max: 175.3, Diff: 3.0, Sum: 28531.3]
[Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 13.9]
[Termination Attempts: Min: 1, Avg: 4.2, Max: 9, Diff: 8, Sum: 692]
[GC Worker Other (ms): Min: 0.1, Avg: 2.7, Max: 5.7, Diff: 5.6, Sum: 448.9]
[GC Worker Total (ms): Min: 220.0, Avg: 225.1, Max: 229.3, Diff: 9.3, Sum: 37146.6]
[GC Worker End (ms): Min: 21782072.2, Avg: 21782074.8, Max: 21782077.8, Diff: 5.6]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.1 ms]
[Clear CT: 8.5 ms]
[Other: 66.3 ms]
[Choose CSet: 0.1 ms]
[Ref Proc: 27.5 ms]
[Ref Enq: 0.3 ms]
[Redirty Cards: 6.8 ms]
[Humongous Register: 0.3 ms]
[Humongous Reclaim: 0.2 ms]
[Free CSet: 26.5 ms]
[Eden: 22.2G(22.2G)->0.0B(20.4G) Survivors: 64.0M->2208.0M Heap: 37.8G(48.0G)→17.7G(48.0G)]

[Times: user=37.07 sys=0.10, real=0.31 secs]

2020-07-11T15:29:56.058-0700: 25089.194: Total time for which application threads were stopped: 0.0092741 seconds, Stopping threads took: 0.0055342 seconds

The above line indicates application stopped times and is turned on via  ‘-XX:+PrintGCApplicationStoppedTime’,

2020-07-11T15:29:56.090-0700: 25089.226: Application time: 0.0318118 seconds

The above is enabled via  ‘-XX:+PrintGCApplicationConcurrentTime’. It shows how much time the application worked without being stopped.

Here is an article from uber that describes tricks for JVM tuning

https://eng.uber.com/jvm-tuning-garbage-collection/