Frequently Asked Questions about Garbage Collection

Frequently Asked Questions
about Garbage Collection
in the Hotspot^TM Java^TM Virtual Machine

This document describes the behavior of the Java(tm) HotSpot(tm) virtual machine. This behavior is not part of the VM specification, however, and is subject to change in future releases. Moreover the behavior described here is generic behavior and will not apply to the execution of all Java applications.

How is the generational collector implemented in HotSpot(tm)?

The default collector in HotSpot has two generations: the young generation and the tenured generation. Most allocations are done in the young generation. The young generation is optimized for objects that have a short lifetime relative to the interval between collections. Objects that survive several collections in the young generation are moved to the tenured generation. The young generation is typically smaller and is collected more often. The tenured generation is typically larger and collected less often.

The young generation collector is a copying collector. The young generation is divided into 3 spaces: eden-space, to-space, and from-space. Allocations are done from eden-space and from-space. When those are full a young generation is collection is done. The expectation is that most of the objects are garbage and any surviving objects can be copied to to-space. If there are more surviving objects than can fit into to-space, the remaining objects are copied into the tenured generation. There is an option to collect the young generation in parallel.

The tenured generation is collected with a mark-sweep-compact collection. There is an option to collect the tenured generation concurrently.

What is the relevance of -XX:MaxNewSize? Where will the differences between -XX:NewSize and -XX:MaxNewSize grow, Eden or Survivor Spaces?

The young generation is set by a policy that bounds the size from below by NewSize and bounds it from above by MaxNewSize. As the young generation grows from NewSize to MaxNewSize, both eden and the survivor spaces grow.

Are all eden-space objects moved into the survivor space so that after a minor gc, eden-space is empty?

Yes. If all the live objects in eden do no fit into a survivor space, the remaining live objects are promoted into the old generation.

When using -XX:TargetSurvivorRatio=90 will this leave ten percent of to-space for objects to be moved from eden?

No. It means that a tenuring threshold is chosen so that, based on the ages of what was scavenged in the last minor collection, there should be nearly 90% of the survivor size used. The actual amount scavenged from either the survivor space or eden may be considerably more or less.

TargetSurvivorRatio does not usually make a big difference.

If objects in eden-space require more space than is available in the to-survivor space, will eden-space objects have precedence over from-survivor space objects? How does the age of from-survivor space objects affect promotion?

There is no distinction here between what comes from eden and what comes from the from-survivor space. After a minor collection completes, both eden and the from-survivor space are empty. If the to-survivor space fills up, any remaining objects are promoted directly into the old generation regardless of their age or origin.

Between NewSize and NewRatio which option takes precedence?

In jdk 1.4.1 and later, neither has strict precedence. The maximum of NewSize and the size calculated using NewRatio is used. The formula is

min(MaxNewSize, max(NewSize, heap/(NewRatio+1)))

How should the permanent generation be sized?

The permanent generation is used to hold reflective of the VM itself such as class objects and method objects. These reflective objects are allocated directly into the permanent generation, and it is sized independently from the other generations. Generally, sizing of this generation can be ignored because the default size is adequate. However, programs that load many classes may need a larger permanent generation.

How can I tell if the permanent generation is filling up?

Starting in 1.4.2 -XX:+PrintGCDetails will print information about all parts of the heap collected at each garbage collection. For a full collection

[Full GC [Tenured: 30437K->33739K(280576K), 0.7050569 secs] 106231K->33739K(362112K), [Perm : 2919K->2919K(16384K)], 0.7052334 secs]

this example shows that little was collected in the permanent generation (it went from 2919K used before the collection to 2919K used after the collection) and the current size of the permanent generation is 16384K.

How can I increase the permanent generation size?

Use the command line option -XX:MaxPermSize=<desired size>

How do I know what classes are being loaded or unloaded?

Use the command line options -XX:+TraceClassloading and -XX:+TraceClassUnloading

What is the best size for the young generation?

The young generation should be sized large enough so that short-lived objects have a chance to die before the next young generation collection. This is a tradeoff since a larger young generation will allow more time for objects to die but may also take longer to collect. Experiment with the size of the young generation to optimize the young generation collection time or the application throughput.

What should I do if my application has mid- or long-lived objects?

Objects that survive a young generation collection have a copying cost (part of the algorithm for a young generation collection is to copy any objects that survive). Mid- or long-lived objects may be copied multiple times. Use the -XX option MaxTenuringThreshold to determine the copying costs. Use -XX:MaxTenuringThreshold=0 to move an object that survives a young generation collection immediately to the tenured generation. If that improves the performance of the application, the copying of long-lived objects is significant. Note that the throughput collector does not use the MaxTenuringThreshold parameter.

When is a garbage collection started?

In the default garbage collector a generation is collected when it is full (i.e., when no further allocations can be done from that generation). This is also true of the throughput collector. The concurrent low pause collector starts a collection when the occupancy of the tenured generation reaches a specified value (by default 68%). The incremental low pause collector collects a portion of the tenured generation during each young generation collection. A collection can also be started explicitly by the application.

What type of collection does a System.gc() do?

An explicit request to do a garbage collection does a full collection (both young generation and tenured generation). A full collection is always done with the application paused for the duration of the collection.

What is the Concurrent Mark Sweep (CMS) collector?

The Concurrent Mark Sweep (CMS) collector (also referred to as the concurrent low pause collector) collects the tenured generation. It attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads.

Why is fragmentation a potential problem for the concurrent low pause collector?

Normally the concurrent low pause collector does not copy nor compact the live objects. A garbage collection is done without moving the live objects. If fragmentation becomes a problem, allocate a larger heap. In 1.4.2 if fragmentation in the tenured generation becomes a problem, a compaction of the tenured generation will be done although not concurrently. In 1.4.1 that compaction will occur if the UseCMSCompactAtFullCollection option is turned on.

-XX:+UseCMSCompactAtFullCollection

What are the phases of the concurrent low pause collector?

There are six phases involved in the collection:

Phase 1 (Initial Checkpoint) involves stopping all the Java threads, marking all the objects directly reachable from the roots, and restarting the Java threads.

Phase 2 (Concurrent Marking) starts scanning from marked objects and transitively marks all objects reachable from the roots. The mutators are executing during the concurrent phases 2, 3, and 5 below and any objects allocated in the CMS generation during these phases (including promoted objects) are immediately marked as live.

Phase 3: During the concurrent marking phase mutators may be modifying objects. Any object that has been modified since the start of the concurrent marking phase (and which was not subsequently scanned during that phase) must be rescanned. Phase 3 (Concurrent Precleaning) scans objects that have been modified concurrently. Due to continuing mutator activity the scanning for modified cards may be done multiple times.

Phase 4 (Final Checkpoint) is a stop-the-world phase. With mutators stopped the final marking is done by scanning objects reachable from the roots and by scanning any modified objects. Note that after this phase there may be objects that have been marked but are no longer live. Such objects will survive the current collection but will be collected on the next collection.

Phase 5 (Concurrent Sweep) collects dead objects. The collection of a dead object adds the space for the object to a free list for later allocation. Coalescing of dead objects may occur at this point. Note that live objects are not moved.

Phase 6 (Resetting) clears data structures in preparation for the next collection.

Does the VM allocate large int arrays for its own use?

One place the JVM does allocate big int[]'s is when it fills up various fragmented parts of memory to make things look whole for the garbage collector. E.g., the unused parts of each thread-local allocation buffer before a GC, or all of the young generation when running with JVMPI object allocation events.

Can I see how much of a thread allocation buffer is being left unused?

There's a flag, -XX:+PrintTLAB,that will trace all the operations on TLAB's. In particular, it prints lines like

reset TLAB: thread: 0x0002d7d0 size: 8KB unused: 76B Total fragmentation 0.004499

each time a TLAB is filled with a int[]. In this case, the unused trailing 76B will be unused.

This is an example of a TLAB that has filled up and a new one will be allocated. The amount of waste here is relatively small. More waste can occur in preparation for a garbage collection. For TLAB output that show up just before the a garbage collection like

reset TLAB: thread: 0x0002d840 size: 8KB unused: 7276B Total fragmentation 0.004580

[Full GC 10424K->591K(15688K), 0.1222677 secs]

where we are filling 7276 bytes of the 8192 byte TLAB. (The "Total fragmentation" is a cumulative accounting of the fragmentation caused by TLAB's.) TLAB's resize by default on SPARC -server, or if you use the -XX:+ResizeTLAB flag, so you may well get large TLAB's if you are running that JVM. Note that we aren't "wasting" the space for the fillers right before collections, as the collection will recover the space the filler objects occupy.

Does the default of NewRatio change with the compiler?

On SPARC's, -XX:NewRatio defaults to 8 with -client and 2 with -server, so the ratio of the young generation to the old generation will be 1::8 and the young generation will be 1/9th of the heap in -client and 1::2 or 1/3rd of the heap with -server.

What is the Parallel Garbage collector (-XX:+UseParallelGC)?

The new parallel garbage collector is similar to the young generation collector in the default garbage collector but uses multiple threads to do the collection. By default on a host with N CPUs, the parallel garbage collector uses N garbage collector threads in the collection. The number of garbage collector threads can be controlled with a command line option (see below). On a host with a single CPU the default garbage collector is used even if the parallel garbage collector has been requested. On a host with 2 cpus the Parallel garbage collector generally performs as well as the default garbage collector and a reduction in the young generation garbage collector pause times can be expected on hosts with more than 2 cpus.

This new parallel garbage collector can be enabled by using command line product flag -XX:+UseParallelGC. The number of garbage collector threads can be controlled with the ParallelGCThreads command line option (-XX:ParallelGCThreads=<desired number>). This collector cannot be used with concurrent low pause collector.

What is the Parallel Young Generation collector (-XX:+UseParNewGC)?

The parallel young generation collector is similar to the parallel garbage collector (-XX:+UseParallelGC) in intent and differs in implementation. Most of the above description for the parallel garbage collector (-XX:+UseParallelGC) therefore applies equally for the parallel young generation collector. Unlike the parallel garbage collector (-XX:+UseParallelGC) this parallel young generation collector can be used with the concurrent low pause collector that collects the tenured generation.

Which parallel collector should I use?

Although similar in intent the collectors are different in some details of the implementation that make the parallel garbage collector better for some applications while the parallel young generation collector is better for others. Both should be tried to determine which might be better suited to a specificapplication.

In addition the parallel young generation collector (-XX:+UseParNewGC) is integrated with the concurrent low pause collector whereas the parallel garbage collector (-XX:+UseParallelGC) is not. There are some costs associated with this integration which are borne even when the concurrent low pause collector is not used. Conversely the parallel garbage collector (-XX:+UseParallelGC) can be used with adaptive sizing (-XX:+UseAdaptiveSizePolicy) whereas the parallel young generation collector (-XX:+UseParNewGC) cannot.

Why is the startup with the concurrent low pause (CMS) collector slow?

With CMS (+XX:UseConcMarkSweepGC) you sometimes need to set the minimum and maximum heap size to the same value (or at least set a large minimum value) because CMS sometimes spends time early growing its heap. This may also be true of the perm generation. Try a larger perm generation size using the options -XX:PermSize=<initial size> -XX:MaxPermSize=<maximum size>.

What young generation collector is used with concurrent low pause collector?

By default low pause collector uses the default, single threaded young generation copying collector. If you specify the +XX:UseParNewGC a parallel version of the copying collector will be used.

Why does the low pause collector sometimes do more collections than the default collector?

If you are not seeing major collections with the default collector but are seeing many major collections with the concurrent low pause collector, you are probably seeing some type of fragmentation problem. Try using a larger heap with the concurrent low pause collector.

Are there other external sources for garbage collection documentation?

http://developer.java.sun.com/developer/technicalArticles/Programming/turbo/

With the concurrent low pause collector, what is a minimum value for NewRatio?

A minimum value of 4 is advisable.

Do objects ever get allocated directly into the old generation?

In 1.4.1 there two situations where allocation may occur directly into the old generation.

If an allocation fails in the young generation and the object is a large array that does not contain any references to objects, it can be allocated directly into the old generation. In some select instances, this strategy was intended to avoid a collection of the young generation by allocating from the old generation.

There is a flag (available in 1.4.2 and later) l-XX:PretenureSizeThreshold=<byte size> that can be set to limit the size of allocations in the young generation. Any allocation larger than this will not be attempted in the young generation and so will be allocated out of the old generation.

The threshold size for 1) is 64k words. The default size for PretenureSizeThreshold is 0 which says that any size can be allocated in the young generation.

In 1.4.2 case 1) the 64k word threshold continues to be true for the incremental collector (-Xincgc). For the default collector and the concurrent collector (-XX:+UseConcMarkSweepGC) the threshold value has been changed so that an attempt to allocate into the old generation only occurs if the size of the allocation is larger than the entire young generation (available space when it is empty). It was observed that there were cases where the 1.4.1 strategy for the default collector and concurrent collector were leading to full collections only (no young generation collections were being done). We deemed that bad enough to raise the threshold.

Should I increase the size of the permanent generation in the client vm?

This will always be a judgment call. In general increasing the size of a generation (and this applies not just to the permanent generation) can reduce the incidence of a wide variety of problems However, this may cause other processes to excessively page and/or garbage collect or throw out-of-memory exceptions.

There are two failure modes to consider.

When raising MaxPermSize, it is possible that previously well behaved programs that used to garbage collect to recover the permanent generation space will die by endless paging. For the permanent generation this usually only happens with the heavy interning of temporary strings.

The other failure mode is that address space must be reserved for the permanent generation and this will reduce that available for the rest of the heap (the maximum -Xmx may then be too large). This will cause programs configured to use all available space to fail at initialization.

Permanent generation defaults in recent VM's.

release		v1.3.1_06	v1.4.1_01	v1.4.2
Client	PermSize	1M	4M	4M
Server	PermSize	1M	4M	16M
Client	MaxPermSize	32M	64M	64M
Server	MaxPermSize	64M	64M	64M

Should I pool objects to help GC? Should I call System.gc() periodically?

The answer to these is No!

Pooling objects will cause them to live longer than necessary. We strongly advise against object pools.

Don't call System.gc(). The system will make the determination of when it's appropriate to do garbage collection and generally has the information necessary to do a much better job of initiating a garbage collection. If you are having problems with the garbage collection (pause times or frequency), consider adjusting the size of the generations.

What determines when softly referenced objects are flushed?

Starting with Java HotSpot VM implementations in J2SE 1.3.1, softly reachable objects will remain alive for some amount of time after the last time they were referenced. The default value is one second of lifetime per free megabyte in the heap. This value can be adjusted using the -XX:SoftRefLRUPolicyMSPerMB flag, which accepts integer values representing milliseconds per MB of free memory. For example, to change the value from one second to 2.5 seconds, use this flag:

-XX:SoftRefLRUPolicyMSPerMB=2500

The Java HotSpot Server VM uses the maximum possible heap size (as set by the -Xmx option) to calculate free space remaining.

The Java HotSpot Client VM uses the current heap size to calculate the free space.

This means that the general tendency is for the Server VM to grow the heap rather than flush soft references, and -Xmx therefore has a significant effect on when soft references are garbage collected.

On the other hand, the Client VM will have a greater tendency to flush soft references rather than grow the heap.

The behavior described above is true for the current (J2SE 1.3.1 and J2SE 1.4.x) versions of the Java HotSpot VMs. Note that the -XX:SoftRefLRUPolicyMSPerMB flag is not guaranteed to be present in any given release.

Prior to version 1.3.1, the Java HotSpot VMs cleared soft references whenever it found them.

I'm getting lots of full garbage collections (GC's)when I turn on -verbose:gc. The GC's are at regular intervals. My application never calls System.gc. I've tuned the heap and it makes no difference, what's going on?

If you're using RMI (remote method invocation), then you could be running into distributed garbage collection (GC). Also, some applications are adding explicit GC's thinking that it will make their application faster. Luckily, you can disable this with an option available in version 1.3 and 1.4. Try -XX:+DisableExplicitGC along with -verbose:gc and see if this helps.

The concurrent low pause collector seems to be doing full collections much of the time. How can the concurrent collection be sped up?

The concurrent collection generally cannot be sped up but it can be started earlier.

A concurrent collection starts running when the percentage of allocated space in the old generation crosses a threshold. This threshold is calculated based on general experience with the concurrent collector. If full collections are occurring, the concurrent collections may need to be started earlier. The command line flag CMSInitiatingOccupancyFraction can be used to set the level at which the collection is started. Its default value is approximately 68%. The command line to adjust the value is

-XX:CMSInitiatingOccupancyFraction=<percent>

The concurrent collector also keeps statistics on the promotion rate into the old generation for the application and makes a prediction on when to start a concurrent collection based on that promotion rate and the available free space in the old generation. Whereas the use of CMSInitiatingOccupancyFraction must be conservative to avoid full collections over the life of the application, the start of a concurrent collection based on the anticipated promotion adapts to the changing requirements of the application. The statistics that are used to calculate the promotion rate are based on the recent concurrent collections. The promotion rate is not calculated until at least one concurrent collection has completed so at least the first concurrent collection has to be initiated because the occupancy has reached CMSInitiatingOccupancyFraction . Setting CMSInitiatingOccupancyFraction to 100 would not cause only the anticipated promotion to be used to start a concurrent collection but would rather cause only non-concurrent collections to occur since a concurrent collection would not start until it was already too late. To eliminate the use of the anticipated promotions to start a concurrent collection set UseCMSInitiatingOccupancyOnly to true.

-XX:+UseCMSInitiatingOccupancyOnly

Sometimes the concurrent low pause collector is about to finish the last part of a concurrent collection when a full collection starts. The full collection looks like it does the whole collection again. Can that happen?

A full collection by default uses a different collection algorithm (a compaction) than the concurrent collection. As such all the work done by a concurrent collection in progress is lost. An adjustment to this can be made such that a full collection will complete the concurrent collection, albeit not concurrently. A full collection normally does a compaction because the inability to finish its collection concurrently is often the sign of a fragmentation problem. In many cases a compaction is needed but the compaction can be delayed for 1 full collection by setting the value ofCMSFullGCsBeforeCompaction to 1.

-XX:CMSFullGCsBeforeCompaction=1

With this value when a full collection starts it will complete the concurrent collection in progress. If another full collection occurs before a normal concurrent collection has completed, a compaction will be done.

With the concurrent low pause collector how can I tell how much floating garbage is left?

Because the application threads and the GC thread run concurrently, an object that is live at the beginning of a collection and which the GC thread has marked as live may die by the end of the collection. Such objects are referred to as floating garbage. The amount of floating garbage can be inferred if a full compacting collection occurs immediate following a concurrent collection. Any reduction in the heap size is due to floating garbage.

The parallel collectors seem to use as many garbage collector (GC) threads as there are processors on the machine. How can I ask for more or fewer GC threads?

The number of GC threads is controlled with the option

-XX:ParallelGCThreads=<number_of_GC_threads>

Why does fragmentation occur with the concurrent low pause collector?

The concurrent low pause collector normally does not move objects during a garbage collection. Fragmentation occurs when live objects are interspersed with the free space left as the result of the collection. The exception is when a non-concurrent, full collection occurs. In this latter case the applications is stopped during the collection and the live objects are compacted to one end of the generation and all the free space reside in a single contiguous piece.

What options should be used with the throughput collector?

The correct options to use depends on your application. Here are a few typical uses but none of these may be best for your application.

Server application running alone on a large multi-processor server with 4gb of physical memory.

#java -server -XX:+AggressiveHeap

Two application instances running on a large multi-processor server with 4gb of physical memory. Each java application instance is allocated a part of total system memory by an explicit specification of the maximum and minimum heap sizes.

#java -server -XX:+AggressiveHeap -Xms1024m -Xmx1024m

Example without using AggressiveHeap flag

#java -server -XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xms1024m -Xmx1024m

What options should I use with the concurrent low pause collector?

The correct options to use depends on your application. Here are a few typical uses but none of these may be best for your application.

Server application running on a processor system with 1 GB of physical memory.

#java -Xmx512m -Xms512m -XX:MaxNewSize=24m -XX:NewSize=24m -XX:+UseConcMarkSweepGC

Server application running on a multiprocessor system with 1GB of physical memory – using parallel minor collection option.

java -Xmx512m -Xms512m -XX:MaxNewSize=24m -XX:NewSize=24m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC

What are the default settings for the concurrent low pause collector?

The default heap size for the concurrent low pause collector is the same as for the default collector. The other parameters are set a described below. These setting have been shown to work well for an application that has mostly very short lived data plus some data that is very long lived. Some of the options require a computation which is enclosed in angle brackets (<>), of which two depend on the number of cpus on the machine (#cpus.)

# enable the concurrent low pause collector

-XX:+UseConcMarkSweepGC

# use parallel threads

-XX:+UseParNewGC

-XX:ParallelGCThreads=<#cpus < 8 ? #cpus : 3 + ((5 * #cpus) / 8) >

-XX:+CMSParallelRemarkEnabled

# size young generation for short pauses

-XX:NewSize=4m

-XX:MaxNewSize=< 4m * ParallelGCThreads >

# promote all live young generation objects

-XX:MaxTenuringThreshold=0

-XX:SurvivorRatio=1024

It is also recommended that a heap size be used that is 20-30% larger than that which would be used with the default collector.

What options should I use with the incremental low pause collector?

The correct options to use depends on your application. Here are a few typical uses but none of these may be best for your application.

Server application with 1GB of physical memory.

#java -server -Xincgc -XX:NewSize=64m -XX:MaxNewSize=64m -Xms512m -Xmx512m

Above application if full collection are occurring, which indicates the tenured generation is not being incrementally collected fast enough.

#java -server -Xincgc -XX:NewSize=24m -XX:MaxNewSize=24m -Xms512m -Xmx512m

Draft version: February 6, 2003

I love programming

Search This Blog