martes, 8 de febrero de 2011

-XX:+UseNUMA

Creo que desde Hotspot 17, al menos, está disponible esta opción (ahora vamos por Hotspot 19 con el JDK 1.6.0_23). Para mejorar el rendimiento de la recolección de basura en arquitecturas multicore, lo que según Sun-Oracle supone en total hasta una mejora de entre un 30 y un 40% en SPEC JBB 2005.

¿Pero qué es NUMA? Pues ala, a verlo en la wikipedia: http://es.wikipedia.org/wiki/NUMA.

En http://download.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#numa puede verse un resumen de la utilidad de esta opción en combinación con -XX:+UseParallelGC (o -XX:+UseParallelOldGC):


NUMA Collector Enhancements

The Parallel Scavenger garbage collector has been extended to take advantage of the machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes a different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.

In the Java HotSpot Virtual Machine, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young generation of the heap, where most of the new objects are created. The allocator divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use the object. To ensure the fastest access to the new object, the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition, "from" and "to" survivor spaces of the young generation, the old generation, and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average.

The NUMA-aware allocator is available on the Solaris™ operating system starting in Solaris 9 12/02 and on the Linux operating system starting in Linux kernel 2.6.19 and glibc 2.6.1.

The NUMA-aware allocator can be turned on with the -XX:+UseNUMA flag in conjunction with the selection of the Parallel Scavenger garbage collector. The Parallel Scavenger garbage collector is the default for a server-class machine. The Parallel Scavenger garbage collector can also be turned on explicitly by specifying the -XX:+UseParallelGC option.

NUMA Performance Metrics

When evaluated against the SPEC JBB 2005 benchmark on an 8-chip Opteron machine, NUMA-aware systems showed the following performance increases:

  • 32 bit – About 30 percent increase in performance with NUMA-aware allocator
  • 64 bit – About 40 percent increase in performance with NUMA-aware allocator

¡¡¡Lástima que con el kernel 2.6.18 (que es el de mis sistemas de producción) no se soporta!!!

Sin haber hecho pruebas, estoy convencido en que en aplicaciones servidor (que efectivamente cumplen el patrón típico de toneladas de objetos creados y consumidos únicamente por el mismo thread) la ganancia será palpable en casi todos los escenarios, o cuando menos inocuo. Probablemente en algunas situaciones, sin embargo, sea peor el remedio que la enfermedad...