Performance tuning
max-spread-checks
By default, haproxy tries to spread the start of health checks across the
smallest health check interval of all the servers in a farm. The principle is
to avoid hammering services running on the same server. But when using large
check intervals (10 seconds or more), the last servers in the farm take some
time before starting to be tested, which can be a problem. This parameter is
used to enforce an upper bound on delay between the first and the last check,
even if the servers' check intervals are larger. When servers run with
shorter intervals, their intervals will be respected though.
maxconn
Sets the maximum per-process number of concurrent connections to . It
is equivalent to the command-line argument "-n". Proxies will stop accepting
connections when this limit is reached. The "ulimit-n" parameter is
automatically adjusted according to this value. See also "ulimit-n". Note:
the "select" poller cannot reliably use more than 1024 file descriptors on
some platforms. If your platform only supports select and reports "select
FAILED" on startup, you need to reduce maxconn until it works (slightly
below 500 in general). If this value is not set, it will default to the value
set in DEFAULT_MAXCONN at build time (reported in haproxy -vv) if no memory
limit is enforced, or will be computed based on the memory limit, the buffer
size, memory allocated to compression, SSL cache size, and use or not of SSL
and the associated maxsslconn (which can also be automatic).
maxconnrate
Sets the maximum per-process number of connections per second to .
Proxies will stop accepting connections when this limit is reached. It can be
used to limit the global capacity regardless of each frontend capacity. It is
important to note that this can only be used as a service protection measure,
as there will not necessarily be a fair share between frontends when the
limit is reached, so it's a good idea to also limit each frontend to some
value close to its expected share. Also, lowering tune.maxaccept can improve
fairness.
maxcomprate
Sets the maximum per-process input compression rate to kilobytes
per second. For each session, if the maximum is reached, the compression
level will be decreased during the session. If the maximum is reached at the
beginning of a session, the session will not compress at all. If the maximum
is not reached, the compression level will be increased up to
tune.comp.maxlevel. A value of zero means there is no limit, this is the
default value.
maxcompcpuusage
Sets the maximum CPU usage HAProxy can reach before stopping the compression
for new requests or decreasing the compression level of current requests.
It works like 'maxcomprate' but measures CPU usage instead of incoming data
bandwidth. The value is expressed in percent of the CPU used by haproxy. In
case of multiple processes (nbproc > 1), each process manages its individual
usage. A value of 100 disable the limit. The default value is 100. Setting
a lower value will prevent the compression work from slowing the whole
process down and from introducing high latencies.
maxpipes
Sets the maximum per-process number of pipes to . Currently, pipes
are only used by kernel-based tcp splicing. Since a pipe contains two file
descriptors, the "ulimit-n" value will be increased accordingly. The default
value is maxconn/4, which seems to be more than enough for most heavy usages.
The splice code dynamically allocates and releases pipes, and can fall back
to standard copy, so setting this value too low may only impact performance.
maxsessrate
Sets the maximum per-process number of sessions per second to .
Proxies will stop accepting connections when this limit is reached. It can be
used to limit the global capacity regardless of each frontend capacity. It is
important to note that this can only be used as a service protection measure,
as there will not necessarily be a fair share between frontends when the
limit is reached, so it's a good idea to also limit each frontend to some
value close to its expected share. Also, lowering tune.maxaccept can improve
fairness.
maxsslconn
Sets the maximum per-process number of concurrent SSL connections to
. By default there is no SSL-specific limit, which means that the
global maxconn setting will apply to all connections. Setting this limit
avoids having openssl use too much memory and crash when malloc returns NULL
(since it unfortunately does not reliably check for such conditions). Note
that the limit applies both to incoming and outgoing connections, so one
connection which is deciphered then ciphered accounts for 2 SSL connections.
If this value is not set, but a memory limit is enforced, this value will be
automatically computed based on the memory limit, maxconn, the buffer size,
memory allocated to compression, SSL cache size, and use of SSL in either
frontends, backends or both. If neither maxconn nor maxsslconn are specified
when there is a memory limit, haproxy will automatically adjust these values
so that 100% of the connections can be made over SSL with no risk, and will
consider the sides where it is enabled (frontend, backend, both).
maxsslrate
Sets the maximum per-process number of SSL sessions per second to .
SSL listeners will stop accepting connections when this limit is reached. It
can be used to limit the global SSL CPU usage regardless of each frontend
capacity. It is important to note that this can only be used as a service
protection measure, as there will not necessarily be a fair share between
frontends when the limit is reached, so it's a good idea to also limit each
frontend to some value close to its expected share. It is also important to
note that the sessions are accounted before they enter the SSL stack and not
after, which also protects the stack against bad handshakes. Also, lowering
tune.maxaccept can improve fairness.
maxzlibmem
Sets the maximum amount of RAM in megabytes per process usable by the zlib.
When the maximum amount is reached, future sessions will not compress as long
as RAM is unavailable. When sets to 0, there is no limit.
The default value is 0. The value is available in bytes on the UNIX socket
with "show info" on the line "MaxZlibMemUsage", the memory used by zlib is
"ZlibMemUsage" in bytes.
noepoll
Disables the use of the "epoll" event polling system on Linux. It is
equivalent to the command-line argument "-de". The next polling system
used will generally be "poll". See also "nopoll".
nokqueue
Disables the use of the "kqueue" event polling system on BSD. It is
equivalent to the command-line argument "-dk". The next polling system
used will generally be "poll". See also "nopoll".
nopoll
Disables the use of the "poll" event polling system. It is equivalent to the
command-line argument "-dp". The next polling system used will be "select".
It should never be needed to disable "poll" since it's available on all
platforms supported by HAProxy. See also "nokqueue" and "noepoll".
nosplice
Disables the use of kernel tcp splicing between sockets on Linux. It is
equivalent to the command line argument "-dS". Data will then be copied
using conventional and more portable recv/send calls. Kernel tcp splicing is
limited to some very recent instances of kernel 2.6. Most versions between
2.6.25 and 2.6.28 are buggy and will forward corrupted data, so they must not
be used. This option makes it easier to globally disable kernel splicing in
case of doubt. See also "option splice-auto", "option splice-request" and
"option splice-response".
nogetaddrinfo
Disables the use of getaddrinfo(3) for name resolving. It is equivalent to
the command line argument "-dG". Deprecated gethostbyname(3) will be used.
noreuseport
Disables the use of SO_REUSEPORT - see socket(7). It is equivalent to the
command line argument "-dR".
spread-checks <0..50, in="" percent="">
Sometimes it is desirable to avoid sending agent and health checks to
servers at exact intervals, for instance when many logical servers are
located on the same physical server. With the help of this parameter, it
becomes possible to add some randomness in the check interval between 0
and +/- 50%. A value between 2 and 5 seems to show good results. The
default value remains at 0.
tune.buffers.limit
Sets a hard limit on the number of buffers which may be allocated per process.
The default value is zero which means unlimited. The minimum non-zero value
will always be greater than "tune.buffers.reserve" and should ideally always
be about twice as large. Forcing this value can be particularly useful to
limit the amount of memory a process may take, while retaining a sane
behaviour. When this limit is reached, sessions which need a buffer wait for
another one to be released by another session. Since buffers are dynamically
allocated and released, the waiting time is very short and not perceptible
provided that limits remain reasonable. In fact sometimes reducing the limit
may even increase performance by increasing the CPU cache's efficiency. Tests
have shown good results on average HTTP traffic with a limit to 1/10 of the
expected global maxconn setting, which also significantly reduces memory
usage. The memory savings come from the fact that a number of connections
will not allocate 2*tune.bufsize. It is best not to touch this value unless
advised to do so by an haproxy core developer.
tune.buffers.reserve
Sets the number of buffers which are pre-allocated and reserved for use only
during memory shortage conditions resulting in failed memory allocations. The
minimum value is 2 and is also the default. There is no reason a user would
want to change this value, it's mostly aimed at haproxy core developers.
tune.bufsize
Sets the buffer size to this size (in bytes). Lower values allow more
sessions to coexist in the same amount of RAM, and higher values allow some
applications with very large cookies to work. The default value is 16384 and
can be changed at build time. It is strongly recommended not to change this
from the default value, as very low values will break some services such as
statistics, and values larger than default size will increase memory usage,
possibly causing the system to run out of memory. At least the global maxconn
parameter should be decreased by the same factor as this one is increased.
If HTTP request is larger than (tune.bufsize - tune.maxrewrite), haproxy will
return HTTP 400 (Bad Request) error. Similarly if an HTTP response is larger
than this size, haproxy will return HTTP 502 (Bad Gateway).
tune.chksize
Sets the check buffer size to this size (in bytes). Higher values may help
find string or regex patterns in very large pages, though doing so may imply
more memory and CPU usage. The default value is 16384 and can be changed at
build time. It is not recommended to change this value, but to use better
checks whenever possible.
tune.comp.maxlevel
Sets the maximum compression level. The compression level affects CPU
usage during compression. This value affects CPU usage during compression.
Each session using compression initializes the compression algorithm with
this value. The default value is 1.
tune.http.cookielen
Sets the maximum length of captured cookies. This is the maximum value that
the "capture cookie xxx len yyy" will be allowed to take, and any upper value
will automatically be truncated to this one. It is important not to set too
high a value because all cookie captures still allocate this size whatever
their configured value (they share a same pool). This value is per request
per response, so the memory allocated is twice this value per connection.
When not specified, the limit is set to 63 characters. It is recommended not
to change this value.
tune.http.maxhdr
Sets the maximum number of headers in a request. When a request comes with a
number of headers greater than this value (including the first line), it is
rejected with a "400 Bad Request" status code. Similarly, too large responses
are blocked with "502 Bad Gateway". The default value is 101, which is enough
for all usages, considering that the widely deployed Apache server uses the
same limit. It can be useful to push this limit further to temporarily allow
a buggy application to work by the time it gets fixed. Keep in mind that each
new header consumes 32bits of memory for each session, so don't push this
limit too high.
tune.idletimer
Sets the duration after which haproxy will consider that an empty buffer is
probably associated with an idle stream. This is used to optimally adjust
some packet sizes while forwarding large and small data alternatively. The
decision to use splice() or to send large buffers in SSL is modulated by this
parameter. The value is in milliseconds between 0 and 65535. A value of zero
means that haproxy will not try to detect idle streams. The default is 1000,
which seems to correctly detect end user pauses (eg: read a page before
clicking). There should be not reason for changing this value. Please check
tune.ssl.maxrecord below.
tune.lua.forced-yield
This directive forces the Lua engine to execute a yield each of
instructions executed. This permits interrupting a long script and allows the
HAProxy scheduler to process other tasks like accepting connections or
forwarding traffic. The default value is 10000 instructions. If HAProxy often
executes some Lua code but more reactivity is required, this value can be
lowered. If the Lua code is quite long and its result is absolutely required
to process the data, the can be increased.
tune.lua.maxmem
Sets the maximum amount of RAM in megabytes per process usable by Lua. By
default it is zero which means unlimited. It is important to set a limit to
ensure that a bug in a script will not result in the system running out of
memory.
tune.lua.session-timeout
This is the execution timeout for the Lua sessions. This is useful for
preventing infinite loops or spending too much time in Lua. This timeout
counts only the pure Lua runtime. If the Lua does a sleep, the sleep is
not taked in account. The default timeout is 4s.
tune.lua.task-timeout
Purpose is the same as "tune.lua.session-timeout", but this timeout is
dedicated to the tasks. By default, this timeout isn't set because a task may
remain alive during of the lifetime of HAProxy. For example, a task used to
check servers.
tune.lua.service-timeout
This is the execution timeout for the Lua services. This is useful for
preventing infinite loops or spending too much time in Lua. This timeout
counts only the pure Lua runtime. If the Lua does a sleep, the sleep is
not taked in account. The default timeout is 4s.
tune.maxaccept
Sets the maximum number of consecutive connections a process may accept in a
row before switching to other work. In single process mode, higher numbers
give better performance at high connection rates. However in multi-process
modes, keeping a bit of fairness between processes generally is better to
increase performance. This value applies individually to each listener, so
that the number of processes a listener is bound to is taken into account.
This value defaults to 64. In multi-process mode, it is divided by twice
the number of processes the listener is bound to. Setting this value to -1
completely disables the limitation. It should normally not be needed to tweak
this value.
tune.maxpollevents
Sets the maximum amount of events that can be processed at once in a call to
the polling system. The default value is adapted to the operating system. It
has been noticed that reducing it below 200 tends to slightly decrease
latency at the expense of network bandwidth, and increasing it above 200
tends to trade latency for slightly increased bandwidth.
tune.maxrewrite
Sets the reserved buffer space to this size in bytes. The reserved space is
used for header rewriting or appending. The first reads on sockets will never
fill more than bufsize-maxrewrite. Historically it has defaulted to half of
bufsize, though that does not make much sense since there are rarely large
numbers of headers to add. Setting it too high prevents processing of large
requests or responses. Setting it too low prevents addition of new headers
to already large requests or to POST requests. It is generally wise to set it
to about 1024. It is automatically readjusted to half of bufsize if it is
larger than that. This means you don't have to worry about it when changing
bufsize.
tune.pattern.cache-size
Sets the size of the pattern lookup cache to entries. This is an LRU
cache which reminds previous lookups and their results. It is used by ACLs
and maps on slow pattern lookups, namely the ones using the "sub", "reg",
"dir", "dom", "end", "bin" match methods as well as the case-insensitive
strings. It applies to pattern expressions which means that it will be able
to memorize the result of a lookup among all the patterns specified on a
configuration line (including all those loaded from files). It automatically
invalidates entries which are updated using HTTP actions or on the CLI. The
default cache size is set to 10000 entries, which limits its footprint to
about 5 MB on 32-bit systems and 8 MB on 64-bit systems. There is a very low
risk of collision in this cache, which is in the order of the size of the
cache divided by 2^64. Typically, at 10000 requests per second with the
default cache size of 10000 entries, there's 1% chance that a brute force
attack could cause a single collision after 60 years, or 0.1% after 6 years.
This is considered much lower than the risk of a memory corruption caused by
aging components. If this is not acceptable, the cache can be disabled by
setting this parameter to 0.
tune.pipesize
Sets the kernel pipe buffer size to this size (in bytes). By default, pipes
are the default size for the system. But sometimes when using TCP splicing,
it can improve performance to increase pipe sizes, especially if it is
suspected that pipes are not filled and that many calls to splice() are
performed. This has an impact on the kernel's memory footprint, so this must
not be changed if impacts are not understood.
tune.rcvbuf.client
tune.rcvbuf.server
Forces the kernel socket receive buffer size on the client or the server side
to the specified value in bytes. This value applies to all TCP/HTTP frontends
and backends. It should normally never be set, and the default size (0) lets
the kernel autotune this value depending on the amount of available memory.
However it can sometimes help to set it to very low values (eg: 4096) in
order to save kernel memory by preventing it from buffering too large amounts
of received data. Lower values will significantly increase CPU usage though.
tune.recv_enough
Haproxy uses some hints to detect that a short read indicates the end of the
socket buffers. One of them is that a read returns more than
bytes, which defaults to 10136 (7 segments of 1448 each). This default value
may be changed by this setting to better deal with workloads involving lots
of short messages such as telnet or SSH sessions.
tune.sndbuf.client
tune.sndbuf.server
Forces the kernel socket send buffer size on the client or the server side to
the specified value in bytes. This value applies to all TCP/HTTP frontends
and backends. It should normally never be set, and the default size (0) lets
the kernel autotune this value depending on the amount of available memory.
However it can sometimes help to set it to very low values (eg: 4096) in
order to save kernel memory by preventing it from buffering too large amounts
of received data. Lower values will significantly increase CPU usage though.
Another use case is to prevent write timeouts with extremely slow clients due
to the kernel waiting for a large part of the buffer to be read before
notifying haproxy again.
tune.ssl.cachesize
Sets the size of the global SSL session cache, in a number of blocks. A block
is large enough to contain an encoded session without peer certificate.
An encoded session with peer certificate is stored in multiple blocks
depending on the size of the peer certificate. A block uses approximately
200 bytes of memory. The default value may be forced at build time, otherwise
defaults to 20000. When the cache is full, the most idle entries are purged
and reassigned. Higher values reduce the occurrence of such a purge, hence
the number of CPU-intensive SSL handshakes by ensuring that all users keep
their session as long as possible. All entries are pre-allocated upon startup
and are shared between all processes if "nbproc" is greater than 1. Setting
this value to 0 disables the SSL session cache.
tune.ssl.force-private-cache
This boolean disables SSL session cache sharing between all processes. It
should normally not be used since it will force many renegotiations due to
clients hitting a random process. But it may be required on some operating
systems where none of the SSL cache synchronization method may be used. In
this case, adding a first layer of hash-based load balancing before the SSL
layer might limit the impact of the lack of session sharing.
tune.ssl.lifetime
Sets how long a cached SSL session may remain valid. This time is expressed
in seconds and defaults to 300 (5 min). It is important to understand that it
does not guarantee that sessions will last that long, because if the cache is
full, the longest idle sessions will be purged despite their configured
lifetime. The real usefulness of this setting is to prevent sessions from
being used for too long.
tune.ssl.maxrecord
Sets the maximum amount of bytes passed to SSL_write() at a time. Default
value 0 means there is no limit. Over SSL/TLS, the client can decipher the
data only once it has received a full record. With large records, it means
that clients might have to download up to 16kB of data before starting to
process them. Limiting the value can improve page load times on browsers
located over high latency or low bandwidth networks. It is suggested to find
optimal values which fit into 1 or 2 TCP segments (generally 1448 bytes over
Ethernet with TCP timestamps enabled, or 1460 when timestamps are disabled),
keeping in mind that SSL/TLS add some overhead. Typical values of 1419 and
2859 gave good results during tests. Use "strace -e trace=write" to find the
best value. Haproxy will automatically switch to this setting after an idle
stream has been detected (see tune.idletimer above).
tune.ssl.default-dh-param
Sets the maximum size of the Diffie-Hellman parameters used for generating
the ephemeral/temporary Diffie-Hellman key in case of DHE key exchange. The
final size will try to match the size of the server's RSA (or DSA) key (e.g,
a 2048 bits temporary DH key for a 2048 bits RSA key), but will not exceed
this maximum value. Default value if 1024. Only 1024 or higher values are
allowed. Higher values will increase the CPU load, and values greater than
1024 bits are not supported by Java 7 and earlier clients. This value is not
used if static Diffie-Hellman parameters are supplied either directly
in the certificate file or by using the ssl-dh-param-file parameter.
tune.ssl.ssl-ctx-cache-size
Sets the size of the cache used to store generated certificates to
entries. This is a LRU cache. Because generating a SSL certificate
dynamically is expensive, they are cached. The default cache size is set to
1000 entries.
tune.vars.global-max-size
tune.vars.proc-max-size
tune.vars.reqres-max-size
tune.vars.sess-max-size
tune.vars.txn-max-size
These five tunes help to manage the maximum amount of memory used by the
variables system. "global" limits the overall amount of memory available for
all scopes. "proc" limits the memory for the process scope, "sess" limits the
memory for the session scope, "txn" for the transaction scope, and "reqres"
limits the memory for each request or response processing.
Memory accounting is hierarchical, meaning more coarse grained limits include
the finer grained ones: "proc" includes "sess", "sess" includes "txn", and
"txn" includes "reqres".
For example, when "tune.vars.sess-max-size" is limited to 100,
"tune.vars.txn-max-size" and "tune.vars.reqres-max-size" cannot exceed
100 either. If we create a variable "txn.var" that contains 100 bytes,
all available space is consumed.
Notice that exceeding the limits at runtime will not result in an error
message, but values might be cut off or corrupted. So make sure to accurately
plan for the amount of space needed to store all your variables.
tune.zlib.memlevel
Sets the memLevel parameter in zlib initialization for each session. It
defines how much memory should be allocated for the internal compression
state. A value of 1 uses minimum memory but is slow and reduces compression
ratio, a value of 9 uses maximum memory for optimal speed. Can be a value
between 1 and 9. The default value is 8.
tune.zlib.windowsize
Sets the window size (the size of the history buffer) as a parameter of the
zlib initialization for each session. Larger values of this parameter result
in better compression at the expense of memory usage. Can be a value between
8 and 15. The default value is 15.