HTTP request
First, let's consider this HTTP request :
Line Contents
number
1 GET /serv/login.php?lang=en
&
profile=2 HTTP/1.1
2 Host: www.mydomain.com
3 User-agent: my small browser
4 Accept: image/jpeg, image/gif
5 Accept: image/png
1.2.1. The Request line
Line 1 is the "request line". It is always composed of 3 fields :
- a METHOD : GET
- a URI : /serv/login.php?lang=en&profile=2
- a version tag : HTTP/1.1
All of them are delimited by what the standard calls LWS (linear white spaces),
which are commonly spaces, but can also be tabs or line feeds/carriage returns
followed by spaces/tabs. The method itself cannot contain any colon (':') and
is limited to alphabetic letters. All those various combinations make it
desirable that HAProxy performs the splitting itself rather than leaving it to
the user to write a complex or inaccurate regular expression.
The URI itself can have several forms :
A "relative URI" :
/serv/login.php?lang=en&profile=2
It is a complete URL without the host part. This is generally what is
received by servers, reverse proxies and transparent proxies.An "absolute URI", also called a "URL" :
http://192.168.0.12:8080/serv/login.php?lang=en&profile=2
It is composed of a "scheme" (the protocol name followed by '://'), a host
name or address, optionally a colon (':') followed by a port number, then
a relative URI beginning at the first slash ('/') after the address part.
This is generally what proxies receive, but a server supporting HTTP/1.1
must accept this form too.a star ('*') : this form is only accepted in association with the OPTIONS
method and is not relayable. It is used to inquiry a next hop's
capabilities.an address:port combination : 192.168.0.12:80
This is used with the CONNECT method, which is used to establish TCP
tunnels through HTTP proxies, generally for HTTPS, but sometimes for
other protocols too.
In a relative URI, two sub-parts are identified. The part before the question
mark is called the "path". It is typically the relative path to static objects
on the server. The part after the question mark is called the "query string".
It is mostly used with GET requests sent to dynamic scripts and is very
specific to the language, framework or application in use.
1.2.2. The request headers
The headers start at the second line. They are composed of a name at the
beginning of the line, immediately followed by a colon (':'). Traditionally,
an LWS is added after the colon but that's not required. Then come the values.
Multiple identical headers may be folded into one single line, delimiting the
values with commas, provided that their order is respected. This is commonly
encountered in the "Cookie:" field. A header may span over multiple lines if
the subsequent lines begin with an LWS. In the example in 1.2, lines 4 and 5
define a total of 3 values for the "Accept:" header.
Contrary to a common mis-conception, header names are not case-sensitive, and
their values are not either if they refer to other header names (such as the
"Connection:" header).
The end of the headers is indicated by the first empty line. People often say
that it's a double line feed, which is not exact, even if a double line feed
is one valid form of empty line.
Fortunately, HAProxy takes care of all these complex combinations when indexing
headers, checking values and counting them, so there is no reason to worry
about the way they could be written, but it is important not to accuse an
application of being buggy if it does unusual, valid things.
Important note:
As suggested by RFC2616, HAProxy normalizes headers by replacing line breaks
in the middle of headers by LWS in order to join multi-line headers. This
is necessary for proper analysis and helps less capable HTTP parsers to work
correctly and not to be fooled by such complex constructs.