"Fossies" - the Fresh Open Source Software Archive

Member "haproxy-2.0.8/doc/internals/body-parsing.txt" (23 Oct 2019, 8524 Bytes) of package /linux/misc/haproxy-2.0.8.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 2014/04/16 - Pointer assignments during processing of the HTTP body
    2 
    3 In HAProxy, a struct http_msg is a descriptor for an HTTP message, which stores
    4 the state of an HTTP parser at any given instant, relative to a buffer which
    5 contains part of the message being inspected.
    6 
    7 Currently, an http_msg holds a few pointers and offsets to some important
    8 locations in a message depending on the state the parser is in. Some of these
    9 pointers and offsets may move when data are inserted into or removed from the
   10 buffer, others won't move.
   11 
   12 An important point is that the state of the parser only translates what the
   13 parser is reading, and not at all what is being done on the message (eg:
   14 forwarding).
   15 
   16 For an HTTP message <msg> and a buffer <buf>, we have the following elements
   17 to work with :
   18 
   19 
   20 Buffer :
   21 --------
   22 
   23 buf.size : the allocated size of the buffer. A message cannot be larger than
   24            this size. In general, a message will even be smaller because the
   25            size is almost always reduced by global.maxrewrite bytes.
   26 
   27 buf.data : memory area containing the part of the message being worked on. This
   28            area is exactly <buf.size> bytes long. It should be seen as a sliding
   29            window over the message, but in terms of implementation, it's closer
   30            to a wrapping window. For ease of processing, new messages (requests
   31            or responses) are aligned to the beginning of the buffer so that they
   32            never wrap and common string processing functions can be used.
   33 
   34 buf.p    : memory pointer (char *) to the beginning of the buffer as the parser
   35            understands it. It commonly refers to the first character of an HTTP
   36            request or response, but during forwarding, it can point to other
   37            locations. This pointer always points to a location in <buf.data>.
   38 
   39 buf.i    : number of bytes after <buf.p> that are available in the buffer. If
   40            <buf.p + buf.i> exceeds <buf.data + buf.size>, then the pending data
   41            wrap at the end of the buffer and continue at <buf.data>.
   42 
   43 buf.o    : number of bytes already processed before <buf.p> that are pending
   44            for departure. These bytes may leave at any instant once a connection
   45            is established. These ones may wrap before <buf.data> to start before
   46            <buf.data + buf.size>.
   47 
   48 It's common to call the part between buf.p and buf.p+buf.i the input buffer, and
   49 the part between buf.p-buf.o and buf.p the output buffer. This design permits
   50 efficient forwarding without copies. As a result, forwarding one byte from the
   51 input buffer to the output buffer only consists in :
   52         - incrementing buf.p
   53         - incrementing buf.o
   54         - decrementing buf.i
   55 
   56 
   57 Message :
   58 ---------
   59 Unless stated otherwise, all values are relative to <buf.p>, and are always
   60 comprised between 0 and <buf.i>. These values are relative offsets and they do
   61 not need to take wrapping into account, they are used as if the buffer was an
   62 infinite length sliding window. The buffer management functions handle the
   63 wrapping automatically.
   64 
   65 msg.next : points to the next byte to inspect. This offset is automatically
   66            adjusted when inserting/removing some headers. In data states, it is
   67            automatically adjusted to the number of bytes already inspected.
   68 
   69 msg.sov  : start of value. First character of the header's value in the header
   70            states, start of the body in the data states. Strictly positive
   71            values indicate that headers were not forwarded yet (<buf.p> is
   72            before the start of the body), and null or negative values are seen
   73            after headers are forwarded (<buf.p> is at or past the start of the
   74            body). The value stops changing when data start to leave the buffer
   75            (in order to avoid integer overflows). So the maximum possible range
   76            is -<buf.size> to +<buf.size>. This offset is automatically adjusted
   77            when inserting or removing some headers. It is useful to rewind the
   78            request buffer to the beginning of the body at any phase. The
   79            response buffer does not really use it since it is immediately
   80            forwarded to the client.
   81 
   82 msg.sol  : start of line. Points to the beginning of the current header line
   83            while parsing headers. It is cleared to zero in the BODY state,
   84            and contains exactly the number of bytes comprising the preceding
   85            chunk size in the DATA state (which can be zero), so that the sum of
   86            msg.sov + msg.sol always points to the beginning of data for all
   87            states starting with DATA. For chunked encoded messages, this sum
   88            always corresponds to the beginning of the current chunk of data as
   89            it appears in the buffer, or to be more precise, it corresponds to
   90            the first of the remaining bytes of chunked data to be inspected. In
   91            TRAILERS state, it contains the length of the last parsed part of
   92            the trailer headers.
   93 
   94 msg.eoh  : end of headers. Points to the CRLF (or LF) preceding the body and
   95            marking the end of headers. It is where new headers are appended.
   96            This offset is automatically adjusted when inserting/removing some
   97            headers. It always contains the size of the headers excluding the
   98            trailing CRLF even after headers have been forwarded.
   99 
  100 msg.eol  : end of line. Points to the CRLF or LF of the current header line
  101            being inspected during the various header states. In data states, it
  102            holds the trailing CRLF length (1 or 2) so that  msg.eoh + msg.eol
  103            always equals the exact header length. It is not affected during data
  104            states nor by forwarding.
  105 
  106 The beginning of the message headers can always be found this way even after
  107 headers or data have been forwarded, provided that everything is still present
  108 in the buffer :
  109 
  110             headers = buf.p + msg->sov - msg->eoh - msg->eol
  111 
  112 
  113 Message length :
  114 ----------------
  115 msg.chunk_len : amount of bytes of the current chunk or total message body
  116                 remaining to be inspected after msg.next. It is automatically
  117                 incremented when parsing a chunk size, and decremented as data
  118                 are forwarded.
  119 
  120 msg.body_len  : total message body length, for logging. Equals Content-Length
  121                 when used, otherwise is the sum of all correctly parsed chunks.
  122 
  123 
  124 Message state :
  125 ---------------
  126 msg.msg_state contains the current parser state, one of HTTP_MSG_*. The state
  127 indicates what byte is expected at msg->next.
  128 
  129 HTTP_MSG_BODY       : all headers have been parsed, parsing of body has not
  130                       started yet.
  131 
  132 HTTP_MSG_100_SENT   : parsing of body has started. If a 100-Continue was needed
  133                       it has already been sent.
  134 
  135 HTTP_MSG_DATA       : some bytes are remaining for either the whole body when
  136                       the message size is determined by Content-Length, or for
  137                       the current chunk in chunked-encoded mode.
  138 
  139 HTTP_MSG_CHUNK_CRLF : msg->next points to the CRLF after the current data chunk.
  140 
  141 HTTP_MSG_TRAILERS   : msg->next points to the beginning of a possibly empty
  142                       trailer line after the final empty chunk.
  143 
  144 HTTP_MSG_DONE       : all the Content-Length data has been inspected, or the
  145                       final CRLF after trailers has been met.
  146 
  147 
  148 Message forwarding :
  149 --------------------
  150 Forwarding part of a message consists in advancing buf.p up to the point where
  151 it points to the byte following the last one to be forwarded. This can be done
  152 inline if enough bytes are present in the buffer, or in multiple steps if more
  153 buffers need to be forwarded (possibly including splicing). Thus by definition,
  154 after a block has been scheduled for being forwarded, msg->next and msg->sov
  155 must be reset.
  156 
  157 The communication channel between the producer and the consumer holds a counter
  158 of extra bytes remaining to be forwarded directly without consulting analysers,
  159 after buf.p. This counter is called to_forward. It commonly holds the advertised
  160 chunk length or content-length that does not fit in the buffer. For example, if
  161 2000 bytes are to be forwarded, and 10 bytes are present after buf.p as reported
  162 by buf.i, then both buf.o and buf.p will advance by 10, buf.i will be reset, and
  163 to_forward will be set to 1990 so that in total, 2000 bytes will be forwarded.
  164 At the end of the forwarding, buf.p will point to the first byte to be inspected
  165 after the 2000 forwarded bytes.