"Fossies" - the Fresh Open Source Software Archive

Member "git-2.23.0.windows.1/Documentation/technical/pack-format.txt" (16 Aug 2019, 11944 Bytes) of package /windows/misc/git-2.23.0.windows.1.zip:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 Git pack format
    2 ===============
    3 
    4 == pack-*.pack files have the following format:
    5 
    6    - A header appears at the beginning and consists of the following:
    7 
    8      4-byte signature:
    9          The signature is: {'P', 'A', 'C', 'K'}
   10 
   11      4-byte version number (network byte order):
   12 	 Git currently accepts version number 2 or 3 but
   13          generates version 2 only.
   14 
   15      4-byte number of objects contained in the pack (network byte order)
   16 
   17      Observation: we cannot have more than 4G versions ;-) and
   18      more than 4G objects in a pack.
   19 
   20    - The header is followed by number of object entries, each of
   21      which looks like this:
   22 
   23      (undeltified representation)
   24      n-byte type and length (3-bit type, (n-1)*7+4-bit length)
   25      compressed data
   26 
   27      (deltified representation)
   28      n-byte type and length (3-bit type, (n-1)*7+4-bit length)
   29      20-byte base object name if OBJ_REF_DELTA or a negative relative
   30 	 offset from the delta object's position in the pack if this
   31 	 is an OBJ_OFS_DELTA object
   32      compressed delta data
   33 
   34      Observation: length of each object is encoded in a variable
   35      length format and is not constrained to 32-bit or anything.
   36 
   37   - The trailer records 20-byte SHA-1 checksum of all of the above.
   38 
   39 === Object types
   40 
   41 Valid object types are:
   42 
   43 - OBJ_COMMIT (1)
   44 - OBJ_TREE (2)
   45 - OBJ_BLOB (3)
   46 - OBJ_TAG (4)
   47 - OBJ_OFS_DELTA (6)
   48 - OBJ_REF_DELTA (7)
   49 
   50 Type 5 is reserved for future expansion. Type 0 is invalid.
   51 
   52 === Deltified representation
   53 
   54 Conceptually there are only four object types: commit, tree, tag and
   55 blob. However to save space, an object could be stored as a "delta" of
   56 another "base" object. These representations are assigned new types
   57 ofs-delta and ref-delta, which is only valid in a pack file.
   58 
   59 Both ofs-delta and ref-delta store the "delta" to be applied to
   60 another object (called 'base object') to reconstruct the object. The
   61 difference between them is, ref-delta directly encodes 20-byte base
   62 object name. If the base object is in the same pack, ofs-delta encodes
   63 the offset of the base object in the pack instead.
   64 
   65 The base object could also be deltified if it's in the same pack.
   66 Ref-delta can also refer to an object outside the pack (i.e. the
   67 so-called "thin pack"). When stored on disk however, the pack should
   68 be self contained to avoid cyclic dependency.
   69 
   70 The delta data is a sequence of instructions to reconstruct an object
   71 from the base object. If the base object is deltified, it must be
   72 converted to canonical form first. Each instruction appends more and
   73 more data to the target object until it's complete. There are two
   74 supported instructions so far: one for copy a byte range from the
   75 source object and one for inserting new data embedded in the
   76 instruction itself.
   77 
   78 Each instruction has variable length. Instruction type is determined
   79 by the seventh bit of the first octet. The following diagrams follow
   80 the convention in RFC 1951 (Deflate compressed data format).
   81 
   82 ==== Instruction to copy from base object
   83 
   84   +----------+---------+---------+---------+---------+-------+-------+-------+
   85   | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
   86   +----------+---------+---------+---------+---------+-------+-------+-------+
   87 
   88 This is the instruction format to copy a byte range from the source
   89 object. It encodes the offset to copy from and the number of bytes to
   90 copy. Offset and size are in little-endian order.
   91 
   92 All offset and size bytes are optional. This is to reduce the
   93 instruction size when encoding small offsets or sizes. The first seven
   94 bits in the first octet determines which of the next seven octets is
   95 present. If bit zero is set, offset1 is present. If bit one is set
   96 offset2 is present and so on.
   97 
   98 Note that a more compact instruction does not change offset and size
   99 encoding. For example, if only offset2 is omitted like below, offset3
  100 still contains bits 16-23. It does not become offset2 and contains
  101 bits 8-15 even if it's right next to offset1.
  102 
  103   +----------+---------+---------+
  104   | 10000101 | offset1 | offset3 |
  105   +----------+---------+---------+
  106 
  107 In its most compact form, this instruction only takes up one byte
  108 (0x80) with both offset and size omitted, which will have default
  109 values zero. There is another exception: size zero is automatically
  110 converted to 0x10000.
  111 
  112 ==== Instruction to add new data
  113 
  114   +----------+============+
  115   | 0xxxxxxx |    data    |
  116   +----------+============+
  117 
  118 This is the instruction to construct target object without the base
  119 object. The following data is appended to the target object. The first
  120 seven bits of the first octet determines the size of data in
  121 bytes. The size must be non-zero.
  122 
  123 ==== Reserved instruction
  124 
  125   +----------+============
  126   | 00000000 |
  127   +----------+============
  128 
  129 This is the instruction reserved for future expansion.
  130 
  131 == Original (version 1) pack-*.idx files have the following format:
  132 
  133   - The header consists of 256 4-byte network byte order
  134     integers.  N-th entry of this table records the number of
  135     objects in the corresponding pack, the first byte of whose
  136     object name is less than or equal to N.  This is called the
  137     'first-level fan-out' table.
  138 
  139   - The header is followed by sorted 24-byte entries, one entry
  140     per object in the pack.  Each entry is:
  141 
  142     4-byte network byte order integer, recording where the
  143     object is stored in the packfile as the offset from the
  144     beginning.
  145 
  146     20-byte object name.
  147 
  148   - The file is concluded with a trailer:
  149 
  150     A copy of the 20-byte SHA-1 checksum at the end of
  151     corresponding packfile.
  152 
  153     20-byte SHA-1-checksum of all of the above.
  154 
  155 Pack Idx file:
  156 
  157 	--  +--------------------------------+
  158 fanout	    | fanout[0] = 2 (for example)    |-.
  159 table	    +--------------------------------+ |
  160 	    | fanout[1]                      | |
  161 	    +--------------------------------+ |
  162 	    | fanout[2]                      | |
  163 	    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
  164 	    | fanout[255] = total objects    |---.
  165 	--  +--------------------------------+ | |
  166 main	    | offset                         | | |
  167 index	    | object name 00XXXXXXXXXXXXXXXX | | |
  168 table	    +--------------------------------+ | |
  169 	    | offset                         | | |
  170 	    | object name 00XXXXXXXXXXXXXXXX | | |
  171 	    +--------------------------------+<+ |
  172 	  .-| offset                         |   |
  173 	  | | object name 01XXXXXXXXXXXXXXXX |   |
  174 	  | +--------------------------------+   |
  175 	  | | offset                         |   |
  176 	  | | object name 01XXXXXXXXXXXXXXXX |   |
  177 	  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   |
  178 	  | | offset                         |   |
  179 	  | | object name FFXXXXXXXXXXXXXXXX |   |
  180 	--| +--------------------------------+<--+
  181 trailer	  | | packfile checksum              |
  182 	  | +--------------------------------+
  183 	  | | idxfile checksum               |
  184 	  | +--------------------------------+
  185           .-------.
  186                   |
  187 Pack file entry: <+
  188 
  189      packed object header:
  190 	1-byte size extension bit (MSB)
  191 	       type (next 3 bit)
  192 	       size0 (lower 4-bit)
  193         n-byte sizeN (as long as MSB is set, each 7-bit)
  194 		size0..sizeN form 4+7+7+..+7 bit integer, size0
  195 		is the least significant part, and sizeN is the
  196 		most significant part.
  197      packed object data:
  198         If it is not DELTA, then deflated bytes (the size above
  199 		is the size before compression).
  200 	If it is REF_DELTA, then
  201 	  20-byte base object name SHA-1 (the size above is the
  202 		size of the delta data that follows).
  203           delta data, deflated.
  204 	If it is OFS_DELTA, then
  205 	  n-byte offset (see below) interpreted as a negative
  206 		offset from the type-byte of the header of the
  207 		ofs-delta entry (the size above is the size of
  208 		the delta data that follows).
  209 	  delta data, deflated.
  210 
  211      offset encoding:
  212 	  n bytes with MSB set in all but the last one.
  213 	  The offset is then the number constructed by
  214 	  concatenating the lower 7 bit of each byte, and
  215 	  for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
  216 	  to the result.
  217 
  218 
  219 
  220 == Version 2 pack-*.idx files support packs larger than 4 GiB, and
  221    have some other reorganizations.  They have the format:
  222 
  223   - A 4-byte magic number '\377tOc' which is an unreasonable
  224     fanout[0] value.
  225 
  226   - A 4-byte version number (= 2)
  227 
  228   - A 256-entry fan-out table just like v1.
  229 
  230   - A table of sorted 20-byte SHA-1 object names.  These are
  231     packed together without offset values to reduce the cache
  232     footprint of the binary search for a specific object name.
  233 
  234   - A table of 4-byte CRC32 values of the packed object data.
  235     This is new in v2 so compressed data can be copied directly
  236     from pack to pack during repacking without undetected
  237     data corruption.
  238 
  239   - A table of 4-byte offset values (in network byte order).
  240     These are usually 31-bit pack file offsets, but large
  241     offsets are encoded as an index into the next table with
  242     the msbit set.
  243 
  244   - A table of 8-byte offset entries (empty for pack files less
  245     than 2 GiB).  Pack files are organized with heavily used
  246     objects toward the front, so most object references should
  247     not need to refer to this table.
  248 
  249   - The same trailer as a v1 pack file:
  250 
  251     A copy of the 20-byte SHA-1 checksum at the end of
  252     corresponding packfile.
  253 
  254     20-byte SHA-1-checksum of all of the above.
  255 
  256 == multi-pack-index (MIDX) files have the following format:
  257 
  258 The multi-pack-index files refer to multiple pack-files and loose objects.
  259 
  260 In order to allow extensions that add extra data to the MIDX, we organize
  261 the body into "chunks" and provide a lookup table at the beginning of the
  262 body. The header includes certain length values, such as the number of packs,
  263 the number of base MIDX files, hash lengths and types.
  264 
  265 All 4-byte numbers are in network order.
  266 
  267 HEADER:
  268 
  269 	4-byte signature:
  270 	    The signature is: {'M', 'I', 'D', 'X'}
  271 
  272 	1-byte version number:
  273 	    Git only writes or recognizes version 1.
  274 
  275 	1-byte Object Id Version
  276 	    Git only writes or recognizes version 1 (SHA1).
  277 
  278 	1-byte number of "chunks"
  279 
  280 	1-byte number of base multi-pack-index files:
  281 	    This value is currently always zero.
  282 
  283 	4-byte number of pack files
  284 
  285 CHUNK LOOKUP:
  286 
  287 	(C + 1) * 12 bytes providing the chunk offsets:
  288 	    First 4 bytes describe chunk id. Value 0 is a terminating label.
  289 	    Other 8 bytes provide offset in current file for chunk to start.
  290 	    (Chunks are provided in file-order, so you can infer the length
  291 	    using the next chunk position if necessary.)
  292 
  293 	The remaining data in the body is described one chunk at a time, and
  294 	these chunks may be given in any order. Chunks are required unless
  295 	otherwise specified.
  296 
  297 CHUNK DATA:
  298 
  299 	Packfile Names (ID: {'P', 'N', 'A', 'M'})
  300 	    Stores the packfile names as concatenated, null-terminated strings.
  301 	    Packfiles must be listed in lexicographic order for fast lookups by
  302 	    name. This is the only chunk not guaranteed to be a multiple of four
  303 	    bytes in length, so should be the last chunk for alignment reasons.
  304 
  305 	OID Fanout (ID: {'O', 'I', 'D', 'F'})
  306 	    The ith entry, F[i], stores the number of OIDs with first
  307 	    byte at most i. Thus F[255] stores the total
  308 	    number of objects.
  309 
  310 	OID Lookup (ID: {'O', 'I', 'D', 'L'})
  311 	    The OIDs for all objects in the MIDX are stored in lexicographic
  312 	    order in this chunk.
  313 
  314 	Object Offsets (ID: {'O', 'O', 'F', 'F'})
  315 	    Stores two 4-byte values for every object.
  316 	    1: The pack-int-id for the pack storing this object.
  317 	    2: The offset within the pack.
  318 		If all offsets are less than 2^31, then the large offset chunk
  319 		will not exist and offsets are stored as in IDX v1.
  320 		If there is at least one offset value larger than 2^32-1, then
  321 		the large offset chunk must exist. If the large offset chunk
  322 		exists and the 31st bit is on, then removing that bit reveals
  323 		the row in the large offsets containing the 8-byte offset of
  324 		this object.
  325 
  326 	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
  327 	    8-byte offsets into large packfiles.
  328 
  329 TRAILER:
  330 
  331 	20-byte SHA1-checksum of the above contents.