"Fossies" - the Fresh Open Source Software Archive

Member "git-2.23.0.windows.1/Documentation/technical/index-format.txt" (16 Aug 2019, 12140 Bytes) of package /windows/misc/git-2.23.0.windows.1.zip:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 Git index format
    2 ================
    3 
    4 == The Git index file has the following format
    5 
    6   All binary numbers are in network byte order. Version 2 is described
    7   here unless stated otherwise.
    8 
    9    - A 12-byte header consisting of
   10 
   11      4-byte signature:
   12        The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
   13 
   14      4-byte version number:
   15        The current supported versions are 2, 3 and 4.
   16 
   17      32-bit number of index entries.
   18 
   19    - A number of sorted index entries (see below).
   20 
   21    - Extensions
   22 
   23      Extensions are identified by signature. Optional extensions can
   24      be ignored if Git does not understand them.
   25 
   26      Git currently supports cached tree and resolve undo extensions.
   27 
   28      4-byte extension signature. If the first byte is 'A'..'Z' the
   29      extension is optional and can be ignored.
   30 
   31      32-bit size of the extension
   32 
   33      Extension data
   34 
   35    - 160-bit SHA-1 over the content of the index file before this
   36      checksum.
   37 
   38 == Index entry
   39 
   40   Index entries are sorted in ascending order on the name field,
   41   interpreted as a string of unsigned bytes (i.e. memcmp() order, no
   42   localization, no special casing of directory separator '/'). Entries
   43   with the same name are sorted by their stage field.
   44 
   45   32-bit ctime seconds, the last time a file's metadata changed
   46     this is stat(2) data
   47 
   48   32-bit ctime nanosecond fractions
   49     this is stat(2) data
   50 
   51   32-bit mtime seconds, the last time a file's data changed
   52     this is stat(2) data
   53 
   54   32-bit mtime nanosecond fractions
   55     this is stat(2) data
   56 
   57   32-bit dev
   58     this is stat(2) data
   59 
   60   32-bit ino
   61     this is stat(2) data
   62 
   63   32-bit mode, split into (high to low bits)
   64 
   65     4-bit object type
   66       valid values in binary are 1000 (regular file), 1010 (symbolic link)
   67       and 1110 (gitlink)
   68 
   69     3-bit unused
   70 
   71     9-bit unix permission. Only 0755 and 0644 are valid for regular files.
   72     Symbolic links and gitlinks have value 0 in this field.
   73 
   74   32-bit uid
   75     this is stat(2) data
   76 
   77   32-bit gid
   78     this is stat(2) data
   79 
   80   32-bit file size
   81     This is the on-disk size from stat(2), truncated to 32-bit.
   82 
   83   160-bit SHA-1 for the represented object
   84 
   85   A 16-bit 'flags' field split into (high to low bits)
   86 
   87     1-bit assume-valid flag
   88 
   89     1-bit extended flag (must be zero in version 2)
   90 
   91     2-bit stage (during merge)
   92 
   93     12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
   94     is stored in this field.
   95 
   96   (Version 3 or later) A 16-bit field, only applicable if the
   97   "extended flag" above is 1, split into (high to low bits).
   98 
   99     1-bit reserved for future
  100 
  101     1-bit skip-worktree flag (used by sparse checkout)
  102 
  103     1-bit intent-to-add flag (used by "git add -N")
  104 
  105     13-bit unused, must be zero
  106 
  107   Entry path name (variable length) relative to top level directory
  108     (without leading slash). '/' is used as path separator. The special
  109     path components ".", ".." and ".git" (without quotes) are disallowed.
  110     Trailing slash is also disallowed.
  111 
  112     The exact encoding is undefined, but the '.' and '/' characters
  113     are encoded in 7-bit ASCII and the encoding cannot contain a NUL
  114     byte (iow, this is a UNIX pathname).
  115 
  116   (Version 4) In version 4, the entry path name is prefix-compressed
  117     relative to the path name for the previous entry (the very first
  118     entry is encoded as if the path name for the previous entry is an
  119     empty string).  At the beginning of an entry, an integer N in the
  120     variable width encoding (the same encoding as the offset is encoded
  121     for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
  122     by a NUL-terminated string S.  Removing N bytes from the end of the
  123     path name for the previous entry, and replacing it with the string S
  124     yields the path name for this entry.
  125 
  126   1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
  127   while keeping the name NUL-terminated.
  128 
  129   (Version 4) In version 4, the padding after the pathname does not
  130   exist.
  131 
  132   Interpretation of index entries in split index mode is completely
  133   different. See below for details.
  134 
  135 == Extensions
  136 
  137 === Cached tree
  138 
  139   Cached tree extension contains pre-computed hashes for trees that can
  140   be derived from the index. It helps speed up tree object generation
  141   from index for a new commit.
  142 
  143   When a path is updated in index, the path must be invalidated and
  144   removed from tree cache.
  145 
  146   The signature for this extension is { 'T', 'R', 'E', 'E' }.
  147 
  148   A series of entries fill the entire extension; each of which
  149   consists of:
  150 
  151   - NUL-terminated path component (relative to its parent directory);
  152 
  153   - ASCII decimal number of entries in the index that is covered by the
  154     tree this entry represents (entry_count);
  155 
  156   - A space (ASCII 32);
  157 
  158   - ASCII decimal number that represents the number of subtrees this
  159     tree has;
  160 
  161   - A newline (ASCII 10); and
  162 
  163   - 160-bit object name for the object that would result from writing
  164     this span of index as a tree.
  165 
  166   An entry can be in an invalidated state and is represented by having
  167   a negative number in the entry_count field. In this case, there is no
  168   object name and the next entry starts immediately after the newline.
  169   When writing an invalid entry, -1 should always be used as entry_count.
  170 
  171   The entries are written out in the top-down, depth-first order.  The
  172   first entry represents the root level of the repository, followed by the
  173   first subtree--let's call this A--of the root level (with its name
  174   relative to the root level), followed by the first subtree of A (with
  175   its name relative to A), ...
  176 
  177 === Resolve undo
  178 
  179   A conflict is represented in the index as a set of higher stage entries.
  180   When a conflict is resolved (e.g. with "git add path"), these higher
  181   stage entries will be removed and a stage-0 entry with proper resolution
  182   is added.
  183 
  184   When these higher stage entries are removed, they are saved in the
  185   resolve undo extension, so that conflicts can be recreated (e.g. with
  186   "git checkout -m"), in case users want to redo a conflict resolution
  187   from scratch.
  188 
  189   The signature for this extension is { 'R', 'E', 'U', 'C' }.
  190 
  191   A series of entries fill the entire extension; each of which
  192   consists of:
  193 
  194   - NUL-terminated pathname the entry describes (relative to the root of
  195     the repository, i.e. full pathname);
  196 
  197   - Three NUL-terminated ASCII octal numbers, entry mode of entries in
  198     stage 1 to 3 (a missing stage is represented by "0" in this field);
  199     and
  200 
  201   - At most three 160-bit object names of the entry in stages from 1 to 3
  202     (nothing is written for a missing stage).
  203 
  204 === Split index
  205 
  206   In split index mode, the majority of index entries could be stored
  207   in a separate file. This extension records the changes to be made on
  208   top of that to produce the final index.
  209 
  210   The signature for this extension is { 'l', 'i', 'n', 'k' }.
  211 
  212   The extension consists of:
  213 
  214   - 160-bit SHA-1 of the shared index file. The shared index file path
  215     is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
  216     index does not require a shared index file.
  217 
  218   - An ewah-encoded delete bitmap, each bit represents an entry in the
  219     shared index. If a bit is set, its corresponding entry in the
  220     shared index will be removed from the final index.  Note, because
  221     a delete operation changes index entry positions, but we do need
  222     original positions in replace phase, it's best to just mark
  223     entries for removal, then do a mass deletion after replacement.
  224 
  225   - An ewah-encoded replace bitmap, each bit represents an entry in
  226     the shared index. If a bit is set, its corresponding entry in the
  227     shared index will be replaced with an entry in this index
  228     file. All replaced entries are stored in sorted order in this
  229     index. The first "1" bit in the replace bitmap corresponds to the
  230     first index entry, the second "1" bit to the second entry and so
  231     on. Replaced entries may have empty path names to save space.
  232 
  233   The remaining index entries after replaced ones will be added to the
  234   final index. These added entries are also sorted by entry name then
  235   stage.
  236 
  237 == Untracked cache
  238 
  239   Untracked cache saves the untracked file list and necessary data to
  240   verify the cache. The signature for this extension is { 'U', 'N',
  241   'T', 'R' }.
  242 
  243   The extension starts with
  244 
  245   - A sequence of NUL-terminated strings, preceded by the size of the
  246     sequence in variable width encoding. Each string describes the
  247     environment where the cache can be used.
  248 
  249   - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
  250     ctime field until "file size".
  251 
  252   - Stat data of core.excludesfile
  253 
  254   - 32-bit dir_flags (see struct dir_struct)
  255 
  256   - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
  257     does not exist.
  258 
  259   - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
  260     not exist.
  261 
  262   - NUL-terminated string of per-dir exclude file name. This usually
  263     is ".gitignore".
  264 
  265   - The number of following directory blocks, variable width
  266     encoding. If this number is zero, the extension ends here with a
  267     following NUL.
  268 
  269   - A number of directory blocks in depth-first-search order, each
  270     consists of
  271 
  272     - The number of untracked entries, variable width encoding.
  273 
  274     - The number of sub-directory blocks, variable width encoding.
  275 
  276     - The directory name terminated by NUL.
  277 
  278     - A number of untracked file/dir names terminated by NUL.
  279 
  280 The remaining data of each directory block is grouped by type:
  281 
  282   - An ewah bitmap, the n-th bit marks whether the n-th directory has
  283     valid untracked cache entries.
  284 
  285   - An ewah bitmap, the n-th bit records "check-only" bit of
  286     read_directory_recursive() for the n-th directory.
  287 
  288   - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
  289     is valid for the n-th directory and exists in the next data.
  290 
  291   - An array of stat data. The n-th data corresponds with the n-th
  292     "one" bit in the previous ewah bitmap.
  293 
  294   - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
  295     in the previous ewah bitmap.
  296 
  297   - One NUL.
  298 
  299 == File System Monitor cache
  300 
  301   The file system monitor cache tracks files for which the core.fsmonitor
  302   hook has told us about changes.  The signature for this extension is
  303   { 'F', 'S', 'M', 'N' }.
  304 
  305   The extension starts with
  306 
  307   - 32-bit version number: the current supported version is 1.
  308 
  309   - 64-bit time: the extension data reflects all changes through the given
  310 	time which is stored as the nanoseconds elapsed since midnight,
  311 	January 1, 1970.
  312 
  313   - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
  314 
  315   - An ewah bitmap, the n-th bit indicates whether the n-th index entry
  316     is not CE_FSMONITOR_VALID.
  317 
  318 == End of Index Entry
  319 
  320   The End of Index Entry (EOIE) is used to locate the end of the variable
  321   length index entries and the begining of the extensions. Code can take
  322   advantage of this to quickly locate the index extensions without having
  323   to parse through all of the index entries.
  324 
  325   Because it must be able to be loaded before the variable length cache
  326   entries and other index extensions, this extension must be written last.
  327   The signature for this extension is { 'E', 'O', 'I', 'E' }.
  328 
  329   The extension consists of:
  330 
  331   - 32-bit offset to the end of the index entries
  332 
  333   - 160-bit SHA-1 over the extension types and their sizes (but not
  334 	their contents).  E.g. if we have "TREE" extension that is N-bytes
  335 	long, "REUC" extension that is M-bytes long, followed by "EOIE",
  336 	then the hash would be:
  337 
  338 	SHA-1("TREE" + <binary representation of N> +
  339 		"REUC" + <binary representation of M>)
  340 
  341 == Index Entry Offset Table
  342 
  343   The Index Entry Offset Table (IEOT) is used to help address the CPU
  344   cost of loading the index by enabling multi-threading the process of
  345   converting cache entries from the on-disk format to the in-memory format.
  346   The signature for this extension is { 'I', 'E', 'O', 'T' }.
  347 
  348   The extension consists of:
  349 
  350   - 32-bit version (currently 1)
  351 
  352   - A number of index offset entries each consisting of:
  353 
  354     - 32-bit offset from the begining of the file to the first cache entry
  355 	in this block of entries.
  356 
  357     - 32-bit count of cache entries in this block