"Fossies" - the Fresh Open Source Software Archive

Member "howto/recover-corrupted-blob-object.txt" (15 Dec 2018, 5510 Bytes) of package /linux/misc/git-htmldocs-2.20.1.tar.xz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 Date: Fri, 9 Nov 2007 08:28:38 -0800 (PST)
    2 From: Linus Torvalds <torvalds@linux-foundation.org>
    3 Subject: corrupt object on git-gc
    4 Abstract: Some tricks to reconstruct blob objects in order to fix
    5  a corrupted repository.
    6 Content-type: text/asciidoc
    7 
    8 How to recover a corrupted blob object
    9 ======================================
   10 
   11 -----------------------------------------------------------
   12 On Fri, 9 Nov 2007, Yossi Leybovich wrote:
   13 >
   14 > Did not help still the repository look for this object?
   15 > Any one know how can I track this object and understand which file is it
   16 -----------------------------------------------------------
   17 
   18 So exactly *because* the SHA-1 hash is cryptographically secure, the hash
   19 itself doesn't actually tell you anything, in order to fix a corrupt
   20 object you basically have to find the "original source" for it.
   21 
   22 The easiest way to do that is almost always to have backups, and find the
   23 same object somewhere else. Backups really are a good idea, and Git makes
   24 it pretty easy (if nothing else, just clone the repository somewhere else,
   25 and make sure that you do *not* use a hard-linked clone, and preferably
   26 not the same disk/machine).
   27 
   28 But since you don't seem to have backups right now, the good news is that
   29 especially with a single blob being corrupt, these things *are* somewhat
   30 debuggable.
   31 
   32 First off, move the corrupt object away, and *save* it. The most common
   33 cause of corruption so far has been memory corruption, but even so, there
   34 are people who would be interested in seeing the corruption - but it's
   35 basically impossible to judge the corruption until we can also see the
   36 original object, so right now the corrupt object is useless, but it's very
   37 interesting for the future, in the hope that you can re-create a
   38 non-corrupt version.
   39 
   40 -----------------------------------------------------------
   41 So:
   42 
   43 > ib]$ mv .git/objects/4b/9458b3786228369c63936db65827de3cc06200 ../
   44 -----------------------------------------------------------
   45 
   46 This is the right thing to do, although it's usually best to save it under
   47 it's full SHA-1 name (you just dropped the "4b" from the result ;).
   48 
   49 Let's see what that tells us:
   50 
   51 -----------------------------------------------------------
   52 > ib]$ git-fsck --full
   53 > broken link from    tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8
   54 >              to    blob 4b9458b3786228369c63936db65827de3cc06200
   55 > missing blob 4b9458b3786228369c63936db65827de3cc06200
   56 -----------------------------------------------------------
   57 
   58 Ok, I removed the "dangling commit" messages, because they are just
   59 messages about the fact that you probably have rebased etc, so they're not
   60 at all interesting. But what remains is still very useful. In particular,
   61 we now know which tree points to it!
   62 
   63 Now you can do
   64 
   65 	git ls-tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8
   66 
   67 which will show something like
   68 
   69 	100644 blob 8d14531846b95bfa3564b58ccfb7913a034323b8    .gitignore
   70 	100644 blob ebf9bf84da0aab5ed944264a5db2a65fe3a3e883    .mailmap
   71 	100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c    COPYING
   72 	100644 blob ee909f2cc49e54f0799a4739d24c4cb9151ae453    CREDITS
   73 	040000 tree 0f5f709c17ad89e72bdbbef6ea221c69807009f6    Documentation
   74 	100644 blob 1570d248ad9237e4fa6e4d079336b9da62d9ba32    Kbuild
   75 	100644 blob 1c7c229a092665b11cd46a25dbd40feeb31661d9    MAINTAINERS
   76 	...
   77 
   78 and you should now have a line that looks like
   79 
   80 	10064 blob 4b9458b3786228369c63936db65827de3cc06200	my-magic-file
   81 
   82 in the output. This already tells you a *lot* it tells you what file the
   83 corrupt blob came from!
   84 
   85 Now, it doesn't tell you quite enough, though: it doesn't tell what
   86 *version* of the file didn't get correctly written! You might be really
   87 lucky, and it may be the version that you already have checked out in your
   88 working tree, in which case fixing this problem is really simple, just do
   89 
   90 	git hash-object -w my-magic-file
   91 
   92 again, and if it outputs the missing SHA-1 (4b945..) you're now all done!
   93 
   94 But that's the really lucky case, so let's assume that it was some older
   95 version that was broken. How do you tell which version it was?
   96 
   97 The easiest way to do it is to do
   98 
   99 	git log --raw --all --full-history -- subdirectory/my-magic-file
  100 
  101 and that will show you the whole log for that file (please realize that
  102 the tree you had may not be the top-level tree, so you need to figure out
  103 which subdirectory it was in on your own), and because you're asking for
  104 raw output, you'll now get something like
  105 
  106 	commit abc
  107 	Author:
  108 	Date:
  109 	  ..
  110 	:100644 100644 4b9458b... newsha... M  somedirectory/my-magic-file
  111 
  112 
  113 	commit xyz
  114 	Author:
  115 	Date:
  116 
  117 	  ..
  118 	:100644 100644 oldsha... 4b9458b... M	somedirectory/my-magic-file
  119 
  120 and this actually tells you what the *previous* and *subsequent* versions
  121 of that file were! So now you can look at those ("oldsha" and "newsha"
  122 respectively), and hopefully you have done commits often, and can
  123 re-create the missing my-magic-file version by looking at those older and
  124 newer versions!
  125 
  126 If you can do that, you can now recreate the missing object with
  127 
  128 	git hash-object -w <recreated-file>
  129 
  130 and your repository is good again!
  131 
  132 (Btw, you could have ignored the fsck, and started with doing a
  133 
  134 	git log --raw --all
  135 
  136 and just looked for the sha of the missing object (4b9458b..) in that
  137 whole thing. It's up to you - Git does *have* a lot of information, it is
  138 just missing one particular blob version.
  139 
  140 Trying to recreate trees and especially commits is *much* harder. So you
  141 were lucky that it's a blob. It's quite possible that you can recreate the
  142 thing.
  143 
  144 			Linus