Gerrit Code Review - Unpack error Missing unknown (Continued)
Several months ago I blogged about an issue I encountered administering the Gerrit Code Review system, version 2.3. Sadly, we had a recurrence today. This time, we had to take a different approach to solve it.
The problem manifested in much the same way. All of our gerrit users began seeing the following error message when pushing to Gerrit,
The difference from last time is that this Gerrit installation is not the "owner" of the repository. We use gitosis to manage all of our repositories and keep the repositories replicated across several servers via post-receive hooks. This setup works great for our use case, until it comes to Gerrit. We do not require that all code be code reviewed, consequently not all code gets merged through Gerrit. For this reason, Gerrit can become out of date from the real repository. In order to fit Gerrit into our system configuration, we host one of our replicated repositories on the same server that hosts Gerrit. That repository has a special post-receive hook that pushes with --mirror and --force to the checkout that Gerrit uses. This keeps Gerit up to date with merges happening outside its control.
At some point this afternoon a commit related to an unmerged patch set disappeared from the Gerrit repository. The result was the error message pasted above for anyone trying to push new patch sets into Gerrit.
In this situation, we had no backups of the Gerrit repository. Digging into the Gerrit database, I found a reference to the commit in the the patch_sets table. The change_id lead me to the commit's author. SSH-ing to his machine and inspecting his git checkout, I was able to track down the missing object. Using git cat-file -t 98aab6fe33971a17b1bfb5a5288070e14f166b79, I identified the object was a commit. And upon asking him about it, he said it was the very commit that he had in code review.
I deleted the record from patch_sets, DELETE FROM patch_sets WHERE revision = '98aab6fe33971a17b1bfb5a5288070e14f166b79' LIMIT 1;.
We had him push to Gerrit again. The push was successful and I observed the record was recreated in the patch_sets table. At this point all other pushes from other users began functioning as expected.
Our problems weren't over yet. He was unable to view his review through the Web UI, instead Gerrit showed "Internal Server Error". The following error was showing up in the error_log,
[2013-04-12 19:16:01,526] WARN / : Error in changeDetail
java.lang.NullPointerException
at com.google.gerrit.server.project.ChangeControl.isPatchVisible(ChangeControl.java:177)
We identified that the patch_set_id numbers were out of order in the patch_set table for his change_id. We updated them to be in order, but this time, the change detail page loaded with no content. So far, we've been unable to fix that problem. Instead we ended up fixing the problem by generated a new Change-Id and pushing that as a new review into Gerrit, which worked as expected.
So, two questions remain unanswered:
The problem manifested in much the same way. All of our gerrit users began seeing the following error message when pushing to Gerrit,
fatal: Unpack error, check server log
remote: Resolving deltas: 100% (4/4)
error: unpack failed: error Missing unknown 98aab6fe33971a17b1bfb5a5288070e14f166b79
The difference from last time is that this Gerrit installation is not the "owner" of the repository. We use gitosis to manage all of our repositories and keep the repositories replicated across several servers via post-receive hooks. This setup works great for our use case, until it comes to Gerrit. We do not require that all code be code reviewed, consequently not all code gets merged through Gerrit. For this reason, Gerrit can become out of date from the real repository. In order to fit Gerrit into our system configuration, we host one of our replicated repositories on the same server that hosts Gerrit. That repository has a special post-receive hook that pushes with --mirror and --force to the checkout that Gerrit uses. This keeps Gerit up to date with merges happening outside its control.
At some point this afternoon a commit related to an unmerged patch set disappeared from the Gerrit repository. The result was the error message pasted above for anyone trying to push new patch sets into Gerrit.
In this situation, we had no backups of the Gerrit repository. Digging into the Gerrit database, I found a reference to the commit in the the patch_sets table. The change_id lead me to the commit's author. SSH-ing to his machine and inspecting his git checkout, I was able to track down the missing object. Using git cat-file -t 98aab6fe33971a17b1bfb5a5288070e14f166b79, I identified the object was a commit. And upon asking him about it, he said it was the very commit that he had in code review.
I deleted the record from patch_sets, DELETE FROM patch_sets WHERE revision = '98aab6fe33971a17b1bfb5a5288070e14f166b79' LIMIT 1;.
We had him push to Gerrit again. The push was successful and I observed the record was recreated in the patch_sets table. At this point all other pushes from other users began functioning as expected.
Our problems weren't over yet. He was unable to view his review through the Web UI, instead Gerrit showed "Internal Server Error". The following error was showing up in the error_log,
[2013-04-12 19:16:01,526] WARN / : Error in changeDetail
java.lang.NullPointerException
at com.google.gerrit.server.project.ChangeControl.isPatchVisible(ChangeControl.java:177)
We identified that the patch_set_id numbers were out of order in the patch_set table for his change_id. We updated them to be in order, but this time, the change detail page loaded with no content. So far, we've been unable to fix that problem. Instead we ended up fixing the problem by generated a new Change-Id and pushing that as a new review into Gerrit, which worked as expected.
So, two questions remain unanswered:
- How did we get into this situation?
- How can we fix the problem review?
UPDATE April 15th, 2013
We have tracked the problem down with the missing patch set commit object to a race condition between Gerrit writing the commit to the repository and our external gitosis post-receive hooks pushing an independent commit at the same time. My assumption is that internally JGit is not handling lock files correctly, or getting confused with file descriptors. Read more at https://www.google.com/search?q=jgit+gerrit+broken
We've fixed this by removing the --mirror flag from our post-receive hook, and instead specifying just refs/heads/master. This will reduce the probabilities of race conditions on lock files.
After much fiddling, I was unable to restore UI access to the change review in question. I surrendered to Gerrit and abandoned it over the ssh interface.
After much fiddling, I was unable to restore UI access to the change review in question. I surrendered to Gerrit and abandoned it over the ssh interface.
Comments