Scan on git blobless clone failed

To improve speed of our monorepo scan, I tried to use blobless clone with git. The issue is even if git supports on-demand blod download, jgit seems to not support it.

For performance improvement, it would be nice that SonarCloud supports blobless clone.

  • ALM used: GitHub
  • CI system used: Azure DevOps, GitHub Actions
  • Scanner command used when applicable: sonarsource/sonarcloud-github-action@master
  • Languages of the repository: Typescript
  • Private repo
  • Error observed:
    java.lang.IllegalStateException: org.eclipse.jgit.errors.MissingObjectException: Missing blob xxxxxxxxxxxxxxx
  • Steps to reproduce
    • git clone --filter=blob:none
  • Potential workaround: no filter

Hey @eboureau

My (perhaps mistaken) understanding is that we only fallback on JGit now instead of using native git (still not sure how a blobless clone would affect results). Can you share a wider range of logs that show where the exception occured?

Thanks!

Hi,

Here is the full error stack, my assumption is that jgit does not download missing blob on-demand like git does.

java.lang.IllegalStateException: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: org.eclipse.jgit.errors.MissingObjectException: Missing blob d2b03953e8486d56337daaf1ca452dd1ab6499c5
	at org.sonar.scm.git.blame.FileBlamer.waitForTasks(FileBlamer.java:238)
	at org.sonar.scm.git.blame.FileBlamer.blameWithFileDiffs(FileBlamer.java:213)
	at org.sonar.scm.git.blame.FileBlamer.blameParent(FileBlamer.java:133)
	at org.sonar.scm.git.blame.BlameGenerator.process(BlameGenerator.java:149)
	at org.sonar.scm.git.blame.BlameGenerator.generateBlame(BlameGenerator.java:127)
	at org.sonar.scm.git.blame.RepositoryBlameCommand.call(RepositoryBlameCommand.java:123)
	at org.sonar.scm.git.CompositeBlameCommand.blameWithFilesGitCommand(CompositeBlameCommand.java:187)
	at org.sonar.scm.git.CompositeBlameCommand.blame(CompositeBlameCommand.java:84)
	at org.sonar.scanner.scm.ScmPublisher.publish(ScmPublisher.java:76)
	at org.sonar.scanner.scan.ProjectScanContainer.doAfterStart(ProjectScanContainer.java:152)
	at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:123)
	at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:109)
	at org.sonar.scanner.bootstrap.ScannerContainer.doAfterStart(ScannerContainer.java:399)
	at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:123)
	at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:109)
	at org.sonar.scanner.bootstrap.GlobalContainer.doAfterStart(GlobalContainer.java:127)
	at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:123)
	at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:109)
	at org.sonar.batch.bootstrapper.Batch.doExecute(Batch.java:57)
	at org.sonar.batch.bootstrapper.Batch.execute(Batch.java:51)
	at org.sonarsource.scanner.api.internal.batch.BatchIsolatedLauncher.execute(BatchIsolatedLauncher.java:46)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.sonarsource.scanner.api.internal.IsolatedLauncherProxy.invoke(IsolatedLauncherProxy.java:60)
	at jdk.proxy1/jdk.proxy1.$Proxy0.execute(Unknown Source)
	at org.sonarsource.scanner.api.EmbeddedScanner.doExecute(EmbeddedScanner.java:189)
	at org.sonarsource.scanner.api.EmbeddedScanner.execute(EmbeddedScanner.java:138)
	at org.sonarsource.scanner.cli.Main.execute(Main.java:126)
	at org.sonarsource.scanner.cli.Main.execute(Main.java:81)
	at org.sonarsource.scanner.cli.Main.main(Main.java:62)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: org.eclipse.jgit.errors.MissingObjectException: Missing blob d2b03953e8486d56337daaf1ca452dd1ab6499c5
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.sonar.scm.git.blame.FileBlamer.waitForTasks(FileBlamer.java:232)
	... 31 more
Caused by: java.lang.IllegalStateException: org.eclipse.jgit.errors.MissingObjectException: Missing blob d2b03953e8486d56337daaf1ca452dd1ab6499c5
	at org.sonar.scm.git.blame.BlobReader.loadText(BlobReader.java:63)
	at org.sonar.scm.git.blame.FileBlamer.splitBlameWithParent(FileBlamer.java:258)
	at org.sonar.scm.git.blame.FileBlamer.lambda$blameWithFileDiffs$0(FileBlamer.java:202)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing blob d2b03953e8486d56337daaf1ca452dd1ab6499c5
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:145)
	at org.sonar.scm.git.blame.BlobReader.loadText(BlobReader.java:69)
	at org.sonar.scm.git.blame.BlobReader.loadText(BlobReader.java:60)

Just being able to see a little bit before this part of the logs would help a lot.

Here is the logs before the error:

INFO: Enabled taint analysis rules: S2083, S6350, S2076, S6096, S5146, S5696, S5883, S6105, S5147, S5144, S5334, S3649, S5131, S2631, S6287
INFO: Load type hierarchy and UCFGs: Starting
INFO: Load type hierarchy: Starting
INFO: Reading type hierarchy from: /github/workspace/.scannerwork/ucfg2/js
INFO: Read 0 type definitions
INFO: Load type hierarchy: Time spent was 00:00:00.000
INFO: Load UCFGs: Starting
INFO: Load UCFGs: Time spent was 00:00:00.000
INFO: Load type hierarchy and UCFGs: Time spent was 00:00:00.001
INFO: No UCFGs have been included for analysis.
INFO: js security sensor: Time spent was 00:00:00.002
INFO: Sensor JsSecuritySensor [security] (done) | time=5ms
INFO: ------------- Run sensors on project
INFO: Sensor Zero Coverage Sensor
INFO: Sensor Zero Coverage Sensor (done) | time=6ms
INFO: SCM Publisher SCM provider for this project is: git
INFO: SCM Publisher 101 source files to be analyzed
INFO: SCM Publisher 0/101 source files have been analyzed (done) | time=1397ms
INFO: ------------------------------------------------------------------------
INFO: EXECUTION FAILURE
INFO: ------------------------------------------------------------------------
INFO: Total time: 2:36.099s
INFO: Final Memory: 18M/67M
INFO: ------------------------------------------------------------------------
ERROR: Error during SonarScanner execution
java.lang.IllegalStateException: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: org.eclipse.jgit.errors.MissingObjectException: Missing blob d2b03953e8486d56337daaf1ca452dd1ab6499c5

Thanks!

I have a feeling if we turn on DEBUG logging (sonar-scanner -X), we’ll see this:

Native git blame failed. Falling back to jgit:

Which would mean that even Native git isn’t able to retrieve the required blame information when using a blobless clone.

However, oddly enough, I tried an analysis of the following repo:

git clone --filter=blob:none https://github.com/SonarSource/sonar-scanning-examples

and had no problem executing a scan, including blame data. Am I missing a step?

I think the issue will occur only when the branches have diverged, meaning that the diff is requiring to get an old blob. This is why it is maybe not that easy to reproduce.

Is there any solution for this type of issue @Colin ? We’re seeing something similar with some .NET scans for SonarCloud.

1 Like

Hi folks,

I have been able to reproduce the issue. JGit (the library we use to collect blame) doesn’t support on-demand loading of blobs. FYI I created a feature request, but I don’t think this is our best solution anyway.
The SonarCloud scanner requires to blame all the files in order to properly date issues and assign them.
The native Git CLI is able to load blobs on-demand, but at the cost of a huge performance penalty. It might be ok to blame only a few files, but when a lot of files are involved, I am not sure the time saved during clone will compensate.

So I suggest you disable partial cloning in your CI for the task running the SonarCloud analysis.

Alternatively, you can force the SonarCloud scanner to use the native Git CLI, but again, I am pretty sure performance will be bad. To do that, you should pass the property sonar.scm.use.blame.algorithm=GIT_NATIVE_BLAME to the scanner.

2 Likes

Awesome, thank you for the help @Julien_HENRY - we’re going to give that a try!

Thanks a lot, I’ll give it a try.

We have this problem on CIRCLECI. Our current workaround is adding a step before calling the sonar plugin:

        git config remote.origin.partialclonefilter ""
        git config remote.origin.promisor ""
        git fetch --refetch
3 Likes

If your running into problems with this while working in CircleCI specifically, they can disable the blobless clones for your org/projects. However, it needs to be requested by email to one of their employees. You can learn more here [Product Update] Speeding up code checkout - #10 by labsquared - Build Environment - CircleCI Discuss

You might be able to do it in a single command (tested with Git 2.43.2):

git fetch --no-filter --refetch

I would like SonarScanner to not require git blame at all, this doesn’t fit well with common CI practices.

If there is functionality to date and assign issues, then can it be optional?