Preprocessing files is taking more than 1 hour

Must-share information (formatted with Markdown):

  • which versions are you using (SonarQube, Scanner, Plugin, and any relevant extension) - SonarQube 10.6.92116 Community Edition, SonarScanner CLI 6.1.0.4477
  • how is SonarQube deployed: zip, Docker, Helm - Helm
  • what are you trying to achieve - Incremental scans for Gosu Plugin
  • what have you tried so far to achieve this - Disabled SCM Sensor

Do not share screenshots of logs – share the text itself (bonus points for being well-formatted)!

09:02:56.842 INFO  Scanner configuration file: C:\Users\sbaradia\Downloads\sonar-scanner-6.1.0.4477-windows-x64\bin\..\conf\sonar-scanner.properties
09:02:56.851 INFO  Project root configuration file: NONE
09:02:56.870 INFO  SonarScanner CLI 6.1.0.4477
09:02:56.873 INFO  Java 17.0.11 Eclipse Adoptium (64-bit)
09:02:56.874 INFO  Windows 10 10.0 amd64
09:02:56.899 INFO  User cache: C:\Users\sbaradia\.sonar\cache
09:02:57.660 INFO  JRE provisioning: os[windows], arch[amd64]
09:02:58.226 INFO  Communicating with SonarQube Server 10.6.0.92116
09:02:59.036 INFO  Starting SonarScanner Engine...
09:02:59.037 INFO  Java 17.0.11 Eclipse Adoptium (64-bit)
09:03:00.214 INFO  Load global settings
09:03:00.338 INFO  Load global settings (done) | time=123ms
09:03:00.344 INFO  Server id: 147B411E-AZHC-J_JOkzgHrx9qbkd
09:03:00.361 INFO  Loading required plugins
09:03:00.362 INFO  Load plugins index
09:03:00.377 INFO  Load plugins index (done) | time=15ms
09:03:00.382 INFO  Load/download plugins
09:03:00.474 INFO  Load/download plugins (done) | time=97ms
09:03:00.876 INFO  Process project properties
09:03:00.887 INFO  Process project properties (done) | time=11ms
09:03:01.690 INFO  Project key: GPC
09:03:01.692 INFO  Base dir: C:\GitRepo\Guidewire\policycenter
09:03:01.697 INFO  Working dir: C:\GitRepo\Guidewire\policycenter\.scannerwork
09:03:01.704 INFO  Load project settings for component key: 'GPC'
09:03:01.729 INFO  Load project settings for component key: 'GPC' (done) | time=25ms
09:03:01.771 INFO  Load quality profiles
09:03:01.874 INFO  Load quality profiles (done) | time=104ms
09:03:01.909 INFO  Load active rules
09:03:09.455 INFO  Load active rules (done) | time=7545ms
09:03:09.461 INFO  Load analysis cache
09:03:09.479 INFO  Load analysis cache (2.0 kB) | time=19ms
09:03:09.526 WARN  The property 'sonar.login' is deprecated and will be removed in the future. Please use the 'sonar.token' property instead when passing a token.
09:03:09.548 INFO  Preprocessing files...
09:03:19.553 INFO
09:03:29.564 INFO
09:03:39.575 INFO
09:03:49.587 INFO
09:03:59.601 INFO
09:04:09.607 INFO
09:04:19.614 INFO
09:04:29.617 INFO
09:04:39.631 INFO
09:04:49.635 INFO
09:04:59.647 INFO
09:05:09.653 INFO
09:05:19.662 INFO
09:05:29.666 INFO
09:05:39.679 INFO
09:05:49.683 INFO
09:05:59.692 INFO
09:06:09.701 INFO
09:06:19.704 INFO
09:06:29.711 INFO
.....
....
10:03:09.446 INFO  1 language detected in 1 preprocessed file
10:03:09.447 INFO  515492 files ignored because of inclusion/exclusion patterns
10:03:09.451 INFO  Loading plugins for detected languages
10:03:09.451 INFO  Load/download plugins
10:03:09.452 INFO  Load/download plugins (done) | time=0ms
10:03:09.510 INFO  Load project repositories
10:03:09.539 INFO  Load project repositories (done) | time=28ms
10:03:09.555 INFO  Indexing files...
10:03:09.555 INFO  Project configuration:
10:03:09.556 INFO    Included sources: modules/configuration/gsrc/com/wrberkley/document/PrintUtilServiceImpl.gs
10:03:09.566 INFO  1 file indexed
10:03:09.567 INFO  Quality profile for gosu: Sonar way
10:03:09.567 INFO  ------------- Run sensors on module GPC
10:03:09.621 INFO  Load metrics repository
10:03:09.645 INFO  Load metrics repository (done) | time=25ms
10:03:10.136 INFO  Sensor JaCoCo XML Report Importer [jacoco]
10:03:10.137 INFO  'sonar.coverage.jacoco.xmlReportPaths' is not defined. Using default locations: target/site/jacoco/jacoco.xml,target/site/jacoco-it/jacoco.xml,build/reports/jacoco/test/jacocoTestReport.xml
10:03:10.140 INFO  No report imported, no coverage information will be imported by JaCoCo XML Report Importer
10:03:10.141 INFO  Sensor JaCoCo XML Report Importer [jacoco] (done) | time=3ms
10:03:10.141 INFO  Sensor Gosu Sensor [communitygosu]
10:03:10.141 INFO  isCacheEnabled(): true
10:03:10.143 INFO  1 source file to be analyzed
10:03:10.451 INFO  Reflections took 79 ms to scan 1 urls, producing 4 keys and 58 values
10:03:10.607 INFO  Reflections took 11 ms to scan 1 urls, producing 2 keys and 6 values
10:03:11.631 INFO  1/1 source file has been analyzed
10:03:11.631 INFO  Sensor Gosu Sensor [communitygosu] (done) | time=1493ms
10:03:11.635 INFO  Sensor Java Config Sensor [iac]
10:03:11.639 INFO  0 source files to be analyzed
10:03:11.651 INFO  0/0 source files have been analyzed
10:03:11.651 INFO  Sensor Java Config Sensor [iac] (done) | time=19ms
10:03:11.652 INFO  Sensor IaC Docker Sensor [iac]
10:03:11.653 INFO  0 source files to be analyzed
10:03:11.747 INFO  0/0 source files have been analyzed
10:03:11.747 INFO  Sensor IaC Docker Sensor [iac] (done) | time=95ms
10:03:11.751 INFO  Sensor TextAndSecretsSensor [text]
10:03:11.751 INFO  Available processors: 16
10:03:11.752 INFO  Using 16 threads for analysis.
10:03:12.296 INFO  The property "sonar.tests" is not set. To improve the analysis accuracy, we categorize a file as a test file if any of the following is true:
  * The filename starts with "test"
  * The filename contains "test." or "tests."
  * Any directory in the file path is named: "doc", "docs", "test" or "tests"
  * Any directory in the file path has a name ending in "test" or "tests"

10:03:13.567 INFO  Using git CLI to retrieve untracked files
10:03:20.424 INFO  Analyzing language associated files and files included via "sonar.text.inclusions" that are tracked by git
10:03:20.435 INFO  1 source file to be analyzed
10:03:20.460 INFO  1/1 source file has been analyzed
10:03:20.460 INFO  Sensor TextAndSecretsSensor [text] (done) | time=8714ms
10:03:20.468 INFO  ------------- Run sensors on project
10:03:20.506 INFO  Sensor Zero Coverage Sensor
10:03:20.515 INFO  Sensor Zero Coverage Sensor (done) | time=9ms
10:03:20.516 INFO  SCM Publisher is disabled
10:03:20.520 INFO  CPD Executor Calculating CPD for 1 file
10:03:20.532 INFO  CPD Executor CPD calculation finished (done) | time=11ms
10:03:21.139 INFO  Analysis report generated in 97ms, dir size=208.9 kB
10:03:21.208 INFO  Analysis report compressed in 66ms, zip size=24.2 kB
10:03:21.262 INFO  Analysis report uploaded in 54ms
10:03:21.268 INFO  ANALYSIS SUCCESSFUL, you can find the results at: http://localhost:9000/dashboard?id=GPC
10:03:21.268 INFO  Note that you will be able to access the updated dashboard once the server has processed the submitted analysis report
10:03:21.270 INFO  More about the report processing at http://localhost:9000/api/ce/task?id=b62f74e1-88f8-4282-934f-3f80062e6059
10:03:21.302 INFO  Analysis total time: 1:00:20.733 s
10:03:21.306 INFO  SonarScanner Engine completed successfully
10:03:21.584 INFO  EXECUTION SUCCESS
10:03:21.585 INFO  Total time: 1:00:24.747s

Preprocessing file logs keeps running for almost 1 hour. And every time I run the scan it takes the same time.

Hi,

Thanks for the log. This is, IMO, the interesting part:

Is it intended that you only analyze one file? If so, you’ll save a lot of cycles and time if you just narrow the sonar.sources value to that file. Because otherwise, the scanner iterates over all the files provided by sonar.sources (if this is a folder, it will iterate over contents of the folder). Then for each file, inclusions/exclusions patterns are evaluated to decide whether the file is indexed or not.

So you’re spending that hour to consider and then exclude half a million files.

 
HTH,
Ann

I’m working on extending the Gosu community plugin to perform incremental scans for PR analysis. Currently, the only method I’ve found to accomplish this, due to limited documentation, is by using a git diff and utilizing the diff files as inclusions. I’m looking for a better approach or any helpful documentation on how a custom plugin should implement readcache and writecache.

Hi,

The initial complaint was that analysis pre-processing takes >1h. I think that’s unrelated to whether it’s branches or PRs, Java or Gosu.

If you want a shorter pre-processing time, then narrow down your sonar.sources definition.

 
HTH,
Ann

So, to clarify, will the pre-processing time be longer due to the larger number of files in the sources folder, regardless of any analysis cache implementation? Will it also take the same amount of time for consecutive scans?

Hi,

This has nothing to do with analysis caching. Now, I think it’s possible that in the context of an actual PR analysis, an initial SCM scan may narrow the list of files to consider, but I don’t have a PR analysis log in front of me to double-check that.

 
Ann

Here are the logs for PR analysis that I am triggering from my .bat file.

C:\GitRepo>runsonar.bat
14:37:09.480 INFO Scanner configuration file: C:\Users\sbaradia\Downloads\sonar-scanner-6.1.0.4477-windows-x64\bin..\conf\sonar-scanner.properties
14:37:09.487 INFO Project root configuration file: NONE
14:37:09.500 INFO SonarScanner CLI 6.1.0.4477
14:37:09.503 INFO Java 17.0.11 Eclipse Adoptium (64-bit)
14:37:09.503 INFO Windows 10 10.0 amd64
14:37:09.523 INFO User cache: C:\Users\sbaradia.sonar\cache
14:37:10.331 INFO JRE provisioning: os[windows], arch[amd64]
14:37:11.054 INFO Communicating with SonarQube Server 10.6.0.92116
14:37:11.808 INFO Starting SonarScanner Engine…
14:37:11.809 INFO Java 17.0.11 Eclipse Adoptium (64-bit)
14:37:12.741 INFO Load global settings
14:37:12.875 INFO Load global settings (done) | time=135ms
14:37:12.881 INFO Server id: EA8D9556-AZFrJSAjQnr8L7-yvvgW
14:37:12.893 INFO Loading required plugins
14:37:12.893 INFO Load plugins index
14:37:12.930 INFO Load plugins index (done) | time=36ms
14:37:12.931 INFO Load/download plugins
14:37:13.044 INFO Load/download plugins (done) | time=114ms
14:37:13.145 INFO Loaded core extensions: developer-scanner
14:37:13.396 INFO Process project properties
14:37:13.408 INFO Process project properties (done) | time=11ms
14:37:13.822 INFO Project key: GPC
14:37:13.822 INFO Base dir: C:\GitRepo\Guidewire\policycenter
14:37:13.826 INFO Working dir: C:\GitRepo\Guidewire\policycenter.scannerwork
14:37:13.833 INFO Load project settings for component key: ‘GPC’
14:37:13.954 INFO Load project settings for component key: ‘GPC’ (done) | time=121ms
14:37:13.978 INFO Load project branches
14:37:14.124 INFO Load project branches (done) | time=146ms
14:37:14.124 INFO Load branch configuration
14:37:14.128 INFO Found manual configuration of branch/PR analysis. Skipping automatic configuration.
14:37:14.129 INFO Load branch configuration (done) | time=2ms
14:37:14.143 INFO Load quality profiles
14:37:14.330 INFO Load quality profiles (done) | time=186ms
14:37:14.367 INFO Load active rules
14:37:20.578 INFO Load active rules (done) | time=6210ms
14:37:20.584 INFO Load analysis cache
14:37:20.655 INFO Load analysis cache (404) | time=72ms
14:37:20.704 INFO Pull request 9576 for merge into master from feature/SQ_POC
14:37:20.705 WARN The property ‘sonar.login’ is deprecated and will be removed in the future. Please use the ‘sonar.token’ property instead when passing a token.
14:37:20.732 INFO Preprocessing files…
14:37:30.746 INFO
14:37:40.752 INFO

Hi,

I don’t see that initial narrowing in this log.

 
Ann

And how does that function? Is it the responsibility of a custom plugin? If so, is there any documentation that I can refer to?

Hi,

No. It’s the responsibility of the Scanner. You can’t change how this works. Just narrow sonar.sources.

 
Ann

Hi Ann,

Thank you for your response. I’m still confused about how to narrow down the sonar.sources folder. Could you provide an example to help me understand better? Do you mean using git diff and assign it to sonar.sources folder?

Hi,

Let’s go back to the log:

There are 515,493 files in the directory. Of those 515,593 files, all but one of them are excluded from analysis based on the inclusion/exclusion patterns you’ve set.

So… get the path to that one remaining file, and set it as the value of sonar.sources.

 
HTH,
Ann

1 Like

Perfect, that makes sense. Thank you so much. Just out of curiosity, is this a workaround or is this how it should be used in general?

Hi,

It’s always best to narrow sonar.sources as much as makes sense.

Let’s say you’ve got .ignore.java files scattered throughout your project. That’s a great time to use an exclusion.

Or if you only want to analyze the few .php files scattered among .ts files in the directory - inclusion! (Altho that’s a bad example because I don’t know why you would want to omit the .ts files, but anyway…`)

So you can use inclusions and exclusions to fine tune what’s included in analysis. But in general, if you can avoid the need to fine tune, then you should.

 
HTH,
Ann

1 Like

Thank you very much, @ganncamp! I will implement your suggestion and reach out if I have more questions.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.