We are running a fairly large project consisting of angular & php in the frontend and java/plsql in the backend and having a CI server running pull requests on every feature/bug branches.
We recently setup a backend gradle build cache server and configured our builds to utilize it fully and saw our build times decrease significantly.
Eg before a clean build was around 50minutes including sonar analyze(around 12 min). After the backend cache was enabled a clean build (without any changes) takes 3 minutes plus the 12 minutes of sonar. Of course during normal activity changes occur in the code as well so the average build time is around 20 minutes including sonar.
But it occurred to me that if we could get such a decrease in the builds with the gradle cache, it should be possible to achieve similar decrease in the sonar analyze as well. Usually in a pull request we seldom do changes in all parts at once, often it’s just a couple of lines changed to fix a bug somewhere in the system. Eg the output of the sonar analyze would be the same for most
Code.
Shouldn’t it be possible for you to quite easily implement similar build cache for to start with every sensor. In that way a change in the java files would only need to rerun the java sensors, wheras the other sensors could just grab the outputs needed from the cache. I like the build caches concept by making a hash of all source files used and then tie it neatly to a specific output (since you have rules etc that could be changed between runs even though no source code has changed, you could perhaps generate a specific settings file together with the sources to build up the hash to determine when something from the cache can be used). And also by having a backend server with the cache it will be available to all build agents regardless which server/container etc they are running in.
Also executing the sensors in parallel would be diserable. However most effect I think will be from a cache.
Eg in our case in our project we are running in average 100 sonar anlyzes a day which takes 12 minutes each. I believe a cache would drop the sonar analyze to maybe 4 minutes in avarage.
Are you in Community Edition or a commercial edition? Because by 9.9 there’s a caching mechanism in place to speed up PR analysis. (I believe we’re looking at extending that to full analysis, but I don’t remember details.)
Hi,
We are in the commercial edition and we have seen some good improvements in the PR analysis as well. But its a different need/approach i think. PR analysis cache is to be able to cache some parts of an analysis even if there have been some changes Within that sensors area.
What i am suggesting/wanting is caching of a complete sensors run when no changes at all have been done for that sensor. Since a typical sonar execution for us runs about 10 different sensors, but only one or two have actually source code changes in the pull request. A push with a minor change during the pull request review triggers another full run. And then later on when the release/master branch is built again (with the exact same source code), the sonar scan is run once again. Eg lots time could probably be saved.
Thanks for the followup. To be sure I understand, what you’re after is skipping sensors that don’t need to re-run because the limited code changes mean they don’t apply?
We looked last year at being able to skip sensors that aren’t relevant to the project, but in the end we found it was less effort to just allow each sensor to start up and say “nothing to do here”.
No the sensors are still useful, but the the output from the sensors have already been produced from a previous run and could be reused instead of creating the same output again which is costly. Maybe I didn’t explained it good enough.
Suppose we have a source repository containing the following parts/directories:
frontend-angular
frontend-jsp
backend-java
backend-plsql
On the build server the master branch is run for the first time and the sonar analyze of the whole project is run which takes 12 min in total
frontend-angular: 3min
frontend-jsp: 3 min
backend-java: 3 min
backend-plsql: 3 min
If you had a cache, the output of every sensor are now cached (with a hash of all the source/input files as the key)
Later on a developer creates a bug fix branch to fix a problem in the backend-java and creates a pull request which creates a new build on the build server including a sonar run.
However this time the results from 3 sensors can be fetched from the cache (since no source code changes have been done there, and the output from the sensors would be the same):
frontend-angular: FROM-CACHE
frontend-jsp: FROM-CACHE
backend-java: 3 min
backend-plsql: FROM-CACHE
Developer fixes a bug during the pull request review, pushes up the changed code and a new sonar run is executed:
frontend-angular: FROM-CACHE
frontend-jsp: FROM-CACHE
backend-java: 3 min
backend-plsql: FROM-CACHE
And the pull request is merged to the master branch a new sonar run is done , and this time everything could be fetched from the cache:
frontend-angular: FROM-CACHE
frontend-jsp: FROM-CACHE
backend-java: FROM-CACHE
backend-plsql: FROM-CACHE
E.g. in this example we would have saved 10x3min of execution time by utilizing a cache. Depending on your branching strategy the savings could be even more. For example we have both specific release branches and a master branch, so a build with the same source code can happen in 3 or more branches.
Eg bugfix branch-> release v1 → release v2 → master branch.