Existing issues reported in branches. Develop plugin to avoid?

For context, my team is using SonarQube 7.7 Enterprise Edition.

The native quality gate metrics in SonarQube (coverage, duplicated lines, maintainability, reliability, and security) all have options for assessment on only “new code,” which my team applies to our quality gate to flag changes on short-lived branches that will introduce issues if merged upstream. Our CI pipeline generates a variety of issue reports (e.g. Fortify, bandit, and dependency check) that SonarQube ingests fine, but many of them are considered for the entire code base and not just for new code.

When our developers work on a feature branch and review findings in SonarQube, they’re currently seeing dependency warnings and security issues from bandit/Fortify that they did not introduce in their short-lived feature branch, but were instead pre-existing in the application (and have not yet been dealt with). This results in the quality gate for their feature branch failing, when really it should pass because the new code they are proposing is not introducing the noted security/dependency issues.

We’re looking for a way to let SonarQube ingest all the reports it currently supports, but to have it only flag issues on short-lived branches if they were introduced in the new code on that branch. Since SonarQube has knowledge about the branch and the merge target, it knows which lines are new and can do the cross-check to see whether an issue flagged by, for example, Fortify on a particular line is something that was newly-introduced in the developer’s short-lived feature branch.

Using the Fortify plugin as an example, we currently ingest our low/medium/high/critical issues into SonarQube and have set metrics for each issue level in our quality gate. These metrics, however, do not apply to new code, and as such cannot fail the quality gate of a short-lived branch. Once a developer’s short-lived feature branch has been merged into a long-lived branch the metrics are applied and the quality gate can then fail based on the number of reported Fortify issues. We would like to catch newly-introduced issues before they’re merged, when the developer is still working on the functionality of the feature branch.

Based on my quick skim over both the Fortify plugin and SonarQube source, it looks like the “…on New Code” Metric instances returned from Metric.Builder() must be handled specially by the SonarQube branch analysis code…?

Is it possible to somehow register a metric with SonarQube such that it behaves the way the “…on New Code” native metrics do, where a failure in the metric will show up as a failed quality gate on the short-lived branch’s dashboard?

Hi,

I started to move this thread into the Plugin Development category, but it’s not clear to me that that’s the core issue. If I understand correctly, you run external analyzers on Short-Lived Branches (SLBs), and instead of only new external issues being reported, you’re seeing all external issues reported. So let’s get that sorted out first, and then if you still feel the need to develop a new metric there ought to be less to deal with.

My first question is whether or not you analyze every branch – short- and long-lived – with these external analyzers and whether you pass those issue reports in to every analysis. Because what should be happening is that each SLB suppresses reporting of issues that exist in its sonar.branch.target. If that’s not happening, that’s where we need to start.

 
Ann

Hi Ann, thanks very much for the response. We are indeed analyzing every branch and ingesting the issue reports for every analysis. I just went back and ensured that master had been run through our pipeline for one project, and then ran a feature branch through the pipeline (which targeted master) and saw the behavior you described–no issues were reported on the feature branch and it passed its quality gate (while master did not, as expected). So, part of this was a misunderstanding on my part about how the metrics are collected and presented; thanks for clarifying that!

To explain how we end up in this situation (since it might help expose a path forward for better integration for our team), we’re in the process of onboarding all of our projects into a new CI pipeline (not having used SonarQube in the past). Our typical procedure involves taking a legacy project (structured with master, snapshot, and feature branches), creating a feature branch with the necessary structure changes and pipeline files (e.g. a Jenkinsfile), running that through the pipeline to ensure compatibility, and then submitting a PR for the manager of the application to review and approve its merge into the snapshot and/or master. This means that we frequently have not scanned a long-lived branch prior to our short-lived branches moving through the pipeline, which results in the scenario we’re discussing where there are no issues to cross-reference in the sonar.branch.target branch.

With our process, would your suggestion be to disregard any findings/metrics shown in the “pipeline-onboarding” feature branch, merge it in after a successful run through our pipeline, and then ensure that the branch being used as sonar.branch.target gets run through the pipeline to have metrics for comparison? Some of this might just be noise coming from the onboarding process.

As a follow-on to the plugin development thought, I think we would still like the ability to call out specific metrics that get applied to the quality gate on a short-lived branch. Two examples come to mind here:

With the Fortify plugin, issues are reported as vulnerabilities with varying severity. We have policies in place that require things like “zero Critical Fortify issues” and “no more than 5 Medium Fortify issues.” In this case, Fortify issues are specifically called out as a requirement for deployment, separate from any other issue reports we may ingest (e.g. bandit) or issues that SonarQube may flag on its own. This means that the native metric for vulnerability score “… on New Code” doesn’t work well for capturing our Fortify requirement, because that score includes a variety of other information. We would like the ability to split out the Fortify requirement to specifically look at, for example, how many critical issues were introduced in the feature branch and fail the quality gate if the number is higher than zero. The Fortify plugin allows for this, but only on long-lived branches because the metric it creates isn’t setup to work on new code. Our team is happy to write some code to create additional “… on New Code” metrics in the Fortify plugin, if possible.

Secondly, after our pipeline finishes a build, we would like to push the build state from Jenkins to SonarQube and reference it as part of a quality gate metric (we’re planning to write a small plugin to consume this information). We’re trying to consolidate all of our metrics and reporting into SonarQube such that developers don’t have to look at other tools (e.g. Jenkins) unless there is a problem with the pipeline (i.e. something broke). We have set our pipeline up to continue running through as many stages as it can when finding issues; for example, some pipelines may want to fail the build if a certain stage fails (e.g. code coverage). Instead we’re marking the build unstable, continuing to the next stage, and pushing all successful reports up to SonarQube. This gives us a single point of review for everything that the pipeline found, but without the ability to review our build state as a metric on new code, I don’t see a way to fail the quality gate on a short-lived branch if Jenkins flagged it as unstable.

With some more clarity on the first discussion point about cross-referencing issues between short-lived and target long-lived branches, I think we can resolve some of our issues around quality gates. Is there a way to address the second point using currently-available Metric api’s?

Hi,

There’s a lot in your response, and I understand that it’s probably all intertwined in your mind. In general though it’s best to have one question per thread. Nonetheless, I’ll try to address your questions here to get you going.

Yes, that would be exactly my suggestion. :slight_smile:

What I understand from this is that it’s okay to introduce new critical vulnerabilities (Security Rating on New Code = D) that don’t come from Fortify. I’m going to challenge you on that. A critical vulnerability is a critical vulnerability regardless of the source. If you don’t think that non-Fortify vulnerabilities can really be “Critical”, then you need to downgrade those rules in your profile, or remove them. Another way to approach this is to challenge your mapping of Fortify rules/issues into SonarQube. If you can’t release with a “Critical” issue from Fortify, then… isn’t it really a “Blocker” (Security Rating on New Code = E)?

So… this gets a bit tangled. A lot of people want to mark the Jenkins build failed if the project fails its Quality Gate. You want to fail the Quality Gate if the Jenkins build fails. :laughing: Things have been set up for the former, but not the latter. Further, you cannot “update” a completed analysis. So to make this work, you’d need to make analysis the very last step, and then find a way to pass the cumulative pipeline status in to analysis. This could probably be accomplished using web services to update a custom measure, but! But you really shouldn’t be running analysis if, for example compile failed. Several languages require a successful compilation before analysis can take place, and all language analyzers proceed on the assumption that they’re dealing with compilable, runnable code. Really, I’d advise you to reconsider your expectations here.

 
HTH,
Ann

Thanks Ann; some of this helps, but raises additional questions/thoughts.

The suggestion to disregard reported metrics on short-lived branches until a target long-lived branch has been analyzed works for some of our projects, but not all. My group supports a very wide variety of projects ranging from full production-scale applications down to small one-off research tests. In the former case it’s reasonable to expect that master and snapshot branches have been scanned, but the latter not so much. Some of the one-off research projects don’t have seasoned developers working on them, so their SCM branch structure isn’t always the best. It’s common that master will periodically atrophy while what was originally a short-lived branch functionally becomes a long-lived branch as a particular research concept is fleshed out.

As a result, it would be optimal if SonarQube didn’t require an analysis on the target long-lived branch in order to de-dupe reported metrics. All of the generated reports indicate a line number where the issue was found, and SonarQube knows which lines are new because it does a diff between the branch and the merge target. If SonarQube simply reviewed all reported issues and only raised them to the user on short-lived branches if the corresponding line number was identified as being new, the developer/pipeline iteration cycle on short-lived branches would be a lot cleaner. There would be no more oddness around “first-time issue imports” between short-lived and merge target branches, and the implementation of quality gates on short-lived branches would seem to be easier.

On the topic of Fortify issues, it’s less about whether our team considers other metrics valid or invalid for a given vulnerability score and more about the inability to specifically call out issues reported from individual tools. Based on my understanding of how the scores are calculated (issues per lines of code), it’s possible for a single issue to blend in with a sea of others and not change the rating. If an application has a vulnerability rating of B because it has 20 high priority issues reported by bandit and 1 high priority issue reported by Fortify, our team needs the ability to fail the quality gate because of the single Fortify issue. Since the points contributed by the Fortify issue are blended/averaged in with the other issues, we can’t do this. In many cases these policies are not directly set by our teams, so it’s not a matter of educating people that bandit issues are equally as important as Fortify ones. We just have specific metrics that require certain actions based on certain tools, and would like to develop plugins that allow us to expose these metrics specifically on new code.

Lastly, while I understand your point on failing builds based on the failure of code compilation, that’s not quite the angle we’re approaching this from. Certainly if the literal build of the application fails, the pipeline should fail as well. SonarQube is indeed the last stage in our CI pipeline, save for artifact deployment; it gathers all metrics from the tools exercised in previous stages. We deal with applications that don’t get compiled (e.g. Python-based) as well as those that do (e.g. Java). Since we have a wide variety of project maturity levels in my group, it’s not uncommon for a project to skip some of the stages required for deployment (for example, running unit tests on an approved version of Python; e.g. 3.6 may be allowed but 2.7 is not without manual authorization). In the “approved Python version” case, our pipeline allows the project to exercise itself in its desired version (say 2.7) using tox but will flag the build as unstable if that desired version is not on the approved list. We would like to communicate this to SonarQube so that deployment approvers can review everything in one dashboard.

So, I think all of this comes back to a desire to have a plugin report findings only on code that is considered new. The native metrics for “… on New Code” are able to do this; is the API for this accessible to plugin developers?

Hopefully this gives some more clarity to the issues we’re hitting; thanks for fielding multiple questions in one post!

Hi,

I appreciate your position on issues in short-lived branches. However, if the target branch has not been analyzed, then there’s no way to “de-dupe” anything because there are no duplicates. Yes, we could do magic with dates, but that’s just not how it’s currently implemented.

Regarding Quality Gates, I think you misunderstand how the ratings work. For Maintainability, the rating is based on “technical debt” (estimated remediation time) of open Code Smells versus the size of the code base. For Security and Reliability the rating is set by the severity of the worst issue. So a single Blocker issue in a code base of 2M lines gives you an E.

Regarding plugin development, the APIs are public. Feel free to look at any of the open source analyzers to see how they implement New Code metrics.

 
Good luck,
Ann

Thanks Ann.

I wasn’t suggesting that any calculations be done based on dates of code changes, and apologies for using the term “de-dupe,” that didn’t clearly communicate my intent. When you specify a merge target for a branch, SonarQube appears to do a diff between the short-lived branch and the target to determine what code is new. This is separate from an analysis being run on the target branch. For example, in the case where you scan a short-lived branch before scanning master, SonarQube correctly reports code coverage metrics on the new code because it has the diff information. There wouldn’t be any magic involved; it would just be correlating reported issues with line numbers, and checking if those line numbers were reflected in the diff. By the definition of reporting issues on new code, this is what should logically be happening, regardless of the merge target having been previously analyzed.

You are 100% correct that I misunderstood how security and reliability ratings were calculated. Thanks very much for correcting me on that; I just went back and re-read the page on metric definitions. I think we could work with this, despite not explicitly calling out issues from individual tools in our quality gate, though it would be ideal if there were a way to create an instance of a Metric that operates on new code.

To your point about reviewing the open source analyzers, I previously reviewed the source for the sonarqube project and looked at the instances of Metric that it creates here:

The only difference between the new-code metrics and the non-new-code metrics is the calling of setDeleteHistoricalData(true). I modified our Fortify plugin to set this for its metrics and the results were not reflected in the quality gate for our short-lived branches, while the native metrics were. As far as I know the branch-handling functionality in our paid version of SonarQube is not open-source, so I’m assuming that the native metrics are somehow registered differently when it comes to quality gates and their application to short-lived branches (please correct me if I’m wrong and missed the implementation).

Could you help me understand what I’m missing about creating a metric that operates on new code?

Hi,

Nope. We use SCM dates to determine what code is “new”, then remove from the set of issues reported on the PR / SLB the issues already present in the target branch. I believe there is some date-based logic to handle non-reporting of issues that have been removed from target since the SLB was created but I don’t remember specific details.

This really deserves a new thread.

 
Ann