Reducing LOC: Devs Scanning Code That Is Not Theirs And How I Can Find It

Hello Sonar Community,

This is my first post ever here so please go easy on me! I am a “Cloud Tools Admin” and I am doing my best to manage our global SonarQube instance. My questions are regarding freeing up lines of code to preserve our license for as long as possible.

TLDR Question: What are some common directories or folder structures that should not be scanned that I can search for? Example: Python “site-packages” . I am looking for any code that our developers have NOT written themselves and are needlessly scanning. I can use the /api/components/tree API to search projects for these directories, but my problem is, other than “site-packages”, I don’t know what to search for anymore.

Background / How SonarQube Deployed:

  • SonarQube Enterprise version 9.9.4 LTS hosted in AWS as a docker container on an EC2 instance using RDS PostgreSQL backend database.
  • We are a diverse organization with hundreds of software engineers all using various languages (Java, Python, C#, C++, CSS etc.)
  • We have a little over 3700 projects. Licensed for 50 million LOC and currently at about 47 million LOC used up.

What I am trying to achieve: I want to free up as many LOC as possible to preserve our license and not reach the limit too quickly.

Problem: Many of our developers simply scan code that they should not scan. They often scan code that they did not write themselves and this taxes our SonarQube license which is based on how many Lines of Code that we scan. Our license is currently set for 50 million lines, and we are fast approaching the limit sitting currently at approximately 47 million lines.

What I have tried so far:

  1. I’ve sent mass communications urging developers to use the sonar.exclusions etc. to “narrow the focus” and only scan code that they write themselves. This only works so well as one guy trying to tell hundreds of people in mass emails what to do.

  2. I identify old projects that have not been scanned in a long time using the API api/projects/search to search for all the 3700+ projects, and filter with the “analyzedBefore” parameter set to a date of 1 year ago to get any projects that have not been scanned in over 1 year. Then I mass communicate to see if these are projects that can be deleted etc. This has already been used a couple of times in the past, and its effectiveness is running out as most of the projects are younger now.

  3. One day when I was manually checking the “Code” tab in a few projects, I noticed that several of the Python projects had scanned “site-packages”, a third-party modules folder of things that the devs did not write themselves. Using the API api/components/tree with the “qualifiers” and “q” parameters, I was able to filter out all projects using site-packages and flagged them as needing to be fixed.

I am unfortunately NOT a jack of all programming languages, and I simply am not aware of all the folders that I could search for that may be containing third-party code that our devs didnt write! Please help me identify other folders or what else I can do to free up lines of code.

Thank you for your time. I appreciate any help you can provide.


Hi Patrick,

I don’t have a lot of advice for you. On the one hand, it may just be time to increase your license. On the other, you have 3 million lines left. In the scope of 50m, that’s only 6%, but it is counted in millions.

That said, you may want to take a look at Administration → Analysis Scope → Global Source File Exclusions. With that setting, you can exclude site-packages for every project at once. Maybe that will help.

Beyond that, you might look at charge-backs. Take a look at the api/projects/license_usage API to see where your license LOC are going, and perhaps use that information to let managers know what their % of the license cost it. If you’re allowed, organizationally, to do that, I suspect it will handle the problem for you.