Strange lines of code count and other questions

SonarQube Developer edition 10.1

Scanner 4.8.0.2856, later on 5.0.1.3006

C++

Using build-wrapper to build, sonar-scanner.bat to analyze.

Hi,

I’m a SQ beginner, evaluating SQ for C++ projects in my company.

While attempting to analyze a few c++ vcxproj-s, I hit our max of 500K lines, and analysis failed of course.

Here’s some more information with questions that came along the way.

I built only 3 projects out of 50 with the buildwrapper command line:

build-wrapper-win-x86-64.exe –out-dir SonarQube “path-to-MSBuild.exe” /t:Rebuild path-to-solution.sln

This produced the two build-wrapper files, log and json. Then:

Scenario 1 (good):

I ran sonar-scanner.bat, passing it the explicit 3 folders of the project (using -D”sonar.sources=PRJ\proj1,PRJ\proj2,PRJ\proj3”), where PRJ is the top directory.

That produced a useful report, and the “line of code” reported was 9.3k.

Scenario 2 (bad):

As a test for preparing a bigger build, without running a new build, I ran the scanner command again, this time passing it the top project directory, i.e. -D”sonar.sources=PRJ”.

This time, in the command line output, the analysis was reported as successful (“ANALYSIS SUCCESSFUL, you can find the results at: …”), however the report webpage showed the error “Server-wide lines of code total to exceed your 500000 limit”.

Scenario 3 (bad):

I hid most of the projects in the PRJ folder, by moving them outside altogether. That left the 3 original projects and an additional 3 others.

Ran the previous bat command again, and still reached the 500K limit, although my counting of the lines revealed about 40K, counting the lines in all *.* files – i.e. even in non-source files.

I made sure the lines count wasn’t accumulating, by removing the extra 3 folders, leaving the original 3 – and this time the report was successful again, and the line count was 9.3k.

I ran the batch again using the -X switch to produce a detailed debug output, however there was no meaningful data revealing why the large line count.

There were suspicious lines in one of the previous command line output:

Sensor CSS Rules [JavaScript]

1 source file to be analyzed

1/1 source file has been analyzed

Hit the cache for 0 out of 0

Miss the cache for 0 out of 0

Sensor CSS Rules [JavaScript] (done) | time=40953ms

Now – we don’t use CSS in our projects and there isn’t such a file in the PRJ folder.

Questions:

  1. Why was the batch output reporting a successful analysis when actually it had failed?
  2. For scenario 2, why was the scanner counting and/or analyzing code that was not built (meaning, it was not appearing in the generated json dump)?
  3. For scenario 3, how did the scanner count over 500k lines where there are actually under 40k?
  4. What is the “ghost” CSS file that was allegedly analyzed, and how come it took ~41 seconds to analyze it?

The problems described are currently stopping me from further trying out the product.

I’d be happy to supply any missing info.

Thanks!

Ok, I found the main problem. There were more lines of code that were used in other projects that I wasn’t aware of. So, the line count is good!

That leaves only questions 1, 2 and 4 above.

1 Like

Hi,

Welcome to the community!

Analysis is two distinct processes. on the build agent, issues are raised, and low-level metrics (e.g. LOC per file) are calculated. Then the analysis report is sent to the server where it’s queued and then processed in turn. In that processing, issues are saved to the database, along with the low-level metrics. Additionally, aggregate metrics are calculated. E.G. LOC per file is added up for LOC per directory, parent directory… and eventually project. And that’s the point at which the analysis process as a whole, can understand that analyzed LOC > license LOC.

Do you have other languages in your project? For C, C++ and Objective-C I would expect analysis to be limited to the context of your build-wrapper output. But other languages, such as JavaScript and TypeScript, are analyzed automatically, and without the need for compilation. You told analysis to analyze everything under PRJ, and so… it did.

That’s a bit difficult to know without more context. I feel like I should be asking you that. :smiley:

Again, this is difficult to know. And at the same time I can say there’s a fixed cost for each analyzer to spin up. CSS analysis uses Node.js, so that’s part of the spin-up too.

 
HTH,
Ann

Thanks Ann for your detailed reply!

Regarding “reporting successful when failed” - understood.

We don’t have any other languages in the project, only C/C++. And the question again is: why was the scanner counting lines of code for files that were not compiled (thus draining out the line quota)? Note that it only added the line count to the project, but it didn’t perform an analysis on them. Only the files that were compiled are shown in the Issues tab.

Regarding the ghost CSS: you are suggesting that without limiting the file types (e.g. with sonar.inclusions), the CSS scanner fires up, which could make sense. Here are some experiments I made:

  • Specified a lower folder (e.g. sonar.sources=PRJ/proj1). CSS scanning didn’t show.
  • Specified the top folder again (PRJ), and added inclusions for *.cpp and *.h files only. Again, CSS scanning didn’t show.
  • Added inclusions for *.css and *.js, and CSS scanning still didn’t show.

In addition, 40 seconds for CSS seem like a very long time for scanning nothing - it is longer than the C++ code scanning!

I understand that “that’s a bit difficult to know without more context.” - is there any log I could share, or any other piece of information that would be helpful?

Thanks,
Gil.

Hi Gil,

We try to keep it to one topic per thread. Otherwise it can get messy, fast. Can you create a new thread for the question of counting LOC for uncompiled files?

Actually, I’m telling you that every sensor gets called on in every analysis. If there’s something for a sensor to do, it does it. If not, it quits.

I really don’t have much to go on here. Can you share the locations of the CSS files in your project? Maybe share the analysis logs where it was analyzed and where it was expected but not analyzed?

The analysis / scanner log is what’s output from the analysis command. Hopefully, the log you provide - redacted as necessary - will include that command as well.

This guide will help you find them.

 
Ann

Thanks Ann,

I posted a new issue regarding LOC counting.

As I stated in the original post above, we don’t have any CSS in our project, nor anywhere under the upmost folder, not even a single one. The only source files are C++.

I also posted the relevant section from the output, here it is again:

Sensor CSS Rules [JavaScript]
1 source file to be analyzed
1/1 source file has been analyzed
Hit the cache for 0 out of 0
Miss the cache for 0 out of 0
Sensor CSS Rules [JavaScript] (done) | time=40953ms

The name of the CSS is not mentioned, not even in the detailed (-X switch) output.
And again, this didn’t show up when specifying a subfolder.

Sharing the full build log would be challenging as it is located on a secured network, however if necessary I’ll make the effort and transfer it for sharing here.

Thanks.

Hi,

Are there any other file types?

This suggests to me that the CSS is found in other file types. Per the docs

 
Ann

  • CSS, SCSS, Less, also ‘style’ inside PHP, HTML and VueJS files

After some investigation I narrowed down the problem to this:
Having an .inc file under the scanned directory (even an empty file) causes this problem.
I couldn’t find anything relevant in the docs about that.
Any idea?

Thanks!

EDIT: examining the scanner debug output I just noticed the line:
“Declared patterns of language PHP were converted to sonar.lang.patterns.php : **/*.php **/*.php3 **/*.php4 **/*.php5 **/*.phtml **/*.inc”

Notice the “.inc”.

Could that be the reason why my empty .inc file fired up the CSS scanner and kept it busy for 40 secs?
All this when the .inc is not even included in my wrapper-supervised build.

Hi Gil,

Yes, I think you’ve found the cause. And as I said, there’s overhead in starting up the language analyzers that use Node.js, so I think that’s the 40s you’re seeing.

Remember, each sensor (analyzer) is going to be invoked at each analysis. If there’s something for it to do, it spins up and does it. The C-family languages are an exception in that they need both files and build-wrapper data/compilation database to meet the criteria of “something do to”. For most language sensors/analyzers the “something” translates simply to “Are there relevant files in the project?” And for most people that’s a good thing.

But if you really want to limit this behavior, the inclusions you specified in one of your tests is probably the best way to go.

 
HTH,
Ann

Thanks a lot Ann, things are clear now.

1 Like