sonar.sourceEncoding

avl · July 17, 2023, 4:13pm

I’m using SonarScanner 4.4.0.2170 and the property-file for the scanner contains “sonar.sourceEncoding=ISO-8859-1”

Yet, in the scanner log I find that it still reads some sources as UTF-8 and then complains that they aren’t UTF-8:

e.g.: (first line is just to show that it treats other files correctly as iso-8859-1)

17:17:05.960 DEBUG: 'xxx/Foo.java' generated metadata with charset 'ISO-8859-1'
17:17:05.964 WARN: Invalid character encountered in file /abs/zzz/Bar.java at line 17 for encoding UTF-8. Please fix file content or configure the encoding to be used using property 'sonar.sourceEncoding'.
17:17:05.965 DEBUG: 'zzz/Bar.java' generated metadata with charset 'UTF-8'

Did I already write, that “sonar.sourceEncoding” is set to “ISO-8859-1” ?

What other magic might kick in to override sourceEncoding for just some sources?

PS: Line 17 is really a comment, that does appear to have a few “ï¿½” (hex: ef bf bd) sequences… which I think is some special utf-8 codepoint for a broken char, but it should still qualify as valid iso8859-1 codepoints, at least if I explicitly tell the scanner to treat it as iso-8859-1.

ganncamp · July 18, 2023, 6:17pm

Hi,

First, can you upgrade the scanner to the latest version, 4.8, and see if this is still replicable?

If it is, could you add -Dsonar.scanner.dumpToFile=[path to file] to the analysis command line of one of the projects where you’re seeing this behavior? Then we can see the encoding value analysis is actually getting. If it’s what you expect (ISO-8859-1) then we can pursue this as a scanner bug. And if it’s not, we’ll (you’ll) need to track down where the override is coming from.

Ann

avl · July 18, 2023, 6:35pm

I’ll be back to it tomorrow…

In the meantime (I don’t know your timezone), it might be worth trying to reproduce it directly in your labs, by creating a plain java file, with a // ï¿½ comment containing the mentioned sequence, and setting the sourceEncoding to iso-8859-1.

avl · July 19, 2023, 4:33pm

Back with some more data…

I upgraded to scanner 4.8.0.2856
I still get the same “WARN” for the same file and line.
with sonar.scanner.dumpToFile , I got a dump of all system properties… which had all the charset-related properties in it, that I configured for it.
for a test, I changed ISO-8859-1 to ISO-8859-15 (which are pretty similar), and most of the java files then got identified as ISO-8859-15, but others were still identified as UTF-8 (probably based on their content)
What I didn’t write before: quite a couple of java files get identified as utf-8, and maybe they even are. The problem is, that if it identifies a file as utf-8 against the given property, then it shouldn’t then complain about the file not being utf-8

ganncamp · July 19, 2023, 5:46pm

Hi,

Thanks for the followup. I’ve flagged this for more expert eyes.

Ann

eric.giffon · July 24, 2023, 2:38pm

Hi Andreas,

Thank you for raising this issue.
The way the scanner works is that it tries to detect the encoding automatically, and falls back to the one specified by the property if it can’t. There seems to be a bug with the detection method, in that it assumes the encoding is UTF8 when this character sequence is in the file.

I created the ticket SONAR-20012 to track the bug, but for now I would recommend fixing the files in your sources to not have these characters anymore.

Cheers,
Eric

Topic		Replies	Views
WARNING : There are problems with file encoding in the source code. Please check the scanner logs for more details SonarQube Server / Community Build	5	10383	February 22, 2021
Sorna scanner incorrect anyalysis encoding SonarQube Server / Community Build sonarqube , scanner , encoding	2	1301	June 2, 2020
SonarQube Encoding to UTF-8 SonarQube Server / Community Build sonarqube , scanner , dotnet	6	12750	March 14, 2022
Honor XML encoding declaration over sonar.sourceEncoding property Product Manager for a Day xml , scanner	0	567	May 24, 2022
Sonarqube encoding set to windows-1252 but skips files stating that it is set to UTF-8 SonarQube Server / Community Build csharp , msbuild , dotnet , encoding	1	196	July 2, 2024

sonar.sourceEncoding

Related topics