How to ignore the REPLACEMENT CHARACTER with sourceEncoding?

johng42 · October 12, 2020, 9:17pm

Must-share information (formatted with Markdown):

Sonarqube 8.3
error free scan of java source files

Say gang,

I have a couple of files in my repository that use the replacement character in strings.

Example (and I realize that everyone may see something different here)

String s = “[“abc�”]”;

These are mostly used as tests for validating our strings can be truncated as expected and whatnot.

This leads to problems with our scanner:

…pathToMyUnitTest.java at line 177 for encoding UTF-8. Please fix file content or configure the encoding to be used using property ‘sonar.sourceEncoding’.

These error show in the scanner output. While they seem ignorable, if I understand correctly, that means this file won’t be scanned at all.

This is sort of related to Invalid character warning UTF-8

I tried setting sourceEncoding to UTF-8 and that made no difference as expected.

The only workaround I have found is to convert the string to something like:

tring specialWithReplacementChar = “[“abc” + “\uFFFD” + “”]”;

Which works, but makes the intent of the test harder to understand.

Has anyone come up with a better best practice that allows the <?> character to be used in the string so that the intent is more clear but also allows the scanner to work with the file?

Thanks!

Antoine · October 13, 2020, 9:05am

Hello @johng42,

Interesting way of doing it. As you know the Replacement Character is not a valid UTF8 character, btw it’s not valid in any charset by definition.
Indeed, having it on purpose in a source file makes the whole file being skipped.

The solution you implemented is valid I’d say, the file stays well encoded but the string will not at execution time. You could simply explain the intention in a comment

On our side, we also have to test bad encoded strings in our softwares sometimes, but our approach is different. We isolate them in independent files which do not contains anything relevant, then we can ignore encoding warning on these files. And I’m pretty sure we also properly exclude them from the analysis to not have any warnings.

Would it be something more suitable in your case? You could read the string from a bad_encoding.txt file, or apply your test upon this file directly, things like that…

Cheers

johng42 · October 13, 2020, 2:17pm

Thanks for the confirmation. I’ll see what the team wants to do to move forward here.

It would be great if Sonar could ignore this one character, though. That would make the tests a lot easier to read

Antoine · October 13, 2020, 2:21pm

Well, I understand that you ask in your context , though you’ll agree that’s it’s impossible to ignore a wrong encoding, whatever it is. I would be like ignoring one syntax error in a source code because it’s more convenient, no one can do that.

Cheers

system · October 20, 2020, 2:21pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Invalid character warning UTF-8 New rules / language support java	0	5080	September 24, 2019
WARNING : There are problems with file encoding in the source code. Please check the scanner logs for more details SonarQube Server / Community Build	5	10234	February 22, 2021
sonar.sourceEncoding SonarQube Server / Community Build java , scanner	5	2843	July 24, 2023
Why is there no rule for error "Invalid character encountered" New rules / language support php , sonarqube , scanner	4	5303	September 19, 2023
SonarQube Encoding to UTF-8 SonarQube Server / Community Build sonarqube , scanner , dotnet	6	12569	March 14, 2022

How to ignore the REPLACEMENT CHARACTER with sourceEncoding?

Related topics