How to ignore the REPLACEMENT CHARACTER with sourceEncoding?

Must-share information (formatted with Markdown):

  • Sonarqube 8.3
  • error free scan of java source files

Say gang,

I have a couple of files in my repository that use the replacement character in strings.

Example (and I realize that everyone may see something different here)

String s = “[“abc�”]”;

These are mostly used as tests for validating our strings can be truncated as expected and whatnot.

This leads to problems with our scanner:

…pathToMyUnitTest.java at line 177 for encoding UTF-8. Please fix file content or configure the encoding to be used using property ‘sonar.sourceEncoding’.

These error show in the scanner output. While they seem ignorable, if I understand correctly, that means this file won’t be scanned at all.

This is sort of related to Invalid character warning UTF-8

I tried setting sourceEncoding to UTF-8 and that made no difference as expected.

The only workaround I have found is to convert the string to something like:

tring specialWithReplacementChar = “[“abc” + “\uFFFD” + “”]”;

Which works, but makes the intent of the test harder to understand.

Has anyone come up with a better best practice that allows the <?> character to be used in the string so that the intent is more clear but also allows the scanner to work with the file?

Thanks!

Hello @johng42,

Interesting way of doing it. As you know the Replacement Character is not a valid UTF8 character, btw it’s not valid in any charset by definition.
Indeed, having it on purpose in a source file makes the whole file being skipped.

The solution you implemented is valid I’d say, the file stays well encoded but the string will not at execution time. You could simply explain the intention in a comment :slight_smile:

On our side, we also have to test bad encoded strings in our softwares sometimes, but our approach is different. We isolate them in independent files which do not contains anything relevant, then we can ignore encoding warning on these files. And I’m pretty sure we also properly exclude them from the analysis to not have any warnings.

Would it be something more suitable in your case? You could read the string from a bad_encoding.txt file, or apply your test upon this file directly, things like that…

Cheers

Thanks for the confirmation. I’ll see what the team wants to do to move forward here.

It would be great if Sonar could ignore this one character, though. That would make the tests a lot easier to read :slight_smile:

1 Like

Well, I understand that you ask in your context :slight_smile: , though you’ll agree that’s it’s impossible to ignore a wrong encoding, whatever it is. I would be like ignoring one syntax error in a source code because it’s more convenient, no one can do that.

Cheers

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.