Use StandardCharsets instead of charset names

java

(Jens Bannmann) #1

Rule description & motivation
JDK 7 introduced the class java.nio.charset.StandardCharsets. It provides constants for all charsets that the Java language spec mandates for all JVMs:

  • ISO_8859_1
  • US_ASCII
  • UTF_16
  • UTF_16BE
  • UTF_16LE
  • UTF_8

Many methods in the JDK which deal with charsets are overloaded to accept either a String charsetName or a Charset. Before the advent of StandardCharsets, developers used one of the following approaches, each with their own drawbacks:

  • pass a string like UTF-8
    • drawback: have to catch/throw an UnsupportedEncodingException that will actually never happen unless the JVM violates the language spec
  • use Guava’s Charsets class

I think that SonarJava running for a JDK 7 or above code base should detect when any of the aliases of the six standard charsets are used for JDK methods that would equally accept Charset instances, and when any of Guava’s Charsets constants are used. The rule should tell developers to replace the reference with the corresponding StandardCharsets constant.

Impact to keep this code as it is
Using names to refer to standard charsets adds useless boilerplate code for an impossible UnsupportedEncodingException. Using Guava’s Charsets class instead of StandardCharsets needlessly introduces/reinforces a library dependency.

Notes
This rule would be similar to S1943 (“Classes and methods that rely on the default system encoding should not be used”) in that it checks for a whole lot of methods.

Noncompliant Code

someString.getBytes("UTF-8");
someString.getBytes(Charsets.UTF_8);

Compliant Code

someString.getBytes(StandardCharsets.UTF_8);

References

Type
Code Smell

Tags
clumsy


(Alexandre Gigleux) #2

Hello,

Thanks for this rule suggestion. I think it makes sense hence why I created this rule specification based on your work: https://jira.sonarsource.com/browse/RSPEC-4719 and its linked implementation ticket: https://jira.sonarsource.com/browse/SONARJAVA-2830

Regards


(Jens Bannmann) #3

Great!

The rule spec looks good, I only found a small mistake: the “Message” field says “Replace XXX with Charsets.XXX”, but it should of course mention “StandardCharsets” instead. This would be pretty obvious to the developer implementing the rule, though.


(Alexandre Gigleux) #4

Thanks for the review. Fixed.