Special unicodes gives "Malformed input or input contains unmappable characters" in vscode

Hi

We are using SonarQube for IDE plugin inside a devcontainer (running Ubuntu:22.04) in vscode. Here we are using the plugin to do static analysis and to connect to our Enterprise version of Sonar Qube Server.

  • Operating system: Ubuntu 22.04 (devcontainer)
  • SonarQube for VS Code plugin version: 4.17.0
  • Programming language you’re coding in: C/C++
  • Is connected mode used: Yes, but the error also comes in unconnected mode. It is unrelated to connected mode state.

If we have a folder or file with unicode chars (fx “folderæøå” which includes Danish characters) and place a “test.cpp” inside of it the SonarQube for IDE plugin throws the following error:

[Error - 06:19:06.456] [sonarlint : SonarLint Analysis Executor] No file to analyze
[Error - 06:19:53.522] [org.eclipse.lsp4j.jsonrpc.json.StreamMessageProducer : Server message reader] Malformed input or input contains unmappable characters: folder������/test.cpp
[Error - 06:19:53.523] [org.eclipse.lsp4j.jsonrpc.json.StreamMessageProducer : Server message reader] java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: folder������/test.cpp
	at java.base/sun.nio.fs.UnixPath.encode(Unknown Source)
	at java.base/sun.nio.fs.UnixPath.<init>(Unknown Source)
	at java.base/sun.nio.fs.UnixFileSystem.getPath(Unknown Source)
	at java.base/java.nio.file.Path.of(Unknown Source)
	at org.sonarsource.sonarlint.core.rpc.protocol.adapter.PathTypeAdapter.read(PathTypeAdapter.java:48)
	at org.sonarsource.sonarlint.core.rpc.protocol.adapter.PathTypeAdapter.read(PathTypeAdapter.java:30)
	at com.google.gson.internal.bind.TypeAdapters$34$1.read(TypeAdapters.java:1007)
	at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
	at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433)
	at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393)
	at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
	at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:82)
	at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:61)
	at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
	at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433)
	at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393)
	at com.google.gson.Gson.fromJson(Gson.java:1227)
	at com.google.gson.Gson.fromJson(Gson.java:1186)
	at org.eclipse.lsp4j.jsonrpc.json.adapters.MessageTypeAdapter.fromJson(MessageTypeAdapter.java:344)
	at org.eclipse.lsp4j.jsonrpc.json.adapters.MessageTypeAdapter.parseParams(MessageTypeAdapter.java:264)
	at org.eclipse.lsp4j.jsonrpc.json.adapters.MessageTypeAdapter.read(MessageTypeAdapter.java:120)
	at org.eclipse.lsp4j.jsonrpc.json.adapters.MessageTypeAdapter.read(MessageTypeAdapter.java:56)
	at com.google.gson.Gson.fromJson(Gson.java:1227)
	at com.google.gson.Gson.fromJson(Gson.java:1186)
	at org.eclipse.lsp4j.jsonrpc.json.MessageJsonHandler.parseMessage(MessageJsonHandler.java:119)
	at org.eclipse.lsp4j.jsonrpc.json.MessageJsonHandler.parseMessage(MessageJsonHandler.java:114)
	at org.eclipse.lsp4j.jsonrpc.json.StreamMessageProducer.handleMessage(StreamMessageProducer.java:193)
	at org.eclipse.lsp4j.jsonrpc.json.StreamMessageProducer.listen(StreamMessageProducer.java:94)
	at org.eclipse.lsp4j.jsonrpc.json.ConcurrentMessageProcessor.run(ConcurrentMessageProcessor.java:113)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

I have tried all sorts of fixes. Fx I’ve tried to change the locale settings and to use enviroment variables (export JAVA_TOOL_OPTIONS=“-Dfile.encoding=UTF-8”) to fix it.

I have even created a small java test application that open the above test.cpp file. The application was built and executed with the portable java environment coming with the SonarQube for IDE in the .vscode-server

What I did

  1. Created the following HelloUTF8.java app
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

public class HelloUTF8 {
    public static void main(String[] args) {
        System.out.println("Hello, UTF-8 world! 🌍");
        System.out.println("Danish Char æøå ÆØÅ");
        System.out.println("Default Charset: " + java.nio.charset.Charset.defaultCharset());
        System.out.println("file.encoding: " + System.getProperty("file.encoding"));

        Path filePath = Paths.get("folderæøå/test.cpp");

        try {
            List<String> lines = Files.readAllLines(filePath, StandardCharsets.UTF_8);
            System.out.println("\nContents of file:");
            lines.forEach(System.out::println);
        } catch (IOException e) {
            System.err.println("Failed to read the file: " + e.getMessage());
        }
    }
}

A test.cpp was placed in folderæøå/test.cpp with the following content

int main()
{
    printf("Hello, World! æøåÆØÅ\n");
}
  1. Next we compiled it using the bundled javac compiler
~/.vscode-server/extensions/sonarsource.sonarlint-vscode-4.20.0-linux-x64/jre/21.0.6-linux-x86_64.tar/bin/javac HelloUTF8.java
  1. Finally executing it with the bundled java runtime engine
~/.vscode-server/extensions/sonarsource.sonarlint-vscode-4.20.0-linux-x64/jre/21.0.6-linux-x86_64.tar/bin/java HelloUTF8

And the output is as expected

Hello, UTF-8 world! 🌍
Default Charset: UTF-8
file.encoding: UTF-8

Contents of file:
int main()
{
    printf("Hello, World! æøåÆØÅ\n");
}

From this the conclusion is that the problem is not related to the environment and the javac or java runtime, but more the SonarQube for IDE itself.

Can you explain why we cannot use unicode characters in our file and folder paths in our setup?

Hello @ccasn, thanks for reporting this to us.

I reproduced your use case on Ubuntu and I did not notice any issue, the file appears encoded folder%C3%A6%C3%B8%C3%A5/test.cpp.

Could you share the output of locale from the VS Code terminal? Sharing the full SonarQube for IDE logs could also help us investigate.

Thanks!

Sure, hereby the locale in vscode terminal

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Attached are the full log from the vscode “Output” → “SonarQube for IDE” log

log.txt (33.2 KB)

Hi Nicolas

Don’t know if you have news to the logs and locale I sent to you last week?

Hey @ccasn, I did not have time yet to fully investigate the case. I’m not yet able to reproduce the problem and I may need a bit more time to properly set up the right environment. We will keep you updated whenever we make progress on it, we are actively looking at it.

Hi Nicolas

Completely fine. Was just wondering if my message wasn’t getting through (as you responded very fast last time :slightly_smiling_face:).

Would it help you if I supplied a sample devcontainer.json? Then it is 100% identical to our setup?

Don’t know if you have worked with devcontainers in vscode (with WSL) before?

Would it help you if I supplied a sample devcontainer.json? Then it is 100% identical to our setup?

I’d appreciate it!

Since I’m more familiar with IntelliJ, I quickly set up a project using a dev container there, and it doesn’t handle non-ASCII characters at all. IntelliJ won’t allow me to create folders or files with such characters — even copy-pasting the project structure didn’t work. This is probably because the locale is set to POSIX in my case.

Using the exact same dev container environment would definitely help. I also encountered this GH discussion, maybe it could be relevant in our case.

Hi Nicolas

Hereby a link to a small devcontainer setup in vscode that replicates the problem:

I’ve described build steps and how to provoke the “SonarQube for IDE” error message.

I’m looking forward to hear news if you can replicate the error.

BR,
Carsten

2 Likes

We were able to reproduce the issue thanks to your reproducer.

I created this bug ticket and we will try to tackle it for a future release.

1 Like

Hi Nicolas

Glad the devcontainer could help in the investigation.
Hopefully it is easy to solve.