Handling files with Japanese names cause error

I am using BitBucket Piplelines to run .NET SonarScaner. Data is sent to SonarCloud. The problem is that I get an error if the file name of the Git diff has Japanese characters in it. I have tried adjusting the locale, but it doesn’t help. Is there any way to fix this?

Shell command running in bitbucket pipelines:

export PATH="$PATH:/root/.dotnet/tools"
export LANG=ja_JP.UTF-8
apt-get install --yes openjdk-11-jdk
dotnet tool install --global dotnet-sonarscanner
java -version -XshowSettings:locale
dotnet sonarscanner begin /k:xxxx /o:xxxx /d:"sonar.host.url=https://sonarcloud.io" /d:"sonar.login=${SONAR_TOKEN}" /d:"sonar.dotnet.excludeTestProjects=true" /v:"${BITBUCKET_COMMIT}"

log:

java -version -XshowSettings:locale
Picked up JAVA_TOOL_OPTIONS: -Duser.language=ja -Duser.country=JP -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
openjdk version "11.0.18" 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Debian-1deb11u1, mixed mode, sharing)

INFO: Load project repositories
INFO: Load project repositories (done) | time=1661ms
INFO: SCM collecting changed files in the branch
INFO: ------------------------------------------------------------------------
INFO: EXECUTION FAILURE
INFO: ------------------------------------------------------------------------
INFO: Total time: 43.837s
INFO: Final Memory: 23M/87M
INFO: ------------------------------------------------------------------------
ERROR: Error during SonarScanner execution
java.lang.IllegalStateException: Unable to load component class org.sonar.scanner.scan.filesystem.ProjectFileIndexer
	at org.sonar.core.platform.ComponentContainer$ExtendedDefaultPicoContainer.getComponent(ComponentContainer.java:52)
	at org.picocontainer.DefaultPicoContainer.getComponent(DefaultPicoContainer.java:678)
	at org.sonar.core.platform.ComponentContainer.getComponentByType(ComponentContainer.java:273)
	at org.sonar.scanner.scan.ProjectScanContainer.doAfterStart(ProjectScanContainer.java:414)

Hi @uda ,
Are you still experiencing this issue? I was wondering if you could give me some additional information. You have mentioned that the issue is being caused by the gitdiff containing japanese characters. Is this because the files have had Japanese characters added to the content and therefore this is being recorded in the diff? Or is it something to do with localization settings which are being used which is causing information to be added to the diff?

Would you be able to share some content from this file? You can also send me this privately if you prefer.

Thanks for the reply. This issue is ongoing.
We are unable to resolve this issue, so we are removing Japanese file names from the repository to address the issue. We are unable to respond.
SonarCloud is not available for the project.

The commit retrieved from BitBucket is as follows. It seems to be an error if the file name of the difference contains Japanese.

From 4318bdd56ef591b18d7be17975b5aba6e187524d Mon Sep 17 00:00:00 2001
From: **************************
Date: Thu, 29 Jun 2023 15:51:56 +0630
Subject: [PATCH] ****************************
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

—Content-Type: text/plain; charset=UTF-8
Omitted.
… .207\345\231\250\345\205\267\351\253\230\343\201\225.txt" | 4 +±-
Omitted.
13 files changed, 34 insertions(+), 16 deletions(-)

Omitted
diff --git “a/*/\342\227\207\345\231\250\345\205\267\351\253\230\343\201\225.txt" "b//\342\227\207\345\ 231\250\345\205\267\351\253\230\343\201\225.txt”.
index 6c6b4313f…2ef385f9a 100644
— “a/src/building/TestLibComputation/Data/Lighting/CompareExcel/\342\227\207\345\231\250\345\205\267\351\253\230\343\201\225.txt”
+++ “b/src/building/TestLibComputation/Data/Lighting/CompareExcel/342\227\207\345\231\250\345\231\345\205\267\351\253\230\343\201\225.txt”
@@ -1,5 +1,5 @@
v
Omit (but unlike the file name, the contents are in English) **
\No newline at end of file
Omitted
** No newline at end of file
omitted**
2.41.0

I can send you the file unabbreviated.
I would appreciate it if you would allow me to send it to you personally.

Hi @uda,

I have been trying to recreate the failure you are describing locally and within bitbucket however, I haven’t experienced the issue you are seeing. I have tried creating files with Japanese filenames and Japanese content however all seem to be passing analysis on our side. I can see from the analysis that you sent me that there are both Japanese filenames and Japanese string content so I have tried a combination of both however it has currently not been failing.

I have sent you a message to discuss further investigation.

Thanks for investigating.
I will try to create a minimal repository to reproduce.
Please give me some time to get back to you when I am ready.

1 Like

No problem, let me know once you have something and ill take a look.

We have prepared a repository to reproduce this issue, and we hope you will check it.
You can see the error on the pipeline screen after adding a file with a Japanese name.

https://bitbucket.org/youworks/sonarcloud_jpname/src/master/

Hi @uda ,

This is great, I will take a look in our logs today and see if we can do something about this for you. Either way i’ll come back and let you know my findings.

Hi @uda ,

An update for you, I was able to recreate the issue you are experiencing, I am currently investigating to see what is possible here. I’ll try and update again for you soon.

Shane

Thank you for your investigation. Glad to hear reports that you were able to reproduce it.

HI @uda,

I have been taking a further look into this issue with a colleague, we have tried many combinations using different character sets to try and find a working setup for you. We should be able to specify the working characterset for the sonarcloud analysis by setting /d:sonar.sourceEncoding=UTF-8 in the parameters and also potentially by setting the java params - export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8 -Dsun.jnu.encoding=UTF8" . I do now see the file now being processed, however as you can see this is still throwing an error in the process

Caused by: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /opt/atlassian/pipelines/agent/build/日本語ファイル名.txt

Unfortunately, as these failures are still occurring you will need to continue to change the files for now. I believe that there is some issue lower down in the code that is potentially not not handling these other character sets, this will however require some deeper investigation.

Thank you for your ongoing support and investigation into the software issue. We appreciate your efforts in resolving the issue.

1 Like

@shane.findley

Please allow me to share the results of our additional internal investigation.
.NET 6.0 container image does not include Japanese language. We also found that the method of specifying the source code encoding to SonarScanner for .NET was incorrect. After correcting these issues, the error has been resolved. Specifically, we made the following changes

  1. Specify LANG for Locale installation
apt-get update
apt-get install --yes locales-all
export LANG="en_US.UTF-8"
  1. Specify `/d: “sonar.sourceEncoding” as an argument to the dotnet sonarscanner command
dotnet sonarscanner begin .... /d: "sonar.sourceEncoding=UTF-8"

See: youworks / sonarcloud_jpname / Compare — Bitbucket

2 Likes

HI @uda,

I am very glad to hear that you managed to find the source of the issue and thank you very much for sharing your findings with us. I will arrange for this information to be added to our troubleshooting documentation as I can imagine that other users could easily experience a similar situation.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.