PHP: emojis in file names result in Crash of Scanner

Must-share information (formatted with Markdown):

  • Community Build v26.5.0.122743
  • how is SonarQube deployed: Docker

what are you trying to achieve?

I wanted to scan a php project where some filenames have an emoji in it’s name. Those are Livewire components, which may use it by default to indicate that it is an livewire component. Don’t argue please if it is good or bad, that’s a different topic and I have no influence on it. You can read here more about it: Quickstart | Laravel Livewire

When the scanner hits the file I get the following error:

ERROR Error during SonarScanner Engine execution
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: tests/Fixtures/blade/views/components/���volt-button.blade.php
	at java.base/sun.nio.fs.UnixPath.encode(Unknown Source)
	at java.base/sun.nio.fs.UnixPath.<init>(Unknown Source)
	at java.base/sun.nio.fs.UnixFileSystem.getPath(Unknown Source)
	at java.base/java.nio.file.Path.resolve(Unknown Source)
	at org.sonar.scm.git.IncludedFilesRepository.collectFiles(IncludedFilesRepository.java:72)
	at org.sonar.scm.git.IncludedFilesRepository.collectFilesIterative(IncludedFilesRepository.java:79)
	at org.sonar.scm.git.IncludedFilesRepository.indexFiles(IncludedFilesRepository.java:51)
	at org.sonar.scm.git.IncludedFilesRepository.<init>(IncludedFilesRepository.java:41)
	at org.sonar.scm.git.GitIgnoreCommand.init(GitIgnoreCommand.java:37)
	at org.sonar.scanner.scan.filesystem.ProjectFilePreprocessor.execute(ProjectFilePreprocessor.java:105)
	at org.sonar.scanner.bootstrap.SpringScannerContainer.doAfterStart(SpringScannerContainer.java:349)
	at org.sonar.core.platform.SpringComponentContainer.startComponents(SpringComponentContainer.java:227)
	at org.sonar.core.platform.SpringComponentContainer.execute(SpringComponentContainer.java:206)
	at org.sonar.scanner.bootstrap.SpringGlobalContainer.doAfterStart(SpringGlobalContainer.java:146)
	at org.sonar.core.platform.SpringComponentContainer.startComponents(SpringComponentContainer.java:227)
	at org.sonar.core.platform.SpringComponentContainer.execute(SpringComponentContainer.java:206)
	at org.sonar.scanner.bootstrap.ScannerMain.runScannerEngine(ScannerMain.java:157)
	at org.sonar.scanner.bootstrap.ScannerMain.run(ScannerMain.java:72)
	at org.sonar.scanner.bootstrap.ScannerMain.main(ScannerMain.java:56)

Clearly it is an incompatibility between the scanner and the file name.

Do you plan to fix this issue?

I have seen there was also in the past a similar issue, but there was no response.

Edit:
Found a solution which worked for me:
I set LANG/LC_ALL: C.UTF-8 in env vars and added `-Dsonar.test.exclusions=tests/Fixtures/** -Dsonar.exclusions=tests/Fixtures/**` to my pipeline.

Hello J.Renfordt and welcome to the community!

When non-ASCII characters are present in file names, indeed our docs prescribe to set LC_ALL and LANG to UTF-8. From your post, I gather that you have set these, but you have also excluded the affected file(s) with sonar.test.exclusions (btw if the affected file is in a test module, then sonar.exclusions is irrelevant here). I encourage you to try running the analysis with the env variables set to UTF-8 but without the exclusions. It should work fine, and that way you can benefit from analyzing those files too.