SonarLint VS2022 .cpp System.ArgumentOutOfRangeException

I am using SonarLint in the following environment
Visual Studio 2022 (Version 17.8.5)
SonarLint for Visual Studio 2022 (Version 7.6.0.83111)

When scanning a .cpp file with a character encoding of Shift_JIS, the following error message was output.

[CFamily] Note: the following CFamily rules are not available in SonarLint: cpp:S5536, c:S5536, cpp:S4830, c:S4830, cpp:S5527, c:S5527
[CLangAnalyzer] Analyzing C:\wk\devsecos\sonareval\monosSekkei\Monos.Client.D.Design\Source File\Madoban\Information\CShiyoMdbnDtlSetup.cpp
[CLangAnalyzer] Failed to analyze C:\wk\devsecos\sonareval\monosSekkei\Monos.Client.D.Design\Source File\Madoban\Information\CShiyoMdbnDtlSetup.cpp: System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. 
Parametername:end
   at Microsoft.VisualStudio.Text.SnapshotSpan..ctor(SnapshotPoint start, SnapshotPoint end)
   at SonarLint.VisualStudio.IssueVisualization.Editor.IssueSpanCalculator.CalculateSpan(ITextRange range, ITextSnapshot currentSnapshot)
   at SonarLint.VisualStudio.IssueVisualization.Models.AnalysisIssueVisualizationConverter.Convert(IAnalysisIssueBase issue, ITextSnapshot textSnapshot)
   at System.Linq.Enumerable.WhereSelectArrayIterator`2.MoveNext()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)
   at SonarLint.VisualStudio.Integration.Vsix.AccumulatingIssueConsumer.Accept(String path, IEnumerable`1 issues)
   at SonarLint.VisualStudio.CFamily.SubProcess.MessageHandler.HandleMessage(Message message)
   at SonarLint.VisualStudio.CFamily.SubProcess.Protocol.Read(BinaryReader reader, Action`1 handleIssue)
   at SonarLint.VisualStudio.CFamily.Analysis.CLangAnalyzer.<>c__DisplayClass17_0.<ExecuteSubProcess>b__1(StreamReader reader)
   at SonarLint.VisualStudio.CFamily.SubProcess.ProcessRunner.Execute(ProcessRunnerArguments runnerArgs)
   at SonarLint.VisualStudio.CFamily.Analysis.CLangAnalyzer.ExecuteSubProcess(Action`1 handleMessage, IRequest request, IProcessRunner runner, ILogger logger, CancellationToken cancellationToken, IFileSystem fileSystem)
   at SonarLint.VisualStudio.CFamily.Analysis.CLangAnalyzer.CallSubProcess(Action`1 handleMessage, IRequest request, ISonarLintSettings settings, ILogger logger, CancellationToken cancellationToken)
   at SonarLint.VisualStudio.CFamily.Analysis.CLangAnalyzer.RunAnalysis(IRequest request, IIssueConsumer consumer, IAnalysisStatusNotifier statusNotifier, CancellationToken cancellationToken)
Refreshing PCH file for C:\wk\devsecos\sonareval\monosSekkei\Monos.Client.D.Design\Source File\Madoban\Information\CShiyoMdbnDtlSetup.cpp. PCH file location: C:\Users\user001\AppData\Local\Temp\SLVS\PCH\7cbf8838-645a-49dd-9d44-060b8b432358\PCH.preamble
ERROR: LLVM ERROR: IO failure on output stream: broken pipe
  • Only some Shift_JIS encoded .cpp will have errors. (Most Shift_JIS encoded .cpps do not cause errors.)
  • If the file in error is converted to UTF-8 and saved, no error will occur.

How can I work around this problem without doing a file encoding conversion (Shift_JIS to UTF-8)?

Hey there.

Can you provide an example file where analysis is not working?

As I understand it, clang (which the analyzer is based on) can understand source code in many encodings as if it were UTF-8, as long as special characters are limited to comments & string literals.

Thank you for your reply.

Please take care of the investigation.
I would like to send you the original source code (.cpp) without sanitization due to character encoding related issues.

#Please
Please agree that no external diversion of the source files is allowed.
If you agree, please let me know its address so that I can DM to Colin.

Hey there.

I’ll ping a member of the related team so they can open a DM with you directly.

You can read our policy about sharing files with us here.

Hello.

Thank you for agreeing to keep the source code private.
I would like to communicate with you via private message or e-mail about sending you the source code.
How can I do that?

Hi @toshihiro_hayashi, and thanks for sharing the problem with us,

Unfortunately, the C++ analyzer doesn’t support any encodings other than UTF-8 at the moment, since it is the only supported encoding in clang (the frontend we use to parse C++ files).

We always assume that the C++ file we are parsing is encoded in UTF-8. Shift_JIS, for example, uses 0x5C to encode the letter ¥, whereas, in UTF-8, this corresponds to the backslash \ character. Even when using this character is limited to string literals and/or comments, it can be misinterpreted as an escape sequence, preventing the parser from finding the end of the comment or the end of the string literal. See this comment for an example that demonstrates this. The reproducer you shared with me had a similar problem with a string literal, and this resulted in a crash later after the parsing errors.

At the moment, we don’t have any plans to address this on our side, but I’ve created this ticket to record interest, [CPP-5066] - Jira.

I hope this clarifies and thanks again for sharing the problem with us,

Best regards,
Michael