Sonarscanner wrongly identifies a file as utf-8 when it is set as utf 16

Instance Information:

  • We are using SonarQube Server version v2025.4.2 (112048) and SonarScanner for msbuild 10.2.0.117568
  • SonarQube is deployed using zip
  • PR analysis

Hello, for one of our pr’s which adds few xml files as assets for unit tests in tf-16 encoding is getting flagged as utf-8 and causing format error when uploading to our database PostgreSQL. Please find the error log below which happens during background task:

Error Details
org.sonar.ce.task.projectanalysis.component.VisitException: Visit of Component {key=********:SourceCode/Standards/PCIe/PCIeProduct.UnitTests/Assets/BackChannelOptimization_M8047C.xml,uuid=43e9ac91-d191-435a-935f-9c80155390e4,type=FILE} failed
	at org.sonar.ce.task.projectanalysis.component.VisitException.rethrowOrWrap(VisitException.java:44)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visit(DepthTraversalTypeAwareCrawler.java:41)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitChildren(DepthTraversalTypeAwareCrawler.java:95)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitImpl(DepthTraversalTypeAwareCrawler.java:54)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visit(DepthTraversalTypeAwareCrawler.java:39)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitChildren(DepthTraversalTypeAwareCrawler.java:95)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitImpl(DepthTraversalTypeAwareCrawler.java:54)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visit(DepthTraversalTypeAwareCrawler.java:39)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitChildren(DepthTraversalTypeAwareCrawler.java:95)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitImpl(DepthTraversalTypeAwareCrawler.java:54)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visit(DepthTraversalTypeAwareCrawler.java:39)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitChildren(DepthTraversalTypeAwareCrawler.java:95)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitImpl(DepthTraversalTypeAwareCrawler.java:54)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visit(DepthTraversalTypeAwareCrawler.java:39)
	at org.sonar.ce.task.projectanalysis.source.PersistFileSourcesStep.execute(PersistFileSourcesStep.java:73)
	at org.sonar.ce.task.step.ComputationStepExecutor.executeStep(ComputationStepExecutor.java:90)
	at org.sonar.ce.task.step.ComputationStepExecutor.executeSteps(ComputationStepExecutor.java:81)
	at org.sonar.ce.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:68)
	at org.sonar.ce.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:75)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.executeTask(CeWorkerImpl.java:212)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.run(CeWorkerImpl.java:194)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:160)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:135)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:87)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:128)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:80)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:842)
Caused by: java.lang.IllegalStateException: Cannot persist sources of BitifEye_ValiFrame_AYJn6DSFD-p1i8Lrcyyi:SourceCode/Standards/PCIe/PCIeProduct.UnitTests/Assets/BackChannelOptimization_M8047C.xml
	at org.sonar.ce.task.projectanalysis.source.PersistFileSourcesStep$FileSourceVisitor.visitFile(PersistFileSourcesStep.java:99)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitNode(DepthTraversalTypeAwareCrawler.java:76)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visitImpl(DepthTraversalTypeAwareCrawler.java:51)
	at org.sonar.ce.task.projectanalysis.component.DepthTraversalTypeAwareCrawler.visit(DepthTraversalTypeAwareCrawler.java:39)
	... 32 more
Caused by: org.apache.ibatis.exceptions.PersistenceException: 
### Error updating database.  Cause: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
  Where: unnamed portal parameter $12
### The error may exist in org.sonar.db.source.FileSourceMapper
### The error may involve org.sonar.db.source.FileSourceMapper.insert-Inline
### The error occurred while setting parameters
### SQL: insert into file_sources     (       uuid,       project_uuid,       file_uuid,       created_at,       updated_at,       binary_data,       line_hashes,       line_hashes_version,       line_count,       data_hash,       src_hash,       revision     )     values     (       ?,       ?,       ?,       ?,       ?,       ?,       ?,       ?,       ?,       ?,       ?,       ?     )
### Cause: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
  Where: unnamed portal parameter $12
	at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:199)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.insert(DefaultSqlSession.java:184)
	at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:62)
	at org.apache.ibatis.binding.MapperProxy$PlainMethodInvoker.invoke(MapperProxy.java:141)
	at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:86)
	at jdk.proxy2/jdk.proxy2.$Proxy77.insert(Unknown Source)
	at org.sonar.db.source.FileSourceDao.insert(FileSourceDao.java:97)
	at org.sonar.ce.task.projectanalysis.source.PersistFileSourcesStep$FileSourceVisitor.persistSource(PersistFileSourcesStep.java:126)
	at org.sonar.ce.task.projectanalysis.source.PersistFileSourcesStep$FileSourceVisitor.visitFile(PersistFileSourcesStep.java:97)
	... 35 more
Caused by: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
  Where: unnamed portal parameter $12
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2734)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2421)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:372)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:518)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:435)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:196)
	at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:182)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
	at org.apache.ibatis.executor.statement.PreparedStatementHandler.update(PreparedStatementHandler.java:48)
	at org.apache.ibatis.executor.statement.RoutingStatementHandler.update(RoutingStatementHandler.java:75)
	at org.apache.ibatis.executor.ReuseExecutor.doUpdate(ReuseExecutor.java:52)
	at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
	at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:197)
	... 43 more

I have checked at the scanner logs on Jenkins and i didn’t find any errors there. Please let me know what can I do to sort these errors.

Hello, can someone have a look at this issue and suggest how to resolve this?

Hi,

Is all of your codebase in tf-16? Do you declare a sonar.sourceEncoding value? And if so, what is it?

 
Thx,
Ann

Hello,

Thank you for the reply. No most of our codebase is in utf-8 but some of xml files have utf-16 encoding. We never had any issues before but suddenly changing few of the existing ones triggered this. We do not specify anything regarding encoding for sonar.sourceEncoding.

Hi,

Have you always had a mixed-encoding code base, or is it only with the addition of these few files that you’ve introduced encoding variety?

 
Ann

We always had a mixed encoding base. But as said earlier most of it is utf-8 just some xml config/data files are in utf-16

Hi,

Thanks for the details. This has been flagged for the team. Hopefully they’ll be along soon.

 
Ann

Hello @sk2312,

Thank you for the report.
The root cause of the issue is still unclear, but we created a ticket to prevent the analysis from failing in this situation.
It should be included in the next SonarQube Server release.

In the meantime, if that is an option for you, changing the encoding of all the analyzed files to UTF-8 should solve the issue.

Hello Eric,

Thanks for the information. Please correct me if my understanding is wrong. With next version if something like this comes up the file will still fail to upload but the error will be suppressed ?

Hi,
No the file will be analyzed, but it may miss SCM information, which is used mostly for issue auto-assignment.