The right way to scan other files next to Java

Hi there,

I’m evaluating writing custom rules in a sonarqube plugin. I followed the tutorial and understand how to create rules on Java files, which works pretty well.
However, I need to read files next to the Java files, e.g. want to read

  • Manifest.MF to check certain conventions regarding OSGi
  • XML Files, because the dependency injection framework uses them (OSGi DS)
  • feature, we work with Cucumber together with Java, so I want to add custom rules on *.feature files

What is the best practice and best means to implement a custom scanner on those files? I’m aware that I might have to parse and interpret the contents of the files myself, but this is what I would expect.


I would like to get some advice from community on this topic as well.

The core of the issue is that typically plugins define a “Language” which maps language id to set of file extensions. Then the language can be used in the rule definitions, quality profiles etc. etc.

The issue however, that only one plugin can claim file extension. So if the plugin intends to check say, .xml files of certain kind, it’s likely this will result in a conflict with the existing plugin (SonarXML for example).

Looking for workarounds of this issue, we so far settled on defining a Language that has an empty list of file extensions, but I’m not certain that this is a best approach.

I would appreciate if someone can comment on this.


Hey @SuperOok, @ak1394 ,

Thank you for your patience, and thanks @ak1394 for the ping, or I would have missed it. I’m not sure I will be able to answer all your questions, but I might help to clarify stuff a bit. Let me try.

Happy to see that it works for you! As being part of the team maintaining it, I truly appreciate the feedback, and I’ll share it with my teammates!

Unfortunately, I’m not aware of any other plugin currently reading .MF files, so you might just have to write your own parser.

Again, not sure there is any plugin out there already parsing these files. So you will probably have to write your parser as well.

On this, if the language is not covered yet, it’s indeed simpler to start your own custom plugin for SonarQube. You will then be able to define the language, but only if it’s needed.

If a language (let’s say XML) is already defined by another plugin, the language definition is indeed claimed, but you can still perfectly query files of this language in your sensor through the filesystem API of SonarQube. You can also decide to simply query files by extensions (or following any predicate you see fit), and do the parsing on your side!

There is not necessarily conflict here. If you take the example of XML again, the SonarSource’s XML analyzer defines the language (and does the highlighting), but, on top of this, the Java analyzer still queries, parses, and analyzes XML files, while it does not declare the language!

To see more about this, I would recommend you to have a look at the XmlFileSensor of the Java Analyzer. Which will show how rules are executed.

For the parsing of XML files, The Java analyzer relies on a SonarSource open source library which provides all the useful methods to parse and execute (mostly XPath) java-written rules: SonarSource Analyzer XML Parsing Commons.

You probably don’t need to define a language at all. The only trick would be to be sure that you publish files that you analyze, using the SonarQube API to do so (check this line from SonarSource Java Analyzer’s XmlFileSensor). This would allow SonarQube to show your findings, even if the language is not declared.

From what I remember, however, you might have to setup your analysis/SonarQube instance to also scan/import “unknown” files (meaning the one not assigned to a language by any plugin.)

Hope this helps,

Thanks a lot for the feedback Michael!

Just as a sanity check though, does defining a language with no extensions raises any flags though?

I’d prefer this approach, because it allows to remove a dependency on other plugin which defines say XML language, and allows to define default Quality Profile (so that users won’t have to create one manually, as there can be only one default quality profile for the language).

With regards of scanning “unknown” files, I think it’s done automatically in the more recent version of SonarQube (see the update to Ann’s answer) java - How to use two different plugins on one type of file in SonarQube - Stack Overflow

Hi folks,

Here are a few additional informations that may be useful:

  • if you only need to read the content of files like MANIFEST.MF or XML files for your analyzer “understanding”, one possibility is to use regular
  • having MANIFEST.MF or XML files considered as InputFile by the SonarQube Scanner is only important if you intend to report issues/metrics/… on those files.
  • all files that are part of sources (sonar.sources) are considered as InputFile. We removed long time ago the necessity to declare a language.
  • files without language are not automatically published to the server, until at least one issue/measure/… is saved on them by a Sensor (or when using SensorContext::markForPublishing())
1 Like

Thanks for info Julien!

I think out of all options, declaring a language with no extensions and then using predicates().matchesPathPattern(…) works the best for us. We raise issues on the files, and declaring a language allows us to have a default Quality Profile which makes it easier to install plugin.