Security check for sensitive data stored as plain text in xml files

Hi all!

I am wondering whether it would be a good idea to add sonar check for xml (or any other properties file) that would ensure that no credentials/api tokens (secrets in general) are stored in our source control as plain text. What do you think about checks like that?

I could also try to contribute some sample checks like that but I wonder what’s the best approach here. Should I contribute to https://github.com/SonarSource/sonar-xml or there is a way to handle it in more generic way to cover more than just xml config files. I am new in sonar plugin development world and would appreciate some advice.

Thanks!

Hello @bartek.wesolowski

Could you provide some example(s) so that we better understand which pattern do you want to catch?
We already have a related rule RSPEC-2068 “Credentials should not be hard-coded”. But for your case I would expect to have a rule with Security Hotspot type.

Hello @Lena, thanks for your answer.

The rule you linked is exactly what I am thinking about but in context of properties files not executable code. I would like to catch patterns like password, basicAuthPass, apiToken, apiKey (and probably many more) that are sometimes present in project configuration files (xml, json, yaml). Security Hotspot sounds like a good idea for issues like that.

Do you think we could implement something like that in Sonar XML Plugin and plugins for other languages? Maybe it could be done in scope of RSPEC-2068 ?

Thanks for your help.
Cheers, Bartek

Hi @bartek.wesolowski ,

Thank you for your suggestion.
Detecting hardcoded sensitive data for XML, JSON, YAML files makes perfect sense. It can indeed be implemented as rule RSPEC-2068 for SonarXML. We don’t have anything yet for JSON and YAML files.

I would however be very careful with the patterns raising issues. Even when passwords seem hardcoded they might in fact not. While searching in Github I could see things like:

<lb:database ... user="${db.user}" password="${db.password}"/>

Password and secret handling often involves some configuration. You could have a “passwordController” or a “passwordValidator”. Thus matching any “password” substring would raise many false positives.

Security Hotspot rules are not Vulnerability rules with more False Positives. They were created because it is sometime not possible to decide if the code is vulnerable or not and should be reviewed by a human. In the case of Security Hotspot rules, a false positive is code which does not need to be reviewed.

It is perfectly possible to know if a password is hardcoded, the algorithm is usually not complex. The problem is to support all existing configuration formats with a high precision, i.e. knowing where the password is in the file and when it is hardcoded.

I suggest another approach. A first easy step would be to use the best patterns from gitleaks, or equivalent tools. This will detect known credentials formats.

Next, if we want to detect basic hardcoded passwords such as “admin” we would need to know the exact XPATHs used to store password for every possible framework. This is a huge effort and needs to be thought through. We could simply count on the community to provide such XPATHs for popular frameworks such as Spring. There might already be such a list somewhere. I didn’t check.

What do you think of this solution?

Cheers,

Nicolas

Hi @Nicolas_Harraudeau

Thanks for your input. I think that including patterns from gitleaks would be a good idea. Seems like issues found by one of this patterns could be raised as Vulnerabilities. It seems that if any file matches the pattern then probability of potential leak is quite high. Maybe we could create a ticket for that to and introduce it in checks for all supported languages ? Or maybe we could create/extend a plugin that scans pure content of a file without a context of it’s language (if that’s possible).

This approach is great but does not cover cases where password is just a random string which is also the true in many cases. As you mentioned gathering all XPATHs for all popular frameworks would take a lot of time, and should be maintained quite often as number of frameworks grows.

I was thinking about creating a rule that will test all key-value pairs present in config files (like xml or JSON attribures). I would raise an issue only when key matches the the secret qualifier (lets say it would be a set of regex expressions that could indicate that this property might be a secret) and a value does not match placeholder regex (like ${…}, #{…}, {{…}}) - we could look for all major templating engines and ignore placeholders they use. I am aware that this still can raise some false positives, and in rare cases could ignore real secret leak but to be honest I do not see any better solution that could be implemented easily.

What do you think about approach like that?

Hi @Nicolas_Harraudeau, @Lena

Do you think we could include security check like this in the scope of RSPEC-2068 or create similar task for configuration files? Currently I am developing my own custom plugin to handle that issue but I think I would be better to contribute directly to language plugins. I would be happy to help with implementing that.

Thanks for your help:)
Cheers,
Bartek

Hi @bartek.wesolowski,

Thank you for the offer.
We can for sure extend RSPEC-2068 to find patterns from gitleaks, both for XML and programming languages.
The best way to start would be a pull request on sonar-xml.

Some patterns would need to be reworked or removed. I would for example restrict “Password in URL” to raise only when the string starts with “http” or “https”. Looking at gitleaks users’ feedback on false positives is probably the first thing to do.

If you want to submit a PR I can create a ticket on SonarXML beforehand so that it has some context and specifications.

We won’t however include any code detecting generic key-value pairs such as “password” as it would either raise too many false positive, or the rule would be too complex to configure. SonarXML already provides the possibility to write XPath-based custom rules. It is a good enough solution for this use case.

Cheers,

Nicolas