python:S6353 has potentially misleading consequences

The S6353 rule for Python claims that [0-9] is equivalent to \d in the regex engine, which is not precisely true and could cause issues if applied as a blanket statement. \d is actually unicode/locale-aware and thus not the same as [0-9].

For an example where this could bite you, consider the following:

 >>> import re

 >>> re.findall('[0-9]+', '123 ٦٩ 456')
 ['123', '456']

 >>> re.findall('\d+', '123 ٦٩ 456')
 ['123', '٦٩', '456']

As you can see, the results are different. In fact, I would generally argue against using \d and in favor of using [0-9] in most cases in Python for this reason (even though it’s slightly more verbose). Typically, you’d only want to match the actual digits 0-9, and not also the Arabic numerals ٦٩ as in this example.

Hello @chris.redwine,

Thank you for reporting this issue.

After discussing this internally, we believe the rule still makes sense in most cases. I understand this depends on the developer’s intent and we may revisit this in the future.

That said, this seems to be a fundamental clarification that needs to be visible in the rule so that it’s indeed not applied as a blanket statement in contexts where [0-9] is what’s desirable.

I created the following ticket to make sure the rule description clarifies the difference between \d and [0-9].

Hope that makes sense.

Cheers,
Guillaume

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.