5 New Rules for Clean Code with the Pandas Library

Hello, Data Scientists & Pythonistas,

We’re happy to announce the following 5 rules to help you write Clean Code with the popular Pandas library in Python:

  • S6734 “inplace=True” should not be used when modifying a Pandas DataFrame
  • S6735 Using the required parameters for “pandas.merge” or “pandas.join”
  • S6740 Using the required parameters for “pandas.read_csv” or “pandas.read_table”
  • S6741 The “pandas.DataFrame.to_numpy()” method should be preferred to the “pandas.DataFrame.values” attribute
  • S6742 The “pandas.pipe” method should be preferred over long chains of instructions

These rules will be available in SonarCloud shortly and will be available in SonarQube 10.3. SonarLint users can also enjoy quick fixes for rules S6741 and S6735 in the next SonarLint release.

We welcome your feedback on these rules. Do take a look at what’s coming up for Python in SonarLint, SonarQube and SonarCloud .

Jean

6 Likes

Hi Jean,

I’ve tried to create an issue in the issue tracker, but it seems to be closed down.

There is a false-positive in S6742 when using pyspark.sql.DataFrame:

    df = create_spark_df()  # signature def create_spark_df() -> DataFrame:
    df2 = (
        df.where(
           ... 
        )
        .withColumn(
            "name",
            ...
        )
        .where(...)
        .withColumn("count", F.lit(1))
        .transform(
            lambda df: ...
        )
        .transform(lambda df: ...)
        .withColumn(...)
    )

Due to a simple comparison with DataFrame at this line.

Hello @MaicoTimmerman

Big thanks for the reporting.
We created the ticket to fix the issue.

Thanks,
Maksim Grebeniuk