Write High Quality PySpark Python Code with SonarQube

jean.jimbo · March 11, 2025, 9:34am

Hello data engineers,

We heard from many of you that you’d like SonarQube to help you avoid pitfalls when working with PySpark code. We’re happy to share that you can now find performance, maintainability and correctness issues in your PySpark code in Python and Jupyter Notebook files with SonarQube. The following rules are available on SonarQube Cloud and will be available in the next release of SonarQube Server (Developer Edition and above) and SonarQube IDE.

S7181: PySpark Window functions should always specify a frame
S7182: The “subset” argument should be provided when using PySpark DataFrame “dropDuplicates” method
S7187: PySpark Pandas DataFrame columns should not use a reserved name
S7189: PySpark DataFrames used multiple times should be cached or persisted
S7191: PySpark withColumns should be preferred over withColumn when multiple columns are specified
S7192: The “how” parameter should be specified when joining two PySpark DataFrames
S7195: PySpark lit(None) should be used when populating empty columns
S7196: Complex logic provided to PySpark “withColumn”, “filter” and “when” methods should be refactored into separate expressions

We welcome your feedback on these rules.

Jean

jean.jimbo · May 8, 2025, 9:45am

We’ve released 5 more rules for PySpark code.
Learn more here.

Jean

Topic		Replies	Views
5 More Rules for PySpark Discussions python	0	177	May 8, 2025
SonarQube - PySpark Code Quality challenges Report False-positive / False-negative... sonarqube , python	4	6201	March 31, 2022
Can we scan PySpark code with community version 7.9? SonarQube Server / Community Build sonarqube	3	1490	September 28, 2020
5 New Rules for Clean Code with the Pandas Library Sonar Updates python , data-science , pandas	2	1267	February 29, 2024
Does Sonarqube code scanner supports Spark SQL code SonarQube Server / Community Build sonarqube , scanner	2	1548	March 8, 2021

Write High Quality PySpark Python Code with SonarQube

Related topics