Language
Python
Rule
pythonenterprise:S7181
Version
Enterprise Edition v2026.1 (119033)
Description
When using Spark Window functions, usually you should specify a window frame to indicate over which rows the window applies.
However, for the row_numberfunction, only one frame is allowed, namely (unboundedPreceding, currentRow).
There is no benefit to specifying this default explicitly, because there is no other frame for which computing row_number makes sense: you have to go over each row in the defined order (within the partition) to give each row a number.
Therefore, I believe that this rule should make an exception for the row_number function, and not require an explicit window frame in that case.
Documentation: pyspark.sql.functions.row_number — PySpark 4.1.1 documentation
Example
from pyspark.sql import SparkSession
from pyspark.sql import functions as f
from pyspark.sql import Window
spark = SparkSession.builder.appName("Example").getOrCreate()
df = spark.range(3)
window = Window.orderBy(df.id.desc()).rowsBetween(Window.unboundedPreceding, Window.currentRow)
df.withColumn("desc_order", f.row_number().over(window)).show()
If the code is changed to any other frame, for example .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)it results in an exception:
pyspark.sql.utils.AnalysisException: Window Frame specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()) must match the required frame specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())
So the correct code would be simply leaving out the frame, since there is only one option:
window = Window.orderBy(df.id.desc())
df.withColumn("desc_order", f.row_number().over(window)).show()