False positive: Using row_number in Pyspark does not need a window frame

Language

Python

Rule

pythonenterprise:S7181

Version

Enterprise Edition v2026.1 (119033)

Description

When using Spark Window functions, usually you should specify a window frame to indicate over which rows the window applies.
However, for the row_numberfunction, only one frame is allowed, namely (unboundedPreceding, currentRow).
There is no benefit to specifying this default explicitly, because there is no other frame for which computing row_number makes sense: you have to go over each row in the defined order (within the partition) to give each row a number.
Therefore, I believe that this rule should make an exception for the row_number function, and not require an explicit window frame in that case.

Documentation: pyspark.sql.functions.row_number — PySpark 4.1.1 documentation

Example

from pyspark.sql import SparkSession
from pyspark.sql import functions as f
from pyspark.sql import Window

spark = SparkSession.builder.appName("Example").getOrCreate()
df = spark.range(3)
window = Window.orderBy(df.id.desc()).rowsBetween(Window.unboundedPreceding, Window.currentRow)
df.withColumn("desc_order", f.row_number().over(window)).show()

If the code is changed to any other frame, for example .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)it results in an exception:

pyspark.sql.utils.AnalysisException: Window Frame specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()) must match the required frame specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())

So the correct code would be simply leaving out the frame, since there is only one option:

window = Window.orderBy(df.id.desc())

df.withColumn("desc_order", f.row_number().over(window)).show()

Hello Thomas,

Thank you for bringing this to our attention. We have created a corresponding ticket to track the issue.

Best regards,
Marc