[C#] Data loss rules in EF-core

Overview

Hello, we are a group of developers that wish to extend the SonarC# plugin. The rules that concern the risk of data loss in migrations (see below).
Therefore, most of the rules are focused on the MigrationBuilder, and the various methods that can be envoked on it.

Most rules mentioned in this post are currently being developed and, therefore, this post will be updated to represent their current state.

In the following list, we list all the rules, information about each rule, and reasoning behind each rule. This segment is intended to aid the moderators, and the rules’, types and severities are potentially subject to change.

Rules

DropColumn

Language: C#

Reason: 
Dropping a column can lose important data, and should be verified to be intentional.

Tags: ef-core

Type: Bug
Severity: Critical

Display name: 
MigrationBuilder should not unintentionally call "DropColumn"

Description: DropColumn can cause data loss


Noncompliant code:
    migrationsbuilder.DropColumn("Email");

DropTable

Language: C#

Reason:
Dropping a table can lose important data, and should be verified to be intentional.

Tags: ef-core

Type: Bug
Severity: Critical

Display name: MigrationBuilder should not unintentionally call "DropTable"

Description: 
DropTable can cause data loss. 

Noncompliant code:
    migrationBuilder.DropTable("Users");

DropSchema

Language: C#

Reason:
Dropping a Schema can lose a lot of data, 
and should only be used for major restructuring,
therfore it must be verified to be intentional.

Tags: ef-core

Type: Bug
Severity: Blocker

Display name: "DropSchema" should not be called
  
Description: DropSchema can cause structural data loss.


Description: Make sure dropping schema is safe

Noncompliant code:
    migrationBuilder.DropSchema("Database");

DeleteData

Language: C#

Reason: 
DeleteData can lose data.
Most cases of DeleteData are intended, but should be tracked.

Tags: ef-core

Type: Code Smell
Category: Info

Display name: Track uses of "DeleteData"

Description: Track DeleteData method calls

Noncompliant code:
    migrationBuilder.DeleteData(
        "Users","Email", "www.foo.com"
    );

UpdateData

Language: C#

Reason: 
UpdateData can lose data,
if the replacing data is null or empty,
and should be tracked.

Tags: ef-core

Type: Code Smell
Category: Info

Display name: Track uses of "UpdateData"

Description: Track UpdateData method calls

Noncompliant code:
    migrationBuilder.UpdateData(
        table: "Users", keyColumn: "Id", 
        keyValue: "1", column:"Email", value: ""
    );

DropCheckConstraint

Language: C#

Reason: 
DropCheckConstraint can lose database constraints for columns, and should be tracked.

Tags: ef-core

Type: Code Smell
Category: Info

Display name: Track uses of "DropCheckConstraint"

Description: Track DropCheckConstraint method calls

Noncompliant code:
    migrationBuilder.DropCheckConstraint(
        "CheckName", "Users"
    );

DropUniqueConstraint

Language: C#

Reason:
DropUniqueConstraint can database constraints for columns, and should be tracked

tags: ef-core
    
Type: Code Smell
Category: Info

Display name: Track usage of "DropUniqueConstraint"

Description: Track DropUniqueConstraint method calls

Description: 
Noncompliant code: 
    migrationBuilder.DropUniqueConstraint(
        "UniqueEmail", "Email"
    );

DropIndex

Language: C#

Reason:
Dropping indexes unintentionally, can make a database slower.
If an index exists we assume it's needed.  

Tags: ef-core

Type: Bug
Severity: Minor

Display name: "DropIndex" should be used to increase performance

Description: DropIndex can cause performance deficiency.

Noncompliant code:
    migrationBuilder.DropIndex(
        "Email", "Users"
    );

DropForeignKey

Language: C#

Reason:
DropForeignKey is used to remove connections between tables, 
and should only be used cautiously. 

Tags: ef-core

Type: Bug
Severity: Minor

Display name: "DropForeignKey" should be used to increase performance

Description: DropForeignKey can ruin database structure

Noncompliant code:
    migrationBuilder.DropForeignKey(
        "PersonID", "Orders"
    );

Hi @Mcfishflaps and thanks for sharing!

To clarify - do you wish to suggest and add new rules for our SonarC# plugin or do you want to write your own roslyn analyzer and import the issues as 3rd party issues?

If you want to suggest new rules to our analyzer, the process would be:

  • we first need to evaluate the rule suggestions you provided
  • we will clarify and iterate on this forum together with you, and then create the Rule SPECifications in our (RSPEC database)
  • for the RSPECs that are defined in our database, we can accept pull requests
2 Likes

Hi @Andrei_Epure and thanks for the reply!

To respond to your question about whether we want to contribute to the C# plugin or not:
Yes, we want these rules to be added to the existing C# plugin, and to do that we need them in the RSPEC database, and then make a pull request.
So the rules mentioned in the post above, are suggestions for rules to add to the database.

We’ve included suggestions for types, categories, names and such to help in the RSPEC creation process.
(some mistakes were made in the original post, but it seems we are unable to edit the post at the moment)

We will do our best to try and make this iteration process as easy as possible, as we would like for our contribution to go upstream in the near future.

Hi @Mcfishflaps,

Thank you for suggesting these rules. We always appreciate contributions.

I would like to ask you a few questions to make sure that these rules match what SonarQube, SonarCloud and SonarLint are designed for.

From what I read in the “reason” parts of your initial post, calling migrationBuilder.XXX methods is not a problem per se, it just needs a more careful review. It also seems that in most case the reviewer will accept the code change. Did I understand correctly?

I did a rapid research github to see what would be the user experience of such rules. Apart from DropSchema, these methods are used a lot, even DropTable. These don’t look like bugs or maintainability issues. They are mostly used in the Down part of the migration, but they also appear in the Up method.

In our experience (SonarSourcers designing rules), raising bug or code smell issues when there is no bug or maintainability issue has a pretty bad impact on developers’ user experience. Developers end up ignoring issues. SonarQube, SonarCloud and SonarLint are not designed to audit code. Security Hotspots are the only rules asking for a review from developers, and these rules focus on security-sensitive code, which doesn’t seem to be the case here. There is no attack vector as users don’t have access to migration scripts.

Just to be clear I am not saying that auditing code is not a valid need. We often receive this request. I just mean that we can’t force this use case on every user by adding it to the core product. Importing such issues via a roslyn plugin is fine.

Could you provide code examples where we can be 80-90% sure that calling one of the methods above is unintentional?

Thanks,
Nicolas

Hi @Nicolas_Harraudeau,

Thank you for your feedback! We’ve taken up the discussion internally, so expect an update in the coming days.

Hello SonarSource community,

Thank you for your feedback :slight_smile: We would like to adjust our proposal to take your feedback on board, so that our rules will now detect bugs with a low false positive rate.

In particular, we now realise that most of our rules would previously have functioned as audit rules, meaning that our rules would have encouraged the user to manually verify the issues that they detect. However, as @Nicolas_Harraudeau pointed out, we need to pivot our rules away from audit to detecting bugs instead. Our plan is to now say with greater certainty that something is an error.

To achieve this, we propose the following changes to our rules:

  • We would like to slim down the discussion/proposal to just “MigrationBuilder.DropTable()” and “MigrationBuilder.DropColumn()”.
  • We will not detect method calls like “MigrationBuilder.DropTable()” if the call happens in the Down method of a Migration, as that is a business decision to downgrade the database, which in turn loses data, and is most likely intentional.
  • We will not detect method calls to “MigrationBuilder.DropTable()” and “MigrationBuilder.DropColumn()” if the table/column has been used in a SQL command in a previous “MigrationBuilder.Sql()” call. Our analysis below identified this as a useful suppression to reduce our false positive rate.
  • We will not detect method calls to “MigrationBuilder.DropTable()” and “MigrationBuilder.DropColumn()” if the dropped table/column was created in the same Up method. Our analysis below identified this as a useful suppression to reduce our false positive rate.

We identified these changes to our rules by searching GitHub with the following link (searches for all instances where MigrationBuilder.DropTable is called) https://github.com/search?q=migrationbuilder.droptable&type=Code

In particular, we used the search results to design our rules to detect bugs (instead of audit) with as low a false positive rate as possible.

We used a sample of the first 100 files from the search results, and looked at those first 100 files in detail.

Looking at the files, we were interested in 3 things. If the MigrationBuilder.DropTable call was in the Up or Down method, if it was intentional, and was the data in the table preserved before the drop.

Looking at our results, we found that, of the 100 files we looked at, 10 had calls to MigrationBuilder.DropTable within the Up method. In those cases, we looked at the circumstances and found these cases for its use:

By looking at this gathered data and our analysis, we came up with the changes to the rules mentioned above. These changes dramatically reduce our false positive rate from our initial proposal and transform our rules into bug detection (instead of audit).

Based on our GitHub search and the sample we analysed in detail, our rules would detect 6 files: 2 where a rename is performed incorrectly and data is lost, 2 where tables are deleted and data is lost, and 2 where a table is rebuilt in a way where the data is lost.

Hi @Mcfishflaps,

Thank you for taking my feedback into account and doing this investigation. The rules already look a lot better.

I see valuable cases here. Dropping then creating a table of the same name, as in QualityBooks, is definitely suspicious. It deserves at the very least a Code Smell recommending “alter table” instead. I am not sure yet if we can categorize these as bugs. In this case the code seems to drop data intentionally. The goal of the rule could be to educate developers by showing how to use “alter table”.

I am still a bit worried by the validation process. When I look at the link you used to get 100 files to analyze I notice two things: most projet seem to be student projects (github is full of those), there are many “Initial” migrations.

Student projects tend to be unrealistic, and I suspect that they don’t migrate any data but rather use migration scripts as a way to see how their database evolved from one version to the next.
Regarding Initial Scripts, the goal of these rules is not to detect bugs in these scripts. Migration bugs will happen in later migrations. However you are right that the rules should not raise on Initial migration scripts “Down” methods.

Could you check how your rules behave on these files (droptable) and these files (dropcolumn). This demo instance of SourceGraph indexes only popular projects as far as I know, thus there shouldn’t be any student or test project. The query I suggest here removes any file with “Init” or “Test” in its filename, then searches for “migrationbuilder.droptable” in the “Up” method.
The goal in doing this additional validation is to see if we detect more False Positives. I don’t expect to detect real bugs on big projects as these are likely to be fixed.

If there are False Positives we can go for even more specific rules. Ex: Only raise when the same table or column is first deleted then added again.