Introducing SonarSweep: Improve training data quality for coding LLMs

manish.k · October 21, 2025, 1:07pm

Sonar is excited to announce SonarSweep, a new service that is designed to improve the coding datasets used by LLMs in model pre-training and post-training (including via supervised fine-tuning and reinforcement learning).

As developers, many of us are now using AI coding tools in our daily work. They can be incredibly helpful for productivity, but we’ve also seen that the quality and security of the code they generate can be inconsistent. Sometimes it’s great, and other times it contains bugs, security vulnerabilities, or maintainability issues.

At Sonar, we’ve been looking into why this happens, and the root cause is simple: an AI model is only as good as the data it was trained on. To address this, we are building SonarSweep.

SonarSweep is engineered to systematically remediate, optimize, and secure coding datasets for model training. It proactively ensures that models learn from high-quality, and secure examples, from pre-training to model alignment—an essential step to building reliable AI coding models. Models trained on data prepared by SonarSweep produced code with up to 67% fewer security vulnerabilities and up to 42% fewer bugs compared to models trained on the original, un-swept data, without loss in functional performance.

Additional detail into the extensive testing can be found in the blog post. SonarSweep is now available in early access.

Topic		Replies	Views
[Webinar] Introducing Sonar AI Code Assurance and AI CodeFix Sonar Updates	1	489	October 29, 2024
[Webinar] Code Faster, Write Cleaner using AI Coding Assistants and Sonar Sonar Updates webinar	2	884	April 24, 2024
Successful adoption of SonarQube to improve code quality SonarQube Server / Community Build sonarqube , azuredevops-services , best-practices	10	4058	August 7, 2018
Source Code Review for Security Vulnerabilities SonarQube Cloud	1	478	January 31, 2019
SonarCloud: Update of our Terms of Service (September 30th) Announcements sonarqube-cloud	0	645	October 9, 2020

Introducing SonarSweep: Improve training data quality for coding LLMs

Related topics