[Webinar] Towards a High-Quality Production AI/ML Python Code

camille.vaissiere · November 13, 2024, 3:51pm

Hi all!

Here are the details of our November Monthly webinar!

On November 20th, Jean Jimbo, Product Manager for the Python ecosystem, will walk you through how the Sonar solution can help you move toward a higher-quality Python Code for Data & ML, using real-world examples.

Title: Towards High-Quality Production AI/ML Python Code
Date and Time: 2024-11-20T16:00:00Z
Speaker: Jean Jimbo, Product Manager

Who should attend the webinar: Python developers, AI/ML engineers & researchers, MLOps, Data Scientists and Data Engineers and Data/AI/ML managers.

Register now!

Interested in the topic but don’t think you can make it to the live session? Register here and receive the recording after the webinar!

camille.vaissiere · November 21, 2024, 1:32pm

Hi everyone!

Thanks to all who attended our session yesterday. Please find below the questions asked during the webinar and the resources mentioned by the speaker.

Q&A

Q: Do Sonar’s internal Machine Learning experts use SonarQube?

A: Yes, we dogfood extensively. Personally, it found some bugs in my code which otherwise would have crashed my script after training models, and before saving it to disk.

Q: How can we better control our workflow’s entropic/noisy/chaotic nature?

A: There are ways to channel the chaos or abstract it out. Look into bootstrapping your next project (repo templates; kedro etc). A good project structure goes a long way. Use the linters you’re comfortable with (SonarQube for ide; pyblack+ruff etc). You have to be intentional about it. Some of your time would need to be invested in managing the chaos/noise.

Q: How does Sonar help with MLOps?

A: Deployment and cloud management is an open avenue for us as of now. But if you’re writing webservers, managing secrets and credentials in your repo, or writing complex monitoring pipelines, you should want to make sure your code smells good; and minimize potential vulnerabilities. We can help there.

Q: How does one effectively balance code maintainability and optimizing performance? Especially with large datasets?

A: Finding that sweet spot is key. What is clear is that the benefits of prioritizing clear, readable code pays-off when you need to understand what to change. It is harder to find performance bottlenecks and tweak code that is harder to understand. Premature optimization also leads to complex hard to maintain code.

Q: Can you recommend the best tools for reproducibility of ML experiments?

A: As far as I can recall, there are no specific libraries that help with ensuring reproducibility. But there are some steps you can take. For instance, fixing your project’s requirements. So instead of adding ‘torch’ in your requirement files, you should specify precisely which version of libraries you’re using. Other things like ensuring every time you generate some randomness, you control the state/seed. Correct me if I’m wrong Jean but there is a rule in Sonar that helps with making sure you’re setting the seed correctly and consistently, especially across multiple torch backends. But beyond that, I guess, it’s just another thing to be intentional about.

Another crucial factor in ensuring reproducibility is of course taking care of the data and configuration versions. Not a single tool, but I highly recommend using some sort of an experiment and data version tracking system (like tensorboard/weights and biases for the former; DVC for the latter, or even your own S3 bucket management-based solution). Then ideally, you should make sure that your code version (git hash), data version (dvc version id), and experiment version (all your configurations) are recorded for each experiment. I find that is a good enough insurance of reproducibility, with a reasonable amount of effort.

Q: Do you have samples or recommendations around CI/CD pipelines for Python code bases and especially notebooks? It’s very different from traditional Java ci/cd processes.

A: This is dependent on your pipeline requirements and tooling integrations (eg Databricks notebooks via Azure DevOps). It is also an emerging field with endless options, start with the best DevOps practices and modify them based on your needs.

Q: We use Sonar Scan for Python for our ML Algos, does Sonar offer something more specific for ML/AI?

A: Yes, we do have some rules specific to ML/AI. We have rules specific for PyTorch, Pandas, numpy and other commonly used ML libraries. Have a look at Python static code analysis

Q: Does the Sonar platform also offer any utility code tools to fix anomalies/issues automatically using AI capabilities?

A: Yes, the commercial editions of SonarQube (Cloud and Server) include AI CodeFix that uses LLM to offer code fix suggestions. Have a look here: AI CodeFix | Reduce Static Code Bugs & Vulnerabilities

Resources

Machine Learning Robustness: A Primer - Houssem Ben Braiek, Foutse Khomh
Hidden technical debt in Machine learning systems. D. Sculley et al
Machine Learning: The High-Interest Credit Card of Technical Debt D. Sculley et al
What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities - Chattopadhyay et al
A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks - Pimentel et al
MAD landscape - https://mad.firstmark.com/

Topic		Replies	Views
[Webinar] Code Faster, Write Cleaner using AI Coding Assistants and Sonar Sonar Updates webinar	2	894	April 24, 2024
[Webinar] Delivering High-Quality and Secure AI Code with SonarQube Sonar Updates webinar	0	64	May 13, 2025
[Webinar] Introducing Sonar AI Code Assurance and AI CodeFix Sonar Updates	1	548	October 29, 2024
Take our survey: Data Science and ML Practitioners - We want to hear from you! Sonar Updates python , data-science , machine-learning , jupyter-notebooks , product-research	0	22	November 14, 2025
[Webinar] Build better, faster: Supercharge your developers in 2025 Sonar Updates coverage	1	135	February 14, 2025

[Webinar] Towards a High-Quality Production AI/ML Python Code

Q&A

Resources

Related topics