SonarQube feature request: reverse engineer the domain model from code

In any object-oriented software, understanding the domain model—entities and relationships—is essential.

Would it be feasible to implement a codebase scanner that detects entities (in a database-centric project) and builds a simple graph? In this graph, vertices would represent entities (e.g., Order, Invoice, Payment), and edges would represent relationships among them (extracted from calls or references).

This domain model view, stripped of attributes and operations, would allow Sonar users to quickly comprehend a large, undocumented codebase. A feature like this could streamline understanding and help detect maintainability issues, especially in extensive projects where technical debt and complexity often go unchecked.

1 Like

Hi Jack, welcome to the Sonar Community!

Can you share a bit more about your current situation (e.g. programming language, codebase size, etc), and also more about the context in which you’d want to use a feature like this?

For example, is this for onboarding new developers? A change of team or responsibilities? For a newly acquired codebase? Something else?

Any more details about the problem you are trying to solve and the things you have tried would help!

Hi Gabriel,

the specifics are 100,000+ LOC (software system from the e-commerce domain) that someone has written in PHP without any documentation. The use case is program comprehension and understanding for perfective maintenance.

Every object-oriented software system can be understood from its conceptual model. For example, when there is a “/classes” directory with 75+ classes that have relationships among each other, it is hard to build a mental model of all those classes and how they call each other. But, with this scanner it’s a matter of one project scan and it is then understood from one picture.

The problem of program understanding can be solved using static code analysis by reverse engineering the conceptual model from code. This new code scanner would extract classes. Then, it would scan their methods to find if they call one of the extracted classes. From that information, a conceptual model can be created for the whole software system (entities and relationships).

The idea generalizes to all object-oriented languages. I developed a quick and dirty prototype in Java that creates a DOT graph and executes GraphViz to generate an SVG that is searchable by entering a class name. It will let you find the class you are searching for using your web browser which loads the SVG. It is using CTRL+F, usually. Then, you can see what other classes it is connected to and how many connections it has. Something in this PoC is project-specific, i.e. the project I wrote this for calls “query()” to execute a SQL statement and it is scanning a non-MVC ecommerce software system. Graphviz can be downloaded for free from graphviz.org, it is Open Source. Graphviz is called in the PoC (dot.exe) to convert a .dot file into the searchable .svg (the SVG with the conceptual model is resizable and the text always looks perfect because it is vector graphics).

Quick and dirty PoC: [package org.example;import java.io.*;import java.nio.charset.MalformedInpu - Pastebin.com]
A sample screenshot of a part of the SVG it generates: [Imgur: The magic of the Internet]
Searching for a class using CTRL+F works because the SVG contains text.

If this feature would be available in SonarQube, a newly acquired codebase could be immediately understood. This understanding is required for QA and more broadly for maintenance. It can help discover issues in program structure. For example, when nearly everything calls one class then there is a problem with coupling or cohesion. [Coupling (computer programming) - Wikipedia] Such issues are currently not detected by QA in large software systems because there are too many LOC to understand everything. It is a form of technical debt that needs to be measured and ideally remedied to reduce maintenance costs in the long term. If we had this code scanner in Sonar, a rule to highlight every such class as a maintenance issue would be easy to add and as a follow up such classes could be remedied to lower the technical debt. I see its usefulness in Agile codebases of large software systems that have little or no documentation of the code and use Sonar to detect maintainability issues.

1 Like

Hi Jack,

Thank you for sharing this much detail and the things you have tried!

We are looking at how Sonar can help understand and improve architecture, which feels very related to what you mention!

I’ll reach out through private message, in case you want to discuss more about this.