We have a repo with 40.000 files of plsql code, like packages, packages bodies, procedures, functions, create tables, views, triggers,
types specs, types bodies, and others, like grant, alter table, etc. So, how i should to analyze this? I thought to run scanner each 500 files with “inclusion” parameter.
There are a better way to do this? Is possible break environment?
Indeed when you deal with large repositories with no definite project structure (ie they are mostly a collection of standalone files with no particular way to “build” them), a situation often found with languages like PL/SQL but also COBOL, RPG etc…, it can make sense to break down the file in logical sets of a moderate number of files, so that when 1 file is modified, you only have to re-analyze the set that contains that file in instead of your whole repository of 40’000 files.
So the question is therefore how to split files in sets that make sense ?
The sets should contain:
a sufficient large number of files so that you don’t have to scan too many groups as separate projects, too many projects is also costly in SonarQube
Not too many files also, so that when you need to re-scan before a few file shave changed, you only re-scan the sets that have changed files
Typically sets should contain reasonably between 100 and 10000 files (ideally between 500 and 5000), it depends a bit on the total number of files you have and how many projects you are OK to have in SonarQube.
Each set will be analyzed with a different project key and different exclusions or inclusions patterns.
The first and best option is to group files by “modules” (ie a more or less self sufficient collection of files that have some sort of binding together).
If there is no particular grouping that make sense, then the fall back is simply to group randomly by sets of 500 to 5000 files (as you suggested). Of course the sets should be deterministic so that at next scan you can recreate the same sets (To define sets, you can use the directories the files are in, or the file name (first letter defines the set) or whatever other grouping that works conveniently with your files…)
In the end, you will get in SonarQube as many different projects as you have defined sets.
You need to know in which project is a given file to check its issues.
For the record, in the Enterprise Edition, there’s a feature called Applications that allows to re-aggregate all the sets/projects together to rebuild a unique view of you full repo.
My question is about the first analyze, when i get the master branch code and only after this we’ll to analyze by delivery files really has be modified. So, how i do this?
Are exacly 42900 files and are 35% or more files with 200 lines and i think what at least the 10% should have between 1000 and 3000 lines.
What I am describing is valid both for the first and all subsequent scans.
Every time you scan you’ll have to scan the same way with the same files in each of the sets (except for files that may be removed or added to your overall code base, in which case the must be added or removed from one given set).
In particular, after the first scan (regardless of how you decide to group the files), you cannot re-scan one by one only the files that changed. You need to rescan the whole sets, even if 99% of files have not changed.
I will not get into the details of why but basically each scan supersedes the previous. If you scan a set with 100 files, then the next time you scan only 1, it will consider that this file is the whole set, not that it replaces only 1 file in the set. That would be incremental analysis, this is something we don’t support… yet.