Combining multiple build wrapper outputs into one

Got a question related to a problem I am working right now and trying to figure out how to do this in a way that will be sustainable and work without build process and general product architecture.

We have a large monorepo that hosts several of our product architectures. It’s a large C code base that is used to build multiple variants of our product, each one designed for a different architecture. We want to build and scan the code on PRs which I already verified works with the build-wrapper. But, the PR model presents a problem. When a dev opens a PR with some changes, we don’t know which architecture or architectures are in play so we need to build them all. That means multiple builds, each one producing multiple a build wrapper output. We can scan the appropriate parts once (it’s a single code base) but I have a problem of multiple build wrapper inputs into that scan. How do I solve this?

I have seen some threads with a similar problem. One suggestion was to wrap all builds with a single build wrapper command. Essentially, each arch is build sequentially, one wrapper output is produced and things work. This process would be very slow for us and I don’t think this would be acceptable. Since the wrapper output is a JSON file, is it possible to essentially combine multiple wrapper outputs into 1 valid schema and feed that into the scanner? This seems very doable with a simple script or something along those lines. The obvious problem is that wrapper output is not a guaranteed product thing so it could change without notice and break this. I also don’t know how the scanner would react if we have some duplicate entries, for example.

Looking for guidance and suggestions on how to approach this. Would a compilation DB help with this instead?

Hello @michalszelagsonos,

Having a single build wrapper command does not means that this command cannot run sub-tasks in parallel, so this would not means that everything would be sequential (but it would means it only executes on 1 machine, which might not be what you want).

However, the approach you suggest will not work: If, within one analysis, we see the same source file being built several times, we only analyze it once. It would be the same if you merge together JSON files (build wrapper or compilation database).

Currently, if you want the same file to be analyzed in different configurations, it has to be part of different analyses. There are usually two ways to do that:

  • One “branch” for each configuration, this does not work well with PR decoration.
  • One project for each configuration (each project will count toward the total LOC in term of license costs). This can work correctly with PR decoration if you can use the monorepo feature (you need to have enterprise edition).

We are studying how to improve this situation, you can have a look at and maybe vote for this feature.

Hope that helps…

This is interesting. Few more questions:

  • if I were to run multiple builds for different architectures, could I fork each build as a separate child process and the build wrapper would detect the compiler calls from the forked process? I guess I’d like to get more clarification on the sub-task idea.
  • I am not after reanalyzing the same file multiple times under different builds. I think I am fine with the scanner analyzing the first one it finds although I am not sure what the implications of that would be as far as correctness of the scan. The problem is that the way our code is structured, when I get a code change, I don’t know which architecture I need to build so I have to build them all so that I have all the compiler calls captured that will be required to scan the code that is changing. It’s not a precise approach. So, coming back to my original question, if I were to just combine the wrapper outputs from multiple builds, with duplicates, would that work?
  1. The build wrapper monitors all child processes of the root process, even if several of them are launched in parallel (forked).

  2. Merging several build-wrapper from several builds should work. It will have several drawbacks:

    • Issues in files compiled in several configurations (which may have an issue in one config, but not in the others, depending on the value of macros, of fundamental type sizes…) can flicker
    • For the same reason of only version of the file being selected each time, caching might not work, leading to an unnecessary long analysis.
      I cannot guarantee that for future versions, but in practice, if you can ensure that the order of the different configurations in your file remains stable, those problems will probably not appear.

Great, this is very useful, thank you!

I also found this thread from a few months ago that seems like the same problem, but it focuses on merging multiple build databases. Posting here for completeness for other folks to see.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.