Sonar-go (26.3+ engine): O(call-exprs × identifiers) blowup — findReturnTypesForIdentifier linearly

TL;DR — Since the rewritten Go engine landed in 26.3, analysing a single Go package with a few hundred files never completes (one CPU core pinned indefinitely). Root cause is in sonar-go-to-slang/nodeImpl.go: findReturnTypesForIdentifier does a full linear scan of the package-wide info.Uses map on every call-expression, giving O(call-expressions × identifiers) ≈ O(n²) in package size. Profiling, a five-point timing curve, and the source all agree. Suggested fix (a one-time position-keyed index) is below.

Environment

  • SonarQube Community Build 26.5.0.122743; bundled Go analyzer 1.36.0.5922 — i.e. the new Go analysis engine introduced in 26.3 (“new Go analysis engine, up to 30× speedup”).
  • sonar-go-to-slang linux-amd64. Host: 6-core / 11 GB (not resource-constrained — one core saturated, RAM idle).
  • Target: one Go package in a single directory, ~646 .go files.
  • The same codebase analysed fine on our pre-26.3 build (last clean run 15 Apr 2026); it became non-analysable after the upgrade.

Symptom

Analysis never completes. sonar-go-to-slang pins a single core; the Go sensor sits on … folder analyzed, current folder: …/internal/api (N files, parsing) indefinitely. A CI run was killed after 35+ minutes with no progress.

Root cause

From the analyzer source, sonar-go-to-slang/nodeImpl.go:

func (t \*SlangMapper) findReturnTypesForIdentifier(identifier \*ast.Ident) \[]string {
	result := make(\[]string, 0)
	for ident, obj := range t.info.Uses {        // full scan of the package-wide Uses map …
		if ident.NamePos == identifier.NamePos {  // … to find one entry, by position
			signature, ok := obj.Type().(\*types.Signature)
			if ok {
				result = make(\[]string, signature.Results().Len())
				for i := 0; i < signature.Results().Len(); i++ {
					result\[i] = signature.Results().At(i).Origin().Type().String()
				}
			}
		}
	}
	return result
}

findReturnTypes calls this once per call-expression (via mapCallExprImpl). t.info.Uses is the type-checker’s map of every identifier-use → object, and its size grows linearly with the package. The function iterates the entire map to find a single matching entry.

Net cost: O(call-expressions × identifiers), i.e. quadratic in package size.

Suggested fix

Semantics-preserving — keep the exact NamePos matching, but build a position index once instead of scanning per call:

// built once, when t.info is populated — O(identifiers) total:
usesByPos := make(map\[token.Pos]types.Object, len(t.info.Uses))
for ident, obj := range t.info.Uses {
	usesByPos\[ident.NamePos] = obj   // token.Pos is unique per FileSet position → collision-free
}

// in findReturnTypesForIdentifier — O(1) instead of O(identifiers):
obj, ok := usesByPos\[identifier.NamePos]

This collapses total cost from O(call-exprs × identifiers) to ≈O(identifiers) while preserving the current position-based matching exactly.

(If the \*ast.Ident reaching this function is guaranteed to be the same node that was type-checked, obj := t.info.Uses\[identifier] would also work directly — but the original NamePos comparison suggests pointer identity wasn’t being relied on, hence the position-keyed map above.)

Evidence

CPU profile (perf record -F 199 -g on the live converter): 80.96% self-time in main.(\*SlangMapper).findReturnTypesForIdentifier. Call path: mapStmt → mapBlockStmtImpl → mapExprStmtImpl → mapExpr → mapCallExpr → mapCallExprImpl → findReturnTypes → findReturnTypesForIdentifier. Not recursive; invoked once per call-expression.

Not allocation/GC (GODEBUG=gctrace=1): heap stable at ~50–64 MB, GCs ~9 s apart after warm-up, GC share 0%. The time is pure CPU, not allocation pressure.

Time vs. measured input — first N files of the package; N=300 run uncapped to completion. “work” = call-expressions × identifiers (both counted from the AST):

N (files) call-exprs identifiers work (call-exprs × idents) Go-sensor time time ÷ work
50 3,418 21,273 7.3e7 6.0 s 8.2e-8
100 6,700 49,714 3.3e8 15.1 s 4.6e-8
150 10,965 79,395 8.7e8 43.7 s 5.0e-8
200 19,064 126,137 2.4e9 178 s 7.4e-8
300 33,553 207,593 7.0e9 604 s 8.7e-8

Time tracks the call-exprs × identifiers product to within ~2× across a 95× range of work — consistent with O(n²) in package size (both factors grow ≈ linearly with file count). The ratio is roughly flat: the elevated N=50 value is fixed parse overhead before the quadratic term dominates, and the slight rise at N=200–300 is consistent with cache effects as the linearly-scanned map outgrows CPU cache.

Count-driven, not content-driven: a random 200-file sample took 241 s ≈ the alphabetical first-200 (178 s). It isn’t one pathological file or construct; it’s the file/identifier count.

Impact

The only available workaround is sonar.go.activate=false, which disables Go analysis entirely. So for any project with a large single-directory Go package, 26.3+ is a hard works → Go-unsupported regression, not merely a slowdown and it worsens as the package grows. The suspected fix is a few lines.

Reproduction

  1. Any single Go package (one directory) with several hundred .go files containing many call-expressions.
  2. Run analysis with the default sonar.go.\* settings on 26.3+.
  3. Observe sonar-go-to-slang saturate one core and analysis time grow super-linearly with file count (table above). At ~300 files in our package it takes ~10 minutes; larger packages do not complete in practical time.

The converter is invoked as sonar-go-to-slang-linux-amd64 -module\_name <module> -gc\_export\_data\_dir <dir> with the file list passed on stdin.

Happy to provide the full file-count/time data or a sanitised reproduction package on request.