Excessive memory use (3GB+) in RegexContext.globalCache

I work on identifying and fixing memory leaks in Android Studio. For our canary users, when Android Studio is running low on memory we’ll take a heap dump, analyze it, and send back a report with some diagnostic information. Several reports over the last few months have shown large (often multi-gigabyte) structures held by org.sonarsource.kotlin.api.regex.RegexContext.globalCache (BindingContexts, various indices).

Here’s a leaktrace showing paths from GC roots to large structures in the heap:

Subtree size  #instances
3.81GB          1   ROOT: Static field: org.sonarsource.kotlin.api.regex.RegexContext.globalCache
3.81GB          1   (root): java.util.LinkedHashMap
3.72GB          1   +-table: java.util.HashMap$Node[]
3.72GB        407   | []: java.util.LinkedHashMap$Entry
3.72GB        407   | value: org.sonarsource.analyzer.commons.regex.RegexParseResult
2.31GB        245   | +-result: org.sonarsource.analyzer.commons.regex.ast.SequenceTree
2.31GB        245   | | source: org.sonarsource.kotlin.api.regex.KotlinAnalyzerRegexSource
2.31GB         63   | | textRangeTracker: org.sonarsource.kotlin.api.regex.TextRangeTracker
2.31GB         63   | | textRange: org.sonarsource.kotlin.api.regex.TextRangeTracker$Companion$of$1
2.31GB         63   | | $kotlinFileContext: org.sonarsource.kotlin.plugin.KotlinFileContext
 668MB         63   | | +-ktFile: org.jetbrains.kotlin.psi.KtFile
 656MB         63   | | | myManager: org.jetbrains.kotlin.com.intellij.psi.impl.PsiManagerImpl
 155MB         63   | | | +-myProject: org.jetbrains.kotlin.com.intellij.mock.MockProject
 147MB         63   | | | | myPicoContainer: org.jetbrains.kotlin.com.intellij.mock.MockComponentManager$1
 146MB         63   | | | | parent: org.jetbrains.kotlin.com.intellij.mock.MockComponentManager$1
 146MB         63   | | | | this$0: org.jetbrains.kotlin.com.intellij.core.CoreApplicationEnvironment$1
 146MB         63   | | | | myExtensionArea: org.jetbrains.kotlin.com.intellij.openapi.extensions.impl.ExtensionsAreaImpl
 146MB         63   | | | | extensionPoints: java.util.concurrent.ConcurrentHashMap
 146MB         63   | | | | table: java.util.concurrent.ConcurrentHashMap$Node[]
 145MB        630   | | | | []: java.util.concurrent.ConcurrentHashMap$Node
 145MB        630   | | | | val: org.jetbrains.kotlin.com.intellij.openapi.extensions.impl.InterfaceExtensionPoint
 145MB         63   | | | | pluginDescriptor: org.jetbrains.kotlin.com.intellij.ide.plugins.IdeaPluginDescriptorImpl
 145MB         63   | | | | basePath: jdk.nio.zipfs.ZipPath
 145MB         63   | | | | zfs: jdk.nio.zipfs.ZipFileSystem
 145MB         63 * | | | | cen: byte[] 
 462MB         63   | | | +-myFileIndex: org.jetbrains.kotlin.com.intellij.openapi.util.NotNullLazyValue$2
 462MB         63   | | | | myValue: org.jetbrains.kotlin.com.intellij.mock.MockFileIndexFacade
 462MB         63   | | | | myLibraryRoots: java.util.ArrayList 
 462MB         63   | | | | elementData: java.lang.Object[]
 384MB      15246   | | | | +-[]: org.jetbrains.kotlin.com.intellij.openapi.vfs.impl.jar.CoreJarVirtualFile
...
1.08GB         63   | | +-bindingContext: org.jetbrains.kotlin.resolve.BindingTraceContext$1
1.08GB         63   | | | this$0: org.jetbrains.kotlin.cli.jvm.compiler.NoScopeRecordCliBindingTrace
 895MB         63   | | | +-map: org.jetbrains.kotlin.util.slicedMap.SlicedMapImpl
 895MB         63   | | | | map: org.jetbrains.kotlin.util.slicedMap.OpenAddressLinearProbingHashTable
 895MB         63   | | | | array: java.lang.Object[]
 204MB      16441   | | | | +-[]: org.jetbrains.kotlin.com.intellij.util.keyFMap.PairElementsFMap
66.6MB       1365   | | | | | +-value1: org.jetbrains.kotlin.serialization.deserialization.descriptors.DeserializedSimpleFunctionDescriptor
46.2MB        126   | | | | | | containingDeclaration: org.jetbrains.kotlin.load.java.lazy.descriptors.LazyJavaPackageFragment
40.0MB        126   | | | | | | scope: org.jetbrains.kotlin.load.java.lazy.descriptors.JvmPackageScope
37.8MB        126   | | | | | | kotlinScopes$delegate: org.jetbrains.kotlin.storage.LockBasedStorageManager$LockBasedNotNullLazyValue
37.8MB        126   | | | | | | value: org.jetbrains.kotlin.resolve.scopes.MemberScope[]
37.8MB       3339   | | | | | | []: org.jetbrains.kotlin.serialization.deserialization.descriptors.DeserializedPackageMemberScope
26.5MB       3339   | | | | | | impl: org.jetbrains.kotlin.serialization.deserialization.descriptors.DeserializedMemberScope$OptimizedImplementation
18.2MB       3213   | | | | | | functionProtosBytes: java.util.LinkedHashMap
15.4MB       1827   | | | | | | table: java.util.HashMap$Node[]
14.5MB      42021   | | | | | | []: java.util.LinkedHashMap$Entry
8.68MB      42021 * | | | | | | value: byte[] 
62.7MB        814   | | | | | \-value2: org.jetbrains.kotlin.load.java.lazy.descriptors.LazyJavaClassDescriptor
36.4MB        756   | | | | |   +-jClass: org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryJavaClass
7.37MB        756   | | | | |   | context: org.jetbrains.kotlin.load.java.structure.impl.classFiles.ClassifierResolutionContext
...

Unfortunately I don’t know what version of SonarQube or sonar-kotlin-plugin is being used here, though it looks like sonar-kotlin-plugin hasn’t changed in the last 10 months so it’s likely an issue in the latest version - and looking at the code bears that out:

  • the lambda argument to the TextRangeTracker constructor call captures kotlinFileContext here, which in turn holds onto the BindingContext, which will be large - and there is a different BindingContext instance associated with each (i.e., it’s not a singleton long-lived object).

  • the cache entries in RegexContext.globalCache appear to persist indefinitely, and there is no bound on the size of the cache.

I’d recommend trying to find a way to avoid maintaining long-lived references to BindingContexts, and possibly consider bounding the size of the cache as well.

1 Like

It looks interesting. I would like to know more about it. Don’t hesitate to update me further. Thank you so much!

Hello @Nathan_Paige,

Thanks for the report. I’m now looking into this.

I think I have an idea where it’s coming from. I think a heap dump would have helped me much. Could you please send it to me (at least the part starting from SonarLint)?

And it would also be helpful to understand which version of SonarLint is used. So if you manage to find it out, it will help a lot.

Here is the ticket to remove binding context from regex sources.

We also plan to rework the global cache.

Regards,
Margarita

Hi, thanks for looking into this. Unfortunately we don’t collect the actual heap dump (both because they’re quite large, and can contain sensitive information), only the text report generated from analyzing it. If I can reproduce the problem myself I’ll send you a heap dump.

I was able to find out that two of the reports were using SonarLint version 7.1.1.54565, and one was using 7.0.0.52289. Plugin version information for the other 6 is not available. I’m not sure how this maps to sonar-kotlin versions, but just based on timing (i.e., if a release of SonarLint uses the most recent version of sonar-kotlin) it looks like this might still be an issue after 2.8.

I think I’ve managed to reproduce the issue:

  • create a new project with a Kotlin file
  • declare a regex with a repeated pattern that can match the empty string, e.g. val regex = Regex("()*")
  • add and remove the * repeatedly. This triggers EmptyStringRepetitionCheck. Each time, a new entry is added to RegexContext.globalCache.

The heap dump is too large to upload here - how would you like me to send it to you?

1 Like

Thanks for the reproducer @Nathan_Paige
I managed to reproduce the problem on our side. Will take it from here and update you with the results of my investigation.

Best,
Margarita

Thanks for your patience. I found the root cause. SL keeps custom classloaders alive because they need them for longer than a single analysis. On the analyzer side, we considered classloaders will be collected by GC after each analysis which is not the case. So together with a missuse of a static cache it caused a memory leak…

For now, the leak should be fixed by this ticket

Thanks a lot for your help!

Best,
Margarita