Diagnose performance difference between two environments

Just an update and this will conclude this issue for us… this really does all revolve around disk and network (latency and/or throughput). Things we were doing that are the culprits:

  • Database in different network
  • (helm) Using Azure Fileshare for persistent storage
  • (VM) Slow disks for temporary storage and elasticsearch storage

Going forward, we are going to switch over to using a dedicated VM with fast storage. Azure ephemeral disks are good for this, and with a VM we don’t have to chase probe timeouts over time as our database grows like with a helm deployment.