Remove schema name assumptions when checking usage of UTF-8

SonarQube does a check when trying to determine whether or not all columns are UTF-8 in PostgresCharsetHandler.java. It assumes it’s data lives in the public schema in PostgreSQL (which by itself is probably the wrong schema to use). This breaks when the PostGIS extension is installed in PostgreSQL because it has a table in the public schema which uses C collation.

Using a schema per tenant is quite common in PostgreSQL because in PostgreSQL a schema is the equivalent to what MySQL would call a “database”. If you truly create a “new database” in PostgreSQL you just create a totally separate PostgreSQL instance. Which needs it’s own high-availability, own monitoring infra, own backup infra, etc. In MySQL the equivalent would be a totally new MySQL installation. Hence the common use of a schema per tenant in the PostgreSQL world.

EDIT: To clarify the “probably the wrong schema to use” comment: The public schema in PostgreSQL is, as the name implies, by default readable and writable by every user in PostgreSQL. This means for example that a user used by let’s say a PostgreSQL Prometheus Exporter also has access to all (SonarQube) data in it. Which makes it a security hazard in my opinion to use the public schema for application data.

The offending code:

  private void expectUtf8Columns(Connection connection) throws SQLException {
    // Charset is defined globally and can be overridden on each column.
    // This request returns all VARCHAR columns. Charset may be empty.
    // Examples:
    // issues | key | ''
    // projects | name | utf8
    List<String[]> rows = getSqlExecutor().select(connection, "select table_name, column_name, collation_name " +
      "from information_schema.columns " +
      "where table_schema='public' " +
      "and udt_name='varchar' " +
      "order by table_name, column_name", new SqlExecutor.StringsConverter(3 /* columns returned by SELECT */));
    Set<String> errors = new LinkedHashSet<>();
    for (String[] row : rows) {
      if (!isBlank(row[2]) && !containsIgnoreCase(row[2], UTF8)) {
        errors.add(format("%s.%s", row[0], row[1]));
      }
    }

    if (!errors.isEmpty()) {
      throw MessageException.of(format("Database columns [%s] must have UTF8 charset.", Joiner.on(", ").join(errors)));
    }
  }

Hi,

Welcome to the community!

It’s not clear to me what you’re asking for. You want SonarQube’s behavior changed w/r/t how it handles using a Postgres database?

 
Ann

Hi!

Yes, this is indeed about a request for change at how SonarQube deals with Postgres.

SonarQube assumes it’s installed in the public schema of PostgreSQL. This is not always the case. If it is not the whole UTF-8 check does nothing. In my case it does something worse because it finds a view which is part of PostGIS (so totally unrelated to SonarQube) which has a C collation. The DB itself is of course UTF-8 by default. However PostGIS just creates this single column in a single view with a C collation (for whatever reason) in the public schema.

On a related note (though not really part of this issue) installing SonarQube in the public schema could be considered bad practice since the public schema is read-write for all users in PostgreSQL.

1 Like