Columnstore index

The columnstore index provides a more performant way to retrieve data from large tables in queries that use a subset of all the columns in the table.

The columnstore index caches the data of the related table in a compressed column-oriented format, which improves data locality for similar values. A query that uses the columnstore index benefits from more efficient I/O and CPU usage, which leads to shorter runtimes.

A multicolumn columnstore index can be used with query conditions that involve any subset of the index columns. Like a BRIN index and unlike a B-tree index, the columnstore index effectiveness is the same regardless of which index columns the query conditions use. EPAS limits all multicolumn indexes to a maximum of 32 columns, see Multicolumn Indexes. If your analytical workload uses more columns, you might consider creating multiple columnstore indexes per table. For most use cases, a single columnstore index per table is enough.

Use the columnstore index to cover the most commonly accessed columns. If columns are only used for archiving or rarely accessed, leave them out to save storage and to speedup the cache generation.

A columnstore index scan combines data from the columnstore cache and data from the original table that is not yet in the cache. Data from the original table is read with the same algorithm used by the Accelerated sequential scan.

The cache of the columnstore index is kept up to date using background workers whilst maintaining transactional consistency. Those background processes are identified as columnstore update processes. The background processes go through all databases one by one, updating any columnstore indexes that exist, until all are up-to-date again. Thereafter the updater sleeps for the specified period of time columnstore_updater_sleep_time after which it starts again.

If you want to update your columnstore indexes at a specific timepoint, e.g. because of an ETL workload, execute SELECT swarm64da.columnstore_update_full() to fully update the cache of all your columnstore indexes. During the full update execution, the columnstore indexes might have a reduced performance until the update is finished or interrupted. The operation is blocking, similar to VACUUM, and will wait until the indexes are available for updating.

Like other indexes, the columnstore index has full crash-safety and replication support. This allows you to have the indexes maintained on the primary node and have updates propagated to all (read-only) replicas that then use these indexes immediately without any overhead.

The columnstore index is only supported when the old_snapshot_threshold is set to -1, and default_transaction_isolation is set to a value different than serializable.

The columnstore index displays in the query plan as S64 DA Columnstore Scan. The EXPLAIN ANALYZE VERBOSE command displays details about the usage of the index.

The columnstore index scan costs are determined by the swarm64da.cost_scan_page and swarm64da.cost_scan_startup settings.

Columnstore creation

A columnstore index has to be created explicitly, like any other index, for the target table and columns.

The command CREATE INDEX orders_idx ON orders USING columnstore (o_orderkey, o_custkey) WITH (compression_level=3, compression_type='zstd') creates a columnstore index called orders_idx for the table orders. The index covers the columns o_orderkey and o_custkey. In contrast with other indexes, the specific order of the columns does not matter.

The compression level is optional and by default 5. Its valid levels go from 1 to 9.

The compression type is optional and by default zstd. You can specify lz4, zstd, and uncompressed.

For the complete syntax of index creation, see CREATE INDEX.

To choose between zstandard (zstd) or LZ4 compression, use LZ4 when your system has a lot of I/O available. Otherwise, use zstd.

In most cases, you can leave the default compression level as is. When your workload is very ingest-heavy, lower the compression level. When your workload is very scan-heavy, increase the compression level.

When changing the compression levels and types, it is not necessary to recreate the index. Instead, you can use the ALTER INDEX SET (compression_level=9).

Settings

swarm64da.enable_columnstore (boolean)

Enables or disables the columnstore index scans. The default is on.

swarm64da.enable_columnstore_updater (boolean)

Enables or disables background updates for the columnstore indexes. The default is on.

swarm64da.columnstore_updater_sleep_time (integer)

The sleep time between each round of updating all columnstore indexes on all databases. Avoid long sleep times, as frequent rounds are faster and have a smaller footprint in the system. The default is 1 minute.

swarm64da.columnstore_updater_workers (integer)

The number of background workers that update the columnstore indexes. The default is 4.