The columnstore index provides a more performant way to retrieve data from large tables in queries that use a subset of all the columns in the table.
The columnstore index caches the data of the related table in a compressed column-oriented format, which improves data locality for similar values. A query that uses the columnstore index benefits from more efficient I/O and CPU usage, which leads to shorter runtimes.
A multicolumn columnstore index can be used with query conditions that involve any subset of the index columns. Like a BRIN index and unlike a B-tree index, the columnstore index effectiveness is the same regardless of which index columns the query conditions use. PostgreSQL limits all multicolumn indexes to a maximum of 32 columns, see Multicolumn Indexes. If your analytical workload uses more columns, you might consider creating multiple columnstore indexes per table. For most use cases, a single columnstore index per table is enough.
Use the columnstore index to cover the most commonly accessed columns. If columns are only used for archiving or rarely accessed, leave them out to save storage and to speedup the cache generation.
A columnstore index scan combines data from the columnstore cache and data from the original table that is not yet in the cache. Data from the original table is read with the same algorithm used by the Accelerated sequential scan.
The cache of the columnstore index is kept up to date using background workers whilst maintaining transactional consistency.
Those background processes are identified as
columnstore update processes.
The background processes go through all databases one by one, updating any columnstore indexes
that exist, until all are up-to-date again. Thereafter the updater sleeps for the specified period of
columnstore_updater_sleep_time after which it starts again.
If you want to update your columnstore indexes at a specific timepoint, e.g. because of an ETL workload, execute SELECT swarm64da.columnstore_update_full() to fully update the cache of all your columnstore indexes. During the full update execution, the columnstore indexes might have a reduced performance until the update is finished or interrupted. The operation is blocking, similar to VACUUM, and will wait until the indexes are available for updating.
Like other indexes, the columnstore index has full crash-safety and replication support. This allows you to have the indexes maintained on the primary node and have updates propagated to all (read-only) replicas that then use these indexes immediately without any overhead.
The columnstore index displays in the query plan as
S64 DA Columnstore Scan. The EXPLAIN ANALYZE VERBOSE command displays details about the usage of the index.
A columnstore index has to be created explicitly, like any other index, for the target table and columns.
The command CREATE INDEX orders_idx ON orders USING columnstore (o_orderkey, o_custkey) WITH (compression_level=3, compression_type='zstd') creates a columnstore index called
orders_idx for the table
orders. The index covers the columns
o_custkey. In contrast with other indexes, the specific order of the columns does not
The compression level is optional and by default
5. Its valid levels go from 1 to 9.
The compression type is optional and by default
zstd. You can specify
For the complete syntax of index creation, see CREATE INDEX.
To choose between zstandard (zstd) or LZ4 compression, use LZ4 when your system has a lot of I/O available. Otherwise, use zstd.
In most cases, you can leave the default compression level as is. When your workload is very ingest-heavy, lower the compression level. When your workload is very scan-heavy, increase the compression level.
When changing the compression levels and types, it is not necessary to recreate the index. Instead, you can use the ALTER INDEX SET (compression_level=9).
- swarm64da.enable_columnstore (boolean)
Enables or disables the columnstore index scans. The default is
- swarm64da.enable_columnstore_updater (boolean)
Enables or disables background updates for the columnstore indexes. The default is
- swarm64da.columnstore_updater_sleep_time (integer)
The sleep time between each round of updating all columnstore indexes on all databases. Avoid long sleep times, as frequent rounds are faster and have a smaller footprint in the system. The default is
- swarm64da.columnstore_updater_workers (integer)
The number of background workers that update the columnstore indexes. The default is