wagnerbianchi.com » Blog Archive » Low Cardinality in PostgreSQL

Low Cardinality in PostgreSQL

março 10th, 2026 Bianchi Posted in PostgreSQL | No Comments »

The Compression Hero That Quietly Kills Your Indexes

Low cardinality is one of those database traits that feels like it should be “good by default”. A column with very few distinct values is easy to reason about, it makes analytics simpler, it often compresses extremely well, and it usually matches the way the business talks about data: status, tier, is_active, region_code, plan, role.

And then you create an index on it… and PostgreSQL still chooses a sequential scan. That’s not PostgreSQL being stubborn. That’s PostgreSQL being honest about selectivity and I/O economics.

Why low cardinality is amazing for compression

Compression thrives on repetition. When the same values repeat across many rows, encodings such as dictionary-based representations and run-length encoding become very effective, reducing the overall storage footprint. Smaller data increases cache residency, reduces I/O pressure, and makes large scans cheaper. This is the halo: low cardinality can be a real win for data volume, memory efficiency, and scan-heavy workloads.

But compression is about bytes on disk and in memory. Index usefulness is about how many rows you’re going to fetch and how scattered those row fetches will be.

Those are related, but they are not the same fight.

Selectivity is the planner’s reality check

Selectivity is basically “what fraction of the table will match this predicate”.

If status = 'shipped' matches 90% of rows, an index is often a bad deal. Even if the index can locate the matching tuples, it still leads PostgreSQL to visit a huge portion of the heap. That becomes a storm of random page fetches, visibility checks, and CPU overhead. At some point it’s cheaper to just stream through the table once in physical order with a sequential scan (or a parallel sequential scan) and be done.

This is the core horn: low cardinality frequently implies low selectivity for common values. Your instinct says “index it”, but the math says “you’re still touching most of the table”.

The hidden tax of “obvious” low-cardinality indexes

A B-tree index stores an entry per row, regardless of how frequently a value occurs. So even when the planner ignores that index for your most common queries, you still pay for it on every insert and update: extra WAL, extra CPU, extra vacuum work, more memory pressure, and more places for bloat to accumulate.

Low-cardinality indexes can still be excellent in two situations. The first is when you query rare values, and those values are truly selective. The second is when you combine the low-cardinality predicate with another condition that dramatically reduces the candidate set, so the access path becomes narrow and cheap. The key is that low-cardinality access paths rarely win as standalone access paths in the common case.

A “before vs after” benchmark you can run, with `track_io_timing`

This demo is built to tell the truth with evidence: same dataset, same queries, different index strategies. We will capture not only execution time and buffer usage, but also I/O time using track_io_timing, so you can see when “random reads” are the real villain.

Run this in a disposable environment or staging. It loads 2 million rows.

Step 1: Create a skewed low-cardinality dataset

\pset pager off
\timing on

DROP SCHEMA IF EXISTS lc_bench CASCADE;
CREATE SCHEMA lc_bench;
SET search_path = lc_bench, public;

CREATE TABLE orders_demo (
  id          bigserial PRIMARY KEY,
  tenant_id   int NOT NULL,
  created_at  timestamptz NOT NULL,
  status      text NOT NULL,
  payload     text
);

-- 2 million rows with skew:
-- 90% shipped, 9% paid, 1% canceled
INSERT INTO orders_demo (tenant_id, created_at, status, payload)
SELECT
  (random() * 9999)::int + 1,
  now() - (g * interval '1 second'),
  CASE
    WHEN random() < 0.90 THEN 'shipped'
    WHEN random() < 0.99 THEN 'paid'
    ELSE 'canceled'
  END,
  md5(g::text || '-' || random()::text)
FROM generate_series(1, 2000000) AS g;

ANALYZE orders_demo;

SELECT status, count(*) AS rows
FROM orders_demo
GROUP BY status
ORDER BY rows DESC;

The skew is intentional. Low cardinality becomes operationally painful when one value is extremely common, because that’s where selectivity collapses.

Step 2: Enable and verify `track_io_timing`

track_io_timing that PostgreSQL measures time spent in read/write syscalls and expose those numbers in EXPLAIN (ANALYZE, BUFFERS) output. It adds overhead, which is why you usually keep it off until you’re diagnosing I/O behavior.

First, verify its current state:

SHOW track_io_timing;

If you’re allowed to enable it for your session:

SET track_io_timing = on;
SHOW track_io_timing;

If that fails due to permissions, you need to enable it in your server configuration (or parameter group) and reload, then confirm again with SHOW track_io_timing;. Don’t skip the confirmation, because the entire “I/O time” part of this story depends on it.

To keep runs comparable while you iterate, you can also reduce variability:

SET jit = off;
SET max_parallel_workers_per_gather = 0;

That isn’t about making the benchmark “faster”; it’s about making the before/after contrast easier to interpret.

Step 3: Capture results automatically into a table

Instead of copying/pasting EXPLAIN output, we’ll store metrics extracted from EXPLAIN (FORMAT JSON) it into a results table. With track_io_timing enabled, the JSON will include I/O Read Time and I/O Write Time.

DROP TABLE IF EXISTS bench_results;

CREATE TABLE bench_results (
  variant          text NOT NULL,
  query_name       text NOT NULL,
  execution_ms     numeric NOT NULL,
  planning_ms      numeric NOT NULL,
  shared_hit       bigint NOT NULL,
  shared_read      bigint NOT NULL,
  io_read_ms       numeric NOT NULL,
  io_write_ms      numeric NOT NULL,
  captured_at      timestamptz NOT NULL DEFAULT now(),
  plan             jsonb NOT NULL
);

CREATE OR REPLACE FUNCTION capture_explain(variant text, query_name text, sql_text text)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
  j jsonb;
  p jsonb;
BEGIN
  EXECUTE format('EXPLAIN (ANALYZE, BUFFERS, TIMING, SUMMARY, FORMAT JSON) %s', sql_text) INTO j;
  j := j->0;
  p := j->'Plan';

  INSERT INTO bench_results
    (variant, query_name, execution_ms, planning_ms, shared_hit, shared_read, io_read_ms, io_write_ms, plan)
  VALUES
    (
      variant,
      query_name,
      COALESCE((j->>'Execution Time')::numeric, 0),
      COALESCE((j->>'Planning Time')::numeric, 0),
      COALESCE((p->>'Shared Hit Blocks')::bigint, 0),
      COALESCE((p->>'Shared Read Blocks')::bigint, 0),
      COALESCE((p->>'I/O Read Time')::numeric, 0),
      COALESCE((p->>'I/O Write Time')::numeric, 0),
      j
    );
END;
$$;

A quick note for rigor: these numbers are taken from the top plan node, which is typically sufficient for this workload to tell the story clearly. If you later want node-by-node aggregation across the whole plan tree, it’s doable, but it’s not necessary to demonstrate the low-cardinality effect.

Step 4: Define the workload we’ll compare

We’ll run three queries. One is the “common value” trap. One is the “rare value” case where a low-cardinality index can actually shine. The last is the real-world pattern that many teams actually care about: status, a recent time window, and ordering.

SELECT capture_explain(
  '00_before_indexes',
  'A_common_count_shipped',
  $$SELECT count(*) FROM lc_bench.orders_demo WHERE status = 'shipped'$$
);

SELECT capture_explain(
  '00_before_indexes',
  'B_rare_count_canceled',
  $$SELECT count(*) FROM lc_bench.orders_demo WHERE status = 'canceled'$$
);

SELECT capture_explain(
  '00_before_indexes',
  'C_shipped_recent_topN',
  $$SELECT id, tenant_id, created_at
    FROM lc_bench.orders_demo
    WHERE status = 'shipped'
      AND created_at >= now() - interval '10 minutes'
    ORDER BY created_at DESC
    LIMIT 200$$
);

If you want a cleaner signal, run the same three captures a second time under a variant like 00_before_indexes_run2because the first execution incurs some warmup costs.

Step 5: “After 1” — the naïve index on the low-cardinality column

This is the index most people create first.

CREATE INDEX idx_orders_demo_status ON orders_demo(status);
ANALYZE orders_demo;

SELECT capture_explain(
  '01_status_index',
  'A_common_count_shipped',
  $$SELECT count(*) FROM lc_bench.orders_demo WHERE status = 'shipped'$$
);

SELECT capture_explain(
  '01_status_index',
  'B_rare_count_canceled',
  $$SELECT count(*) FROM lc_bench.orders_demo WHERE status = 'canceled'$$
);

SELECT capture_explain(
  '01_status_index',
  'C_shipped_recent_topN',
  $$SELECT id, tenant_id, created_at
    FROM lc_bench.orders_demo
    WHERE status = 'shipped'
      AND created_at >= now() - interval '10 minutes'
    ORDER BY created_at DESC
    LIMIT 200$$
);

What you will typically observe is that the rare-value query gets meaningfully faster, often via an index scan or bitmap path, while the common-value query remains a sequential scan. That isn’t a failure; it’s the planner refusing to pay random I/O overhead to fetch most of the table through an index. If your storage is cold, this is exactly where io_read_ms makes the story obvious: the “wrong” access path is the one that makes the storage subsystem do scattered work.

Step 6: “After 2” — index the question, not the label

Now you stop trying to use status it as the access path by itself. You align indexes with how you actually ask for data.

CREATE INDEX idx_orders_demo_status_created_at
ON orders_demo (status, created_at DESC);

CREATE INDEX idx_orders_demo_canceled_recent
ON orders_demo (created_at DESC)
WHERE status = 'canceled';

ANALYZE orders_demo;

SELECT capture_explain(
  '02_composite_plus_partial',
  'A_common_count_shipped',
  $$SELECT count(*) FROM lc_bench.orders_demo WHERE status = 'shipped'$$
);

SELECT capture_explain(
  '02_composite_plus_partial',
  'B_rare_count_canceled',
  $$SELECT count(*) FROM lc_bench.orders_demo WHERE status = 'canceled'$$
);

SELECT capture_explain(
  '02_composite_plus_partial',
  'C_shipped_recent_topN',
  $$SELECT id, tenant_id, created_at
    FROM lc_bench.orders_demo
    WHERE status = 'shipped'
      AND created_at >= now() - interval '10 minutes'
    ORDER BY created_at DESC
    LIMIT 200$$
);

This is where the “realistic query” usually changes dramatically. Instead of searching a massive status='shipped' population and then sorting, PostgreSQL can walk a tight slice of (status, created_at) in the correct order and stop early because of the LIMIT. In many environments, you’ll see a large drop in shared_read and a very visible drop in io_read_ms, because the engine stops thrashing the heap and starts reading far fewer pages.

Meanwhile, the “common value count” query often remains a sequential scan, and you should treat that as the correct outcome. Counting nearly everything is almost always best for streaming operations.

Step 7: Print a clean “before vs after” table

SELECT
  query_name,
  variant,
  round(execution_ms, 3) AS execution_ms,
  round(planning_ms, 3)  AS planning_ms,
  shared_hit,
  shared_read,
  round(io_read_ms, 3)   AS io_read_ms,
  round(io_write_ms, 3)  AS io_write_ms
FROM bench_results
ORDER BY query_name, variant, captured_at;

How to address low-cardinality problems in real systems

The best mental shift is to stop thinking “low cardinality means index” and start thinking “low cardinality means the optimizer will be selective only sometimes”. You want to build access paths for the selective parts of the workload and avoid paying a write tax for an index that helps rarely or only helps rare values.

When the application frequently queries a common status only for recent data, a composite index that includes the time dimension is often the cleanest fix, as it restores selectivity over time. When the workload cares about rare statuses, partial indexes let you focus index maintenance on the slice that actually benefits. When the “real” filter is a tenant, user, account, or another high-cardinality attribute, that should usually lead the index, with the low-cardinality attribute following to keep scans narrow and ordered.

If you’re seeing PostgreSQL pick the “wrong” plan because it misestimates skew or correlation, improve statistics so the planner stops guessing. Raising per-column statistics targets can help with heavily skewed data, and extended statistics can help when two columns interact (for example, a tenant where one status dominates far more than the global distribution suggests). None of this changes the fundamental economics of low selectivity, but it prevents the planner from hallucinating selectivity that doesn’t exist.

Conclusion: low cardinality is a feature, not a free lunch

Low cardinality delivers real wins in compression and cache efficiency, and it often makes datasets easier to store and scan. That’s the halo.

The horn is that it seduces you into building indexes that look semantically correct but are mechanically expensive. A low-cardinality index can become the definition of “pay every day, benefit once a month”, especially when the most common value is what you query most.

The mature approach is to measure, not assume. Use EXPLAIN (ANALYZE, BUFFERS) and, when you’re diagnosing I/O behavior, enable track_io_timing intentionally and verify that it’s on. Keep in mind that it track_io_timing adds overhead, so treat it like a diagnostic instrument, not a default setting. Then design indexes that match your query shape, not your schema labels: composite indexes that restore selectivity, partial indexes that target rare slices, and statistics that keep the planner honest.

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Categorias

Posts

Arquivo p/ mês

My Presentations

My Articles

Categorias

Arquivo p/ mês

Posts

Low Cardinality in PostgreSQL

The Compression Hero That Quietly Kills Your Indexes

Why low cardinality is amazing for compression

Selectivity is the planner’s reality check

The hidden tax of “obvious” low-cardinality indexes

A “before vs after” benchmark you can run, with `track_io_timing`

Step 1: Create a skewed low-cardinality dataset

Step 2: Enable and verify `track_io_timing`

Step 3: Capture results automatically into a table

Step 4: Define the workload we’ll compare

Step 5: “After 1” — the naïve index on the low-cardinality column

Step 6: “After 2” — index the question, not the label

Step 7: Print a clean “before vs after” table

How to address low-cardinality problems in real systems

Conclusion: low cardinality is a feature, not a free lunch

Leave a Reply

Categorias

Posts

Arquivo p/ mês

My Presentations

My Articles

Categorias

Arquivo p/ mês

Posts

Low Cardinality in PostgreSQL

The Compression Hero That Quietly Kills Your Indexes

Why low cardinality is amazing for compression

Selectivity is the planner’s reality check

The hidden tax of “obvious” low-cardinality indexes

A “before vs after” benchmark you can run, with track_io_timing

Step 1: Create a skewed low-cardinality dataset

Step 2: Enable and verify track_io_timing

Step 3: Capture results automatically into a table

Step 4: Define the workload we’ll compare

Step 5: “After 1” — the naïve index on the low-cardinality column

Step 6: “After 2” — index the question, not the label

Step 7: Print a clean “before vs after” table

How to address low-cardinality problems in real systems

Conclusion: low cardinality is a feature, not a free lunch

Leave a Reply

A “before vs after” benchmark you can run, with `track_io_timing`

Step 2: Enable and verify `track_io_timing`