Skip to content

GeoTIFF writer: predictor corrupts pixels with jpeg2000/lerc compression #3371

@brendancol

Description

@brendancol

Describe the bug

The GeoTIFF writer applies the TIFF horizontal/floating-point differencing predictor (PREDICTOR=2 or 3) to pixel bytes before handing them to the JPEG2000 and LERC codecs, then stamps PREDICTOR=2 in the IFD. GDAL only honors the PREDICTOR tag for byte-oriented entropy codecs (deflate, lzw, zstd, lz4, packbits) and never for jpeg2000/lerc/jpeg. The result is a file whose on-disk pixel bytes are differenced but which no spec-compliant reader will un-difference, so the pixels read back corrupted everywhere except in our own reader (which symmetrically inverts the predictor).

To Reproduce

import numpy as np, xarray as xr, rasterio
from xrspatial.geotiff import to_geotiff

data = (np.arange(64*64, dtype=np.uint16).reshape(64, 64) % 1000).astype(np.uint16)
da = xr.DataArray(data, dims=['y', 'x'],
                  coords={'y': np.arange(64., 0, -1), 'x': np.arange(64.)})

to_geotiff(da, "out.tif", compression='lerc', predictor=2,
           allow_experimental_codecs=True)

with rasterio.open("out.tif") as src:
    rt = src.read(1)
print(np.array_equal(rt, data))                        # False
print(np.abs(rt.astype(int) - data.astype(int)).max())  # 64537

For jpeg2000 + predictor=2, GDAL cannot open the file at all.

Expected behavior

Combining predictor 2 or 3 with jpeg2000/lerc (or jpeg) should be rejected up front with a clear ValueError, matching GDAL, which only honors PREDICTOR for deflate/lzw/zstd/lz4/packbits. A predictor is meaningless for these codecs.

Root cause

normalize_predictor(predictor, dtype, compression) in xrspatial/geotiff/_encode.py:252 receives the compression tag but never uses it to force the predictor to 1 for codecs that do not support differencing. The three encode paths (_prepare_strip at _encode.py:334, the second strip path at _encode.py:484, the tiled path at _encode.py:650) all apply _apply_predictor_encode whenever predictor != 1 and compression != COMPRESSION_NONE, then pass the differenced bytes to jpeg2000_compress / lerc_compress. The GPU writer (xrspatial/geotiff/_writers/gpu.py:696) calls the same normalize_predictor, so it shares the bug. _validation.py has no codec-vs-predictor gate (only a predictor=3-requires-float check).

Severity

High: silent data corruption and non-interoperable files. Requires the non-default allow_experimental_codecs opt-in plus an explicit predictor= argument, so it is not hit on default writes.

Environment

  • xarray-spatial: main (2026-06-16)
  • Found by the accuracy deep-sweep on the geotiff module.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggeotiffGeoTIFF moduleinput-validationInput validation and error messages

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions