Describe the bug
The GeoTIFF writer applies the TIFF horizontal/floating-point differencing predictor (PREDICTOR=2 or 3) to pixel bytes before handing them to the JPEG2000 and LERC codecs, then stamps PREDICTOR=2 in the IFD. GDAL only honors the PREDICTOR tag for byte-oriented entropy codecs (deflate, lzw, zstd, lz4, packbits) and never for jpeg2000/lerc/jpeg. The result is a file whose on-disk pixel bytes are differenced but which no spec-compliant reader will un-difference, so the pixels read back corrupted everywhere except in our own reader (which symmetrically inverts the predictor).
To Reproduce
import numpy as np, xarray as xr, rasterio
from xrspatial.geotiff import to_geotiff
data = (np.arange(64*64, dtype=np.uint16).reshape(64, 64) % 1000).astype(np.uint16)
da = xr.DataArray(data, dims=['y', 'x'],
coords={'y': np.arange(64., 0, -1), 'x': np.arange(64.)})
to_geotiff(da, "out.tif", compression='lerc', predictor=2,
allow_experimental_codecs=True)
with rasterio.open("out.tif") as src:
rt = src.read(1)
print(np.array_equal(rt, data)) # False
print(np.abs(rt.astype(int) - data.astype(int)).max()) # 64537
For jpeg2000 + predictor=2, GDAL cannot open the file at all.
Expected behavior
Combining predictor 2 or 3 with jpeg2000/lerc (or jpeg) should be rejected up front with a clear ValueError, matching GDAL, which only honors PREDICTOR for deflate/lzw/zstd/lz4/packbits. A predictor is meaningless for these codecs.
Root cause
normalize_predictor(predictor, dtype, compression) in xrspatial/geotiff/_encode.py:252 receives the compression tag but never uses it to force the predictor to 1 for codecs that do not support differencing. The three encode paths (_prepare_strip at _encode.py:334, the second strip path at _encode.py:484, the tiled path at _encode.py:650) all apply _apply_predictor_encode whenever predictor != 1 and compression != COMPRESSION_NONE, then pass the differenced bytes to jpeg2000_compress / lerc_compress. The GPU writer (xrspatial/geotiff/_writers/gpu.py:696) calls the same normalize_predictor, so it shares the bug. _validation.py has no codec-vs-predictor gate (only a predictor=3-requires-float check).
Severity
High: silent data corruption and non-interoperable files. Requires the non-default allow_experimental_codecs opt-in plus an explicit predictor= argument, so it is not hit on default writes.
Environment
- xarray-spatial: main (2026-06-16)
- Found by the accuracy deep-sweep on the geotiff module.
Describe the bug
The GeoTIFF writer applies the TIFF horizontal/floating-point differencing predictor (PREDICTOR=2 or 3) to pixel bytes before handing them to the JPEG2000 and LERC codecs, then stamps PREDICTOR=2 in the IFD. GDAL only honors the PREDICTOR tag for byte-oriented entropy codecs (deflate, lzw, zstd, lz4, packbits) and never for jpeg2000/lerc/jpeg. The result is a file whose on-disk pixel bytes are differenced but which no spec-compliant reader will un-difference, so the pixels read back corrupted everywhere except in our own reader (which symmetrically inverts the predictor).
To Reproduce
For jpeg2000 + predictor=2, GDAL cannot open the file at all.
Expected behavior
Combining predictor 2 or 3 with jpeg2000/lerc (or jpeg) should be rejected up front with a clear ValueError, matching GDAL, which only honors PREDICTOR for deflate/lzw/zstd/lz4/packbits. A predictor is meaningless for these codecs.
Root cause
normalize_predictor(predictor, dtype, compression)inxrspatial/geotiff/_encode.py:252receives the compression tag but never uses it to force the predictor to 1 for codecs that do not support differencing. The three encode paths (_prepare_stripat_encode.py:334, the second strip path at_encode.py:484, the tiled path at_encode.py:650) all apply_apply_predictor_encodewheneverpredictor != 1 and compression != COMPRESSION_NONE, then pass the differenced bytes tojpeg2000_compress/lerc_compress. The GPU writer (xrspatial/geotiff/_writers/gpu.py:696) calls the samenormalize_predictor, so it shares the bug._validation.pyhas no codec-vs-predictor gate (only a predictor=3-requires-float check).Severity
High: silent data corruption and non-interoperable files. Requires the non-default
allow_experimental_codecsopt-in plus an explicitpredictor=argument, so it is not hit on default writes.Environment