From fc58df6247d9dbc31995e57c7d65632792c24c5c Mon Sep 17 00:00:00 2001
From: Cary Phillips <cary@ilm.com>
Date: Thu, 26 Mar 2026 08:17:58 -0700
Subject: [PATCH 1/2] =?UTF-8?q?Fix=20misaligned=20memory=20access=20in=20`?=
 =?UTF-8?q?LossyDctDecoder=5Fexecute`=20HALF=E2=86=92FLOAT=20expansion?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

After DCT decoding, `LossyDctDecoder_execute()` expands FLOAT-type channels
from their intermediate HALF (16-bit) XDR representation back to FLOAT (32-bit)
XDR in place.  The expansion was done by casting `_rows[y]` (a `uint8_t *`)
directly to `float *` and `uint16_t *`, then reading and writing through those
typed pointers.

Because row buffers are assigned by advancing a byte pointer with no alignment
padding (`outBufferEnd += chan->width * chan->bytes_per_element` in
`internal_dwa_compressor.h`), a FLOAT channel that follows a HALF channel of
odd width receives a `_rows[y]` pointer that is 2-byte aligned but not 4-byte
aligned.  Dereferencing a `float *` cast from such a pointer is undefined
behavior under the C standard:

- On ARM, RISC-V, and MIPS (strict alignment) this crashes immediately.
- On x86 it is silently tolerated at the hardware level but remains UB:
  auto-vectorizing compilers (SSE/AVX) may assume aligned access and generate
  incorrect code.
- UBSan reports: `store to misaligned address ... for type 'float', which
  requires 4 byte alignment` at `internal_dwa_decoder.h:749`.

Fix: replace the cast-and-dereference pattern with the `unaligned_load16` /
`memcpy` / `unaligned_store32` helpers already used throughout the rest of
OpenEXRCore (`internal_xdr.h`, `unpack.c`, `pack.c`, `internal_pxr24.c`).
These helpers use `memcpy` internally, which the C standard guarantees is safe
for unaligned addresses and which compilers compile to a single load/store
instruction on architectures that support it.

The byte-order handling is preserved correctly:
- `unaligned_load16` reads 2 bytes via `memcpy` and applies `one_to_native16`
  (XDR → native), returning a native-endian HALF value.
- `half_to_float` converts native HALF → native float.
- `memcpy(&bits, &f, 4)` reinterprets the float's bit pattern as `uint32_t`
  without numeric conversion (the correct type-pun idiom in C).
- `unaligned_store32` applies `one_from_native32` (native → XDR) and writes
  4 bytes via `memcpy`, storing the result in XDR float format.

Made-with: Cursor
Signed-off-by: Cary Phillips <cary@ilm.com>
---
 src/lib/OpenEXRCore/internal_dwa_decoder.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: openexr-3.2.2/src/lib/OpenEXRCore/internal_dwa_decoder.h
===================================================================
--- openexr-3.2.2.orig/src/lib/OpenEXRCore/internal_dwa_decoder.h
+++ openexr-3.2.2/src/lib/OpenEXRCore/internal_dwa_decoder.h
@@ -651,13 +651,22 @@ LossyDctDecoder_execute (
         /* process in place in reverse to avoid temporary buffer */
         for (int y = 0; y < d->_height; ++y)
         {
-            float*    floatXdrPtr = (float*) chanData[chan]->_rows[y];
-            uint16_t* halfXdr     = (uint16_t*) floatXdrPtr;
+            uint8_t* rowBytes = chanData[chan]->_rows[y];
 
             for (int x = d->_width - 1; x >= 0; --x)
             {
-                floatXdrPtr[x] = one_from_native_float (
-                    half_to_float (one_to_native16 (halfXdr[x])));
+                // TODO: make an unaligned_store32f that takes the float and
+                // packages up a one_from_native_float and calls memcpy
+                // instead of the two memcpy. We should look at the metrics
+                // for dwa and see if there's a performance difference to do
+                // so at some point. See:
+                // https://github.com/AcademySoftwareFoundation/openexr/pull/2324
+
+                uint16_t h = unaligned_load16 (rowBytes + x * sizeof (uint16_t));
+                float    f = half_to_float (h);
+                uint32_t bits;
+                memcpy (&bits, &f, sizeof (bits));
+                unaligned_store32 (rowBytes + x * sizeof (float), bits);
             }
         }
     }
