Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -3212,6 +3212,25 @@ case "$ENABLED_STSAFE" in
esac


# RealTek AmebaPro2 (RTL8735B) HUK crypto-callback port.
# On-target the application supplies the AmebaPro2 HAL include path. This option
# is a host compile-test of the port: it swaps the HAL headers for a shim
# (WOLFSSL_AMEBAPRO2_HOST_TEST) so the cryptocb dispatch and wiring build without
# the vendor SDK. It forces crypto callbacks on (see the cryptocb block).
# Example: "./configure --enable-amebapro2"
ENABLED_AMEBAPRO2="no"
AC_ARG_ENABLE([amebapro2],
[AS_HELP_STRING([--enable-amebapro2],
[Enable RealTek AmebaPro2 (RTL8735B) HUK crypto-callback port (host compile-test).])],
[ ENABLED_AMEBAPRO2=$enableval ],
[ ENABLED_AMEBAPRO2=no ])

if test "x$ENABLED_AMEBAPRO2" != "xno"
then
AM_CFLAGS="$AM_CFLAGS -DWOLFSSL_REALTEK_HUK -DWOLFSSL_AMEBAPRO2_HOST_TEST -DHAVE_AES_ECB"
fi


# NXP SE050
# Example: "./configure --with-se050=/home/pi/simw_top"
ENABLED_SE050="no"
Expand Down Expand Up @@ -10680,7 +10699,7 @@ AC_ARG_ENABLE([cryptocb-sw-test],
[ ENABLED_CRYPTOCB_SW_TEST=yes ]
)

if test "x$ENABLED_PKCS11" = "xyes" || test "x$ENABLED_WOLFTPM" = "xyes" || test "$ENABLED_CAAM" != "no"
if test "x$ENABLED_PKCS11" = "xyes" || test "x$ENABLED_WOLFTPM" = "xyes" || test "$ENABLED_CAAM" != "no" || test "x$ENABLED_AMEBAPRO2" != "xno"
then
ENABLED_CRYPTOCB=yes
fi
Expand Down Expand Up @@ -12429,6 +12448,7 @@ AM_CONDITIONAL([BUILD_IOTSAFE],[test "x$ENABLED_IOTSAFE" = "xyes"])
AM_CONDITIONAL([BUILD_IOTSAFE_HWRNG],[test "x$ENABLED_IOTSAFE_HWRNG" = "xyes"])
AM_CONDITIONAL([BUILD_SE050],[test "x$ENABLED_SE050" = "xyes"])
AM_CONDITIONAL([BUILD_STSAFE],[test "x$ENABLED_STSAFE" != "xno"])
AM_CONDITIONAL([BUILD_AMEBAPRO2],[test "x$ENABLED_AMEBAPRO2" != "xno"])
AM_CONDITIONAL([BUILD_TROPIC01],[test "x$ENABLED_TROPIC01" = "xyes"])
AM_CONDITIONAL([BUILD_KDF],[test "x$ENABLED_KDF" = "xyes"])
AM_CONDITIONAL([BUILD_HMAC],[test "x$ENABLED_HMAC" = "xyes"])
Expand Down Expand Up @@ -13008,6 +13028,7 @@ echo " * IoT-Safe: $ENABLED_IOTSAFE"
echo " * IoT-Safe HWRNG: $ENABLED_IOTSAFE_HWRNG"
echo " * NXP SE050: $ENABLED_SE050"
echo " * STMicro STSAFE: $ENABLED_STSAFE"
echo " * RealTek AmebaPro2 HUK: $ENABLED_AMEBAPRO2"
echo " * TROPIC01: $ENABLED_TROPIC01"
echo " * Maxim Integrated MAXQ10XX: $ENABLED_MAXQ10XX"
echo " * PSA: $ENABLED_PSA"
Expand Down
7 changes: 7 additions & 0 deletions wolfcrypt/src/include.am
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,9 @@ EXTRA_DIST += wolfcrypt/src/port/ti/ti-aes.c \
wolfcrypt/src/port/st/README.md \
wolfcrypt/src/port/st/STM32MP13.md \
wolfcrypt/src/port/st/STM32MP25.md \
wolfcrypt/src/port/realtek/amebapro2.c \
wolfcrypt/src/port/realtek/amebapro2_shim.h \
wolfcrypt/src/port/realtek/README.md \
wolfcrypt/src/port/tropicsquare/tropic01.c \
wolfcrypt/src/port/tropicsquare/README.md \
wolfcrypt/src/port/af_alg/afalg_aes.c \
Expand Down Expand Up @@ -244,6 +247,10 @@ if BUILD_TROPIC01
src_libwolfssl@LIBSUFFIX@_la_SOURCES += wolfcrypt/src/port/tropicsquare/tropic01.c
endif

if BUILD_AMEBAPRO2
src_libwolfssl@LIBSUFFIX@_la_SOURCES += wolfcrypt/src/port/realtek/amebapro2.c
endif

if BUILD_PSA
src_libwolfssl@LIBSUFFIX@_la_SOURCES += wolfcrypt/src/port/psa/psa.c
src_libwolfssl@LIBSUFFIX@_la_SOURCES += wolfcrypt/src/port/psa/psa_hash.c
Expand Down
258 changes: 258 additions & 0 deletions wolfcrypt/src/port/realtek/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
# RealTek AmebaPro2 (RTL8735B) HUK Port

Binds wolfCrypt keys to the RTL8735B silicon Hardware Unique Key (HUK) through
the AmebaPro2 HAL crypto engine, via the wolfCrypt crypto-callback (CryptoCb)
framework. A 256-bit "seed" is run through the HAL HKDF key-ladder against the
HUK to land a device-bound working key in a secure key-storage slot; AES
(GCM/ECB/CBC/CTR) then runs from that slot and the working key never enters
software. It is a pure crypto-callback device and adds no wolfSSL core API or
struct fields: AES reads its seed from the standard `aes->devKey`, and ECDSA
reads a `wc_AmebaPro2_EccKey` (the HUK-wrapped scalar + seed) the caller attaches
via the standard `ecc_key->devCtx`. This mirrors the device pattern the STM32
DHUK port (`wc_Stm32_DhukRegister`) also uses.

## Hardware

RTL8735B / AmebaPro2 security blocks used by this port (from the
`Ameba-AIoT/nuwa_hal_realtek` SDK, `rtl8735b` branch, headers under
`ameba/amebapro2/source/fwlib/rtl8735b/include/`):

- HUK in OTP: `SB_OTP_HIGH_VAL_HUK1` (0x21), `HUK2` (0x22), `HUK_RMA` (0x2F).
- HKDF key-ladder in secure RAM: `hal_hkdf_hmac_sha256_secure_init`,
`hal_hkdf_extract_secure_all`, `hal_hkdf_expand_secure_all` -- derive the HUK
into a secure key-storage slot without exposing the key to software.
- AES secure-key ops that reference the derived slot by number:
`hal_crypto_aes_ecb_sk_init`, `hal_crypto_aes_gcm_sk_init` (key never leaves
hardware).
- The HUK-bound ECDSA sign path reuses the AES secure-key engine above to unwrap
the wrapped scalar, then signs in software. The HW ECDSA engine (`hal_ecdsa.h`)
and OTP-resident ECDSA keys (`hal_otp_ecdsa_key_*`) are follow-ons, not yet
used.
- TRNG (`hal_trng.h`); the `ameba-zephyr-pro2-platform` repo provides a Zephyr
entropy driver (`entropy_amebapro2.c`, DT `realtek,amebapro2-trng`) that feeds
wolfCrypt's `wc_GenerateSeed` via `sys_rand_get`.

## Enabling

```c
#define WOLFSSL_REALTEK_HUK /* enable the AmebaPro2 HUK device */
#define WOLF_CRYPTO_CB /* required -- HUK routes through crypto callbacks */
```

Set these in `user_settings.h`. The application/board CMake must add
the AmebaPro2 HAL include directory (e.g.
`.../fwlib/rtl8735b/include/`) to the wolfSSL library include path so this port
can include `hal_crypto.h` and `hal_hkdf.h`.

Configurable (override in `user_settings.h` before including wolfSSL):

| Macro | Default | Meaning |
|--------------------------------|---------|--------------------------------------|
| `WC_HUK_DEVID` | 809 | CryptoCb device id (STM32 DHUK is 808) |
| `WC_AMEBAPRO2_HUK_SK_IDX` | 1 | Secure-key slot holding the HUK (HUK1) |
| `WC_AMEBAPRO2_HKDF_PRK_IDX` | 3 | Intermediate HKDF PRK slot |
| `WC_AMEBAPRO2_DERIVED_WB_IDX` | 4 | Derived working-key slot (AES uses it) |
| `WC_AMEBAPRO2_HKDF_CRYPTO_SEL` | 0 | `crypto_sel` for the secure HKDF init |
| `WC_AMEBAPRO2_MAX_WRAPPED` | 96 | Max wrapped-scalar blob the ECDSA sign path unwraps |

## API

```c
#include <wolfssl/wolfcrypt/port/realtek/amebapro2.h>

/* One-time: register the AmebaPro2 HUK crypto-callback device. */
wc_AmebaPro2_HukRegister(WC_HUK_DEVID);

/* AES / GCM: enable via devId at init, then pass the 256-bit seed as the key.
* The seed is HKDF input that diversifies the HUK -- it is NOT the AES key. */
Aes aes;
byte seed[32]; /* per-purpose derivation seed (need not be secret) */
wc_AesInit(&aes, NULL, WC_HUK_DEVID);
wc_AesGcmSetKey(&aes, seed, 32);
wc_AesGcmEncrypt(&aes, ct, pt, ptSz, iv, 12, tag, tagSz, aad, aadSz); /* full GCM */
wc_AesFree(&aes);

/* AES-ECB / AES-CBC follow the same pattern (wc_AesSetKey + wc_AesEcb*/
/* wc_AesCbc* with devId = WC_HUK_DEVID). */

wc_AmebaPro2_HukUnRegister(WC_HUK_DEVID);
```

The seed maps to a device-bound working key as:
HUK (slot `WC_AMEBAPRO2_HUK_SK_IDX`) -> `hal_hkdf_extract_secure_all` -> PRK slot
-> `hal_hkdf_expand_secure_all` -> working key in `WC_AMEBAPRO2_DERIVED_WB_IDX`
-> `hal_crypto_aes_gcm_sk_init` / `hal_crypto_aes_ecb_sk_init`. The derive and
the AES op run under one crypto-mutex hold; the working key never enters
software. Identical seed -> identical working key (deterministic, so GMAC
verifies and AES round-trips); a wrong seed yields a different key (GCM decrypt
returns `AES_GCM_AUTH_E`).

HUK-bound ECDSA sign (Stage 3, wrapped-scalar): point the key's crypto-callback
context at a `wc_AmebaPro2_EccKey` (the scalar AES-wrapped under a HUK-derived
key, plus its 32-byte seed) -- no dedicated wolfSSL import API:

```c
#include <wolfssl/wolfcrypt/port/realtek/amebapro2.h>
wc_AmebaPro2_EccKey hk = { seed, 32, wrapped, wrappedLen, plainLen };
ecc_key key;
wc_ecc_init_ex(&key, NULL, WC_HUK_DEVID);
wc_ecc_set_curve(&key, plainLen, ECC_SECP256R1);
key.devCtx = &hk; /* borrowed; must outlive the key */
wc_ecc_sign_hash(hash, hashSz, sig, &sigSz, rng, &key);
```

At sign time the port derives the slot key from the seed, ECB-unwraps the scalar
into a short-lived buffer, signs, and scrubs it. The wrapped blob is device-bound
(it only unwraps on the silicon whose HUK produced the slot key). The scalar is
briefly in software during the sign; an OTP-resident model (`hal_ecdsa_select_prk`,
scalar never in software) and routing the sign itself through the HW ECDSA engine
(`hal_ecdsa`) are follow-ons.

## Notes / limitations

- The HAL GCM path assumes a 96-bit (12-byte) IV (standard J0). A non-12-byte
IV returns a hard error (not a software fallback, which would key off the seed
rather than the device-bound key).
- AES-CBC and AES-CTR chain in software over single-block
`hal_crypto_aes_ecb_sk_*` calls because the HAL exposes no CBC/CTR secure-key
variant; the key still stays in hardware. CTR maintains the wolfCrypt counter
state (`aes->reg`/`tmp`/`left`) so partial blocks continue across calls.
- The HAL crypto engine DMAs its buffers on 32-byte (cache-line) boundaries and
rejects an unaligned GCM iv/aad. The port stages key/iv/aad/tag on aligned
temporaries and bounces unaligned in/out through aligned buffers, so callers
need not align.
- Each operation derives the working key from the Aes' own `devKey` seed under
the crypto mutex (no shared port global), so concurrent `Aes` objects are
safe.
- `--enable-amebapro2` builds a host compile-test only: it swaps the HAL headers
for `amebapro2_shim.h` (sentinel stubs, no real crypto) to exercise the
crypto-callback dispatch and build wiring without the vendor SDK. All
functional validation requires RTL8735B hardware.

## Status

Validated on RTL8735B silicon (both the RealTek FreeRTOS SDK app and a Zephyr
image): registration; AES-GCM (encrypt / deterministic tag / decrypt-verify /
round-trip / wrong-seed -> `AES_GCM_AUTH_E` / unaligned buffers / non-12-byte-IV
reject); AES-ECB; AES-CBC (incl. in-place, multi-call); AES-CTR; and HUK-bound
ECDSA (P-256) -- all pass.

- Stage 0 (skeleton, build wiring, host compile-test): done.
- Stage 1 (HUK key-ladder + full AES-GCM): done, validated on hardware.
- Stage 2 (AES-ECB / AES-CBC / AES-CTR): done, validated on hardware.
- Stage 3 (HUK-bound ECDSA sign, wrapped-scalar): done, validated on RTL8735B
(P-256 sign verifies against the original public key; tampered hash fails).
OTP-resident keys and HW-ECDSA-engine signing are follow-ons.

## Benchmarks (software crypto baseline)

`wolfcrypt_test` (full self-test, all PASS) and `wolfcrypt_benchmark` were run on
the RTL8735B EVB to validate the core library and toolchain on this target. The
figures below are **pure software wolfCrypt** -- they are NOT the HUK device
(which routes AES through the silicon engine for HUK-derived keys); they serve as
a reference baseline and to size the benefit of hardware offload.

- Target: RTL8735B "KM4" Arm Cortex-M33 (ARMv8-M Mainline, TrustZone + DSP) at
500 MHz (`CPU_CLK`); DDR at 533 MHz.
- Toolchain / build: RealTek ASDK 10.3.0 (GCC 10.3.0), SDK default `-Os`,
FreeRTOS, `WOLFCRYPT_ONLY`, `SINGLE_THREADED`, big-integer math via the generic
`WOLFSSL_SP_MATH_ALL` (portable C, no Cortex-M assembly), `BENCH_EMBEDDED`.
- Build options live with the SDK example (not in the wolfSSL tree):
`component/example/wolfcrypt_test/{user_settings.h, wolfcrypt_test.cmake,
main.c}` of the AmebaPro2 FreeRTOS SDK. The RNG is seeded from the SDK
`rtw_get_random_bytes`; `current_time()` uses `hal_read_systime_us()`.

Symmetric / hash (higher is better):

| Algorithm | Throughput |
|---------------------|------------|
| AES-128-CBC enc/dec | 9.55 / 9.67 MiB/s |
| AES-256-CBC enc/dec | 7.25 / 7.02 MiB/s |
| AES-128-GCM enc/dec | 5.35 / 5.33 MiB/s |
| AES-256-GCM enc/dec | 4.53 / 4.52 MiB/s |
| AES-128-CTR | 9.75 MiB/s |
| AES-128-ECB enc/dec | 10.42 / 10.56 MiB/s |
| AES-CCM enc/dec | 4.73 / 4.65 MiB/s |
| GMAC (4-bit table) | 13.43 MiB/s |
| AES-128-CMAC | 8.84 MiB/s |
| ChaCha20 | 24.79 MiB/s |
| ChaCha20-Poly1305 | 15.83 MiB/s |
| Poly1305 | 64.77 MiB/s |
| SHA-1 | 29.19 MiB/s |
| SHA-256 | 10.94 MiB/s |
| SHA-512 | 7.29 MiB/s |
| SHA3-256 | 6.61 MiB/s |
| HMAC-SHA256 | 10.85 MiB/s |

Public key (higher is better):

| Operation | Rate |
|-----------------------|------|
| RSA-2048 public | 214.7 ops/s |
| RSA-2048 private | 6.14 ops/s |
| RSA-2048 key gen | 0.40 ops/s |
| DH-2048 key gen/agree | 17.67 / 15.23 ops/s |
| ECDSA P-256 sign/verify | 40.03 / 29.81 ops/s |
| ECDHE P-256 agree | 40.69 ops/s |
| Curve25519 key gen/agree | 414.8 / 419.4 ops/s |
| Ed25519 sign/verify | 788.3 / 397.0 ops/s |

The tables above are the portable-C baseline. The assembly backends below raise
these substantially. Curve25519/Ed25519 already use the dedicated
`curve25519.c`/`ed25519.c` fast code.

## Optimizations (measured on RTL8735B @ 500 MHz, -Os)

Two wolfCrypt assembly backends apply to this Cortex-M33 and were validated on
hardware (both keep `wolfcrypt_test` all-PASS). Neither needs wolfSSL source
changes -- they are build-config selections plus adding the relevant asm files.

### 1. Public key -- `sp_cortexm.c` (Thumb-2/DSP single-precision)

Enable with `WOLFSSL_SP_ARM_CORTEX_M_ASM` + `WOLFSSL_HAVE_SP_RSA` +
`WOLFSSL_HAVE_SP_ECC` + `WOLFSSL_HAVE_SP_DH`, and add `wolfcrypt/src/sp_cortexm.c`
to the build (alongside the generic `sp_int.c` for sizes without an asm path).

| Operation | Generic C | sp_cortexm | Speedup |
|------------------------|-----------|------------|---------|
| ECC P-256 key gen | 40.7 | 541.2 ops/s | 13.3x |
| ECDSA P-256 sign | 40.0 | 427.6 ops/s | 10.7x |
| ECDSA P-256 verify | 29.8 | 292.7 ops/s | 9.8x |
| ECDHE P-256 agree | 40.7 | 318.1 ops/s | 7.8x |
| RSA-2048 public | 214.7 | 618.4 ops/s | 2.9x |
| RSA-2048 private | 6.14 | 19.0 ops/s | 3.1x |
| DH-2048 agree | 15.2 | 38.3 ops/s | 2.5x |

### 2. Symmetric -- Thumb-2 asm (`port/arm/thumb2-*-asm.S`)

Enable with `WOLFSSL_ARMASM` + `WOLFSSL_ARMASM_THUMB2` +
`WOLFSSL_ARMASM_NO_HW_CRYPTO` + `WOLFSSL_ARMASM_NO_NEON` + `WOLFSSL_ARM_ARCH=7`,
and add `thumb2-aes-asm.S`, `thumb2-sha256-asm.S`, `thumb2-sha512-asm.S`,
`thumb2-sha3-asm.S`, `thumb2-chacha-asm.S`, `thumb2-poly1305-asm.S`.
`WOLFSSL_ARMASM` is a global switch, so provide the `.S` for every covered
module. (Curve25519/Ed25519 also have Thumb-2 asm but their `ge_operations.c`
integration assumes 64-bit and was left on the C path here.)

| Algorithm | Generic C | Thumb-2 asm | Speedup |
|---------------------|-----------|-------------|---------|
| AES-128-CBC enc | 9.55 | 20.85 MiB/s | 2.2x |
| AES-128-ECB enc | 10.42 | 20.82 MiB/s | 2.0x |
| AES-128-CTR | 9.75 | 20.47 MiB/s | 2.1x |
| AES-128-GCM enc | 5.35 | 10.30 MiB/s | 1.9x |
| GMAC | 13.43 | 20.81 MiB/s | 1.5x |
| AES-128-CMAC | 8.84 | 14.67 MiB/s | 1.7x |
| ChaCha20 | 24.79 | 46.44 MiB/s | 1.9x |
| ChaCha20-Poly1305 | 15.83 | 25.38 MiB/s | 1.6x |
| SHA-256 | 10.94 | 17.83 MiB/s | 1.6x |
| SHA3-256 | 6.61 | 8.64 MiB/s | 1.3x |
| HMAC-SHA256 | 10.85 | 17.66 MiB/s | 1.6x |

### Note on hardware offload

For AES, hashing and ECDSA the RTL8735B has a dedicated crypto engine (the HAL
`hal_crypto_*` / `hal_ecdsa` blocks this HUK port already uses for HUK-derived
keys). A general (any-key) HW crypto-callback port over that engine would beat
the Thumb-2 software figures above and is the recommended production path for
symmetric throughput; the Thumb-2 asm is the portable software fallback. The
`sp_cortexm.c` PK speedup is worth taking regardless, since it needs no silicon
support.
Loading