Why Implementation Security Matters
The algorithm can be perfect, but the implementation can be broken.
Historical examples:
- RSA: Bleichenbacher's attack (padding oracle)
- AES: Cache timing attacks
- ECDSA: Lattice attacks on biased nonces
- OpenSSL Heartbleed: Memory safety
PQC is no different - implementation vulnerabilities exist.Part 1: Side-Channel Attack Categories
Timing Attacks
The Threat: Execution time varies based on secret data, leaking information.
// VULNERABLE: Branch on secret
int cmp_secret(uint8_t *a, uint8_t *b, size_t len) {
for (size_t i = 0; i < len; i++) {
if (a[i] != b[i]) return 0; // Early exit leaks position of difference!
}
return 1;
}
// SECURE: Constant-time comparison
int ct_cmp(uint8_t *a, uint8_t *b, size_t len) {
uint8_t diff = 0;
for (size_t i = 0; i < len; i++) {
diff |= a[i] ^ b[i]; // Accumulate all differences
}
return (1 & ((diff - 1) >> 8)); // 1 if equal, 0 if not
}PQC-Specific Timing Vulnerabilities:
| Operation | Vulnerability | Mitigation |
|---|---|---|
| NTT butterfly | Variable-time modular reduction | Barrett/Montgomery reduction |
| Rejection sampling | Loop count depends on secret | Constant-time loop bounds |
| Polynomial comparison | Early exit on difference | Full comparison always |
| Decapsulation check | Different paths for valid/invalid | Identical execution path |
Power Analysis Attacks
Simple Power Analysis (SPA):
- Single trace reveals information
- Operation patterns visible in power consumption
Differential Power Analysis (DPA):
- Statistical analysis of many traces
- Correlate power consumption with hypothetical secret values
NTT Power Attack Example:
┌────────────────────────────────────────────────────┐
│ Power │
│ ▲ │
│ │ ┌┐ ┌┐ ┌┐ ┌─┐ ┌─┐ ┌─┐ │
│ │ ││ ││ ││ │ │ │ │ │ │ │
│ │ ───┘└──┘└──┘└──────┘ └─┘ └─┘ └───▶ Time │
│ │ │
│ │ Smaller multiplications Larger multiplications│
│ │ (coefficients near 0) (coefficients near q) │
└────────────────────────────────────────────────────┘PQC-Specific Power Vulnerabilities:
| Operation | Vulnerability | Mitigation |
|---|---|---|
| NTT multiplication | Coefficient size visible | Masking, shuffling |
| CBD sampling | Hamming weight visible | Constant-weight implementation |
| Polynomial addition | Coefficient patterns | Randomize execution order |
| Encode/decode | Message-dependent patterns | Blinding |
Electromagnetic Attacks
Similar to power analysis but measured via EM emissions.
- Can be more localized (target specific chip areas)
- Harder to shield against than power
Fault Injection Attacks
The Threat: Induce computational errors to leak secrets.
Techniques:
- Voltage glitching
- Clock glitching
- Laser fault injection
- EM fault injection
ML-KEM Fault Attack Example:
1. Attacker injects fault during decapsulation
2. NTT computation produces wrong result
3. Decapsulation fails in a secret-dependent way
4. Repeated faults reveal secret key bits
Defense: Redundant computation and verificationPQC-Specific Fault Vulnerabilities:
| Attack Point | Effect | Mitigation |
|---|---|---|
| NTT computation | Wrong ciphertext decryption | Double computation, compare |
| Encoding | Message corruption | Checksum verification |
| Rejection sampling | Accept invalid signature | Verify rejection conditions |
| Hash computation | Wrong challenge | Recompute and verify |
Cache Timing Attacks
The Threat: Memory access patterns depend on secret data.
Cache Attack on Table Lookups:
- AES S-box: Access pattern reveals key bytes
- NTT: Twiddle factor access reveals coefficient positions
ML-KEM vulnerable operations:
- Any table-based implementation
- Variable indexing based on secretsMitigation Strategies:
// VULNERABLE: Secret-dependent index
uint32_t table[256];
uint32_t value = table[secret_byte]; // Cache hit/miss leaks secret_byte
// SECURE: Access entire table
uint32_t ct_table_lookup(uint32_t *table, uint8_t index) {
uint32_t result = 0;
for (int i = 0; i < 256; i++) {
// Select without branching
uint32_t mask = ~((uint32_t)(i ^ index) - 1) >> 31;
result |= table[i] & mask;
}
return result;
}Part 2: Constant-Time Programming
The Golden Rule
Every operation's execution time must be independent of secret data.
Constant-Time Primitives
Conditional Selection:
// Select a if bit=1, b if bit=0 (bit must be 0 or 1)
uint32_t ct_select(uint32_t a, uint32_t b, uint32_t bit) {
uint32_t mask = -(uint32_t)bit; // 0xFFFFFFFF if bit=1, 0 if bit=0
return (a & mask) | (b & ~mask);
}
// Alternatively using XOR
uint32_t ct_select_xor(uint32_t a, uint32_t b, uint32_t bit) {
return b ^ (-(uint32_t)bit & (a ^ b));
}Constant-Time Comparison:
// Returns 1 if equal, 0 otherwise
int ct_equal(const uint8_t *a, const uint8_t *b, size_t len) {
uint8_t diff = 0;
for (size_t i = 0; i < len; i++) {
diff |= a[i] ^ b[i];
}
return (1 & ((diff - 1) >> 8));
}
// Returns 1 if a < b (unsigned)
int ct_less_than(uint32_t a, uint32_t b) {
return (a - b) >> 31;
}Constant-Time Conditional Copy:
// Copy src to dst if condition=1, do nothing if condition=0
void ct_cmov(uint8_t *dst, const uint8_t *src, size_t len, uint8_t condition) {
uint8_t mask = -(uint8_t)condition;
for (size_t i = 0; i < len; i++) {
dst[i] ^= mask & (src[i] ^ dst[i]);
}
}ML-KEM Constant-Time Implementation
Decapsulation with Implicit Rejection:
// ML-KEM decapsulation MUST be constant-time
void ml_kem_decapsulate(
uint8_t *shared_secret, // Output
const uint8_t *ciphertext, // Input
const uint8_t *secret_key // Input
) {
uint8_t m_prime[32];
uint8_t K_bar[32];
uint8_t r[32];
uint8_t c_prime[CIPHERTEXT_BYTES];
// Step 1: Decrypt to get m'
decrypt(m_prime, ciphertext, secret_key);
// Step 2: Re-encrypt m' to get c'
derive_randomness(r, m_prime);
encrypt(c_prime, m_prime, r, secret_key);
// Step 3: Compare c and c' (constant-time!)
int ciphertexts_equal = ct_equal(ciphertext, c_prime, CIPHERTEXT_BYTES);
// Step 4: Derive K_bar
hash_G(K_bar, m_prime);
// Step 5: Select output (constant-time!)
// If c == c': output = KDF(K_bar || H(c))
// If c != c': output = KDF(z || H(c)) where z is implicit rejection key
uint8_t implicit_reject_key[32];
memcpy(implicit_reject_key, secret_key + SK_OFFSET_Z, 32);
// Constant-time selection
ct_cmov(K_bar, implicit_reject_key, 32, 1 - ciphertexts_equal);
// Final KDF
uint8_t kdf_input[64];
memcpy(kdf_input, K_bar, 32);
hash_H(kdf_input + 32, ciphertext, CIPHERTEXT_BYTES);
kdf(shared_secret, kdf_input, 64);
// Secure cleanup
secure_zero(m_prime, 32);
secure_zero(K_bar, 32);
secure_zero(r, 32);
}Testing for Timing Leaks
Using dudect (timing leak detection):
#include "dudect.h"
// Define the function to test
void victim_function(uint8_t *input) {
ml_kem_decapsulate(output, input, secret_key);
}
int main() {
dudect_config_t config = {
.number_measurements = 10000,
.chunk_size = 500,
.fix_measure = 1,
};
// Class 0: Fixed input (e.g., valid ciphertext)
// Class 1: Random input
dudect_ctx_t ctx;
dudect_init(&ctx, &config);
for (int i = 0; i < 10000000; i++) {
uint8_t input[CIPHERTEXT_BYTES];
uint8_t class = random_bit();
if (class == 0) {
// Fixed valid ciphertext
memcpy(input, fixed_ciphertext, CIPHERTEXT_BYTES);
} else {
// Random ciphertext
random_bytes(input, CIPHERTEXT_BYTES);
}
dudect_measure(&ctx, input, class);
}
// If timing differs between classes, leak detected!
return dudect_result(&ctx);
}Using ctgrind (valgrind extension):
#include "ctgrind.h"
void test_constant_time() {
uint8_t secret_key[SECRET_KEY_BYTES];
uint8_t ciphertext[CIPHERTEXT_BYTES];
uint8_t shared_secret[32];
// Generate random inputs
random_bytes(secret_key, SECRET_KEY_BYTES);
random_bytes(ciphertext, CIPHERTEXT_BYTES);
// Mark secret data
ct_poison(secret_key, SECRET_KEY_BYTES);
// Run the function
ml_kem_decapsulate(shared_secret, ciphertext, secret_key);
// Check for secret-dependent branches (valgrind will report)
ct_unpoison(secret_key, SECRET_KEY_BYTES);
}Part 3: Secure Random Number Generation
Requirements for PQC
| Operation | Entropy Needed | Consequence of Failure |
|---|---|---|
| Key generation | High (256 bits) | Complete key compromise |
| Encapsulation | High (256 bits) | Ciphertext insecure |
| Signing randomness | High (256 bits) | Signature forgery, key leak |
| Seed expansion | Derived from above | Depends on parent |
Platform-Specific Best Practices
Linux:
#include <sys/random.h>
int get_random_bytes(uint8_t *buf, size_t len) {
// getrandom() is the preferred method (kernel 3.17+)
ssize_t ret = getrandom(buf, len, 0);
if (ret != (ssize_t)len) {
// Handle error - NOT optional!
return -1;
}
return 0;
}Windows:
#include <bcrypt.h>
int get_random_bytes(uint8_t *buf, size_t len) {
NTSTATUS status = BCryptGenRandom(
NULL,
buf,
(ULONG)len,
BCRYPT_USE_SYSTEM_PREFERRED_RNG
);
return BCRYPT_SUCCESS(status) ? 0 : -1;
}Cross-Platform (OpenSSL):
#include <openssl/rand.h>
int get_random_bytes(uint8_t *buf, size_t len) {
return RAND_bytes(buf, len) == 1 ? 0 : -1;
}Deterministic Random Bit Generator (DRBG)
For performance, use a DRBG seeded from system entropy:
// AES-256-CTR DRBG (NIST SP 800-90A)
typedef struct {
uint8_t key[32];
uint8_t v[16];
uint64_t reseed_counter;
} drbg_state;
void drbg_init(drbg_state *state) {
uint8_t seed[48];
get_random_bytes(seed, 48); // From system
memcpy(state->key, seed, 32);
memcpy(state->v, seed + 32, 16);
state->reseed_counter = 1;
}
void drbg_generate(drbg_state *state, uint8_t *out, size_t len) {
if (state->reseed_counter > RESEED_INTERVAL) {
drbg_reseed(state);
}
// Generate using AES-CTR
aes256_ctr(out, len, state->key, state->v);
// Update state
drbg_update(state, NULL, 0);
state->reseed_counter++;
}ML-DSA Signing: The Nonce Problem
Critical: ML-DSA uses rejection sampling. The randomness MUST be fresh each attempt.
// DANGEROUS PATTERN (if randomness is reused)
void sign_dangerous(signature *sig, const uint8_t *msg, const secret_key *sk) {
uint8_t seed[32];
get_random_bytes(seed, 32); // Only called once!
while (1) {
// Use same seed for each attempt - BAD!
poly_vec y;
sample_y(&y, seed); // Same y each time = disaster
// ... rest of signing ...
if (check_pass()) break;
// Retry with SAME y - leaks secret!
}
}
// CORRECT PATTERN
void sign_correct(signature *sig, const uint8_t *msg, const secret_key *sk) {
uint64_t nonce = 0;
while (1) {
// Fresh randomness for each attempt
uint8_t seed[32];
derive_seed(seed, sk->seed, msg, nonce++);
poly_vec y;
sample_y(&y, seed); // Different y each time
// ... rest of signing ...
if (check_pass()) break;
// Retry with different y - safe!
}
}Part 4: Memory Safety
Secure Memory Handling
// Allocate secure memory (non-swappable, guard pages)
void *secure_alloc(size_t size) {
#ifdef __linux__
void *ptr = mmap(NULL, size + 2 * PAGE_SIZE,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED) return NULL;
// Guard pages
mprotect(ptr, PAGE_SIZE, PROT_NONE);
mprotect(ptr + PAGE_SIZE + size, PAGE_SIZE, PROT_NONE);
// Lock in memory (prevent swap)
mlock(ptr + PAGE_SIZE, size);
return ptr + PAGE_SIZE;
#else
return malloc(size); // Fallback
#endif
}
// Secure deallocation
void secure_free(void *ptr, size_t size) {
if (ptr == NULL) return;
// Zero before free
volatile uint8_t *p = ptr;
for (size_t i = 0; i < size; i++) {
p[i] = 0;
}
#ifdef __linux__
munlock(ptr, size);
munmap(ptr - PAGE_SIZE, size + 2 * PAGE_SIZE);
#else
free(ptr);
#endif
}Preventing Compiler Optimization of Zeroing
// Compiler might optimize away memset if data isn't used after
void insecure_cleanup(uint8_t *secret, size_t len) {
memset(secret, 0, len); // May be removed by optimizer!
}
// Secure zeroing methods
// Method 1: Volatile pointer
void secure_zero_volatile(void *ptr, size_t len) {
volatile uint8_t *p = ptr;
while (len--) *p++ = 0;
}
// Method 2: Memory barrier
void secure_zero_barrier(void *ptr, size_t len) {
memset(ptr, 0, len);
__asm__ __volatile__("" : : "r"(ptr) : "memory");
}
// Method 3: Platform-specific
#ifdef _WIN32
#include <windows.h>
#define secure_zero(ptr, len) SecureZeroMemory(ptr, len)
#elif defined(__APPLE__)
#include <string.h>
#define secure_zero(ptr, len) memset_s(ptr, len, 0, len)
#else
#define secure_zero(ptr, len) explicit_bzero(ptr, len)
#endifKey Lifecycle Management
typedef struct {
uint8_t data[SECRET_KEY_BYTES];
int initialized;
time_t created;
int use_count;
} managed_secret_key;
// Initialize key with secure memory
int key_init(managed_secret_key *key) {
if (mlock(key, sizeof(*key)) != 0) {
return -1; // Failed to lock memory
}
key->initialized = 0;
key->use_count = 0;
return 0;
}
// Generate new key
int key_generate(managed_secret_key *key) {
int ret = ml_kem_keygen(NULL, key->data); // Only secret key
if (ret == 0) {
key->initialized = 1;
key->created = time(NULL);
}
return ret;
}
// Use key (with logging/limiting)
int key_use(managed_secret_key *key, operation_fn op) {
if (!key->initialized) return -1;
if (key->use_count >= MAX_USES) return -2; // Key exhausted
key->use_count++;
return op(key->data);
}
// Destroy key
void key_destroy(managed_secret_key *key) {
secure_zero(key->data, SECRET_KEY_BYTES);
key->initialized = 0;
key->use_count = 0;
munlock(key, sizeof(*key));
}Part 5: Fault Attack Countermeasures
Double Computation
// Verify NTT by computing twice and comparing
void secure_ntt(poly *out, const poly *in) {
poly temp1, temp2;
// First computation
ntt(&temp1, in);
// Second computation
ntt(&temp2, in);
// Compare (constant-time)
if (!ct_poly_equal(&temp1, &temp2)) {
// Fault detected!
abort(); // Or handle appropriately
}
poly_copy(out, &temp1);
}Algorithmic Verification
// ML-KEM decapsulation with verification
int secure_decapsulate(uint8_t *ss, const uint8_t *ct, const secret_key *sk) {
uint8_t m[32];
uint8_t ct_check[CIPHERTEXT_BYTES];
// Decrypt
decrypt(m, ct, sk);
// Re-encrypt to verify
uint8_t randomness[32];
derive_randomness(randomness, m);
encrypt(ct_check, m, randomness, &sk->public_key);
// Verify ciphertexts match
int valid = ct_equal(ct, ct_check, CIPHERTEXT_BYTES);
// Compute shared secret (constant-time regardless of validity)
uint8_t ss_valid[32], ss_invalid[32];
kdf(ss_valid, m, 32);
kdf(ss_invalid, sk->implicit_rejection_key, 32);
ct_cmov(ss, ss_invalid, 32, 1);
ct_cmov(ss, ss_valid, 32, valid);
secure_zero(m, 32);
secure_zero(randomness, 32);
return 0; // Always return success (timing!)
}Signature Verification Check
// After signing, verify the signature
int secure_sign(signature *sig, const uint8_t *msg, size_t msg_len,
const secret_key *sk) {
int ret = ml_dsa_sign(sig, msg, msg_len, sk);
if (ret != 0) return ret;
// Verify our own signature
public_key pk;
derive_public_key(&pk, sk);
ret = ml_dsa_verify(sig, msg, msg_len, &pk);
if (ret != 0) {
// Signature verification failed - possible fault attack!
secure_zero(sig, sizeof(*sig));
return -1;
}
return 0;
}Part 6: Testing and Validation
Known Answer Tests (KAT)
// Test vectors from NIST
typedef struct {
uint8_t seed[32];
uint8_t pk[PUBLIC_KEY_BYTES];
uint8_t sk[SECRET_KEY_BYTES];
uint8_t ct[CIPHERTEXT_BYTES];
uint8_t ss[32];
} kat_vector;
int run_kat_tests(void) {
const kat_vector vectors[] = {
// From NIST PQC test vectors
{ .seed = {...}, .pk = {...}, ... },
// More vectors...
};
for (size_t i = 0; i < sizeof(vectors)/sizeof(vectors[0]); i++) {
uint8_t pk[PUBLIC_KEY_BYTES], sk[SECRET_KEY_BYTES];
uint8_t ct[CIPHERTEXT_BYTES], ss[32];
// Deterministic keygen with seed
keygen_with_seed(pk, sk, vectors[i].seed);
if (memcmp(pk, vectors[i].pk, PUBLIC_KEY_BYTES) != 0) {
printf("KAT %zu: Public key mismatch\n", i);
return -1;
}
// ... verify all fields ...
}
return 0;
}Fuzzing
// AFL/libFuzzer harness for decapsulation
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size < SECRET_KEY_BYTES + CIPHERTEXT_BYTES) {
return 0; // Not enough data
}
uint8_t ss[32];
const uint8_t *sk = data;
const uint8_t *ct = data + SECRET_KEY_BYTES;
// Should handle any input without crashing
ml_kem_decapsulate(ss, ct, sk);
return 0;
}Property-Based Testing
# Using Hypothesis for Python implementation testing
from hypothesis import given, strategies as st
@given(st.binary(min_size=32, max_size=32))
def test_keygen_decapsulate_roundtrip(seed):
"""Encapsulation followed by decapsulation should recover shared secret."""
pk, sk = ml_kem_keygen(seed)
ct, ss_enc = ml_kem_encapsulate(pk)
ss_dec = ml_kem_decapsulate(ct, sk)
assert ss_enc == ss_dec
@given(st.binary(min_size=CIPHERTEXT_BYTES, max_size=CIPHERTEXT_BYTES))
def test_invalid_ciphertext_handling(random_ct):
"""Random ciphertext should not crash and should return implicit rejection."""
_, sk = ml_kem_keygen(os.urandom(32))
# Should not raise exception
ss = ml_kem_decapsulate(random_ct, sk)
# Should return something (implicit rejection)
assert len(ss) == 32Part 7: Common Implementation Pitfalls
Pitfall 1: Non-Constant-Time Modular Reduction
// BAD: Conditional subtraction
uint32_t mod_reduce_bad(uint32_t a, uint32_t q) {
if (a >= q) a -= q; // Branch leaks if a >= q!
return a;
}
// GOOD: Barrett reduction (constant-time)
uint32_t mod_reduce_barrett(uint32_t a, uint32_t q, uint64_t barrett_const) {
uint64_t t = ((uint64_t)a * barrett_const) >> 32;
t = a - t * q;
t -= q & -(t >= q); // Constant-time conditional subtraction
return (uint32_t)t;
}Pitfall 2: Variable-Time Polynomial Sampling
// BAD: Early exit in CBD
void cbd_bad(poly *r, const uint8_t *buf) {
for (int i = 0; i < N; i++) {
int a = popcount(buf[i] & 0x0F); // Variable-time popcount!
int b = popcount(buf[i] >> 4);
r->coeffs[i] = a - b;
}
}
// GOOD: Constant-time bit counting
static inline int ct_popcount4(uint8_t x) {
x = (x & 0x5) + ((x >> 1) & 0x5);
x = (x & 0x3) + ((x >> 2) & 0x3);
return x;
}
void cbd_good(poly *r, const uint8_t *buf) {
for (int i = 0; i < N; i++) {
int a = ct_popcount4(buf[i] & 0x0F);
int b = ct_popcount4(buf[i] >> 4);
r->coeffs[i] = a - b;
}
}Pitfall 3: Forgetting to Clear Sensitive Data
// BAD: Stack data not cleared
void decrypt_bad(uint8_t *plaintext, const uint8_t *ciphertext, const key *k) {
uint8_t temp_key[32]; // On stack
derive_key(temp_key, k);
decrypt_internal(plaintext, ciphertext, temp_key);
// temp_key still on stack! Can be recovered from memory dump.
}
// GOOD: Explicit cleanup
void decrypt_good(uint8_t *plaintext, const uint8_t *ciphertext, const key *k) {
uint8_t temp_key[32];
derive_key(temp_key, k);
decrypt_internal(plaintext, ciphertext, temp_key);
secure_zero(temp_key, 32); // Clear before return
}Pitfall 4: Using System rand()
// BAD: Predictable randomness
void keygen_bad(uint8_t *pk, uint8_t *sk) {
srand(time(NULL)); // Predictable seed!
for (int i = 0; i < 32; i++) {
seed[i] = rand() % 256; // Also poor distribution
}
keygen_internal(pk, sk, seed);
}
// GOOD: Cryptographic randomness
void keygen_good(uint8_t *pk, uint8_t *sk) {
uint8_t seed[32];
if (getrandom(seed, 32, 0) != 32) {
abort(); // Must handle failure!
}
keygen_internal(pk, sk, seed);
}Part 8: Certification and Validation
FIPS 140-3 Considerations
| Requirement | PQC Implication |
|---|---|
| Algorithm validation | CAVP testing for ML-KEM/ML-DSA |
| Self-tests | Power-on KAT, conditional tests |
| Key management | Zeroization, access control |
| Physical security | Side-channel resistance at Level 3+ |
| Entropy | NIST SP 800-90B compliance |
Common Criteria Evaluation
For high assurance (EAL4+):
- Formal specification of algorithm
- Correspondence between spec and implementation
- Vulnerability analysis
- Penetration testing (including side-channels)
- Documentation of security functionsPQC-Specific Validation Challenges
- No CAVP for PQC yet (as of 2025, in development)
- Side-channel testing standards evolving
- Limited certified HSM support
- New attack surfaces to evaluate
Summary: Implementation Security Checklist
□ Constant-Time Implementation
□ No secret-dependent branches
□ No secret-dependent memory access
□ Verified with timing leak detection tools
□ Secure Randomness
□ Using cryptographic RNG
□ Properly seeded
□ Error handling for RNG failures
□ Memory Safety
□ Sensitive data cleared after use
□ Compiler optimization not removing clears
□ Memory locking where appropriate
□ Fault Resistance
□ Double computation for critical operations
□ Verification after signing
□ Safe failure modes
□ Testing
□ Known Answer Tests pass
□ Property-based testing
□ Fuzzing completed
□ Side-channel testing
□ Key Management
□ Secure key storage
□ Key lifecycle management
□ Key destruction procedures