Skip to main content

Command Palette

Search for a command to run...

Mastering Portability: Solving Cross-Architecture Challenges in Linux Userspace and Kernel

Updated
17 min read
Mastering Portability: Solving Cross-Architecture Challenges in Linux Userspace and Kernel

You've just spent three hours debugging why your userspace application crashes every time it talks to your kernel module. The ioctl calls return garbage data, pointers that seems correct in application is corrupted, and structure fields contain completely wrong values. You check your code once, twice, ten times—everything looks correct. These bugs are not exotic corner cases—they appear in real drivers, real production systems, and especially in embedded environments where a single kernel image must support both 32‑bit and 64‑bit applications. Welcome to the world of cross-architecture communication nightmares.

Mixing 32-bit userspace with 64-bit kernels is common in embedded systems, industrial controllers, and legacy enterprise environments. Maybe you're maintaining a 20-year-old application that customers refuse to recompile. Maybe your embedded device has memory constraints that make 32-bit binaries attractive. Maybe you're building a system that needs to support both old and new hardware. Whatever the reason, the moment your userspace and kernel operate at different bitness levels, you enter a minefield of portability issues.

Why This Problem Exists

When a 32-bit process communicates with a 64-bit kernel, you're crossing a fundamental boundary. These two environments have different assumptions about:

  • Pointer sizes: 4 bytes vs 8 bytes

  • Data alignment requirements: 32-bit vs 64-bit natural alignment

  • Structure padding: Compilers add different padding to maintain alignment

  • Address space interpretation: Virtual addresses have completely different meanings

The kernel and userspace must agree on the exact memory layout of every shared data structure. Get it wrong by even a single byte, and your data becomes corrupted. Miss a padding byte, and your kernel reads from the wrong offset. Pass a pointer thinking it will work across the boundary, and you'll crash the system.

What You'll Learn

In this blog, we'll tackle the most common portability pitfalls when your userspace and kernel speak different architectural languages. You'll learn how to detect these issues, understand why they happen, and apply proven solutions that work in production systems. Each challenge includes real code, debugging techniques, and practical fixes you can use immediately.

Let's start with the most fundamental issue: data structure alignment.


Challenge 1: Data Structure Alignment:

The Problem

Imagine you define a simple structure in your userspace application to communicate sensor data to your kernel module:

struct sensor {
    uint32_t sensor_id;    // 4 bytes
    uint64_t timestamp;    // 8 bytes
    float value;           // 4 bytes
};

You calculate the size: 4 + 8 + 4 = 16 bytes. Simple, right? But when you print sizeof(struct sensor) in your 32-bit userspace application, you get 16 bytes. In your 64-bit kernel module, you get 24 bytes. Your structure grew by 50% just by crossing the architecture boundary.

This isn't a compiler bug or cosmic rays flipping bits. This is structure padding, and it's going to corrupt your data if you don't handle it correctly.

Why Padding Happens

Modern CPUs have alignment requirements for efficient memory access. A 64-bit processor prefers (or on some architectures, requires) that 64-bit values start at addresses divisible by 8. When your structure contains a 64-bit field like timestamp, the compiler adds invisible padding bytes to ensure proper alignment.

Here's what actually happens in memory:

32-bit Userspace Layout:
Offset:  0       4       8       12      16
        ┌───────┬───────────────┬───────┐
        │ s_id  │   timestamp   │ value │
        │  4B   │      8B       │  4B   │
        └───────┴───────────────┴───────┘
Total: 16 bytes

The 32-bit compiler places timestamp at offset 4. On 32-bit systems, this is acceptable as it expexts a 4-byte alignment.

64-bit Kernel Layout:
Offset:  0       4       8               16      20      24
        ┌───────┬───────┬───────────────┬───────┬───────┐
        │ s_id  │ PAD   │   timestamp   │ value │ PAD   │
        │  4B   │  4B   │      8B       │  4B   │  4B   │
        └───────┴───────┴───────────────┴───────┴───────┘
Total: 24 bytes

The 64-bit compiler adds 4 bytes of padding after sensor_id to ensure timestamp starts at offset 8 (8-byte aligned). It also adds 4 bytes of padding at the end to make the total structure size a multiple of its largest alignment requirement (8 bytes).

What Goes Wrong at Runtime

When your userspace sends this structure to the kernel via an ioctl or write call, the kernel reads it with the wrong layout:

// Userspace (32-bit) writes:
// [sensor_id][timestamp_bytes_0-7][value]

// Kernel (64-bit) reads:
// [sensor_id][padding][timestamp_bytes_4-11][value]

The kernel skips 4 bytes expecting padding, then reads the timestamp from the wrong position. Your timestamp becomes corrupted, your value reads garbage, and you spend hours wondering why 0xDEADBEEF keeps appearing in your logs.

How to Detect This Issue

1. Compare sizeof() or offsetof() on Both Sides

Add debug prints in both userspace and kernel to check structure size and field positions:

// Userspace
#include <stddef.h>
printf("Userspace sizeof(struct sensor): %zu\n", sizeof(struct sensor));
printf("timestamp offset: %zu\n", offsetof(struct sensor, timestamp));
printf("value offset: %zu\n", offsetof(struct sensor, value));

// Kernel
printk(KERN_INFO "Kernel sizeof(struct sensor): %zu\n", sizeof(struct sensor));
printk(KERN_INFO "timestamp offset: %zu\n", offsetof(struct sensor, timestamp));
printk(KERN_INFO "value offset: %zu\n", offsetof(struct sensor, value));

Different sizes or offsets = alignment problem. The offsetof() macro is particularly useful because it shows you exactly where each field lives in memory, revealing hidden padding.

2. Use pahole Tool

The pahole tool (from the dwarves package) shows you exactly how the compiler laid out your structure:

# Compile your code with debug symbols
gcc -g -o myapp myapp.c

# Examine structure layout
pahole myapp

# Output shows:
struct sensor {
    uint32_t    sensor_id;     /*     0     4 */
    /* XXX 4 bytes hole, try to pack */
    uint64_t    timestamp;     /*     8     8 */
    float       value;         /*    16     4 */
    /* size: 24, cachelines: 1, members: 3 */
    /* sum members: 16, holes: 1, sum holes: 4 */
    /* padding: 4 */
    /* last cacheline: 24 bytes */
};

Solution:

Now that you can detect the problem, here are three ways to fix it, with their trade-offs:

  1. Explicit Padding (Recommended)

    Add padding fields manually in your 32-bit userspace structure to match the 64-bit kernel layout:

struct sensor {
    uint32_t sensor_id;
    uint32_t padding1;     // Explicit padding
    uint64_t timestamp;
    float value;
    uint32_t padding2;     // Align to 8-byte boundary
};

Pros:

  • Crystal clear what's happening

  • No compiler magic or attributes

  • Works across all compilers

  • Easiest to review and maintain

Cons:

  • Slightly verbose

  • Must manually calculate padding

When to use: This is the safest approach for kernel/userspace interfaces. Always prefer explicit padding for production code.

  1. Attribute Packed Attribute:

    Tell the compiler to pack the structure tightly with no padding:

// Both Kernel space and userspace structure defined with packed attribute
struct sensor {
    uint32_t sensor_id;
    uint64_t timestamp;
    float value;
}__attribute__((packed));

Pros:

  • Minimal size (16 bytes instead of 24)

  • No manual padding calculation

Cons:

  • Unaligned access penalty: On some architectures (ARM, older MIPS), accessing unaligned data is slower or can cause bus errors

  • The CPU may need multiple memory accesses to read a single 64-bit field

  • Performance impact can be 2-10x slower for unaligned reads

  • Not all compilers support this attribute (though GCC and Clang do)

When to use: When memory is extremely tight and you've profiled that the performance penalty is acceptable. Common in network protocol headers and flash storage formats.

  1. #pragma pack:

  2.    // Kernel space and userspace data structure with pragma pack directive
       #pragma pack(push, 1)
       struct sensor {
           uint32_t sensor_id;
           uint64_t timestamp;
           float value;
       };
       #pragma pack(pop)
    

The push, 1 sets packing to 1-byte alignment, and pop restores the previous setting.

Pros:

  • More portable than attribute

  • Can control packing level (1, 2, 4, 8 bytes)

  • Cleaner syntax for multiple structures

Cons:

  • Same unaligned access penalties as packed attribute

  • Less visible than explicit padding (hidden in pragmas)

  • Easy to forget the pop and affect other structures

When to use: When you need to pack multiple structures and want finer control over alignment, or when working with compilers that don't support GCC attributes.

Performance Impact

SolutionSizeAccess SpeedPortability
Explicit PaddingLarger (24B)Fast (aligned)Excellent
Packed AttributeSmaller (16B)Slow (unaligned)Good
Optimal OrderingSmaller (16B)Fast (aligned)Excellent

For kernel interfaces, always prioritize correctness over size. The extra 8 bytes of padding is negligible compared to the cost of debugging data corruption in production.

Challenge 2: Pointer and Payload Transmission

The Problem

Here's a mistake that many developers make when they first write kernel interfaces. They define a structure like this:

struct payload {
    uint32_t size;
    void *data;
};

Then they write userspace code that creates this structure, fills in the pointer, and sends it to the kernel via ioctl:

// Userspace application
uint32_t sensor_data = 0xDEADBEEF;
struct payload info;

info.data = &sensor_data;  // Pointer to userspace memory
info.size = sizeof(sensor_data);

// Send to kernel
ioctl(fd, MY_IOCTL_CMD, &info);

In the kernel module, they try to dereference the pointer:

// Kernel module
struct payload info;
copy_from_user(&info, (void *)arg, sizeof(struct payload));

// Try to access the data
uint32_t value = *(uint32_t *)(info.data);  // CRASH or garbage!
printk(KERN_INFO "Value: 0x%x\n", value);

This code has two fundamental problems, and understanding them is critical for writing correct kernel interfaces.

Problem 1: Address Space Separation

The first and most important issue: userspace pointers are meaningless in kernel space. Even if both were running at the same bitness (both 64-bit or both 32-bit), this code would still fail.

Linux uses virtual memory. Every process has its own virtual address space. When userspace says "my data is at address 0x12345000", that address only has meaning within that process's address space. The kernel operates in a completely different address space. The physical memory at that address might contain completely different data—or might not even be mapped at all.

If the kernel tries to directly dereference a userspace pointer, one of these things happens:

  • Page fault and kernel panic - The address isn't mapped in kernel space

  • Reading wrong data - The address points to different memory in kernel space

  • Security violation - The kernel might accidentally read privileged kernel memory

This is why Linux provides copy_from_user() and copy_to_user() functions—they safely transfer data between address spaces.

Problem 2: Pointer Size Mismatch

The second issue appears specifically in cross-architecture scenarios. In your structure:

struct payload {
    uint32_t size;
    void *data;    // 4 bytes in 32-bit, 8 bytes in 64-bit
};

The pointer size differs:

  • 32-bit userspace: void * is 4 bytes → structure is 8 bytes total

  • 64-bit kernel: void * is 8 bytes → structure is 16 bytes total (with padding)

When userspace sends this structure to the kernel, the memory layouts don't match:

32-bit Userspace:
Offset:  0       4       8
        ┌───────┬───────┐
        │ size  │ data  │
        │  4B4B   │
        └───────┴───────┘
Total: 8 bytes

64-bit Kernel:
Offset:  0       4       8               16
        ┌───────┬───────┬───────────────┐
        │ size  │ PAD   │     data      │
        │  4B4B8B       │
        └───────┴───────┴───────────────┘
Total: 16 bytes

The kernel reads garbage for the pointer value because it's reading from the wrong offset.

Complete Example: The Wrong Way

Let's see a complete example that demonstrates both problems:

Userspace (wrong_userspace.c):

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <stdint.h>

#define IOCTL_GET_DATA _IOR('M', 0, struct payload)

struct payload {
    uint32_t size;
    void *data;
};

int main() {
    int fd;
    struct payload info;
    uint32_t sensor_data = 0xDEADBEEF;

    // Set up payload with userspace pointer
    info.data = &sensor_data;
    info.size = sizeof(sensor_data);

    printf("Userspace: data address = %p\n", info.data);
    printf("Userspace: data value = 0x%x\n", sensor_data);
    printf("Userspace: sizeof(payload) = %zu\n", sizeof(struct payload));

    fd = open("/dev/my_device", O_RDWR);
    if (fd < 0) {
        perror("Failed to open device");
        return -1;
    }

    // This will fail or produce garbage in the kernel
    if (ioctl(fd, IOCTL_GET_DATA, &info) < 0) {
        perror("ioctl failed");
        close(fd);
        return -1;
    }

    close(fd);
    return 0;
}

Kernel Module (wrong_kernel.c):

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
#include <linux/device.h>

#define IOCTL_GET_DATA _IOR('M', 0, struct payload)

struct payload {
    uint32_t size;
    void *data;
};

static long device_ioctl(struct file *file, unsigned int cmd, unsigned long arg) {
    struct payload info;
    uint32_t value;

    if (cmd != IOCTL_GET_DATA)
        return -EINVAL;

    // Copy the payload structure itself
    if (copy_from_user(&info, (void *)arg, sizeof(struct payload))) {
        pr_err("Failed to copy payload from userspace\n");
        return -EFAULT;
    }

    pr_info("Kernel: sizeof(payload) = %zu\n", sizeof(struct payload));
    pr_info("Kernel: data pointer = %p\n", info.data);

    // WRONG: Trying to dereference userspace pointer directly
    value = *(uint32_t *)(info.data);  // CRASH or garbage!
    pr_info("Kernel: data value = 0x%x\n", value);

    return 0;
}

static const struct file_operations device_fops = {
    .unlocked_ioctl = device_ioctl,
};

// ... device registration code ...

What happens:

Userspace: data address = 0xbffff7a4
Userspace: data value = 0xdeadbeef
Userspace: sizeof(payload) = 8

Kernel: sizeof(payload) = 16
Kernel: data pointer = 0xf7a400000000  // Garbage! Wrong offset
Kernel: Oops: 0000 [#1] SMP          // Kernel panic

The kernel reads the pointer from the wrong offset (due to size mismatch) and then tries to dereference it (crossing address spaces), causing a crash.

Solution:

Now let's look at three correct ways to handle data transfer between userspace and kernel.

1. Pass Data Directly (Inline Data)

For small amounts of data, embed it directly in the structure instead of using a pointer:

Corrected Structures:

#define MAX_DATA_SIZE 64

struct payload {
    uint32_t size;
    uint32_t padding;      // Explicit padding for 64-bit alignment
    uint8_t data[MAX_DATA_SIZE];  // Inline data, not pointer
};

Userspace:

struct payload info;
uint32_t sensor_data = 0xDEADBEEF;

// Copy data directly into the structure
info.size = sizeof(sensor_data);
memcpy(info.data, &sensor_data, sizeof(sensor_data));

ioctl(fd, IOCTL_GET_DATA, &info);

Kernel:

struct payload info;
uint32_t value;

if (copy_from_user(&info, (void *)arg, sizeof(struct payload)))
    return -EFAULT;

// Data is already in kernel space, safe to access
memcpy(&value, info.data, sizeof(value));
pr_info("Kernel: data value = 0x%x\n", value);

Pros:

  • Simple and straightforward

  • No pointer issues

  • Single copy operation

  • No size mismatch problems

Cons:

  • Limited to fixed maximum size

  • Wastes memory if actual data is smaller

  • Not suitable for large or variable-sized data

When to use: For small, fixed-size data like sensor readings, configuration values, or status information.

2. Use copy_from_user() with Userspace Pointer

For larger or variable-sized data, keep the pointer but use it correctly with copy_from_user():

Structures (Fixed-Size Types):

struct payload {
    uint32_t size;
    uint32_t padding;      // Explicit padding
    uint64_t data_ptr;     // Always 64-bit, stores userspace address
};

Notice we use uint64_t instead of void *. This ensures consistent size across architectures.

Userspace:

struct payload info;
uint32_t sensor_data = 0xDEADBEEF;

info.size = sizeof(sensor_data);
info.data_ptr = (uint64_t)(uintptr_t)&sensor_data;  // Store address as 64-bit value

ioctl(fd, IOCTL_GET_DATA, &info);

Kernel:

struct payload info;
uint32_t *kernel_buffer;
uint32_t value;

if (copy_from_user(&info, (void *)arg, sizeof(struct payload)))
    return -EFAULT;
// Validate size
if (info.size > MAX_ALLOWED_SIZE)
    return -EINVAL;
// Allocate kernel buffer
kernel_buffer = kmalloc(info.size, GFP_KERNEL);
if (!kernel_buffer)
    return -ENOMEM;
// Safely copy data from userspace to kernel space
if (copy_from_user(kernel_buffer, (void *)(uintptr_t)info.data_ptr, info.size)) {
    kfree(kernel_buffer);
    return -EFAULT;
}
// Now we can safely access the data
value = *kernel_buffer;
pr_info("Kernel: data value = 0x%x\n", value);

kfree(kernel_buffer);

Pros:

  • Handles variable-sized data

  • Memory efficient

  • Industry standard approach

Cons:

  • Requires two copy operations (structure, then data)

  • More complex error handling

  • Must allocate kernel memory

When to use: For large buffers, variable-sized data, or when memory efficiency matters.

3. Flexible Array Member (Variable-Length Payload)
For variable-sized data where you want to send everything in a single buffer, use a flexible array member (also called zero-length array):

struct payload {
    uint32_t size;
    uint32_t padding;
    uint8_t data[];        // Flexible array member (C99 and later)
    // or: uint8_t data[0]; // Old GCC extension, same effect
};

This is a special C feature where the array at the end of the structure has no fixed size. You allocate the structure with extra space for the actual data.

Userpsace:

uint32_t sensor_data = 0xDEADBEEF;
size_t total_size = sizeof(struct payload) + sizeof(sensor_data);
struct payload *info;

// Allocate structure + data in one block
info = malloc(total_size);
if (!info) {
    perror("malloc failed");
    return -1;
}

info->size = sizeof(sensor_data);
memcpy(info->data, &sensor_data, sizeof(sensor_data));

// Send entire buffer in one ioctl
if (ioctl(fd, IOCTL_GET_DATA, info) < 0) {
    perror("ioctl failed");
    free(info);
    return -1;
}

free(info);

Kernel:

static long device_ioctl(struct file *file, unsigned int cmd, unsigned long arg) {
    struct payload *info;
    uint32_t value;
    size_t total_size;

    // First, get just the header to know the size
    struct payload header;
    if (copy_from_user(&header, (void __user *)arg, sizeof(struct payload)))
        return -EFAULT;

    // Validate size
    if (header.size == 0 || header.size > MAX_ALLOWED_SIZE)
        return -EINVAL;

    // Calculate total size
    total_size = sizeof(struct payload) + header.size;

    // Allocate kernel buffer for entire payload
    info = kmalloc(total_size, GFP_KERNEL);
    if (!info)
        return -ENOMEM;

    // Copy the entire structure including data
    if (copy_from_user(info, (void __user *)arg, total_size)) {
        kfree(info);
        return -EFAULT;
    }

    // Now safely access the data
    memcpy(&value, info->data, sizeof(value));
    pr_info("Kernel: data value = 0x%x\n", value);

    kfree(info);
    return 0;
}

How It Works:

The flexible array member data[] doesn't add to the structure's size:

sizeof(struct payload) = 8 bytes  // just size + padding

But you can allocate extra memory and access it through the array:

malloc(sizeof(struct payload) + 100);  // 8 + 100 = 108 bytes total
info->data[0] through info->data[99] are all valid

Pros:

  • Single memory allocation in userspace

  • Single copy operation to kernel

  • Natural for variable-sized data

  • Efficient for sending buffers of different sizes

  • No separate pointer indirection

Cons:

  • Requires two-step copy in kernel (header first to get size, then full payload)

  • More complex memory management

  • Must carefully calculate total size

  • Old compilers might not support data[] syntax (use data[0] instead)

When to use: When you need variable-sized payloads and want to avoid the two-copy overhead of separate pointer approach. Common in network protocols, file operations, and variable-length messages.

Important Note: In modern C (C99 and later), use uint8_t data[] (flexible array member). For older code, you might see uint8_t data[0] (GNU extension), which works the same way but isn't standard C. Both achieve the same result.

Comparison of Solutions

ApproachMax Data SizeCopiesMemoryComplexityUse Case
Inline DataFixed (small)1HigherLowSensor data, configs
copy_from_user()Variable (large)2EfficientMediumBuffers, variable data
Flexible Array (data[0])Variable (large)2*EfficientMediumNetwork protocols, messages

*Requires header copy then full payload copy


Best Practices for Cross-Architecture Interfaces

After understanding the challenges, here are the essential practices for building robust kernel interfaces:

1. Always Use Fixed-Size Types

// WRONG - sizes vary
struct bad_example {
    int value;           // 32-bit or 64-bit?
    long timestamp;      // 32-bit or 64-bit?
    size_t count;        // 32-bit or 64-bit?
    void *ptr;           // 4 bytes or 8 bytes?
};

// CORRECT - explicit sizes
struct good_example {
    uint32_t value;      // Always 32-bit
    uint64_t timestamp;  // Always 64-bit
    uint32_t count;      // Always 32-bit
    uint64_t ptr;        // Store address as 64-bit value
};

2. Add Explicit Padding

struct sensor {
    uint32_t sensor_id;
    uint32_t padding1;      // Align timestamp to 8 bytes
    uint64_t timestamp;
    float value;
    uint32_t padding2;      // Pad to 8-byte boundary
};

3. Never Pass Pointers Across the Boundary

Userspace pointers are meaningless in kernel space. Always use:

  • Inline data for small payloads

  • copy_from_user() / copy_to_user() for data transfer

  • uint64_t to store addresses if you must pass them

  • Flexible arrays (data[]) for variable-length payloads\

4. Validate Everything in the Kernel and use Static Assertions

// Check size limits
if (info.size == 0 || info.size > MAX_ALLOWED_SIZE)
    return -EINVAL;

// Validate pointers before copy_from_user
if (!access_ok(user_ptr, size))
    return -EFAULT;

_Static_assert(sizeof(struct payload) == 16, 
               "payload structure size must be 16 bytes");

_Static_assert(offsetof(struct payload, timestamp) == 8,
               "timestamp must be at offset 8");

Debugging Cross-Architecture Issues

When things go wrong, these techniques will help you find the problem quickly.

1. Add Size Checks at Runtime

Compare structure layouts between userspace and kernel:

// Userspace
printf("Userspace: sizeof(payload)=%zu, offset(data)=%zu\n",
       sizeof(struct payload), offsetof(struct payload, data));

// Kernel
pr_info("Kernel: sizeof(payload)=%zu, offset(data)=%zu\n",
        sizeof(struct payload), offsetof(struct payload, data));

2. Cross-Compile and Test Both Architectures

Don't wait until production to discover architecture issues:

# Compile for 32-bit
gcc -m32 -o myapp32 myapp.c

# Compile for 64-bit
gcc -m64 -o myapp64 myapp.c

# Run both and compare output
./myapp32  # Check sizeof() output
./myapp64  # Should match expected values

# For kernel modules, use cross-compilation toolchain
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- M=/path/to/module
make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- M=/path/to/module

3. Dump Memory Layout

When data looks corrupted, dump the raw bytes:

void dump_payload(struct payload *p) {
    unsigned char *bytes = (unsigned char *)p;
    pr_info("Payload bytes: ");
    for (int i = 0; i < sizeof(struct payload); i++) {
        printk(KERN_CONT "%02x ", bytes[i]);
    }
    printk(KERN_CONT "\n");
}

Compare the hex dump from userspace and kernel to see where the mismatch occurs.


Conclusion

Cross-architecture communication between 32-bit userspace and 64-bit kernel comes down to two fundamental rules: structures must have identical memory layouts, and pointers cannot cross the boundary. Use fixed-size types, add explicit padding, and always transfer data through copy_from_user() or inline it directly. When things break, tools like pahole, sparse, and cross-compilation testing will quickly reveal where your assumptions diverged from reality.

The key insight: you're designing an ABI, not just an API. Every byte matters. Get it wrong, and you're debugging crashes at 3 AM. Get it right once with these practices, and your interface works reliably for years across any architecture. Think about portability from day one—your future self will thank you.


Further Reading:


Did you find this helpful? Have you encountered other cross-architecture pitfalls? Share your experiences in the comments below.