Mastering Portability: Solving Cross-Architecture Challenges in Linux Userspace and Kernel

You've just spent three hours debugging why your userspace application crashes every time it talks to your kernel module. The ioctl calls return garbage data, pointers that seems correct in application is corrupted, and structure fields contain completely wrong values. You check your code once, twice, ten times—everything looks correct. These bugs are not exotic corner cases—they appear in real drivers, real production systems, and especially in embedded environments where a single kernel image must support both 32‑bit and 64‑bit applications. Welcome to the world of cross-architecture communication nightmares.
Mixing 32-bit userspace with 64-bit kernels is common in embedded systems, industrial controllers, and legacy enterprise environments. Maybe you're maintaining a 20-year-old application that customers refuse to recompile. Maybe your embedded device has memory constraints that make 32-bit binaries attractive. Maybe you're building a system that needs to support both old and new hardware. Whatever the reason, the moment your userspace and kernel operate at different bitness levels, you enter a minefield of portability issues.
Why This Problem Exists
When a 32-bit process communicates with a 64-bit kernel, you're crossing a fundamental boundary. These two environments have different assumptions about:
Pointer sizes: 4 bytes vs 8 bytes
Data alignment requirements: 32-bit vs 64-bit natural alignment
Structure padding: Compilers add different padding to maintain alignment
Address space interpretation: Virtual addresses have completely different meanings
The kernel and userspace must agree on the exact memory layout of every shared data structure. Get it wrong by even a single byte, and your data becomes corrupted. Miss a padding byte, and your kernel reads from the wrong offset. Pass a pointer thinking it will work across the boundary, and you'll crash the system.
What You'll Learn
In this blog, we'll tackle the most common portability pitfalls when your userspace and kernel speak different architectural languages. You'll learn how to detect these issues, understand why they happen, and apply proven solutions that work in production systems. Each challenge includes real code, debugging techniques, and practical fixes you can use immediately.
Let's start with the most fundamental issue: data structure alignment.
Challenge 1: Data Structure Alignment:
The Problem
Imagine you define a simple structure in your userspace application to communicate sensor data to your kernel module:
struct sensor {
uint32_t sensor_id; // 4 bytes
uint64_t timestamp; // 8 bytes
float value; // 4 bytes
};
You calculate the size: 4 + 8 + 4 = 16 bytes. Simple, right? But when you print sizeof(struct sensor) in your 32-bit userspace application, you get 16 bytes. In your 64-bit kernel module, you get 24 bytes. Your structure grew by 50% just by crossing the architecture boundary.
This isn't a compiler bug or cosmic rays flipping bits. This is structure padding, and it's going to corrupt your data if you don't handle it correctly.
Why Padding Happens
Modern CPUs have alignment requirements for efficient memory access. A 64-bit processor prefers (or on some architectures, requires) that 64-bit values start at addresses divisible by 8. When your structure contains a 64-bit field like timestamp, the compiler adds invisible padding bytes to ensure proper alignment.
Here's what actually happens in memory:
32-bit Userspace Layout:
Offset: 0 4 8 12 16
┌───────┬───────────────┬───────┐
│ s_id │ timestamp │ value │
│ 4B │ 8B │ 4B │
└───────┴───────────────┴───────┘
Total: 16 bytes
The 32-bit compiler places timestamp at offset 4. On 32-bit systems, this is acceptable as it expexts a 4-byte alignment.
64-bit Kernel Layout:
Offset: 0 4 8 16 20 24
┌───────┬───────┬───────────────┬───────┬───────┐
│ s_id │ PAD │ timestamp │ value │ PAD │
│ 4B │ 4B │ 8B │ 4B │ 4B │
└───────┴───────┴───────────────┴───────┴───────┘
Total: 24 bytes
The 64-bit compiler adds 4 bytes of padding after sensor_id to ensure timestamp starts at offset 8 (8-byte aligned). It also adds 4 bytes of padding at the end to make the total structure size a multiple of its largest alignment requirement (8 bytes).
What Goes Wrong at Runtime
When your userspace sends this structure to the kernel via an ioctl or write call, the kernel reads it with the wrong layout:
// Userspace (32-bit) writes:
// [sensor_id][timestamp_bytes_0-7][value]
// Kernel (64-bit) reads:
// [sensor_id][padding][timestamp_bytes_4-11][value]
The kernel skips 4 bytes expecting padding, then reads the timestamp from the wrong position. Your timestamp becomes corrupted, your value reads garbage, and you spend hours wondering why 0xDEADBEEF keeps appearing in your logs.
How to Detect This Issue
1. Compare sizeof() or offsetof() on Both Sides
Add debug prints in both userspace and kernel to check structure size and field positions:
// Userspace
#include <stddef.h>
printf("Userspace sizeof(struct sensor): %zu\n", sizeof(struct sensor));
printf("timestamp offset: %zu\n", offsetof(struct sensor, timestamp));
printf("value offset: %zu\n", offsetof(struct sensor, value));
// Kernel
printk(KERN_INFO "Kernel sizeof(struct sensor): %zu\n", sizeof(struct sensor));
printk(KERN_INFO "timestamp offset: %zu\n", offsetof(struct sensor, timestamp));
printk(KERN_INFO "value offset: %zu\n", offsetof(struct sensor, value));
Different sizes or offsets = alignment problem. The offsetof() macro is particularly useful because it shows you exactly where each field lives in memory, revealing hidden padding.
2. Use pahole Tool
The pahole tool (from the dwarves package) shows you exactly how the compiler laid out your structure:
# Compile your code with debug symbols
gcc -g -o myapp myapp.c
# Examine structure layout
pahole myapp
# Output shows:
struct sensor {
uint32_t sensor_id; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
uint64_t timestamp; /* 8 8 */
float value; /* 16 4 */
/* size: 24, cachelines: 1, members: 3 */
/* sum members: 16, holes: 1, sum holes: 4 */
/* padding: 4 */
/* last cacheline: 24 bytes */
};
Solution:
Now that you can detect the problem, here are three ways to fix it, with their trade-offs:
Explicit Padding (Recommended)
Add padding fields manually in your 32-bit userspace structure to match the 64-bit kernel layout:
struct sensor {
uint32_t sensor_id;
uint32_t padding1; // Explicit padding
uint64_t timestamp;
float value;
uint32_t padding2; // Align to 8-byte boundary
};
Pros:
Crystal clear what's happening
No compiler magic or attributes
Works across all compilers
Easiest to review and maintain
Cons:
Slightly verbose
Must manually calculate padding
When to use: This is the safest approach for kernel/userspace interfaces. Always prefer explicit padding for production code.
Attribute Packed Attribute:
Tell the compiler to pack the structure tightly with no padding:
// Both Kernel space and userspace structure defined with packed attribute
struct sensor {
uint32_t sensor_id;
uint64_t timestamp;
float value;
}__attribute__((packed));
Pros:
Minimal size (16 bytes instead of 24)
No manual padding calculation
Cons:
Unaligned access penalty: On some architectures (ARM, older MIPS), accessing unaligned data is slower or can cause bus errors
The CPU may need multiple memory accesses to read a single 64-bit field
Performance impact can be 2-10x slower for unaligned reads
Not all compilers support this attribute (though GCC and Clang do)
When to use: When memory is extremely tight and you've profiled that the performance penalty is acceptable. Common in network protocol headers and flash storage formats.
#pragma pack:
// Kernel space and userspace data structure with pragma pack directive #pragma pack(push, 1) struct sensor { uint32_t sensor_id; uint64_t timestamp; float value; }; #pragma pack(pop)
The push, 1 sets packing to 1-byte alignment, and pop restores the previous setting.
Pros:
More portable than attribute
Can control packing level (1, 2, 4, 8 bytes)
Cleaner syntax for multiple structures
Cons:
Same unaligned access penalties as packed attribute
Less visible than explicit padding (hidden in pragmas)
Easy to forget the
popand affect other structures
When to use: When you need to pack multiple structures and want finer control over alignment, or when working with compilers that don't support GCC attributes.
Performance Impact
| Solution | Size | Access Speed | Portability |
| Explicit Padding | Larger (24B) | Fast (aligned) | Excellent |
| Packed Attribute | Smaller (16B) | Slow (unaligned) | Good |
| Optimal Ordering | Smaller (16B) | Fast (aligned) | Excellent |
For kernel interfaces, always prioritize correctness over size. The extra 8 bytes of padding is negligible compared to the cost of debugging data corruption in production.
Challenge 2: Pointer and Payload Transmission
The Problem
Here's a mistake that many developers make when they first write kernel interfaces. They define a structure like this:
struct payload {
uint32_t size;
void *data;
};
Then they write userspace code that creates this structure, fills in the pointer, and sends it to the kernel via ioctl:
// Userspace application
uint32_t sensor_data = 0xDEADBEEF;
struct payload info;
info.data = &sensor_data; // Pointer to userspace memory
info.size = sizeof(sensor_data);
// Send to kernel
ioctl(fd, MY_IOCTL_CMD, &info);
In the kernel module, they try to dereference the pointer:
// Kernel module
struct payload info;
copy_from_user(&info, (void *)arg, sizeof(struct payload));
// Try to access the data
uint32_t value = *(uint32_t *)(info.data); // CRASH or garbage!
printk(KERN_INFO "Value: 0x%x\n", value);
This code has two fundamental problems, and understanding them is critical for writing correct kernel interfaces.
Problem 1: Address Space Separation
The first and most important issue: userspace pointers are meaningless in kernel space. Even if both were running at the same bitness (both 64-bit or both 32-bit), this code would still fail.
Linux uses virtual memory. Every process has its own virtual address space. When userspace says "my data is at address 0x12345000", that address only has meaning within that process's address space. The kernel operates in a completely different address space. The physical memory at that address might contain completely different data—or might not even be mapped at all.
If the kernel tries to directly dereference a userspace pointer, one of these things happens:
Page fault and kernel panic - The address isn't mapped in kernel space
Reading wrong data - The address points to different memory in kernel space
Security violation - The kernel might accidentally read privileged kernel memory
This is why Linux provides copy_from_user() and copy_to_user() functions—they safely transfer data between address spaces.
Problem 2: Pointer Size Mismatch
The second issue appears specifically in cross-architecture scenarios. In your structure:
struct payload {
uint32_t size;
void *data; // 4 bytes in 32-bit, 8 bytes in 64-bit
};
The pointer size differs:
32-bit userspace:
void *is 4 bytes → structure is 8 bytes total64-bit kernel:
void *is 8 bytes → structure is 16 bytes total (with padding)
When userspace sends this structure to the kernel, the memory layouts don't match:
32-bit Userspace:
Offset: 0 4 8
┌───────┬───────┐
│ size │ data │
│ 4B │ 4B │
└───────┴───────┘
Total: 8 bytes
64-bit Kernel:
Offset: 0 4 8 16
┌───────┬───────┬───────────────┐
│ size │ PAD │ data │
│ 4B │ 4B │ 8B │
└───────┴───────┴───────────────┘
Total: 16 bytes
The kernel reads garbage for the pointer value because it's reading from the wrong offset.
Complete Example: The Wrong Way
Let's see a complete example that demonstrates both problems:
Userspace (wrong_userspace.c):
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <stdint.h>
#define IOCTL_GET_DATA _IOR('M', 0, struct payload)
struct payload {
uint32_t size;
void *data;
};
int main() {
int fd;
struct payload info;
uint32_t sensor_data = 0xDEADBEEF;
// Set up payload with userspace pointer
info.data = &sensor_data;
info.size = sizeof(sensor_data);
printf("Userspace: data address = %p\n", info.data);
printf("Userspace: data value = 0x%x\n", sensor_data);
printf("Userspace: sizeof(payload) = %zu\n", sizeof(struct payload));
fd = open("/dev/my_device", O_RDWR);
if (fd < 0) {
perror("Failed to open device");
return -1;
}
// This will fail or produce garbage in the kernel
if (ioctl(fd, IOCTL_GET_DATA, &info) < 0) {
perror("ioctl failed");
close(fd);
return -1;
}
close(fd);
return 0;
}
Kernel Module (wrong_kernel.c):
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
#include <linux/device.h>
#define IOCTL_GET_DATA _IOR('M', 0, struct payload)
struct payload {
uint32_t size;
void *data;
};
static long device_ioctl(struct file *file, unsigned int cmd, unsigned long arg) {
struct payload info;
uint32_t value;
if (cmd != IOCTL_GET_DATA)
return -EINVAL;
// Copy the payload structure itself
if (copy_from_user(&info, (void *)arg, sizeof(struct payload))) {
pr_err("Failed to copy payload from userspace\n");
return -EFAULT;
}
pr_info("Kernel: sizeof(payload) = %zu\n", sizeof(struct payload));
pr_info("Kernel: data pointer = %p\n", info.data);
// WRONG: Trying to dereference userspace pointer directly
value = *(uint32_t *)(info.data); // CRASH or garbage!
pr_info("Kernel: data value = 0x%x\n", value);
return 0;
}
static const struct file_operations device_fops = {
.unlocked_ioctl = device_ioctl,
};
// ... device registration code ...
What happens:
Userspace: data address = 0xbffff7a4
Userspace: data value = 0xdeadbeef
Userspace: sizeof(payload) = 8
Kernel: sizeof(payload) = 16
Kernel: data pointer = 0xf7a400000000 // Garbage! Wrong offset
Kernel: Oops: 0000 [#1] SMP // Kernel panic
The kernel reads the pointer from the wrong offset (due to size mismatch) and then tries to dereference it (crossing address spaces), causing a crash.
Solution:
Now let's look at three correct ways to handle data transfer between userspace and kernel.
1. Pass Data Directly (Inline Data)
For small amounts of data, embed it directly in the structure instead of using a pointer:
Corrected Structures:
#define MAX_DATA_SIZE 64
struct payload {
uint32_t size;
uint32_t padding; // Explicit padding for 64-bit alignment
uint8_t data[MAX_DATA_SIZE]; // Inline data, not pointer
};
Userspace:
struct payload info;
uint32_t sensor_data = 0xDEADBEEF;
// Copy data directly into the structure
info.size = sizeof(sensor_data);
memcpy(info.data, &sensor_data, sizeof(sensor_data));
ioctl(fd, IOCTL_GET_DATA, &info);
Kernel:
struct payload info;
uint32_t value;
if (copy_from_user(&info, (void *)arg, sizeof(struct payload)))
return -EFAULT;
// Data is already in kernel space, safe to access
memcpy(&value, info.data, sizeof(value));
pr_info("Kernel: data value = 0x%x\n", value);
Pros:
Simple and straightforward
No pointer issues
Single copy operation
No size mismatch problems
Cons:
Limited to fixed maximum size
Wastes memory if actual data is smaller
Not suitable for large or variable-sized data
When to use: For small, fixed-size data like sensor readings, configuration values, or status information.
2. Use copy_from_user() with Userspace Pointer
For larger or variable-sized data, keep the pointer but use it correctly with copy_from_user():
Structures (Fixed-Size Types):
struct payload {
uint32_t size;
uint32_t padding; // Explicit padding
uint64_t data_ptr; // Always 64-bit, stores userspace address
};
Notice we use uint64_t instead of void *. This ensures consistent size across architectures.
Userspace:
struct payload info;
uint32_t sensor_data = 0xDEADBEEF;
info.size = sizeof(sensor_data);
info.data_ptr = (uint64_t)(uintptr_t)&sensor_data; // Store address as 64-bit value
ioctl(fd, IOCTL_GET_DATA, &info);
Kernel:
struct payload info;
uint32_t *kernel_buffer;
uint32_t value;
if (copy_from_user(&info, (void *)arg, sizeof(struct payload)))
return -EFAULT;
// Validate size
if (info.size > MAX_ALLOWED_SIZE)
return -EINVAL;
// Allocate kernel buffer
kernel_buffer = kmalloc(info.size, GFP_KERNEL);
if (!kernel_buffer)
return -ENOMEM;
// Safely copy data from userspace to kernel space
if (copy_from_user(kernel_buffer, (void *)(uintptr_t)info.data_ptr, info.size)) {
kfree(kernel_buffer);
return -EFAULT;
}
// Now we can safely access the data
value = *kernel_buffer;
pr_info("Kernel: data value = 0x%x\n", value);
kfree(kernel_buffer);
Pros:
Handles variable-sized data
Memory efficient
Industry standard approach
Cons:
Requires two copy operations (structure, then data)
More complex error handling
Must allocate kernel memory
When to use: For large buffers, variable-sized data, or when memory efficiency matters.
3. Flexible Array Member (Variable-Length Payload)
For variable-sized data where you want to send everything in a single buffer, use a flexible array member (also called zero-length array):
struct payload {
uint32_t size;
uint32_t padding;
uint8_t data[]; // Flexible array member (C99 and later)
// or: uint8_t data[0]; // Old GCC extension, same effect
};
This is a special C feature where the array at the end of the structure has no fixed size. You allocate the structure with extra space for the actual data.
Userpsace:
uint32_t sensor_data = 0xDEADBEEF;
size_t total_size = sizeof(struct payload) + sizeof(sensor_data);
struct payload *info;
// Allocate structure + data in one block
info = malloc(total_size);
if (!info) {
perror("malloc failed");
return -1;
}
info->size = sizeof(sensor_data);
memcpy(info->data, &sensor_data, sizeof(sensor_data));
// Send entire buffer in one ioctl
if (ioctl(fd, IOCTL_GET_DATA, info) < 0) {
perror("ioctl failed");
free(info);
return -1;
}
free(info);
Kernel:
static long device_ioctl(struct file *file, unsigned int cmd, unsigned long arg) {
struct payload *info;
uint32_t value;
size_t total_size;
// First, get just the header to know the size
struct payload header;
if (copy_from_user(&header, (void __user *)arg, sizeof(struct payload)))
return -EFAULT;
// Validate size
if (header.size == 0 || header.size > MAX_ALLOWED_SIZE)
return -EINVAL;
// Calculate total size
total_size = sizeof(struct payload) + header.size;
// Allocate kernel buffer for entire payload
info = kmalloc(total_size, GFP_KERNEL);
if (!info)
return -ENOMEM;
// Copy the entire structure including data
if (copy_from_user(info, (void __user *)arg, total_size)) {
kfree(info);
return -EFAULT;
}
// Now safely access the data
memcpy(&value, info->data, sizeof(value));
pr_info("Kernel: data value = 0x%x\n", value);
kfree(info);
return 0;
}
How It Works:
The flexible array member data[] doesn't add to the structure's size:
sizeof(struct payload) = 8 bytes // just size + padding
But you can allocate extra memory and access it through the array:
malloc(sizeof(struct payload) + 100); // 8 + 100 = 108 bytes total
info->data[0] through info->data[99] are all valid
Pros:
Single memory allocation in userspace
Single copy operation to kernel
Natural for variable-sized data
Efficient for sending buffers of different sizes
No separate pointer indirection
Cons:
Requires two-step copy in kernel (header first to get size, then full payload)
More complex memory management
Must carefully calculate total size
Old compilers might not support
data[]syntax (usedata[0]instead)
When to use: When you need variable-sized payloads and want to avoid the two-copy overhead of separate pointer approach. Common in network protocols, file operations, and variable-length messages.
Important Note: In modern C (C99 and later), use uint8_t data[] (flexible array member). For older code, you might see uint8_t data[0] (GNU extension), which works the same way but isn't standard C. Both achieve the same result.
Comparison of Solutions
| Approach | Max Data Size | Copies | Memory | Complexity | Use Case |
| Inline Data | Fixed (small) | 1 | Higher | Low | Sensor data, configs |
| copy_from_user() | Variable (large) | 2 | Efficient | Medium | Buffers, variable data |
| Flexible Array (data[0]) | Variable (large) | 2* | Efficient | Medium | Network protocols, messages |
*Requires header copy then full payload copy
Best Practices for Cross-Architecture Interfaces
After understanding the challenges, here are the essential practices for building robust kernel interfaces:
1. Always Use Fixed-Size Types
// WRONG - sizes vary
struct bad_example {
int value; // 32-bit or 64-bit?
long timestamp; // 32-bit or 64-bit?
size_t count; // 32-bit or 64-bit?
void *ptr; // 4 bytes or 8 bytes?
};
// CORRECT - explicit sizes
struct good_example {
uint32_t value; // Always 32-bit
uint64_t timestamp; // Always 64-bit
uint32_t count; // Always 32-bit
uint64_t ptr; // Store address as 64-bit value
};
2. Add Explicit Padding
struct sensor {
uint32_t sensor_id;
uint32_t padding1; // Align timestamp to 8 bytes
uint64_t timestamp;
float value;
uint32_t padding2; // Pad to 8-byte boundary
};
3. Never Pass Pointers Across the Boundary
Userspace pointers are meaningless in kernel space. Always use:
Inline data for small payloads
copy_from_user()/copy_to_user()for data transferuint64_tto store addresses if you must pass themFlexible arrays (
data[]) for variable-length payloads\
4. Validate Everything in the Kernel and use Static Assertions
// Check size limits
if (info.size == 0 || info.size > MAX_ALLOWED_SIZE)
return -EINVAL;
// Validate pointers before copy_from_user
if (!access_ok(user_ptr, size))
return -EFAULT;
_Static_assert(sizeof(struct payload) == 16,
"payload structure size must be 16 bytes");
_Static_assert(offsetof(struct payload, timestamp) == 8,
"timestamp must be at offset 8");
Debugging Cross-Architecture Issues
When things go wrong, these techniques will help you find the problem quickly.
1. Add Size Checks at Runtime
Compare structure layouts between userspace and kernel:
// Userspace
printf("Userspace: sizeof(payload)=%zu, offset(data)=%zu\n",
sizeof(struct payload), offsetof(struct payload, data));
// Kernel
pr_info("Kernel: sizeof(payload)=%zu, offset(data)=%zu\n",
sizeof(struct payload), offsetof(struct payload, data));
2. Cross-Compile and Test Both Architectures
Don't wait until production to discover architecture issues:
# Compile for 32-bit
gcc -m32 -o myapp32 myapp.c
# Compile for 64-bit
gcc -m64 -o myapp64 myapp.c
# Run both and compare output
./myapp32 # Check sizeof() output
./myapp64 # Should match expected values
# For kernel modules, use cross-compilation toolchain
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- M=/path/to/module
make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- M=/path/to/module
3. Dump Memory Layout
When data looks corrupted, dump the raw bytes:
void dump_payload(struct payload *p) {
unsigned char *bytes = (unsigned char *)p;
pr_info("Payload bytes: ");
for (int i = 0; i < sizeof(struct payload); i++) {
printk(KERN_CONT "%02x ", bytes[i]);
}
printk(KERN_CONT "\n");
}
Compare the hex dump from userspace and kernel to see where the mismatch occurs.
Conclusion
Cross-architecture communication between 32-bit userspace and 64-bit kernel comes down to two fundamental rules: structures must have identical memory layouts, and pointers cannot cross the boundary. Use fixed-size types, add explicit padding, and always transfer data through copy_from_user() or inline it directly. When things break, tools like pahole, sparse, and cross-compilation testing will quickly reveal where your assumptions diverged from reality.
The key insight: you're designing an ABI, not just an API. Every byte matters. Get it wrong, and you're debugging crashes at 3 AM. Get it right once with these practices, and your interface works reliably for years across any architecture. Think about portability from day one—your future self will thank you.
Further Reading:
Linux Device Drivers, 3rd Edition - Chapter 6
Understanding the Linux Kernel - Memory Management
Did you find this helpful? Have you encountered other cross-architecture pitfalls? Share your experiences in the comments below.