The answer to my issue is in arch/arm64/kernel/process.c, copy_thread():
if (stack_start) {
if (is_compat_thread(task_thread_info(p)))
childregs->compat_sp = stack_start;
/* 16-byte aligned stack mandatory on AArch64 */
else if (stack_start & 15)
return -EINVAL;
else
childregs->sp = stack_start;
}
Ahah! The stack being passed into clone() has to be 16 byte aligned. With this simple fix to my code, clone() worked. Pity this was not in the documentation.