Thursday, April 15, 2021

HelloWorld Assembler Code for x86_64, arm64 and for linux or macOS

(1) Following the previous post, this post demo the assembler code for command line program HelloWorld for x86_64, arm64 and for linux or macOS.
HelloWorld.S   Select all
// // Assembler program to print "Hello World!" // to stdout. For amr64, x86_64, linux and macOS // #define STDIN 0 // standard input device #define STDOUT 1 // standard output device #ifdef __APPLE__ #define SYS_read 0x2000003 // system call to read input macOS #define SYS_write 0x2000004 // system call to write message macOS #define SYS_exit 0x2000001 // system call to terminate program macOS #define SVC_write 4 // SVC write arm64 macOS #define SVC_exit 1 // SVC exit arm64 macOS #endif #ifdef __linux__ #define SYS_read 0 // system call to read input #define SYS_write 1 // system call to write message #define SYS_exit 60 // system call to terminate program #define SVC_write 64 // SVC write arm64 linux #define SVC_exit 93 // SVC exit arm64 linux #endif #define EXIT_OK 0 // OK exit status .globl _start // Provide program starting address to linker #ifdef __APPLE__ .align 4 #endif .text _start: #if defined __arm64__ || defined __ARM_ARCH_ISA_A64 mov X0, #STDOUT // 1 = StdOut #ifdef __linux__ ldr X1, =helloworld // string to print mov X8, #SVC_write // linux write system call #endif #ifdef __APPLE__ // adr X1, helloworld // string to print //(adr calculates an address from the PC plus an offset, but for local) adrp X1, helloworld@PAGE // adrp can be used to access relative address of 4GB range add X1, X1, helloworld@PAGEOFF // string to print mov X16, #SVC_write // linux write system call #endif ldr X2, =len // length of our string svc #0 // Call linux to output the string // Setup the parameters to exit the program // and then call Linux to do it. mov X0, #0 // Use 0 return code #ifdef __linux__ mov X8, #SVC_exit // Service command code 93 terminates this program #endif #ifdef __APPLE__ mov X16, #1 // Service command terminates this program #endif svc #0 // Call linux to terminate the program #endif #if defined __x86_64__ movq $STDOUT, %rdi #ifdef __linux__ movq $helloworld, %rsi // char * #endif #ifdef __APPLE__ leaq helloworld(%rip), %rsi #endif movq $len, %rdx // length of our string movq $SYS_write, %rax // write system call syscall movq $EXIT_OK, %rdi // Use 0 return code movq $SYS_exit, %rax // exit system call syscall #endif .data helloworld: .ascii "Hello World!\n" len = . - helloworld // len = start - end


(2) To compile and debug for different systems
shell scripts   Select all
# To download the above code using command line. curl -L https://tinyurl.com/helloworld-gas | grep -A200 START_OF_HELLOWORLD.S | sed '1d' | sed -n "/END_OF_HELLOWORLD.S/q;p" | sed 's/&gt;/\>/g;s/&lt;/\</g' > HelloWorld.S # To compile with debug symbols under linux, e.g. Win10 WSL2 or Linux or Android Termux App clang -g -c HelloWorld.S -o HelloWorld.o ; ld HelloWorld.o -o HelloWorld # To compile under macOS (e.g. with M1 cpu) clang -g HelloWorld.S -o HelloWorld_x86_64 -e _start -arch x86_64 clang -g HelloWorld.S -o HelloWorld_arm64 -e _start -arch arm64


(3) To debug using lldb
shell scripts   Select all
# To start program debug lldb HelloWorld_x86_64 # or lldb HelloWorld_arm64 # lldb debug session for arm64 - useful commands (lldb) breakpoint set --name _start (lldb) breakpoint list (lldb) run (lldb) step (lldb) reg read x0 x1 x2 x8 lr pc (lldb) reg read -f t cpsr # lldb debug session for x86_64 - useful commands (lldb) reg read -f d rax rdi rsi rdx rflags (lldb) reg read -f t rflags # print the address value in the stackpointer for x86_64 (lldb) p *(int **)$sp # hint: to search lldb command history use ctrl-r


(4) Summary of differences
4.1) In order to preprocess the assembler file using clang compiler, the filename extension should be capital letter S in linux. Subroutine name between C and global asm labels should prefix by underscore for macOS.
4.2) A64 (arm64) parameter/ results registers are X0-7. If the function has a return value, it will be stored in X0.
4.3) x86_64 parameter registers for integer or pointer are %rdi. %rsi, %rdx, %rcx, %r8, %r9. If the function has a return value, it will be stored in %rax.
4.4) Linux and macOS has different syscall number (x86_64) or Service call number (for arm64). They are defined in this source code.
4.5) Absolute addressing is not allowed for arm64. For macOS, adr instruction can be used for accessing readonly local data. But for non-local data section (which is a buffer in RAM), adrp instruction and @PAGE and @PAGEOFF operators should be used as demo in the code.


No comments: