Today we will learn about Exploiting Format Strings: The Stack in this article.
Introduction to Exploiting Format Strings: The Stack
This article describes how to exploit vulnerabilities in the String format to load and write arbitrary values to and from the stack.
In this article, we will learn what Format String vulnerabilities are, how we exploit them to read specific values from the stack, and we will also look at how we can use various format specifiers to write arbitrary values to the stack.
NOTE: The memory addresses shown in the commands may be completely different in your setup, so you must adjust the calculation accordingly.
What is a format string?
A format string is a simple representation of an ASCII string in a controlled manner using format specifiers. Next, this full ASCII string is fed to formatting functions such as printf, vprintf, scanf to convert the C data types to a string representation.
Example: Here we used %s to specify that the next argument to be picked from the stack should be cast to a string for the final representation.
char *s = “Format String”;
But what if we didn’t specify the format specifiers in the format function, well, the format function won’t change its behavior and start loading arbitrary values from the stack. In order to use this feature, we need to know three basic things:
- Placement of arguments on the stack.
- Basic understanding of x86 assembly.
- Knowledge of GNU debugger.
- Sample program
We will be using the following piece of code throughout the tutorial. We also assume that memory protection mechanisms such as ASLR are disabled and the stack is executable.
Switch to root user and execute the following commands:
Disable ASLR :echo 0 > /proc/sys/kernel/randomize_va_space
Compile program with stack executableoption :gcc –z execstack –o fmtfmt.c
In this phase, we will dive deep into the basics of exploiting the problem we have discussed so far. Before we start fuzzing the binary with different symbols, we need to make sure we understand what these symbols mean.
The following screenshot is taken directly from the Linux Programmer’s Guide to the printf function.
As can be seen
%x is used to convert unsigned int to unsigned hex
%u is used to convert an unsigned int to an unsigned decimal number
n is used to store the number of characters written so far.
Since we know that our code is not using any particular format specifier, so when we pass any format specifier as an argument to our program, it will start loading the value from the beginning and display it in the specified format. Since we are using %x in this case, it fetches some arbitrary data from the stack and presents it to us in hexadecimal format.
Before we go any further, let’s analyze the result in the GNU debugger, first we set a breakpoint on main using the command “break main”, then we run the program by passing the arguments and the format specifier “run AAAA-%x”.
We set another breakpoint “break *0x80484d0” before the printf function to examine the state of the stack before it throws the segmentation fault error and nullifies the stack registers.
In the above screenshot we can see that our breakpoint is hit, let’s examine the first ten words from the top of the stack, we can see that our input is placed on the 8th argument from the top of the stack.
To make sure our analysis is correct we will directly fetch 8th argument from the top of the stack using the dollar sign. As we are executing these commands in a bash shell, we need to escape dollar sign with a backslash. As can be seeing we can determine the exact location where our argument is getting placed.
Now we’ll use the “n” format specifier to write some data to a precise location in memory, first let’s understand what happens when we pass n instead of x. As it may be, the program crashes with a segmentation fault. Let’s look at the EIP registry; in EAX there is an instruction to move data from EDX, while EDX stores some bytes written so far, which is five ie “AAAA and one hyphen (-)” and EAX stores the value of the passed argument.
Next, in the same screenshot, we used the “u” format specifier to write the decimal value to the same location in memory. Since %u has ten decimal values, the total results are 0x10, i.e. we have written 16 characters so far. So “5 A, one dash – and ten characters from u”. Using the same technique, we write the address of our shellcode into the offset of the function.