This article is about Saving Time and Effort with IDAPython.
For those familiar with CTF competitions, time is an important constraint that must be managed effectively to score as many points as possible. Most task write-ups you’ll come across use some kind of script to automate long tasks or ones that are difficult to do manually. This saves not only time, but also effort, because the participant only needs to find out what is repeated once and automate the rest. This analogy is clarified by the challenge we will examine.
The task we are dealing with is from Nuit Du Hack Quals 2017 . It’s a reverse engineering challenge called “Matrioshka: Step 4 (I did it again)” . We’ll see that this task not only performs repetitive work that would take a reverser hours to complete, but also makes it difficult to access the code that checks whether the given input is correct or not.
Saving Time and Effort with IDAPython:
The goal of this article is to write a Python solver script that automates everything from start to finish and prints a flag at the end. 
First, we’ll start by seeing what kind of binary star we’re dealing with.
Now we need to set up a remote GDB debugger and then connect to it from IDA64.
The reason behind why we are providing to the executable 334 characters as an argument is because that is what it expects.
After checking the length, crackme allocates a memory-aligned portion of memory using memalign. It then uses mprotect to modify the site’s permissions to RWX.
The data copied into this block starts from sub_40089D and goes to but does not include sub_422C0C. A look at some of what lies between these two addresses suggests that there is probably some ongoing code decryption going on.
Next, the copy of “sub_40089D” from the allocated pages is called with the following arguments.
The return value must be 0x1 if the supplied argument represents the correct input.
So far, so good. Now let’s look at what sub_40089D does.
We can see that it iterates over each character of the supplied input and does the following:
- Read the character twice
- Store it in a local variable
- Read the characters again and add 1 to them. For example, A becomes B and so on
- Replace the original character in memory with the incremented value
- Restore it back from a local variable
This can be interpreted as junk code because all it does is modify the character and then restore it back to the state it was in before. However, this is not a spam code. This is a “guard” against detection by the flag checking algorithm using hardware breakpoints. This “protection” would be weak if it were only used once, but it is not. It is used throughout the execution of the program, dozens of times. It would take a LOT of time to achieve the symptom check algorithm manually.
Now to decipher the code. Without going into too much detail, each function does the following:
- Do the simple breakpoint anti-hardware trick we saw earlier.
- Encrypt the last three instructions of the same function (before LEAVE; RETN) using a simple XOR algorithm.
- Decrypt the next function to be called using a simple XOR algorithm.
- Call the decrypted function
On return—we don’t return until the input is checked—unscramble the last three instructions and continue execution, returning to the caller.
The same is done from 0x40089D to 0x40C305 (equivalent addresses in the allocated block). So we have about 47KB of code that does nothing but decrypt and expose (read/write) user input.
Now you might be wondering how I knew it was done from 0x40089D to 0x40C305. I certainly didn’t do it by pressing F9 and constantly hitting HW breakpoints hundreds of times. This is where IDAPython comes into play.
But before we start writing, we need to make a few assumptions:
- When debugging the binary code, I only went through 4 functions that perform all these actions. So my first assumption was that all of the following functions do the same thing except the ones that do checks on input.
- The second assumption, and the most important, concerns “protection” against HW breakpoints. I assumed that the functions that check if the input is correct are coming after all
- routines we have seen. In other words, they are called last. If these checks were somewhere in the middle, meaning we hit more unnecessary hardware breakpoints after the character check, the approach I’m taking wouldn’t work and I’d have to use more criteria.
- Hopefully all my assumptions were correct and the following script helped me find exactly where the first character of the flag is validated.
We start by creating an empty list that will contain all addresses (RIPs) where the debugger will suspend the process due to a hardware read/write breakpoint. Next, in the Prepare() function, you can see that I modify the address parameter supplied for mprotect and the RAX register in the CALL instruction (Figure 4). I do this because IDA crashes at a later stage (second script) when trying to deal with hardware breakpoints. I have no idea why that is.
When returning from the function, the debugger is still suspended on CALL RAX and we can extract the pointer to our string by dereferencing the pointer in RDX.
Subsequently, we set the hardware Read/Write breakpoint to the first character of our string. Additionally, right after that we enter a loop that keeps adding (RIP) addresses to the list. The loop continues until the process terminates (event != BREAKPOINT). Finally, we clear the hardware read/write breakpoint and print the last address where the hardware breakpoint was hit. In other words, the code that checks the validity of the given input.
This is the output we get from IDA:
So now we need to figure out what the function containing the instruction at 0x40c336 does. Additionally, we will need to set a hardware execution breakpoint at this address (software breakpoints are not considered).
Once we hit that breakpoint, all we have to do is get IDA to interpret the surrounding bytes as instructions and then see what the function does.
Below is the python equivalent of an algorithm that checks if two characters match a flag:
As you can see, the algorithm checks that every two characters are valid using two hard-coded double words, which I named dword1 and dword2. Each function checks ten characters, except the last one, which only does it for 4. In other words, each function uses the last code five times, except the last one, which uses it for 2. This means that if you want to do this manually, you will have to repeat the same task copy and paste various double words into your brute force script 167 times; This is very bad in the middle of a CTF competition.
What we need to do now is modify the python function so that it can brute force the result variable using i and j; Which are the first and second characters.
Then after that, we need to execute a second python script that will write the flag to a file:
What the script does is pretty obvious. The only part that needs to be clarified is the two offsets added to RIP and RBP:
- The first DWORD is always stored in a local variable [RBP-3Ch]
- The second DWORD is hardcoded into the instruction operand. However, it is always located at address RIP+0x54. RIP is where the debugger pauses the process after triggering a HW breakpoint.
As a result of running this script, we get a file containing the flag:
We can easily concatenate both scripts into one and run it only once under IDA .
Links & References:
: Binary file download: https://goo.gl/MhVl0g
: Full script: https://goo.gl/i4wSWl
IDAPython docs: https://www.hex-rays.com/products/ida/support/idapython_docs/