Hooking System Calls Through MSRs 2023
In this article we will learn about Hooking System Calls Through MSRs.
What is Hooking System Calls Through MSRs?
Download the code associated with this article by filling out the form below.
In this article, we have presented the details of using the sysenter instruction to call from user mode to kernel mode. On older versions of Windows operating systems, the “int 0x2e” interrupt was used instead, but on newer systems, sysenter is used. When the “int 0x2e” interrupt is used, it uses the 0x2e interrupt descriptor from the interrupt descriptor table (IDT), while the system call number is passed in the eax register. On the other hand, the sysenter instruction can be used to switch from user to kernel mode more quickly than using the “int 0x2e” instruction. The instruction uses the model-specific registers (MSRs) specified below for its operation. MSR registers are control registers in an x86 machine used for debugging, monitoring program execution, monitoring computer performance, and switching certain CPU functions [1].
We can read and write from/to the MSR registers using the rdmsr and wrmsr instructions, which must be executed as part of the kernel-mode privileged instructions. The following MSR registers [2] are used when calling the sysenter instruction:
Target code segment: reads it from IA32_SYSENTER_CS.
Target instruction: reads it from IA32_SYSENTER_EIP.
Stack segment: calculated by adding 8 to the value in IA32_SYSENTER_CS.
Stack pointer: reads it from IA32_SYSENTER_ESP.
All MSR registers for the Intel IA-32 architecture can be found in [3], but the relevant registers are shown in the figure below. We can see which bits are used for certain purposes, allowing us to store appropriate values in them while overwriting their values.

The columns in the image above are shown as follows:
1: Hexadecimal representation of the register address.
2: Decimal representation of register address.
3: MSR architectural name and bit fields.
4: Description of MSR/bit.
5: It presents itself as an architectural MSR.
When using the sysenter instruction, the CS register is filled with IA32_SYSENTER_CS, while IA32_SYSENTER_ESP is loaded into the ESP register and IA32_SYSENTER_EIP is loaded into the EIP register. In addition, the SS register is overwritten with the value IA32_SYSENTER_CS+8. Then, execution is transferred to the SS:EIP instruction, which executes the system call. Since we only need to access the special registers and write to the ESP, EIP, SS, and CS registers, the operation from switching from user to kernel mode is very fast – especially compared to “int 0x2e” interrupts.
Environment settings
Now let’s actually connect the IA32_SYSENTER_EIP whose value is stored in the 0x176 MSR register. To do this, first start the Windows operating system in debug mode. We can do this by executing the following instructions in Windows cmd.exe under Administrator Privileges. The commands below will set Windows to start in debug mode where we will be able to debug Windows over the serial port.
[simple]
bcdedit /set debug on
bcdedit /set debugtype serial
bcdedit /set debugport 1
bcdedit /set baudrate 115200
bcdedit /set {bootmgr} displaybootmenu yes
bcdedit /timeout 10
[/simple]
In order to debug the Windows OS, we first need to start another Windows VM with WinDbg installed and go to File – Kernel Debugging and accept the default settings as shown below. If we didn’t use the exact same commands as above, we need to change the settings in the Kernel Debug dialog accordingly.

After pressing OK, WinDbg will listen for incoming connections on the serial port. Since we’ve set up the Windows operating system on the second VM to connect to the same serial port, we’ll be able to debug Windows from the WinDbg debugger running. Additionally, we will be able to monitor the execution of the entire operating system, not just user mode code. When debugging with Ida Pro, OllyDbg, or ImmunityDebugger, we do not see execution of kernel-mode instructions located at virtual addresses 0x80000000-0xFFFFFFFF; we skip right over them because we’re running the debugger in user mode. In this case, we specifically instructed the Windows operating system to connect to the serial port where the WinDbg debugger waits for incoming connections. Therefore, we are able to easily debug instructions in both user mode and kernel mode. This also allows us to run and debug privileged instructions like rdmsr and wrmsr that would otherwise not be debuggable.
At this point we have effectively started the Windows operating system in debug mode and can start/stop it at will using the WinDbg debugger. First, we pause Windows execution by clicking Debug – Break in WinDbg as shown below. This effectively stops the debugged Windows OS and gives us a chance to run WinDbg commands.

Once we break into the system, we will be able to input WinDbg commands at the “kd>” shell as seen on the picture below.

Getting and Setting the MSR Register Values
When attaching a sysenter instruction that uses 0x176 MSR, we must first save the old MSR value 0x176 IA32_SYSENTER_EIP. We can read the contents of the model-specific register using the rdmsr instruction, which loads the 64-bit model-specific register listed in the ECX register into the EDX:EAX registers. The high-order 32 bits of MSR are loaded into EDX, while the low-order 32 bits of MSR are loaded into EAX. We must execute the rdmsr instruction in privileged mode and store the existing MSR address in the ECX register, otherwise a general protection exception [4] will be thrown. This can be done using the GetMSRAddress function below. The GetMSRAddress function takes as input the number of the MSR register whose value we would like to extract and returns its value.
[simple]
MSR GetMSRAddress (UINT32 reg) {
MSR msraddr;
UINT32 low value;
UINT32 high value;
/* get address of IDT table */
__asm {
push eax;
push ecx;
push edx;
mov ecx, reg;
rdmsr;
mov lowvalue, eax;
mov high value, edx;
pop edx;
pop ecx;
pop eax;
}
msraddr.value_low = low value;
msraddr.value_high = high value;
DbgPrint(“Address of MSR entry %x is: %x.rn”, reg, msraddr);
/* store the old MSR address in a global variable so we can use it later */
oldMSRAddressL = msraddr.value_low;
oldMSRAddressH = msraddr.value_high;
return msraddr;
}
[/simple]
In the GetMSRAddress function, we first allocate space for two local variables lowvalue and highvalue, which are used to store the lower and upper 32 bits of the value of the MSR register. The assembly code block stores the eax, ecx, and edx registers on the stack, so we can safely overwrite their values without unwanted side effects. After the push instructions, we store the number of the MSR register whose value we want to extract into the ecx register. We then call the rdmsr command, which extracts the MSR ecx register value and stores it in the edx:eax registers. We then copy the values from the edx and eax registers into local variables so that we can use them in the C++ code after the assembly code block is complete. We then assign the 64-bit value from the MSR register to the MSR object and also store it in the global variables oldMSRAddressL and oldMSRAddressH so we can reuse it later. The function returns an MSR object with the correctly assigned values read from the MSR register.
After getting the value from the MSR register, we need to overwrite the value with the address of our function, which can be done using the wrmsr instruction. The wrmsr instruction writes the contents of the EDX:EAX registers to the 64-bit MSR register specified in the ECX register. The high order 32 bits are copied from EDX and the low order 32 bits are copied from EAX. We must execute the wrmsr instruction in privileged mode and store the existing MSR address in the ECX register, otherwise a general protection exception [5] will be thrown. This can be done using the function below:
[simple]
void SetMSRAddress(UINT32 reg, PMSR msr) {
UINT32 low value;
UINT32 high value;
lowvalue = msr->value_low;
highvalue = msr->value_high;
/* get address of IDT table */
__asm {
push eax;
push ecx;
push edx;
mov ecx, reg;
mov eax, low value;
mov edx, high value;
wrmsr;
pop edx;
pop ecx;
pop eax;
}
DbgPrint(“Address of MSR entry %x is attached: %x.rn”, reg, msr->value_low);
}
[/simple]
To display the IA32_SYSENTER_CS, IA32_SYSENTER_EIP and IA32_SYSENTER_ESP values in the WinDbg debugger, we can use the rdmsr command to display them. We can see their values in the image below, where it is clear that IA32_SYSENTER_EIP is located at address 0x82682300.

Let’s look at the first few instructions of this function, which is the KiFastCallEntry routine. In the image below we see the instructions that will be executed first when the system function is called.
The function basically loads the number 0x30 into the FS register and the value 0x23 into the DS and ES registers. These are the segment registers that will be used by the KiFastCallEntry routine. Segments can be printed using the command “dg 0 f0” as seen below, where selector 0x0008 specifies a kernel mode code segment with base address 0x00000000 and length 0xffffffff.

We need to look at the segment selector 0x0023, which should be XORed with its Priority, which in this case is 0x3; so we need to look at segment selector 0x0020, which specifies a user-mode data segment with a base address of 0x00000000 and a length of 0xffffffff. These instructions are there to ensure that the correct values are read and written when executing an interrupt.
Hooking up the MSR
Now let’s imagine the actual code we used to implement hookmsr. First, we need to introduce the new MSR data type that we introduced to the program. The MSR structure is shown below and has two UINT32 members, value_low and value_high. Both members are used to represent the contents of the MSR register, which is 64 bits in size; value_low represents the lower 32 bits and value_high represents the higher 32 bits.
[simple]
pragma pack(1)
typedef struct _MSR {
UINT32 value_low;
UINT32 value_high;
} MSR, *PMSR;
pragma pack()
[/simple]
We have defined MSR, which is a _MSR structure, and PMSR, which is a pointer to a _MSR structure. There is also a #pragma directive that accepts a parameter – in this case the number 1 – which defines that the members of the structure are aligned on a 1-byte boundary. Therefore, when declaring a new instance of a structure, no extra bytes are used for padding to align members on a 4-byte boundary, as is the default on a 32-bit architecture.
Next, we define two global variables, oldMSRAddressL and oldMSRAddressH, which are used to hold the old address of the overwritten MSR register. When overwriting an address in the MSR register, we need to save the old address so we can jump to it in the hooked routine. This is necessary because we need to call the actual routine that would be called if we hadn’t overridden it. If we don’t save the old address and jump to it at the end of the hook routine, then it would be as if the system calls were never called, which would definitely have unwanted side effects like the system becoming non-interactive.
[simple]
/* Global variable to store old MSR address. */
UINT32 oldMSRAddressL = NULL;
UINT32 oldMSRAddressH = NULL;
[/simple]
The function that actually overwrites the pointer stored in the MSR register is HookMSR, seen below. The function accepts two parameters: the reg parameter specifying the MSR register we want to hook, and the hookaddr specifying the address of the hook routine that will be called when the sysenter interrupt is raised.
[simple]
void HookMSR(UINT32 reg, UINT32 hookaddr) {
MSR msraddr;
/* check if the ISR has already been attached */
msraddr = GetMSRAddress(reg);
if(msraddr.value_low == hookaddr) {
DbgPrint(“MSR register %x already attached.rn”, reg);
}
otherwise {
DbgPrint(“MSR hook register %x: %x –> %x.rn”, reg, msraddr.value_low, hookaddr);
msraddr.value_low = hookaddr;
SetMSRAddress(reg, &msraddr);
}
}
[/simple]
The MSR msraddr object retrieves the value of the specified reg MSR by calling the GetMSRAddress function. If the returned address stored in the selected MSR register already contains the same hookaddr address, then the MSR has already been connected. Otherwise, the connection is made by calling the SetMSRAddress function.
HookMSR is called inside the DriverEntry routine, which is called when the driver is loaded into the kernel. You can see the source code for the entire DriverEntry routine below, but we won’t describe it in detail. If you want details, you need to check out this article I wrote earlier. An important addition is the last line where the HookMSR function is actually called. Note that we pass the number 0x176 as the first parameter, which identifies the IA32_SYSENTER_EIP MSR register. The second parameter specifies a pointer to the hang function that will be called when sysenter is invoked.
[simple]
NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegistryPath) {
NTSTATUS NtStatus = STATUS_SUCCESS;
unsigned int uiIndex = 0;
PDEVICE_OBJECT pDeviceObject = NULL;
UNICODE_STRING usDriverName, usDosDeviceName;
DbgPrint(“DriverEntry Called rn”);
RtlInitUnicodeString(&usDriverName, L”DeviceMyDriver”);
RtlInitUnicodeString(&usDosDeviceName, L”DosDevicesMyDriver”);
NtStatus = IoCreateDevice(pDriverObject, 0, &usDriverName, FILE_DEVICE_UNKNOWN, FILE_DEVICE_SECURE_OPEN, FALSE, &pDeviceObject);
if(NtStatus == STATUS_SUCCESS) {
/* MajorFunction: is a list of function pointers for entry points to the controller. */
for(uiIndex = 0; uiIndex < IRP_MJ_MAXIMUM_FUNCTION; uiIndex++)
pDriverObject->MajorFunction[uiIndex] = MyDriver_UnSupportedFunction;
/* DriverUnload is required to dynamically unload the driver. */
pDriverObject->DriverUnload = MyDriver_Unload;
pDeviceObject->Flags |= 0;
pDeviceObject->Flags &= (~DO_DEVICE_INITIALIZING);
/* Create a symbolic link to the device. MyDriver -> DeviceMyDriver */
IoCreateSymbolicLink(&usDosDeviceName, &usDriverName);
/* IDT hook */
HookMSR(0x176, (UINT32)HookRoutine);
}
return NtStatus;
}
[/simple]
We should also introduce a MyDriver_Unload function that is called when the driver is unloaded from the kernel. In the function, we need to call the IoDeleteSymbolicLink and IoDeleteDevice functions, but in our case, the most important function call is again the HookMSR function call. You might be wondering why this function is called after the kernel driver is unloaded? The answer is simple: we have to clean up after ourselves. The best way to do this is to call an existing function and pass it the value of the old MSR address that points to the original function. It is imperative that we reset the MSR pointer to the old value after releasing the driver. If we didn’t, the system would crash because it would try to call a HookRoutine function that is no longer loaded, so the pointer stored in 0x176 of the MSR points to undefined memory. By resetting the pointer to the old value, we allow the system to continue making sysenter system calls without crashing the entire system.
[simple]
VOID MyDriver_Unload(PDRIVER_OBJECT DriverObject) {
/* local variables */
UNICODE_STRING usDosDeviceName;
/* reset the hook */
if(oldMSRAddressL != NULL || oldMSRAddressH != NULL) {
HookMSR(0x176, (UINT32)oldMSAddressL);
}
/* delete driver */
DbgPrint(“MyDriver_Unload Called rn”);
RtlInitUnicodeString(&usDosDeviceName, L”DosDevicesMyDriver”);
IoDeleteSymbolicLink(&usDosDeviceName);
IoDeleteDevice(DriverObject->DeviceObject);
}
[/simple]
Finally, we also need to introduce our HookRoutine function that will catch the 0x176 MSR. The code of the function can be seen below and we can immediately notice that the function is naked. This means that the compiler will not add any instructions to create/unframe the function stack, like: mov “push ebp”, “mov ebp, esp” etc. In the function itself, we have one __asm {} block that contains the assembler instructions. These instructions first store the values of the following registers on the stack: eax, ecx, edx, ebx, esp, ebp, esi, edi (pushad instruction) and also the elags register (pushfd instruction). We then execute five assembly lines from the beginning of the KiFastCallEntry routine as we have already identified; used to set the correct segment registers. Then we put the parameter of the DebugPrint routine on the stack (the push eax instruction) and call the DebugPrint routine, which prints a message in WinDbg using the DbgPrint function. Finally, we restore the elags register (popfd instruction) and also the following registers: edi, esi, ebp, esp, ebx, edx, ecx, eax (popad instruction).
[simple]
__declspec(naked) HookRoutine() {
__asm {
push
pushfd;
mov ecx, 0x23
push 0x30
pop fs
mov ds, cx
strokes, cx
push eax;
call DebugPrint;
popfd;
grab;
jmp oldMSRAddressL;
}
}
[/simple]
We mentioned that the HookRoutine function calls the DebugPrint function, the code of which is shown below. We can see that the function actually just prints a message that it is inside a hang routine.
[simple]
void DebugPrint(UINT32 d) {
DbgPrint(“[*] Inside Hook routine – dispatch %d called”, d);
return;
}
[/simple]
Loading the driver
In the previous section we introduced the code used to connect the 0x176 MSR register, but here we will actually see the driver in action. To do this, we must first compile the driver and transfer the mydriver.sys file to the Windows operating system. We also need to download two tools that are invaluable when testing Windows kernel drivers: DebugView and OSR Driver Loader. The OSR Driver Loader can be used to load a driver into the kernel, at which point its DriverEntry function will run. The OSR driver loader can be seen in the image below where we have selected the correct driver (mydriver.sys) to load into the “Driver Path”. Once we have selected the driver, we need to click on the ‘Register Service’ button and then on the ‘Start Service’ button. When the service starts, the file mydriver.sys is loaded into the kernel and the DriverEntry function is called. Therefore, the 0x176 MSR is also captured and whenever the sysenter instruction is executed, our HookRoutine is called.

To see this in action, we also need to run DebugView and view the messages printed by DbgPrint. Note that both OSR Driver Loader and DebugView must be run with administrator privileges.
After loading the driver into the kernel, the following text is printed to WinDbg, clearly showing that the DriverEntry function has been called. The address of the 0x176 MSR entry is 0x8267e300 and has been replaced by 0x9caa3100, which is the address of our HookRoutine function.

When 0x176 MSR is attached, our HookRoutine will be called every time a sysenter instruction is executed. Because the HookRoutine runs the DebugPrint function, a new line is printed in the DebugView each time sysenter is run. Because the sysenter instruction is the primary source of calls from user mode to kernel mode in modern Windows operating systems, the driver loading follows. Notice the scrollbar on the right: 1023 entries are generated by our HookRoutine function in a few seconds. Our HookRoutine function just caused the system to become completely unresponsive; at this point we can’t really do anything in a debugged Windows system because the system is busy doing the DbgPrint functions in the DebugPrint function.

To prevent this, we should somehow filter the messages so that DbgPrint only runs on the specific system calls we are interested in. To display every 1000th occurrence of the system call, we can limit the DbgPrint function using the code below. We have defined another global variable numActions which is decremented by 1 every time sysenter is called. When it reaches 0, which happens every 1000th occurrence, a debug message is printed and numActions is reset to 1000.
[simple]
UINT16 numActions = 1000;
void DebugPrint(UINT32 d) {
if(numActions == 0) {
DbgPrint(“[*] Inside Hook routine – sending %d calls.rn”, d);
numActions = 1000;
}
otherwise {
numActions–;
}
return;
}
[/simple]
Once we reload the driver, once in a while, about every second, it will run the DbgPrint function. The result can be seen in the image below. At this point, the Windows operating system is perfectly usable and we can communicate with it normally.

Let’s also see how the MSR registry was overwritten by executing the rdmsr command.
[simple]
kd> rdmsr 176
msr[176] = 00000000`988cf110
[/simple]
The 0x176 MSR contains the address 0x988cf110, which points to our HookRoutine function. Below we see the instructions containing this function that were displayed by the u command. Note that the assembly instructions are the same as we coded them in the __asm{} block; since this is a bare function, the compiler did not add instructions for handling stack frames.

If we stop the service in OSR Driver Loader, the original value of 0x176 MSR register will be restored as seen below. We can start/stop the driver any number of times because the driver is written in a way that gracefully handles hanging and undocking the MSR.
[simple]
kd> rdmsr 176
msr[176] = 00000000`8264b300
[/simple]
Handling multiple processors
At this point we also need to talk about a multi-processor system: it is a well-known fact that when multiple processors are present in a system, each processor has its own set of MSR registers that store pointers to the same routines. So when an interrupt is triggered by sysenter on CPU 1 or CPU 2, the same action takes place. So when we attach an MSR routine, we need to do so on all MSR registers, so that when an interrupt/exception is triggered, the same routine is called, regardless of which processor it is running on. We can do this using one of the methods below:
Infinite Loop: we can run threads in an infinite loop that will sooner or later run the hook thread on all processors; this is because the scheduler assigns certain threads to run on the processor that is currently free.
KeSet AffinityThread: this function allows us to set the affinity mask of the executing thread, which in turn allows us to define a specific thread to be called on a specific processor. In this article, we will use this method, which is much better than the previous one, but a little more complicated.
Hooking all MSR processors is quite easy as all we have to do is create a thread that calls the HookMSR routine and wait for that thread to start on a particular processor. To do this, we will need several functions, which we will describe below.
The first function is InitializeObjectAttributes, which initializes the OBJECT_ATTRIBUTES structure passed to the function as its first parameter. You can see the syntax of the InitializeObjectAttributes function below.

The function accepts the following parameters [7]:
InitializedAttributes: specifies the OBJECT_ATTRIBUTES structure to be initialized.
ObjectName: A pointer to a unicode string that contains the name of the object for which to open a handle.
Attributes: specifies the flags to use, which can be the following:
OBJ_INHERIT: handle can be inherited by child processes.
OBJ_PERMANENT: objects will not be removed when all open handles are closed.
OBJ_EXCLUSIVE: only one handle can be opened for this object.
OBJ_CASE_INSENSITIVE: Case-insensitive name matching is used when comparing ObjectName with other object names.
OBJ_OPENIF: the routine opens the object when the object exists.
OBJ_KERNEL_HANDLE: handle can only be accessed from kernel mode.
OBJ_FORCE_ACCESS_CHECK: access checks are checked when the handle is opened.
RootDirectory: The handle to the object’s root directory for the pathname specified in ObjectName.
SecurityDescriptor: specifies the security descriptor to be used with the object. If NULL, the default security handle is used.
InitializeObjectAttributes initialized the OBJECT_ATTRIBUTES structure that is used to hold the object’s properties. The following call is used in our kernel driver:
[simple]
HANDLE thread;
OBJECT_ATTRIBUTES attrs;
PKTHREAD pkthread;
LARGE_INTEGER timeout;
InitializeObjectAttributes(&attrs, NULL, 0, NULL, NULL);
[/simple]
Basically, we pass the OBJECT_ATTRIBUTES attrs parameter to the InitializeObjectAttributes routine by specifying NULL ObjectName, NULL RootDirectory, and NULL SecurityDescriptor. Because 0 is passed as the Attributes parameter, none of the specified attributes are set for this object.
Another function is PsCreateSystemThread, which creates a system thread that runs in the kernel and returns a handle to the thread. The function syntax can be seen below [8].

PsCreateSystemThread takes as input the following parameters [8]:
ThreadHandle: a pointer to the variable where the handle will be stored. Once we no longer need the handle to the thread, we need to close it using the ZwClose function.
DesiredAccess: specifies the access the thread would like to have. Normally we would have to look at [9] to see the available access masks, but since we are creating a new thread, we can also use the access masks specified in [10] which are as follows. I didn’t specifically describe what each of the access masks means, because we will use THREAD_ALL_ACCESS, which will give us all possible access rights for the thread object.
SYNCHRONIZE
THREAD_ALL_ACCESS
THREAD_DIRECT_IMPERSONATION
THREAD_GET_CONTEXT
THREAD_IMPERSONATE
THREAD_QUERY_INFORMATION
THREAD_QUERY_LIMITED_INFORMATION
THREAD_SET_CONTEXT
THREAD_SET_INFORMATION
THREAD_SET_LIMITED_INFORMATION
THREAD_SET_THREAD_TOKEN
THREAD_SUSPEND_RESUME
THREAD_TERMINATE
ObjectAttributes: points to a previously created structure specifying object attributes.
ProcessHandle: specifies an open process handle in whose address space the thread will run. Since we are programming the driver, we should enter NULL here.
ClientId: a pointer to a structure where the client identified for the new thread will be stored. Since we are programming the driver, we should enter NULL here.
StartRoutine: the entry point that will be executed when the thread starts.
StartContext: the argument passed to the thread when the thread starts.
In our code we call PsCreateSystemThread as shown below.
[simple]
PsCreateSystemThread(&thread, THREAD_ALL_ACCESS, &attrs, NULL, NULL, (PKSTART_ROUTINE)AllCPUsInfiniteLoop, (PVOID)hookaddr);
[/simple]
The next function we need to look at is ObReferenceObjectByHandle, which validates access to an object handle. If access is granted, it returns STATUS_SUCCESS, otherwise it may return one of the following error codes: STATUS_OBJECT_TYPE_MISMATCH, STATUS_ACCESS_DENIED, or STATUS_INVALID_HANDLE [11].

The parameters passed to the ObReferenceObjectByHandle function are explained below [11]:
Handle: specifies an open handle for the object.
DesiredAccess: specifies a request to access the object, which has a value of THREAD_ALL_ACCESS, because we would like to have full access to the object.
ObjectType: a pointer to the object type, which can be NULL, in which case the operating system will not verify that the supplied object type matches the object specified by the Handle argument.
AccessMode: specifies the access mode to use for access control, which must be set to UserMode or KernelMode.
Object: a pointer to a variable that receives the body of the object. Depending on the ObjectType parameter, we can specify one of the following pointer types:

HandleInformation: since we are programming a kernel driver, we need to set this to NULL.
In our code, we use the following call to the ObReferenceObjectByHandle function by passing it an open handle for the thread. Calling the function stores a pointer to the body of the object in the pkthread argument.
[simple]
ObReferenceObjectByHandle(thread, THREAD_ALL_ACCESS, NULL, KernelMode, &pkthread, NULL);
[/simple]
Another function is KeWaitForSingleObject, which puts the current thread into a waiting state until the dispatcher object is started or until the dispatcher’s time to execute expires. The function syntax can be seen below [12].

Related article:Ethical Hacking Interview Questions 2023
The parameters passed to the KeWaitForSingleObject function are as follows [12]:
Object: A pointer to an initialized dispatcher object, which can be an event, mutex, semaphore, thread, or timer.
WaitReason: specifies the reason for waiting. Since we are programming the kernel driver, we should set this to Executive.
WaitMode: specifies whether the caller waits in UserMode or KernelMode. Since we are calling from the kernel driver, we should specify KernelMode.
Alertable: Specifies whether the wait is alertable (TRUE) or not (FALSE). Since wait is not notifiable in our case, we should use FALSE.
Timeout: Specifies a pointer to a timeout value that specifies the time to wait in 100 nanoseconds. If we enter 0, the routine returns without waiting, whether NULL it waits indefinitely until the dispatcher object signs that it’s done.
In our kernel driver, we use the following KeWaitForSingleObject function call by passing the previously created pkthread pointer to the body of the object.
[simple]
KeWaitForSingleObject(pkthread, Executive, KernelMode, FALSE, &timeout);
[/simple]
Another function is KeQueryActiveProcessors, which returns a bitmask of the currently active processors, the syntax of which is shown below [13]. Note that the function takes no parameters and returns a KAFFINITY value that represents the set of currently active processors. The KAFFINITY type is the type definition for ULONT_PTR, which is 32-bit on 32-bit Windows systems and 64-bit on 64-bit Windows systems: each bit in the value is set to 1 if a processor is present on the system.
The KeGetCurrentThread routine identifies the current thread and returns a pointer to the PKTHREAD thread object. In our code, we also defined KeSetAffinityThread as a function with the stdcall calling convention. When using KeSetAffinityThread, we need to pass two parameters: a thread of type PKTHREAD and an affinity of type KAFFINITY.
[simple]
typedef NTSTATUS (__stdcall * KeSetAffinityThread)(
PKTHREAD thread,
CAFFINITY affinity
);
[/simple]
In the AllCPUs function, we use KeSetAffinityThread as follows. Basically, we call MmGetSystemRoutineAddress to return a pointer to the function specified by the str variable, which in our case is set to “KeSetAffinityThread”. When the routine name can be resolved, a pointer to that routine is returned. Otherwise, NULL is returned. Drivers typically use this to determine whether a routine is available in a particular version of Windows and can be used for routines exported by the kernel or HAL and not for any routine defined by the driver [14].
[simple]
KeSetAffinityThread KeSetAffinityThreadObj;
UNICODE_STRING str;
RtlInitUnicodeString(&str, L”KeSetAffinityThread”);
KeSetAffinityThreadObj = (KeSetAffinityThread)MmGetSystemRoutineAddress(&str);
[/simple]
We just used the undocumented KeSetAffinityThread function, which is exported by the kernel or HAL and must be called with dynamic binding. Usually, when we dynamically link to a function in a DLL, we must first call LoadLibrary to load the DLL into the virtual address space, and then we must find the function by traversing the linked-list export functions. Since we are now in kernel mode, the MmGetSystemRoutineAddress function can make it easier for us. By calling the MmGetSystemRoutineAddress function, we get the address to the KeSetAffinityThread routine practically for free.
If we follow the AllCPUs routine in WinDbg, this translates into the following build instructions. The highlighted instruction is a call to the MmGetSystemRoutineAddress function, which stores the value of the “KeSetAffinityThread” function into stack address [ebp-1c] and the value of KeSetAffinityThreadObj into stack address [ebp-14].

After that we’re calling the [ebp-14] address, which is the KeSetAffinityThreadObj object and holds a pointer to the KeSetAffinityThread kernel routine. On the picture below we can see where the KeSetAffinityThread is called.

If we set an “int 3” instruction before that calls in our C++ code, our debugger will be stopped at that point when loading the driver into the kernel. This is shown on the picture below.

If we execute the u command on the current address where “int 3” instruction is located, we’ll see that we’re at the same location as we previously located. Don’t be bothered with the virtual address being different; this is because we’ve reloaded the driver, which means it was loaded in a different location in the memory.

Let’s now step through the next four instructions, which set up the parameters before “KeSetAffinityThreadObj(thread, curCPU)” is called. On the picture below, we can see that the first parameter is edx, which corresponds to curCPU and the parameter stored in edx is the thread handle.

Just before executing the call instruction, let’s take a look at what’s located at the [ebp-14h] stack address by using the dd command.
[plain]
kd> dd ebp-0x14 l1
9cb69d3c 8263aa48
[/plain]
Notice that the address 0x8263aa48 is stored at that location. This means that the KeSetAffinityThread function is actually located at the 0x8263aa48 address. Let’s now dump that function with the u command.

We have now managed to get the KeSetAffinityThread function which sets the affinity of the current thread so that the thread runs on the selected processor. We will see the actual C++ code that is used in the following article.
Then we also enter a for loop counting from 0 to 32, where we selectively select a processor by using the AND operation on processors with local variables. Note that the processors variable contains 32 bits, where a bit can be set to 1 (processor present) or 0 (processor not present). The number of processors in the processor variable is the same as the number of processors connected to the system. By shifting the number 1 for 1,2,3,…,31 bits, we effectively select every processor in the system.
The code that goes through all processors can be seen below. Inside the body of the loop are several DbgPrint calls that are for debugging purposes so we know what’s going on. However, there is also a call to the KeSetAffinityThreadObj function that sets the current thread to run on a specific processor.
[simple]
for(i = 0; i < 32; i++) {
curCPU = processors & (1 << i);
if(curCPU != 0) {
DbgPrint(“Logical CPU 0x%x attached”, curCPU);
KeSetAffinityThreadObj(thread, curCPU);
DbgPrint(“Thread argument: reg : %x.rn”, thread_argument->reg);
DbgPrint(“Thread argument: hook addr: %x.rn”, thread_argument->hookaddr);
HookMSR(thread_arg->reg, thread_arg->hookaddr);
}
}
[/simple]
After executing the for loop, we reset the current affinity by calling KeSetAffinityThreadObj again. This ensures that the rest of the program will run on the same processor previously selected by the scheduler. Finally, we terminate the currently running thread by calling the PsTerminateSystemThread command.
[simple]
KeSetAffinityThreadObj(thread, processors);
PsTerminateSystemThread(STATUS_SUCCESS);
[/simple]
If we compile and reload the driver, it writes the following to the WinDbg output. Notice that the DriverEntry routine was called and the thread started on logical processor 1. Then the MSR entry 0x176 was appended and the previous value 0x8265e300 was overwritten by 0x96ef1110. Once the hooking process was over, the “Inside Hook Routine” messages started printing every 1000 calls.

Conclusion
In this article, we saw how it is possible to attach MSR records to gain control over the execution of system calls. At the beginning of the article, we looked at how we could connect MSR recording to a 1-processor system where we didn’t have to worry about multiple processors. Later, we also wrote code to attach the same MSR record to all processors, which ensures that the same HookRoutine is called regardless of which processor is selected to handle the system call.
Note that it is trivial to determine if the MSR has been mounted, since the pointers stored in the MSR registers must point to the ntoskrnl.exe module. After the connection process is complete, the MSR pointer points to MyDriver, which means it does not point to the ntoskrnl.exe module. This means that in order to determine if the MSR pointers have been mounted, we need to execute a function that loops through the MSR pointers and checks that the pointers actually point to the ntoskrnl.exe module. We can do this using the same KeSetAffinityThread routine that we used to hook the pointers in the first place.
Note that when you enter kernel mode, we have access to all structures, which means that a driver that has been loaded into the kernel can do a lot of damage, especially on Windows 7 and earlier, where drivers don’t need to be signed in order to be loaded into cores. In Windows 8 and later, drivers must be signed by a known CA in order to be loaded into the kernel, which means that attackers must first steal a valid certificate that can be used to sign the driver. But that’s just another layer of defense and it’s pretty easy to get around: all we have to do is sign the driver with a trusted valid certificate. How to get such a certificate is another story, but in the end we can order the certificate ourselves from the certification authority.
References
[1] Model-specific register, https://en.wikipedia.org/wiki/Model-specific_register.
[2] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1, http://www.intel.com/Assets/ja_JP/PDF/manual/253668.pdf.
[3] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C: System Programming Guide, Part 3, http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf
[4] RDMSR, http://faydoc.tripod.com/cpu/rdmsr.htm.
[5] WRMSR, http://faydoc.tripod.com/cpu/wrmsr.htm.
[6] Syscall hooking via MSRs, http://www.blizzhackers.cc/viewtopic.php?t=392361.
[7] InitializeObjectAttributes macro, http://msdn.microsoft.com/en-us/library/windows/hardware/ff547804(v=vs.85).aspx.
[8] PsCreateSystemThread routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff559932(v=vs.85).aspx.
[9] ACCESS_MASK, http://msdn.microsoft.com/en-us/library/windows/hardware/ff540466(v=vs.85).aspx.
[10] Thread Security and Access Rights, http://msdn.microsoft.com/en-us/library/windows/desktop/ms686769(v=vs.85).aspx.
[11] ObReferenceObjectByHandle routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff558679(v=vs.85).aspx.
[12] KeWaitForSingleObject routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff553350(v=vs.85).aspx.
[13] KeQueryActiveProcessors routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff553001(v=vs.85).aspx.
[14] MmGetSystemRoutineAddress routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff554563(v=vs.85).aspx.