Forensics / Document & Image Steganography

TJCTF 2026: Invisible Ink

QA210·May 2026·Polyglot · PDF-ZIP · Swirl Distortion

Challenge

Invisible Ink

Challenge Overview

Invisible Ink is a multi-layered forensics challenge that tests the competitor’s ability to recognize and exploit polyglot files — files that are simultaneously valid in two different formats. The provided artifact, invisible_ink.pdf, is neither a normal PDF nor a standard ZIP archive. It is both at the same time, a chimera constructed by concatenating a complete PDF document with a complete ZIP archive, exploiting the fact that PDF parsers read from the beginning while ZIP parsers read from the end.

The challenge unfolds in three distinct phases, each building on the previous. First, you must recognize the dual nature of the file and open it both as a PDF and as a ZIP. Second, the PDF contains a hidden password embedded as white-colored text on a white background — invisible to the eye but trivially extracted by text-layer tools. Third, the ZIP contains an encrypted image with a mathematical swirl distortion applied, and the password from the PDF is needed to decrypt it. Only by reversing the swirl transformation can you read the flag written in the image.

The name “Invisible Ink” is a direct reference to the historical practice of writing secret messages in substances that are invisible under normal lighting but become visible under specific conditions (UV light, heat, chemical developer). Here, the “ink” is the white-on-white text in the PDF — present in the data, invisible in the rendering, and revealed only by the right tool.

Hint — “Invisible Ink”

The challenge title is not just thematic window dressing. It describes the exact technique used to hide the password: text rendered in the same color as its background. Just as real invisible ink requires a developer reagent to reveal, this digital invisible ink requires a text extraction tool (not a visual PDF reader) to expose the hidden content.

The Polyglot Chimera: PDF-ZIP Dual Format

Header and Trailer Analysis

The first step in any forensics challenge is to examine the raw bytes of the file. Running hexdump -C invisible_ink.pdf | head reveals the classic PDF magic number %PDF- at offset 0, confirming that the file begins as a valid PDF document. The file command corroborates this, reporting PDF document, version 1.4. However, this is only half the story.

Examining the tail of the file with hexdump -C invisible_ink.pdf | tail reveals the bytes 50 4B 05 06 — the End of Central Directory (EOCD) signature of a ZIP archive. This is the smoking gun: the file contains a complete ZIP structure at its end, meaning it can be opened by any ZIP-compatible tool.

bash

$ file invisible_ink.pdf
invisible_ink.pdf: PDF document, version 1.4

$ hexdump -C invisible_ink.pdf | head -2
00000000  25 50 44 46 2d 31 2e 34  0a ...  |%PDF-1.4.......|

$ hexdump -C invisible_ink.pdf | tail -2
0000f2a0  50 4b 05 06 00 00 00 00  01 00 01 00 ...  |PK..............|

Why Polyglot Files Work

The reason this trick works lies in the fundamental difference between how PDF and ZIP parsers locate data within a file. A PDF parser reads from the top: it finds the %PDF- header, then builds an internal structure by following cross-reference tables and object offsets until it reaches the trailer. Everything beyond the trailer is simply ignored — the PDF specification does not mandate that the file end at the trailer.

A ZIP parser reads from the bottom: it scans backward from the end of the file for the EOCD record, which contains the offset of the Central Directory. From there, it reads each Central Directory entry, which in turn points to Local File Headers scattered throughout the file. The ZIP format explicitly allows arbitrary data before the first Local File Header, making it tolerant of garbage at the beginning of the file.

This asymmetry creates a sweet spot for polyglot construction: append a complete ZIP archive after a complete PDF, and both parsers happily read their respective structures while ignoring the other format’s data. The result is a file that is simultaneously a valid PDF and a valid ZIP.

Info — Polyglot Construction Methods

There are multiple ways to construct PDF-ZIP polyglots. The simplest method (used here) is naive concatenation: PDF first, ZIP second. More sophisticated methods involve embedding ZIP data inside PDF stream objects, which makes the polyglot harder to detect because the ZIP structures are interleaved with PDF structures rather than simply appended. Tools like zzag and polyglot automate the construction process for various format combinations.

Opening Both Layers

Once you recognize the polyglot nature, the exploitation is straightforward. Open the file as a PDF with any viewer or library to examine the document content. Open the same file as a ZIP with any archive tool to list and extract the embedded files. On the command line, unzip -l invisible_ink.pdf will list the ZIP contents despite the .pdf extension, because unzip locates data via the EOCD, not the file extension:

bash

$ unzip -l invisible_ink.pdf
Archive:  invisible_ink.pdf
  Length      Date    Time    Name
---------  ---------- -----   ----
   284726  2026-05-10 14:22   original_distorted.png
---------                     -------
   284726                     1 file

The ZIP layer contains a single file: original_distorted.png, which is encrypted and requires a password to extract. Finding that password is the next phase of the challenge.

White-on-White Text Extraction

PDF Layer Separation

The PDF format maintains a strict separation between the rendering layer (which controls how text and graphics appear visually — font, size, color, position) and the text layer (which stores the raw Unicode character data for search, copy-paste, and accessibility). This separation exists so that PDFs can be searched and indexed without requiring OCR, even when the visual presentation is complex.

An attacker can exploit this separation by placing text in the PDF that is present in the text layer but invisible in the rendering layer. The simplest method is to set the text color to white (#FFFFFF) on a white background. The text data is stored in the PDF’s content stream, and the glyph coordinates and character codes are fully intact — but when rendered, the white glyphs blend perfectly into the white page, making them undetectable to the human eye.

Extraction with PyMuPDF

Text extraction tools like PyMuPDF (the fitz library) operate on the text layer, not the rendering layer. When you call page.get_text(), the library reads the character codes and glyph positions from the PDF content stream and returns the raw Unicode text. It does not evaluate the rendering state — font color, background color, or clipping paths — because those are concerns of the display engine, not the data extraction engine.

This means that white-on-white text is just as extractable as any other text. The extraction is purely data-driven: if the characters are encoded in the content stream, they will be returned by get_text(), regardless of how they would appear on screen.

python

import fitz  # PyMuPDF

doc = fitz.open("invisible_ink.pdf")
for idx, page in enumerate(doc):
    txt = page.get_text().strip()
    if txt:
        print(f"Page {idx}: {repr(txt)}")

Running this extraction reveals that Page 1 contains normal visible text (the cover page), but Page 2 contains a string that does not appear anywhere in the visual rendering: DBf8nEBgwRhZ. This is the ZIP password, hidden in plain sight using the invisible ink technique.

Extracted Password

DBf8nEBgwRhZ — found on Page 2 of the PDF, concealed as white-on-white text in the text layer. This password decrypts the ZIP layer’s contents.

Why Visual Inspection Fails

Opening the PDF in a standard viewer like Adobe Acrobat, Evince, or a browser shows nothing unusual on page 2. The text is rendered in white on a white background, so it is completely invisible. Even advanced PDF features like “Find” may not locate the text if the viewer’s search implementation respects the rendering context rather than the raw content stream. The only reliable way to extract the hidden text is to use a programmatic tool that reads the text layer directly, bypassing the rendering pipeline entirely.

This technique is a well-known attack vector in document forensics and has real-world implications. Malicious actors can embed hidden commands, URLs, or exfiltration channels in PDFs that pass visual inspection but contain steganographic data in their text layers. Security auditors routinely scan PDFs with text extraction tools as part of their document analysis workflow.

Warning — Beyond White-on-White

White-on-white is the simplest form of PDF text steganography, but more sophisticated variants exist: text placed outside the page boundary (negative coordinates), text with zero font size, text obscured by overlapping opaque objects, and text encoded with custom CMaps that map visible glyphs to different characters. A thorough forensic examination should check all of these vectors.

Swirl Distortion & Mathematical Recovery

Decrypting the ZIP Layer

With the password DBf8nEBgwRhZ in hand, extracting the ZIP layer is straightforward. The ZIP contains a single file — original_distorted.png — which is a PNG image of handwriting that has been warped by a swirl distortion. The flag is written in the original, undistorted handwriting, so recovering it requires reversing the mathematical transformation.

bash

$ unzip -P DBf8nEBgwRhZ invisible_ink.pdf
Archive:  invisible_ink.pdf
  inflating: original_distorted.png
$ file original_distorted.png
original_distorted.png: PNG image data, 1080 x 1080, 8-bit/color RGBA

The Mathematics of Swirl Distortion

Swirl distortion is a spatial transformation that operates in polar coordinates. Given a center point (cx, cy) and a radius R, each pixel at Cartesian coordinates (x, y) is first converted to polar coordinates (r, θ) relative to the center. The angle is then modified according to the formula:

θ_new = θ_old + strength × (1 − r / radius)

The key property of this formula is that the rotation angle decreases linearly from strength at the center (r = 0) to zero at the boundary (r = radius). Pixels beyond the radius are unaffected. This creates a vortex effect where the center of the image is rotated the most, and the rotation tapers off toward the edge of the distortion region.

The critical insight for recovery is that swirl distortion is self-inverting. If the forward transformation uses a positive strength value, applying the same transformation with the negative of that value exactly cancels the original distortion. This is because the polar coordinate transformation is additive in the angle dimension: adding +strength and then adding -strength returns to the original angle.

Identifying the Parameters

The challenge does not explicitly provide the swirl parameters. However, the distortion is clearly centered on the image (the center is the most warped region), and by examining the extent of the warping, we can estimate that the radius covers most of the image. For a 1080×1080 image, a radius of approximately 540 pixels (half the width) is a natural choice that covers the entire image area.

The strength parameter requires more careful estimation. Too weak a reversal leaves residual swirling; too strong a reversal over-corrects and swirls in the opposite direction. Through experimentation or by examining the distortion’s characteristics (a common starting point for CTF challenges is an integer or simple decimal value), a strength of 8.5 with a radius of 540 produces a clean, undistorted image when negated.

python

from skimage.transform import swirl
import numpy as np
from PIL import Image

# Load the distorted image
warped = np.array(Image.open("original_distorted.png"))

# Reverse the swirl: negate the strength to cancel the original +8.5
recovered = swirl(
    warped,
    center=(warped.shape[1] // 2, warped.shape[0] // 2),
    strength=-8.5,   # Invert the sign to undo the distortion
    radius=540
)

Image.fromarray((recovered * 255).astype(np.uint8)).save("flag_recovered.png")
print("[+] Saved flag_recovered.png")

The recovered image reveals the flag written in handwriting across the image. The transformation is exact: because swirl is mathematically invertible (assuming the correct parameters), the recovered image is a pixel-perfect reconstruction of the original undistorted content.

Flag

tjctf{p0lygl0t_f1les_4r3_50_c00l}

Info — Why Swirl Is Invertible

Not all image distortions are invertible. Lossy operations like heavy JPEG compression, aggressive downscaling, or random noise injection permanently destroy information. Swirl distortion, however, is a deterministic spatial remapping — it moves pixels to new positions without altering their values. As long as you know the exact transformation parameters, every pixel can be returned to its original location with zero loss. This makes swirl an ideal challenge mechanic: it’s visually dramatic but mathematically clean.

Exploit Code

The following script automates the entire three-phase solve: polyglot identification, invisible text extraction, ZIP decryption, and swirl reversal. It uses PyMuPDF for PDF text extraction, Python’s built-in zipfile for ZIP handling, and scikit-image’s swirl function for the distortion reversal.

python

#!/usr/bin/env python3
"""
TJCTF 2026 - Invisible Ink
Polyglot PDF-ZIP forensics + white-on-white extraction + swirl reversal
Author: QA210
"""

import fitz                    # PyMuPDF
import zipfile
import io
import numpy as np
from PIL import Image
from skimage.transform import swirl


def harvest_concealed_passphrase(document_path):
    """
    Extract invisible white-on-white text from the PDF text layer.
    PDF rendering engines apply color/font context, but text extraction
    libraries return raw Unicode regardless of visual appearance.
    """
    pdf_handle = fitz.open(document_path)

    # Scan all pages for any text content (visible or invisible)
    for page_index in range(len(pdf_handle)):
        page_obj = pdf_handle[page_index]
        page_content = page_obj.get_text().strip()
        if page_content:
            print(f"[+] Page {page_index} text layer: {page_content}")

    # The hidden passphrase is specifically on page 2 (index 1)
    target_page = pdf_handle[1]
    concealed_text = target_page.get_text().strip()
    pdf_handle.close()

    print(f"[*] Extracted concealed passphrase: {concealed_text}")
    return concealed_text


def unlock_embedded_archive(chimera_path, passphrase):
    """
    Open the ZIP layer hidden inside the polyglot file.
    ZIP parsers locate data via EOCD from the file's tail,
    so they naturally skip the PDF content at the head.
    """
    with zipfile.ZipFile(chimera_path, 'r') as archive:
        catalog = archive.namelist()
        print(f"[*] ZIP catalog: {catalog}")

        # Identify the distorted image file
        image_entry = [entry for entry in catalog if entry.endswith('.png')][0]

        # Decrypt and read the image bytes
        with archive.open(image_entry, pwd=passphrase.encode()) as encrypted_stream:
            image_blob = encrypted_stream.read()

    return Image.open(io.BytesIO(image_blob))


def undo_vortex_warping(pil_frame, twist_magnitude=8.5, warp_extent=540):
    """
    Reverse swirl distortion by negating the twist strength.
    The forward transform rotates pixels by +magnitude;
    the inverse applies -magnitude to cancel it exactly.
    """
    pixel_array = np.array(pil_frame)
    frame_h, frame_w = pixel_array.shape[:2]

    # Apply inverse swirl: strength is negated
    restored_array = swirl(
        pixel_array,
        center=(frame_w // 2, frame_h // 2),
        strength=-twist_magnitude,    # Critical: negate to invert
        radius=warp_extent
    )

    return Image.fromarray((restored_array * 255).astype(np.uint8))


def main():
    chimera_file = "invisible_ink.pdf"

    # Phase 1: Extract the invisible password from the PDF text layer
    archive_key = harvest_concealed_passphrase(chimera_file)

    # Phase 2: Decrypt the ZIP layer and retrieve the distorted image
    warped_frame = unlock_embedded_archive(chimera_file, archive_key)

    # Phase 3: Reverse the mathematical swirl distortion
    print("[*] Reversing swirl distortion (strength -> -8.5, radius -> 540)...")
    recovered_frame = undo_vortex_warping(warped_frame, twist_magnitude=8.5, warp_extent=540)

    recovered_frame.save("flag_recovered.png")
    print("[+] Saved recovered image to flag_recovered.png")
    print("[+] Read the flag from the image!")


if __name__ == "__main__":
    main()

Running the Exploit

bash

$ python3 invisible_ink_solve.py
[+] Page 0 text layer: 'Invisible Ink Challenge'
[+] Page 1 text layer: 'DBf8nEBgwRhZ'
[*] Extracted concealed passphrase: DBf8nEBgwRhZ
[*] ZIP catalog: ['original_distorted.png']
[*] Reversing swirl distortion (strength -> -8.5, radius -> 540)...
[+] Saved recovered image to flag_recovered.png
[+] Read the flag from the image!

Forensic Toolbox Reference

Solving Invisible Ink requires familiarity with several categories of forensic tools. Understanding when and why to reach for each tool is as important as the technical solve itself. Below is a reference of the key tools and their applications in this challenge.

Tool	Purpose	Usage in This Challenge
`file`	Identify file type by magic bytes	Confirmed PDF header at offset 0
`hexdump`	Raw byte inspection	Found PK\x05\x06 (EOCD) at file tail
`unzip -l`	List ZIP contents	Enumerated files in ZIP layer
`PyMuPDF (fitz)`	PDF text extraction	Extracted invisible white-on-white text
`zipfile`	Programmatic ZIP handling	Decrypted and extracted PNG from ZIP
`scikit-image`	Image transformations	Reversed swirl distortion (strength negation)
`binwalk`	Embedded file detection	Alternative: detect ZIP signature inside PDF

Info — Alternative Detection with Binwalk

If you did not think to check the file tail manually, binwalk would have revealed the embedded ZIP automatically. Binwalk scans for known file signatures at every offset, and its output would show a ZIP archive starting at the offset where the PDF content ends. This is a useful tool for any challenge where you suspect hidden data but are not sure where to look.

Contents

TJCTF 2026: Invisible Ink

Challenge Overview

The Polyglot Chimera: PDF-ZIP Dual Format

Header and Trailer Analysis

Why Polyglot Files Work

Opening Both Layers

White-on-White Text Extraction

PDF Layer Separation

Extraction with PyMuPDF

Why Visual Inspection Fails

Swirl Distortion & Mathematical Recovery

Decrypting the ZIP Layer

The Mathematics of Swirl Distortion

Identifying the Parameters

Exploit Code

Running the Exploit

Forensic Toolbox Reference