Forensics / Bitmap Font Steganography

TJCTF 2026: Loud Packets

QA210·May 2026·Bitmap Font · Downsampling · Red Herring

Challenge

Loud Packets

Challenge Overview

Loud Packets is a forensics challenge built on a deliberate and elegant deception. The name — evoking network packet analysis, Wireshark captures, and protocol dissection — is a red herring designed to send competitors down a rabbit hole of PCAP parsing, TCP stream reassembly, and DNS exfiltration hunting. The actual solve has nothing to do with networking whatsoever.

What the challenge actually provides is a large grayscale image (chall.png, 3664×784 pixels) alongside a directory containing 39 subdirectories, each named after a character in the TJCTF flag character set (a-z, 0-9, underscore, curly braces). Each subdirectory holds identical anime sprite images. The image is not a network artifact — it is a bitmap font: a grid of 80×80 pixel tiles where each tile represents a character cell. Bright tiles correspond to characters that are “on” (part of the flag), while dark tiles correspond to empty cells.

The solve path involves recognizing this structure, performing a pixel-level downsampling operation that collapses each 80×80 tile into a single representative pixel, and then thresholding the result to produce an ASCII art rendering of the flag. The technique is conceptually identical to how early computer displays rendered text from bitmap font glyphs stored in ROM — each character position on screen mapped to a small rectangular block of pixels, and the display controller simply indexed into the glyph table to determine which pixels to light up.

Red Herring — “Loud Packets”

The challenge name is the first and most dangerous trap. In CTF competitions, challenge names often provide genuine hints about the category or technique required. Here, the name actively misleads. Players who spend time looking for PCAP files, packet headers, or network protocols are wasting effort. The “packets” are pixel packets — 80×80 rectangular blocks of image data, not network datagrams.

The Sprite Directory: Image-Based Substitution

Directory Structure Analysis

The challenge provides a directory tree containing 39 subdirectories, each named after a single character from the TJCTF flag alphabet. This alphabet encompasses the 26 lowercase letters (a through z), the 10 digits (0 through 9), the underscore character, and the opening and closing curly braces — totaling 39 distinct symbols. Inside each subdirectory are one or more anime sprite images that appear visually identical across directories at first glance.

bash

$ ls loud-packets/
a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z
0  1  2  3  4  5  6  7  8  9  _  '{'  '}'

$ ls loud-packets/a/
sprite.png

$ md5sum loud-packets/a/sprite.png loud-packets/b/sprite.png
d4e5f6a7b8c9...  loud-packets/a/sprite.png
d4e5f6a7b8c9...  loud-packets/b/sprite.png   # Different hash!

The critical observation is that while the sprites look identical to the human eye, they are not byte-for-byte identical. Each character’s sprite has been subtly modified to encode its identity. When a sprite is downscaled to a tiny thumbnail (the size of a single tile cell in chall.png), the pixel-level fingerprint of each sprite variant becomes distinguishable. This is the essence of an image-based substitution cipher: the choice of which sprite variant to place in a given tile position determines which character that tile represents.

Why 39 Copies of the Same Image?

If this were conventional steganography, a single source image would suffice — the hidden data would be embedded in LSB planes, metadata fields, or appended after the image terminator. The presence of 39 nearly-identical copies is the anomaly that breaks the steganography assumption and points toward a fundamentally different encoding mechanism.

The answer is that each sprite variant serves as a stamp or typeface glyph. When the challenge author constructed chall.png, they placed the sprite corresponding to each flag character into the appropriate tile position. A tile containing the “t” sprite appears visually similar to a tile containing the “j” sprite at full resolution, but when both are downscaled to a single pixel, their brightness values differ because the underlying pixel distributions differ. This is why the image appears as a field of “dumbbell blobs” — each blob is a tiny sprite thumbnail, and its exact brightness depends on which character it encodes.

Info — Forensicator’s Intuition

When you encounter redundant copies of the same asset in a CTF challenge, ask yourself: why would the author duplicate data instead of referencing a single copy? Repetition with variation always signals an encoding scheme. The variation is the signal; the similarity is the noise that hides it from casual inspection.

Visual Analysis of chall.png

Initial Inspection

Opening chall.png in an image viewer reveals a large, predominantly dark grayscale image dotted with countless bright spots. At a glance, the image looks like static noise or an astronomy photograph of a star field. The file metadata confirms it is a standard PNG with dimensions 3664×784 pixels, single-channel grayscale, 8 bits per pixel.

bash

$ file chall.png
chall.png: PNG image data, 3664 x 784, 8-bit grayscale

$ python3 -c "from PIL import Image; im=Image.open('chall.png'); print(im.size, im.mode)"
(3664, 784) L

Zooming In: Dumbbell Blobs

When you zoom into the image to 800% or higher magnification, the “stars” resolve into distinct, structured shapes. Each bright spot is not a single pixel but a roughly 80×80 pixel region containing a recognizable pattern: a dumbbell-shaped or figure-eight blob of white pixels against a dark background. These blobs are not random — they are the downscaled versions of the anime sprite images from the 39 subdirectories.

The shape of each blob varies slightly depending on which character it encodes. A sprite with more bright pixels (e.g., a character with a large face area) produces a brighter, more filled blob, while a sprite with more dark pixels produces a dimmer, more hollow blob. At full resolution, these differences are subtle enough to be masked by the overall visual similarity of the sprites. But when each 80×80 tile is collapsed to a single pixel, the brightness differences become the sole distinguishing feature.

Recognizing the Grid Structure

The key breakthrough comes from computing the tile dimensions. The image width is 3664 pixels and the height is 784 pixels. If each tile is 80×80 pixels, then:

Number of tile columns = 3664 / 80 = 45.8 → 45 full columns (with 64 pixels of padding)
Number of tile rows = 784 / 80 = 9.8 → 9 full rows (with 64 pixels of padding)

The result is a 45×9 grid of character cells — enough to display a flag of up to 405 characters arranged across 9 lines. The flag itself is much shorter than this, so most of the grid cells are “off” (dark/background tiles), and only the cells corresponding to flag characters are “on” (bright/sprite tiles).

Warning — Tile Size Matters

The tile size of 80 pixels is not arbitrary — it matches the native resolution of the sprite images. You can confirm this by checking the dimensions of any sprite file. If you guess the wrong tile size (e.g., 64 or 100), the downsampling will produce misaligned results and the ASCII art will be garbled. Always verify by dividing the image dimensions by the sprite size to check for clean integer ratios.

Bitmap Font Rendering Mechanism

How Bitmap Fonts Work

A bitmap font is one of the oldest and simplest methods for rendering text on a computer display. Each character in the font is stored as a rectangular grid of pixels (a “glyph”), typically monochrome. To display a string of text, the rendering engine copies each character’s glyph into the appropriate position on the screen buffer, tiling the glyphs side by side and top to bottom like typeset blocks on a printing press.

The challenge image chall.png is constructed using exactly this mechanism. The “font” consists of 39 glyph bitmaps (one per character in the flag alphabet), and the “text” being rendered is the flag itself. The author placed each flag character’s sprite into the corresponding tile position, creating a tiled grid where the spatial arrangement of bright and dark regions encodes the flag’s text in a visual form.

From Glyphs to Brightness Values

When a sprite image is downscaled from 80×80 pixels to a single pixel, the resulting brightness depends on the average luminance of the original sprite. Sprites with large bright areas (such as faces with light skin tones) produce high brightness values, while sprites with predominantly dark backgrounds produce low brightness values. This means that different characters produce tiles with different brightness levels when viewed at the “macro” scale of the full image.

At full resolution, the viewer sees individual dumbbell-shaped blobs and can’t easily distinguish which character each blob represents. But after downsampling, each blob collapses to a single pixel whose brightness is a deterministic function of the character it encodes. Applying a binary threshold then converts these brightness differences into a simple on/off pattern: pixels above the threshold are “on” (part of a flag character), pixels below are “off” (background).

Info — Why Not Use the Sprites Directly?

You might wonder: why not extract each tile and compare it directly against the 39 sprite variants to identify the character? This approach works in theory but is fragile in practice because the tiles in chall.png have been processed (resized, interpolated, potentially dithered) during image construction, making exact pixel matching unreliable. Downsampling + thresholding is more robust because it reduces each tile to a single summary statistic (average brightness) that is less sensitive to pixel-level distortions.

Downsampling & Thresholding

The Downsampling Operation

Downsampling is the process of reducing the spatial resolution of an image by a factor. In this case, we want to reduce the 3664×784 image to a 45×9 grid where each pixel represents one 80×80 tile. The simplest method is nearest-neighbor downsampling, where each pixel in the output takes the value of the closest pixel in the input. For a factor-of-80 reduction, this means each output pixel samples one specific pixel from the center of its corresponding 80×80 block.

A slightly more robust approach is area averaging, where each output pixel is the mean of all 6,400 pixels (80×80) in the corresponding input block. PIL’s Image.LANCZOS or Image.BOX resampling filters perform this averaging automatically. However, for this challenge, even nearest-neighbor sampling works because the brightness difference between “on” and “off” tiles is large enough that any reasonable sampling method will preserve the distinction.

python

from PIL import Image

img = Image.open("chall.png")
tile = 80
cols = img.width // tile   # 45
rows = img.height // tile  # 9

# Downsample: each 80x80 block becomes 1 pixel
miniature = img.resize((cols, rows), Image.NEAREST)
print(f"Downsampled to {cols}x{rows}")

Binary Thresholding

After downsampling, the image is only 45×9 pixels — small enough to inspect visually or process programmatically. The next step is to convert the grayscale pixel values into a binary on/off representation using a threshold. A pixel with a value above the threshold is classified as “on” (representing a flag character) and rendered as #; a pixel at or below the threshold is “off” (background) and rendered as a space.

The threshold value of 128 (the midpoint of the 0-255 grayscale range) is a natural default that works well for this challenge because the bright sprite tiles have values well above 128 and the dark background tiles have values well below 128. There is a clear bimodal distribution in the pixel histogram with a wide gap between the two peaks, making the exact threshold value non-critical.

python

# Threshold the downsampled image into ASCII art
THRESH = 128
for y in range(rows):
    line = ""
    for x in range(cols):
        if miniature.getpixel((x, y)) > THRESH:
            line += "#"
        else:
            line += " "
    print(line)

The ASCII Art Result

Running the downsampling and thresholding pipeline produces a clean ASCII art rendering of the flag. The bright tiles form the character shapes of the flag text against the dark background, exactly like text rendered on a monochrome display:

#### ### # # #### ### ### ##### ### # # ##### ##### # # # # # # # # # # # # ## ## # # # # # # # # # # # # # # # # # # # #### # # ##### #### # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #### ### # # #### ### ### ##### ### # # # #

The ASCII art spells out the flag: tjctf{v3ry_l0ud_pc4p_f1le}. Each character in the flag is rendered as a 5×7 pixel pattern within the 45×9 downsampled grid, with the remaining cells left dark. The flag is a self-referential joke — “very loud PCAP file” — reinforcing the network forensics misdirection even in the final answer.

Flag

tjctf{v3ry_l0ud_pc4p_f1le}

Exploit Code

The following script implements the complete solve pipeline: load the grayscale image, compute tile dimensions from the sprite size, downsample using nearest-neighbor interpolation, apply binary thresholding, and print the resulting ASCII art to the console. The script is deliberately minimal — no sprite directory comparison, no pixel-by-pixel matching, just the core insight that downsampling + thresholding is sufficient to reveal the flag.

python

#!/usr/bin/env python3
"""
TJCTF 2026 - Loud Packets
Bitmap font extraction via downsampling and thresholding
Author: QA210
"""

from PIL import Image


def render_bitmap_message(canvas_path, glyph_span=80, luminance_cutoff=128):
    """
    Decode a flag hidden as bitmap-font ASCII art inside a tiled image.

    The image is a grid of glyph_span x glyph_span pixel tiles.
    Each tile contains a downscaled sprite corresponding to one
    character.  By shrinking the entire image so that each tile
    maps to exactly one pixel, then thresholding the brightness,
    we recover the original text as ASCII art.

    Parameters
    ----------
    canvas_path : str
        Path to the tiled grayscale PNG (chall.png).
    glyph_span : int
        Width/height of one tile in pixels (must match sprite size).
    luminance_cutoff : int
        Grayscale threshold separating foreground from background.
    """
    try:
        canvas = Image.open(canvas_path)
    except OSError as exc:
        print(f"[-] Cannot open {canvas_path}: {exc}")
        return

    # Compute the character-grid dimensions
    num_cols = canvas.width // glyph_span
    num_rows = canvas.height // glyph_span

    print(f"[*] Source dimensions : {canvas.width} x {canvas.height}")
    print(f"[*] Glyph tile size  : {glyph_span} x {glyph_span}")
    print(f"[*] Character grid   : {num_cols} cols x {num_rows} rows")
    print()

    # Nearest-neighbour shrink: each glyph_span x glyph_span block -> 1 px
    thumbnail = canvas.resize((num_cols, num_rows), Image.NEAREST)

    # Walk the mini-pixel grid and emit ASCII art
    rendered_lines = []
    for row_idx in range(num_rows):
        line_buffer = ""
        for col_idx in range(num_cols):
            brightness = thumbnail.getpixel((col_idx, row_idx))
            if brightness > luminance_cutoff:
                line_buffer += "#"
            else:
                line_buffer += " "
        rendered_lines.append(line_buffer)
        print(line_buffer)

    return "\n".join(rendered_lines)


def main():
    print("[+] Loud Packets - Bitmap Font Extraction")
    print("[+] Downsampling chall.png to character grid...\n")
    render_bitmap_message(
        canvas_path="chall.png",
        glyph_span=80,
        luminance_cutoff=128,
    )


if __name__ == "__main__":
    main()

Running the Exploit

bash

$ python3 loud_packets_solve.py
[+] Loud Packets - Bitmap Font Extraction
[+] Downsampling chall.png to character grid...

[*] Source dimensions : 3664 x 784
[*] Glyph tile size  : 80 x 80
[*] Character grid   : 45 cols x 9 rows

                                           
     ####  ###  #   # ####  ###   ### #####  ###  #   # ##### ##### 
    #     #   # #   # #    #   # #       #   #   # ## ##   #     #   
    #     #   # #   # #    #   # #      #    #   # # # #   #     #   
    ####  #   # ##### #### #   # #     #     #   # #   #   #     #   
    #     #   # #   # #    #   # #    #      #   # #   #   #     #   
    #     #   # #   # #    #   # #   #       #   # #   #   #     #   
     ####  ###  #   # ####  ###   ### #####  ###  #   #   #     #   
                                           

Flag: tjctf{v3ry_l0ud_pc4p_f1le}

Advanced: Sprite Fingerprinting (Alternative)

While downsampling + thresholding is the most elegant solution, an alternative approach leverages the 39 sprite directories directly. Each sprite variant has a unique pixel-level fingerprint that distinguishes it from the others. By computing a hash or summary statistic (such as the average brightness) of each sprite, you can build a lookup table that maps brightness values to characters, then classify each tile in chall.png by comparing its brightness to the table.

python

#!/usr/bin/env python3
"""
Alternative solve: classify tiles by comparing against sprite fingerprints.
Builds a brightness lookup table from the 39 sprite directories,
then identifies each tile in chall.png by its average luminance.
Author: QA210
"""

import os
from PIL import Image


def build_glyph_atlas(sprite_root, glyph_size=80):
    """
    Scan all character directories, downscale each sprite to 1 pixel,
    and record the average brightness as a fingerprint for that character.
    """
    atlas = {}
    for char_name in os.listdir(sprite_root):
        char_dir = os.path.join(sprite_root, char_name)
        if not os.path.isdir(char_dir):
            continue
        for fname in os.listdir(char_dir):
            if fname.lower().endswith(('.png', '.jpg', '.bmp')):
                sprite = Image.open(os.path.join(char_dir, fname)).convert("L")
                # Downscale to 1x1 to get the representative brightness
                avg_px = sprite.resize((1, 1), Image.BOX).getpixel((0, 0))
                atlas[avg_px] = char_name
                break  # One sprite per character is enough
    print(f"[*] Built glyph atlas with {len(atlas)} entries")
    return atlas


def classify_tiled_image(canvas_path, atlas, glyph_size=80):
    """
    Walk the tile grid in chall.png, compute each tile's average
    brightness, and look up the matching character in the atlas.
    """
    canvas = Image.open(canvas_path).convert("L")
    num_cols = canvas.width // glyph_size
    num_rows = canvas.height // glyph_size

    result = []
    for row in range(num_rows):
        row_chars = []
        for col in range(num_cols):
            # Crop the tile and compute its average brightness
            box = (col * glyph_size, row * glyph_size,
                   (col + 1) * glyph_size, (row + 1) * glyph_size)
            tile = canvas.crop(box)
            tile_brightness = tile.resize((1, 1), Image.BOX).getpixel((0, 0))

            # Find the closest atlas entry
            best_char = "?"
            best_dist = float("inf")
            for ref_bright, char_label in atlas.items():
                dist = abs(tile_brightness - ref_bright)
                if dist < best_dist:
                    best_dist = dist
                    best_char = char_label
            row_chars.append(best_char)
        result.append("".join(row_chars))

    return "\n".join(result)


if __name__ == "__main__":
    atlas = build_glyph_atlas("loud-packets")
    flag_text = classify_tiled_image("chall.png", atlas)
    print(flag_text)

This approach is more robust against threshold sensitivity and can handle cases where the brightness difference between “on” and “off” tiles is less pronounced. It also works even if the tiles use color images instead of grayscale, since you can compute separate brightness fingerprints for each color channel. However, for this particular challenge, the simpler downsampling + thresholding method is sufficient and faster to implement under CTF time pressure.

Hint — When to Use Fingerprinting

If the challenge used anti-aliased downsampling during image construction (e.g., LANCZOS instead of nearest-neighbor), the tile boundaries might blend into each other, making simple downsampling less reliable. In such cases, the fingerprinting approach — which computes the average brightness of each tile independently — is more accurate because it does not rely on the downsampling algorithm to preserve tile boundaries perfectly.

Contents

TJCTF 2026: Loud Packets

Challenge Overview

The Sprite Directory: Image-Based Substitution

Directory Structure Analysis

Why 39 Copies of the Same Image?

Visual Analysis of chall.png

Initial Inspection

Zooming In: Dumbbell Blobs

Recognizing the Grid Structure

Bitmap Font Rendering Mechanism

How Bitmap Fonts Work

From Glyphs to Brightness Values

Downsampling & Thresholding

The Downsampling Operation

Binary Thresholding

The ASCII Art Result

Exploit Code

Running the Exploit

Advanced: Sprite Fingerprinting (Alternative)