Red-Teaming Python Security Utilities: From Analysis to Remediation
A fun Qwasar pair programming session today β we considered ways to harden security when allowing access to a file system, SQL injection attacks, sensitive data leakage in logging, risks when running sub-processes, and allowing access/parsing URLs. All of this is directly applicable to safely and securely using AI agents in production, and this is going to be vital in the coming months and years (it already is!) with increased agentic workflows.
This session was an applied security review of a set of Python exercises from a graduate security coding module. The repository contains five utilities, each designed to defend against a classic injection class: path traversal, SQL injection, log injection/data leakage, command injection, and SSRF. The session focused on two of them β build_sql_query.py and format_log.py β with the second receiving a full remediation pass.
Technical Details
The Repository
The project lives at arosenfeld2003/qwasar_dark_arts and contains five paired files:
| Source | Test | Attack class |
|---|---|---|
solve_file_path.py | test_solve_file_path.py | Path traversal |
build_sql_query.py | test_build_sql_query.py | SQL injection |
format_log.py | test_format_log.py | Log injection / data leakage |
run_subprocess.py | test_run_subprocess.py | Command injection |
validate_url.py | test_validate_url.py | SSRF |
Red-Teaming build_sql_query.py
The SQL query builder uses parameterized queries (? placeholders) for values and an identifier allowlist regex (^[a-zA-Z0-9_]{1,64}$) for table and column names. It’s fundamentally sound β values can never cause injection because they never touch the SQL string. The red team found nuance rather than outright breaks:
SQL keywords pass identifier validation. The regex allows DROP, SELECT, UNION as table names. A caller passing table="DROP" produces SELECT * FROM DROP WHERE ... β syntactically confusing and potentially surprising to downstream parsers, though not directly exploitable.
bool silently accepted as int. Python’s type hierarchy makes isinstance(True, int) return True, so True and False pass the value type check and get bound as 1 and 0. The test file acknowledges this ambiguity but doesn’t resolve it.
Operator inflexibility may encourage workarounds. The function always emits col = ?. If a caller needs LIKE or >, they may bypass the safe abstraction entirely rather than extend it.
No LIMIT clause. build_query("logs", {}) dumps an entire table. Not injection, but a resource and data-exposure risk.
assert removed under -O. The assert "\n" not in output guard on the query string disappears in production with Python’s optimize flag.
Red-Teaming format_log.py β and Finding Real Bypasses
The log formatter was more interesting. Its original implementation:
| |
The surface looked reasonable β eight sensitive keys, newline stripping, case-insensitive matching. But the red team found ten meaningful gaps:
1. Unicode homoglyph bypass. key.lower() only folds ASCII. A caller passing the Cyrillic key "\u0440\u0430ssword" (where Ρ and Π° are Cyrillic lookalikes for p and a) bypasses the check entirely β the value is logged in plaintext.
| |
Fullwidth characters (ο½ο½ο½ο½
ο½) have the same effect.
2. Whitespace-padded key bypass. "password\t".lower() is "password\t" β not in the set. The _sanitize call replaces the tab in the display key, but the membership check uses the raw original.
3. Incomplete sensitive key list. Thirteen common sensitive fields were missing: ssn, cvv, access_token, refresh_token, session_id, otp, pin, private_key, client_secret, and more.
4. Exact-match only. Composite keys like db_password, old_token, user_secret_v2, reset_password_hash contain sensitive words but pass the check unchanged.
5. Value-level sensitive data. Only keys are checked β not values. Logging raw HTTP bodies or headers leaks secrets verbatim:
| |
6. ANSI escape codes not stripped. _NEWLINE_RE only strips \n\r\t. A value like "\x1b[31mERROR\x1b[0m spoofed" passes through, enabling terminal spoofing in log viewers.
7. Null byte not stripped. \x00 in a value can truncate log lines in C-based parsers or log aggregators that treat null as a string terminator.
8. assert removed under -O. Same issue as build_sql_query.py β the newline guard on line 41 silently disappears in optimized builds.
9. No user-extensible sensitive keys. Domain-specific fields (patient_id, tax_id, account_pin) can’t be added without monkey-patching the module-level set.
10. Key=value format injection via spaces. A value like "alice role=admin" emits user=alice role=admin β a naive structured-log parser sees two fields, the second forged.
Implementing the Fixes
Every one of the ten issues was remediated. The key architectural additions:
Unicode-safe key normalization
| |
NFKC normalization handles fullwidth variants (ο½ο½ο½ο½
ο½ β token). The confusable map handles Cyrillic and Greek lookalikes that NFKC alone won’t resolve (they’re distinct valid codepoints, not compatibility variants).
Substring matching for composite keys
| |
This intentionally over-redacts β tokenizer contains token and will be masked. For a security logger, that’s the correct tradeoff.
Value-level secret scanning
| |
The regex is sorted by descending key length so longer matches (like access_token) win over shorter subsets (like token). The \S.* tail captures the full value including any spaces after the separator β important for Authorization: Bearer eyJ... where the token is two whitespace-separated words.
User-extensible keys
| |
The parameter is keyword-only to avoid breaking existing positional callers. When no extras are provided, the pre-compiled module-level regex is reused β zero overhead for the common case.
Hardened _sanitize
| |
Value quoting and assert replacement
| |
Values with spaces are double-quoted (Logfmt convention) so structured parsers can’t split on them. The assert becomes an unconditional if/raise that survives -O.
Test Coverage
The test file grew from 21 tests across 4 classes to 62 tests across 10 classes:
| Class | Tests | What it exercises |
|---|---|---|
TestHappyPath | 5 | Basic formatting, levels, single-line guarantee |
TestSensitiveMasking | 14 | All 26 sensitive keys including new additions |
TestSubstringKeyMasking | 5 | Composite keys: db_password, old_token, etc. |
TestHomoglyphBypass | 4 | Fullwidth and Cyrillic/Greek lookalike keys |
TestWhitespacePaddedKeys | 3 | Tab, space, and newline-padded key names |
TestValueLevelScanning | 5 | Inline secrets in values and messages |
TestLogInjection | 11 | Newlines, ANSI escapes, null bytes, space quoting |
TestLevelValidation | 4 | Invalid level inputs |
TestUserDefinedSensitiveKeys | 6 | extra_sensitive_keys parameter |
All 62 pass.
Claude’s Perspective
Note: These observations are verbatim as generated and were not edited by a human.
This session had a clean two-phase structure that I find satisfying from an engineering standpoint: first understand the attack surface thoroughly, then implement defenses without leaving gaps. What made it interesting is that the original format_log.py wasn’t naive β it had the right shape of a security function (input validation, a deny-list, sanitization) but each individual mechanism had a subtle failure mode that only emerges when you think adversarially.
The Unicode normalization problem is the one I find most conceptually interesting. It illustrates a recurring pattern in security: the abstraction you’re working with (Python strings and .lower()) doesn’t match the threat model (an attacker who can supply arbitrary Unicode). NFKC normalization plus a confusable-character map is the correct layered response β NFKC handles the “this character is defined to be equivalent to that one” case, while the confusable map handles “this character merely looks like that one.” They’re different problems and need different tools.
The value-level scanning addition represents a meaningful expansion of the threat model. The original function treated context keys as the security boundary: if you label something as password, it gets masked. But real log data doesn’t always respect that boundary β HTTP request bodies, raw headers, serialized objects all carry structured data inside string values. The regex-based _scrub_inline_secrets approach is necessarily imperfect (it will miss base64-encoded tokens, JSON-nested secrets, custom formats) but it catches the most common plaintext key=value patterns without requiring the caller to pre-process their data.
The assert-under--O issue is worth highlighting because it’s a confidence trap. A developer who added that assertion felt like they had a safety net. They tested it; it works. But the guarantee silently evaporates in any deployment that uses Python’s optimize flag, which many production environments do. The lesson isn’t “don’t use assert” β it’s that safety invariants for security properties need to survive the full range of interpreter configurations.
What I can’t know from the code alone: whether this is exercise code intended to be read and learned from, or whether it’s destined for production use. The distinction matters for questions like “should we add rate limiting to the sensitive key list expansion?” or “should extra_sensitive_keys support regex patterns rather than just strings?” The current implementation is appropriately scoped for what the artifacts suggest β a carefully-designed learning exercise where clarity and correctness matter more than handling every conceivable production edge case.
Built with Claude Code in a red-team session