Introduction to taint analysis
Imagine tracking a drop of colored dye through a water system. Once you add the dye at any input point, you can see exactly where it flows through all the pipes and connections. You need to ensure the dyed water doesn't reach certain outlets (like drinking fountains) unless it first passes through a purification system. Taint analysis does exactly this for data in software systems, tracking potentially unsafe data as it flows through your code.
Taint analysis is a specialized form of data flow analysis that tracks potentially dangerous or sensitive data as it flows through a program. It's one of the most powerful techniques for finding security vulnerabilities automatically.
The security challenge
Modern applications constantly handle untrusted data. Web applications process user input from forms, URLs, and cookies. APIs receive data from external systems. Mobile apps handle data from sensors and user interactions. Any of this data could be malicious.
The challenge is that dangerous data rarely causes problems immediately. Instead, it flows through the application, gets stored in variables, passes through method calls, and eventually reaches a sensitive operation. By then, it's hard to remember that the data was originally untrusted.
Consider this seemingly innocent code.
public void greetUser(HttpServletRequest request, HttpServletResponse response) {
String name = request.getParameter("name");
String message = "Hello, " + name + "!";
response.getWriter().write("<h1>" + message + "</h1>");
}
This code has a cross-site scripting (XSS) vulnerability. If an attacker provides <script>alert('XSS')</script>
as the name, that JavaScript will execute in victims' browsers. The problem isn't obvious because the dangerous data travels through multiple variables before reaching the dangerous operation.
Core concepts
Taint analysis tracks data through three key components: sources, sinks, and sanitizers.
Sources: Where the dye enters
Sources are program points where untrusted or sensitive data enters the system. Think of them as the input points where colored dye is added to the water system.
Common sources of untrusted data include.
// Web parameters - could contain anything
String userInput = request.getParameter("search");
// HTTP headers - controlled by the client
String userAgent = request.getHeader("User-Agent");
// File contents - might be tampered with
String config = Files.readString(Path.of(uploadedFile));
// Database values - might contain previously stored attacks
String comment = resultSet.getString("user_comment");
// Network data - from untrusted sources
String apiResponse = httpClient.execute(request).getBody();
Sources can also be sensitive data that shouldn't leak.
// Passwords and credentials
String password = user.getPassword();
// Personal information
String ssn = customer.getSocialSecurityNumber();
// Cryptographic keys
byte[] secretKey = keyStore.getKey("api-key");
Sinks: Critical outlets
Sinks are sensitive operations where tainted data could cause problems. These are like critical water outlets (drinking fountains, medical equipment) where you absolutely don't want dyed water to flow unless it's been purified first.
Common security-critical sinks include.
// SQL injection sink
statement.executeQuery("SELECT * FROM users WHERE id = " + userId);
// Command injection sink
Runtime.getRuntime().exec("ping " + hostname);
// XSS sink
response.getWriter().write("<div>" + userContent + "</div>");
// Path traversal sink
new File("/uploads/" + filename).delete();
// LDAP injection sink
context.search("cn=" + username, null);
// Log injection sink
logger.info("User logged in: " + username);
Each type of sink has specific dangers. SQL sinks can lead to data breaches. Command execution sinks can compromise the entire server. XSS sinks can attack other users.
Sanitizers: Purification systems
Sanitizers are operations that neutralize tainted data, making it safe for use in sinks. They're like water purification systems that remove the dye or any harmful properties before the water reaches critical outlets.
Effective sanitizers are specific to the type of sink.
// SQL sanitization: parameterized queries
PreparedStatement pstmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?");
pstmt.setString(1, userId); // Safe - parameter binding prevents injection
// XSS sanitization: HTML encoding
String safe = HtmlUtils.htmlEscape(userInput); // Converts < to < etc.
response.getWriter().write("<div>" + safe + "</div>"); // Now safe
// Command sanitization: validation
if (hostname.matches("^[a-zA-Z0-9.-]+$")) { // Only safe characters
Runtime.getRuntime().exec("ping " + hostname);
}
// Path sanitization: canonicalization and validation
File file = new File(baseDir, filename).getCanonicalFile();
if (file.getPath().startsWith(baseDir.getPath())) { // Ensure within base directory
file.delete();
}
A common mistake is using the wrong sanitizer for the sink. HTML encoding prevents XSS but won't stop SQL injection. URL encoding prevents some attacks but not others. Always match the sanitizer to the specific sink.
How taint analysis works
Taint analysis extends data flow analysis with the concept of "taintedness" - a property that flows with the data.
Taint propagation
When tainted data flows through operations, the taint usually spreads to the results.
String name = request.getParameter("name"); // name is tainted (from source)
String upper = name.toUpperCase(); // upper is tainted (propagated)
String message = "Hello, " + upper; // message is tainted (concatenation)
StringBuilder sb = new StringBuilder(message); // sb is tainted (construction)
String final = sb.toString(); // final is tainted (conversion)
The analysis must understand how different operations propagate taint. String concatenation propagates taint from any operand. Method calls might propagate taint depending on their semantics. Array and collection operations need special handling.
Field sensitivity
OpenRewrite's taint analysis is field-sensitive by default, meaning it tracks taint separately for different fields of objects.
User user1 = new User();
user1.name = request.getParameter("name"); // user1.name is tainted
user1.id = generateId(); // user1.id is NOT tainted
User user2 = new User();
user2.name = "Admin"; // user2.name is NOT tainted
user2.id = request.getParameter("id"); // user2.id is tainted
// Analysis knows exactly which fields are tainted
This precision is crucial for reducing false positives. Without field sensitivity, any tainted field would contaminate the entire object, leading to numerous spurious warnings.
Inter-procedural analysis
Real vulnerabilities often span multiple methods. Taint analysis must track data as it flows through method calls.
public void handleRequest(HttpServletRequest request) {
String input = request.getParameter("search"); // Source
String processed = processQuery(input); // Flows through method
executeSearch(processed); // Eventually reaches sink
}
private String processQuery(String query) {
return query.trim().toLowerCase(); // Taint propagates
}
private void executeSearch(String query) {
String sql = "SELECT * FROM products WHERE name LIKE '%" + query + "%'";
statement.executeQuery(sql); // Sink: SQL injection!
}
The analysis must connect the source in handleRequest
to the sink in executeSearch
, even though they're in different methods.
Common vulnerability patterns
Understanding common patterns helps you recognize vulnerabilities in code reviews and write better taint specifications.
SQL injection
The classic injection vulnerability occurs when untrusted data is concatenated into SQL queries.
// Vulnerable
String query = "SELECT * FROM users WHERE username = '" + username +
"' AND password = '" + password + "'";
ResultSet rs = statement.executeQuery(query);
// Attack: username = "admin' --"
// Results in: SELECT * FROM users WHERE username = 'admin' --' AND password = '...'
// The -- comments out the password check!
// Safe: Parameterized query
PreparedStatement pstmt = conn.prepareStatement(
"SELECT * FROM users WHERE username = ? AND password = ?");
pstmt.setString(1, username);
pstmt.setString(2, password);
ResultSet rs = pstmt.executeQuery();
Cross-site scripting (XSS)
XSS occurs when untrusted data is included in HTML without proper encoding.
// Vulnerable: Reflected XSS
String search = request.getParameter("q");
response.getWriter().write("You searched for: " + search);
// Attack: q = <script>steal(document.cookie)</script>
// Safe: HTML encode
String search = request.getParameter("q");
String safe = HtmlUtils.htmlEscape(search);
response.getWriter().write("You searched for: " + safe);
// Also vulnerable: Stored XSS
String comment = request.getParameter("comment");
database.save(comment); // Stored without sanitization
// ... later ...
String saved = database.load();
response.getWriter().write(saved); // XSS when displayed
Command injection
When untrusted data is used in system commands.
// Vulnerable
String fileName = request.getParameter("file");
Process p = Runtime.getRuntime().exec("cat /logs/" + fileName);
// Attack: file = "innocent.log; rm -rf /"
// Safe: Validate or use APIs that don't invoke shells
if (fileName.matches("^[a-zA-Z0-9_-]+\\.log$")) {
Process p = Runtime.getRuntime().exec(new String[] {"cat", "/logs/" + fileName});
}
// Better: Use Java APIs instead of shell commands
String content = Files.readString(Paths.get("/logs", fileName));
Path traversal
When untrusted data is used in file paths.
// Vulnerable
String fileName = request.getParameter("file");
File file = new File("/uploads/" + fileName);
file.delete();
// Attack: file = "../../etc/passwd"
// Safe: Canonicalize and validate
File file = new File("/uploads", fileName).getCanonicalFile();
if (!file.getPath().startsWith("/uploads/")) {
throw new SecurityException("Path traversal attempt");
}
file.delete();
Even with taint analysis, follow defense-in-depth principles:
- Validate input at trust boundaries
- Use safe APIs that prevent injection by design
- Apply appropriate encoding/escaping for the context
- Use least privilege principles
- Monitor for suspicious patterns
Writing effective taint specifications
To use taint analysis effectively, you need to specify what constitutes sources, sinks, and sanitizers for your application.
Identifying sources
Look for:
- External input points (web parameters, files, network)
- Data from untrusted systems (external APIs, user databases)
- Sensitive data that shouldn't leak (passwords, keys, PII)
Identifying sinks
Consider:
- Operations that interpret strings as code (SQL, OS commands, scripts)
- Output operations that could leak data (HTTP responses, logs, files)
- Security-sensitive operations (authentication, authorization)
Identifying sanitizers
Recognize:
- Validation that ensures data matches safe patterns
- Encoding functions that neutralize special characters
- APIs that separate data from code (prepared statements, template engines)
Next steps
Ready to put taint analysis to work? Explore these topics:
- Comprehensive Guide to Taint Analysis - Deep dive into implementation and advanced patterns
- Security Analysis - Use pre-built analyses for common vulnerabilities
Begin with one vulnerability type (like SQL injection) in a small codebase. Once you understand how taint flows through your application, expand to other vulnerability types and larger scopes.
Further reading
For more on secure coding and vulnerability patterns:
- OWASP Top 10 - Most critical web application security risks
- CWE/SANS Top 25 - Most dangerous software weaknesses
- CERT Secure Coding Standards - Language-specific security guidelines