Advanced taint analysis guide

This guide provides an in-depth exploration of implementing taint analysis for security vulnerability detection in complex Java applications. Building on the foundational concepts covered in the introduction, we'll dive into advanced implementation details, sophisticated analysis techniques, and real-world challenges you'll encounter when deploying taint analysis at scale.

Prerequisites

This guide assumes you've read the Introduction to Taint Analysis which covers the core concepts of sources, sinks, and sanitizers. This guide focuses on implementation details and advanced patterns.

How taint analysis works

Think of taint analysis like tracking a drop of food coloring through water. When untrusted data (the food coloring) enters your program, taint analysis follows it as it flows through variables, method calls, and operations. It does this by marking data as tained or clean. Here's how this looks in practice:

// Taint flows through assignments
String input = request.getParameter("id");     // input is TAINTED
String copy = input;                           // copy is TAINTED

// Taint flows through operations
String concat = "ID: " + input;                // concat is TAINTED
String upper = input.toUpperCase();            // upper is TAINTED

// Taint is removed by sanitizers
String safe = sanitize(input);                 // safe is CLEAN
String stillTainted = input;                   // input remains TAINTED

Implementation in OpenRewrite

OpenRewrite provides a sophisticated field-sensitive taint analysis that you can use:

public class TaintAnalysis extends ForwardDataFlowAnalysis<TaintedValue, TaintFlows> {
    private final TaintFlowSpec spec;
    
    public TaintAnalysis(ControlFlowGraph cfg, TaintFlowSpec spec) {
        super(cfg);
        this.spec = spec;
    }
    
    @Override
    protected Set<TaintedValue> transfer(BasicBlock block, Set<TaintedValue> inputTaints) {
        Set<TaintedValue> result = new HashSet<>(inputTaints);
        
        for (Tree stmt : block.getStatements()) {
            if (stmt instanceof J.MethodInvocation) {
                J.MethodInvocation call = (J.MethodInvocation) stmt;
                
                // Check if it's a source
                if (spec.isSource(call)) {
                    result.add(new TaintedValue(call, spec.getSourceType(call)));
                }
                
                // Check if it's a sanitizer
                if (spec.isSanitizer(call)) {
                    removeSanitizedTaints(result, call);
                }
                
                // Propagate through method calls
                propagateThroughCall(result, call);
            }
            
            // Handle assignments
            if (stmt instanceof J.Assignment) {
                propagateThroughAssignment(result, (J.Assignment) stmt);
            }
        }
        
        return result;
    }
    
    @Override
    protected TaintFlows createResult(Map<BasicBlock, Set<TaintedValue>> analysisResult) {
        // Check for flows to sinks
        Set<TaintFlow> flows = new HashSet<>();
        
        for (Map.Entry<BasicBlock, Set<TaintedValue>> entry : analysisResult.entrySet()) {
            BasicBlock block = entry.getKey();
            Set<TaintedValue> taints = entry.getValue();
            
            for (Tree stmt : block.getStatements()) {
                if (spec.isSink(stmt)) {
                    checkSink(stmt, taints, flows);
                }
            }
        }
        
        return new TaintFlows(cfg, analysisResult, flows);
    }
}

Defining taint specifications

Basic taint specification

To start tracking security vulnerabilities, you need to tell the analyzer what to look for. Think of this as teaching it to recognize the "bad guys" (sources), the "victims" (sinks), and the "security guards" (sanitizers). Here's a simple example that looks for common web vulnerabilities:

public class WebSecuritySpec implements TaintFlowSpec {
    @Override
    public boolean isSource(J.MethodInvocation method) {
        String methodName = method.getSimpleName();
        JavaType.Method methodType = method.getMethodType();
        
        // HTTP parameters are tainted
        if ("getParameter".equals(methodName) && 
            isType(methodType.getDeclaringType(), "javax.servlet.http.HttpServletRequest")) {
            return true;
        }
        
        // Headers are tainted
        if ("getHeader".equals(methodName) && 
            isType(methodType.getDeclaringType(), "javax.servlet.http.HttpServletRequest")) {
            return true;
        }
        
        return false;
    }
    
    @Override
    public boolean isSink(Tree tree) {
        if (tree instanceof J.MethodInvocation) {
            J.MethodInvocation method = (J.MethodInvocation) tree;
            String methodName = method.getSimpleName();
            
            // SQL execution methods
            if (("execute".equals(methodName) || "executeQuery".equals(methodName)) &&
                isType(method.getSelect(), "java.sql.Statement")) {
                return true;
            }
            
            // Process execution
            if ("exec".equals(methodName) && 
                isType(method.getSelect(), "java.lang.Runtime")) {
                return true;
            }
        }
        
        return false;
    }
    
    @Override
    public boolean isSanitizer(J.MethodInvocation method) {
        String methodName = method.getSimpleName();
        
        // Common sanitization methods
        return "escapeHtml".equals(methodName) ||
               "escapeSql".equals(methodName) ||
               "encodeForURL".equals(methodName);
    }
}

Advanced specifications with context

Sometimes you need more nuance – not all tainted data is equally dangerous for all sinks. For example, user input might be dangerous for SQL queries but harmless for logging. Here's how to create smarter specifications that understand context:

public class ContextAwareTaintSpec implements TaintFlowSpec {
    
    @Override
    public TaintType getSourceType(J.MethodInvocation source) {
        String methodName = source.getSimpleName();
        
        if ("getParameter".equals(methodName)) {
            return TaintType.USER_INPUT;
        } else if ("readFile".equals(methodName)) {
            return TaintType.FILE_CONTENT;
        } else if ("getenv".equals(methodName)) {
            return TaintType.ENVIRONMENT;
        }
        
        return TaintType.UNKNOWN;
    }
    
    @Override
    public boolean isSinkFor(Tree sink, TaintType taintType) {
        if (sink instanceof J.MethodInvocation) {
            J.MethodInvocation method = (J.MethodInvocation) sink;
            
            // SQL sinks only care about user input
            if (isSqlSink(method)) {
                return taintType == TaintType.USER_INPUT;
            }
            
            // File sinks care about path traversal
            if (isFileSink(method)) {
                return taintType == TaintType.USER_INPUT || 
                       taintType == TaintType.ENVIRONMENT;
            }
        }
        
        return false;
    }
}

Field-sensitive analysis

OpenRewrite can track taint at the field level. Imagine you have an object with multiple fields – some might contain dangerous data while others are perfectly safe. The analyzer keeps track of each field separately:

class UserData {
    String name;      // Might be tainted
    String id;        // Might be clean
}

// Analysis tracks fields separately
userData.name = request.getParameter("name");   // userData.name is TAINTED
userData.id = generateId();                     // userData.id is CLEAN

// Precise tracking through field access
String userName = userData.name;                // userName is TAINTED
String userId = userData.id;                    // userId is CLEAN

Working with TaintFlows results

The TaintFlows result type provides comprehensive querying:

Finding vulnerabilities

Once your analysis runs, you'll want to examine what it found. The TaintFlows object is like a detailed security report that tells you exactly where problems exist. Here's how to work with these results:

TaintAnalysis analysis = new TaintAnalysis(cfg, new WebSecuritySpec());
TaintFlows taintFlows = analysis.analyze();

// Get all security vulnerabilities
List<TaintFlow> vulnerabilities = taintFlows.getFlows();

for (TaintFlow flow : vulnerabilities) {
    Tree source = flow.getSource();
    Tree sink = flow.getSink();
    List<Tree> path = flow.getPath();
    
    System.out.println("Vulnerability found:");
    System.out.println("  Source: " + source);
    System.out.println("  Sink: " + sink);
    System.out.println("  Path length: " + path.size());
}

Analyzing specific variables

Sometimes you need to zoom in on specific parts of your code. Maybe you're curious about whether a particular variable is tainted, or you want to see all the dangerous data at a specific point in your program:

// Check if a specific variable is tainted
boolean isTainted = taintFlows.isTainted("userInput", statement);

// Get taint information for an expression
Set<TaintedValue> taints = taintFlows.getTaints(expression);

// Find all tainted variables at a program point
Set<String> taintedVars = taintFlows.getTaintedVariables(statement);

Path queries

When you find a vulnerability, you'll often want to understand exactly how the tainted data traveled from source to sink. These methods help you trace the complete journey:

// Get the taint propagation path
List<Tree> propagationPath = taintFlows.getTaintPath(source, sink);

// Find all paths from sources to a specific sink
List<TaintPath> allPaths = taintFlows.findAllPaths(sink);

// Get intermediate propagation points
Set<Tree> propagationPoints = taintFlows.getPropagationPoints(taintedValue);

Creating security recipes

SQL injection detection

Let's put everything together to create a practical recipe that finds SQL injection vulnerabilities. This recipe will scan your code, identify where user input flows into SQL queries, and flag potential security issues:

public class FindSqlInjection extends Recipe {
    @Override
    public String getDisplayName() {
        return "Find SQL injection vulnerabilities";
    }
    
    @Override
    public TreeVisitor<?, ExecutionContext> getVisitor() {
        return new JavaIsoVisitor<ExecutionContext>() {
            @Override
            public J.MethodDeclaration visitMethodDeclaration(
                    J.MethodDeclaration method, ExecutionContext ctx) {
                
                // Analyze for SQL injection
                Map<Tree, String> vulnerabilities = new HashMap<>();
                
                J.MethodDeclaration analyzed = ControlFlowSupport.analyze(getCursor(), method,
                    (cursor, cfg) -> {
                        TaintAnalysis analysis = new TaintAnalysis(cfg, new SqlInjectionSpec());
                        TaintFlows flows = analysis.analyze();
                        
                        for (TaintFlow flow : flows.getFlows()) {
                            Tree sink = flow.getSink();
                            String message = String.format(
                                "SQL injection vulnerability: untrusted data flows to SQL query. " +
                                "Source: %s",
                                flow.getSource().toString()
                            );
                            vulnerabilities.put(sink, message);
                        }
                        
                        return method;
                    });
                
                // Mark vulnerabilities if found
                if (!vulnerabilities.isEmpty()) {
                    return (J.MethodDeclaration) new JavaIsoVisitor<ExecutionContext>() {
                        @Override
                        public J visit(@Nullable Tree tree, ExecutionContext ctx) {
                            J j = (J) super.visit(tree, ctx);
                            if (vulnerabilities.containsKey(tree)) {
                                return SearchResult.found(j, vulnerabilities.get(tree));
                            }
                            return j;
                        }
                    }.visit(analyzed, ctx);
                }
                
                return analyzed;
            }
        };
    }
}

Cross-site scripting detection

XSS vulnerabilities happen when untrusted data gets written to HTML without proper escaping. Here's a recipe that catches these issues before they reach production:

public class FindXss extends Recipe {
    @Override
    public TreeVisitor<?, ExecutionContext> getVisitor() {
        return new JavaIsoVisitor<ExecutionContext>() {
            @Override
            public J.MethodInvocation visitMethodInvocation(
                    J.MethodInvocation method, ExecutionContext ctx) {
                
                // Check if this is an XSS sink
                if (isHtmlOutput(method)) {
                    Boolean hasXss = ControlFlowSupport.analyze(getCursor(), false,
                        (cursor, cfg) -> {
                            TaintAnalysis analysis = new TaintAnalysis(cfg, new XssSpec());
                            TaintFlows flows = analysis.analyze();
                            
                            // Check if any argument is tainted
                            for (Expression arg : method.getArguments()) {
                                if (flows.isTainted(arg)) {
                                    return true;
                                }
                            }
                            return false;
                        });
                    
                    if (Boolean.TRUE.equals(hasXss)) {
                        return SearchResult.found(method, 
                            "XSS vulnerability: untrusted data in HTML output");
                    }
                }
                
                return method;
            }
        };
    }
}

Advanced taint analysis patterns

Context-sensitive analysis

When analyzing large applications, tracking taint through method calls can get tricky. Context-sensitive analysis remembers the call chain, so it knows the difference between processUserInput() called from a secure context versus an insecure one:

public class ContextSensitiveTaint extends TaintAnalysis {
    private final CallStringContext context = new CallStringContext(3); // k=3
    
    @Override
    protected void propagateThroughCall(Set<TaintedValue> taints, J.MethodInvocation call) {
        // Update context
        CallString newContext = context.push(call);
        
        // Get method summary for this context
        MethodSummary summary = getSummaryWithContext(call.getMethodType(), newContext);
        
        // Apply summary with context
        Set<TaintedValue> result = summary.apply(taints, newContext);
        taints.addAll(result);
    }
}

Implicit flow tracking

Here's where things get really interesting! Sometimes data becomes tainted not through direct assignment, but through control flow. If a secret value influences a branch condition, anything assigned in that branch might leak information about the secret:

public class ImplicitFlowTaint extends TaintAnalysis {
    @Override
    protected Set<TaintedValue> transfer(BasicBlock block, Set<TaintedValue> input) {
        Set<TaintedValue> result = super.transfer(block, input);
        
        // Check if block is control-dependent on tainted condition
        for (BasicBlock pred : cfg.getPredecessors(block)) {
            Tree branch = getBranchCondition(pred, block);
            if (branch != null && isTainted(branch, input)) {
                // All assignments in this block are implicitly tainted
                for (Tree stmt : block.getStatements()) {
                    if (stmt instanceof J.Assignment) {
                        J.Assignment assign = (J.Assignment) stmt;
                        result.add(new TaintedValue(assign.getVariable(), 
                            TaintType.IMPLICIT));
                    }
                }
            }
        }
        
        return result;
    }
}

Taint sanitization validation

It's not enough to just call a sanitizer – you need to use it correctly! This validator checks common mistakes like ignoring the sanitizer's return value or using the wrong type of sanitizer for a particular vulnerability:

public class SanitizationValidator {
    public void validateSanitization(TaintFlows flows) {
        for (J.MethodInvocation sanitizer : findSanitizers()) {
            // Check if sanitizer result is actually used
            if (!isSanitizerResultUsed(sanitizer)) {
                reportIssue("Sanitizer result ignored - original tainted value may still be used");
            }
            
            // Check if correct sanitizer for sink type
            Set<Tree> reachableSinks = findReachableSinks(sanitizer);
            for (Tree sink : reachableSinks) {
                if (!isCorrectSanitizer(sanitizer, sink)) {
                    reportIssue("Wrong sanitizer type for sink");
                }
            }
        }
    }
}

Performance optimization

Demand-driven analysis

For large codebases, analyzing every possible path can be slow. Demand-driven analysis is like having a smart assistant that only looks at the code paths that could actually contain vulnerabilities:

public class DemandDrivenTaint {
    public boolean canReachSink(Tree source, Tree sink) {
        // Work backward from sink
        Set<Tree> reachableFromSink = computeBackwardSlice(sink);
        
        if (!reachableFromSink.contains(source)) {
            return false; // Early termination
        }
        
        // Only analyze relevant portion
        return analyzeSubgraph(source, sink, reachableFromSink);
    }
}

Summary-based analysis

Libraries like Apache Commons or Spring have predictable behavior. Instead of analyzing them every time, we can use pre-computed "summaries" that describe how they handle tainted data. It's like having a cheat sheet for common library methods:

public class LibraryModeledTaint extends TaintAnalysis {
    private final Map<String, LibraryModel> models = loadLibraryModels();
    
    @Override
    protected void propagateThroughCall(Set<TaintedValue> taints, J.MethodInvocation call) {
        String signature = getMethodSignature(call);
        LibraryModel model = models.get(signature);
        
        if (model != null) {
            // Use pre-computed model
            applyLibraryModel(model, call, taints);
        } else {
            // Fall back to interprocedural analysis
            super.propagateThroughCall(taints, call);
        }
    }
}

Testing taint analysis

Testing your taint analysis is crucial – you want to make sure it catches real vulnerabilities without crying wolf too often. Here's how to write comprehensive tests that verify your analysis works correctly:

@Test
void detectsSimpleSqlInjection() {
    rewriteRun(
        java("""
            class Test {
                void unsafe(HttpServletRequest req, Statement stmt) {
                    String id = req.getParameter("id");
                    ~~>stmt.execute("SELECT * FROM users WHERE id = " + id);
                }
            }
            """)
    );
}

@Test
void respectsSanitization() {
    rewriteRun(
        java("""
            class Test {
                void safe(HttpServletRequest req, Statement stmt) {
                    String id = req.getParameter("id");
                    String safe = escapeSQL(id);
                    stmt.execute("SELECT * FROM users WHERE id = " + safe); // No issue
                }
            }
            """)
    );
}

@Test
void tracksFieldTaint() {
    rewriteRun(
        java("""
            class Test {
                String userData;
                
                void setData(HttpServletRequest req) {
                    this.userData = req.getParameter("data");
                }
                
                void useData(Statement stmt) {
                    ~~>stmt.execute("INSERT INTO log VALUES ('" + this.userData + "')");
                }
            }
            """)
    );
}

Best practices

Minimize false positives

Model sanitizers accurately
Consider validation patterns
Understand framework behavior
Track taint types precisely

Maximize coverage

Include all relevant sources
Model library propagation
Handle reflection when possible
Consider indirect flows

Performance considerations

Use demand-driven analysis for large codebases
Cache method summaries
Prune irrelevant paths early
Leverage incremental analysis

Next steps

Security Recipes - Pre-built security analysis recipes

Prerequisites​

How taint analysis works​

Implementation in OpenRewrite​

Defining taint specifications​

Basic taint specification​

Advanced specifications with context​

Field-sensitive analysis​

Working with TaintFlows results​

Finding vulnerabilities​

Analyzing specific variables​

Path queries​

Creating security recipes​

SQL injection detection​

Cross-site scripting detection​

Advanced taint analysis patterns​

Context-sensitive analysis​

Implicit flow tracking​

Taint sanitization validation​

Performance optimization​

Demand-driven analysis​

Summary-based analysis​

Testing taint analysis​

Best practices​

Minimize false positives​

Maximize coverage​

Performance considerations​

Next steps​