Method summary analysis

Method summaries are compact representations of what a method does, enabling efficient inter-procedural analysis without repeatedly analyzing method bodies. Think of them as "nutrition labels" for methods – they tell you what goes in, what comes out, and what effects occur, without needing to examine all the ingredients.

In this guide, we will walk you through the concepts behind method summaries, show you how to design different summary representations for various analysis needs, and demonstrate practical implementation techniques. We will cover everything from basic flow summaries to advanced compositional approaches, performance optimization strategies, and integration with OpenRewrite's analysis framework.

Understanding method summaries

Before diving into implementation details, let's take a moment to talk through what method summaries are and why they're useful for scalable program analysis.

A method summary captures the essential behavior of a method relevant to your analysis. For taint analysis, this might look like:

// Original method
public String processUser(String name, int id, boolean validate) {
    if (validate) {
        name = sanitize(name);
    }
    logAccess(id);
    return "User: " + name;
}

// Summary:
// * Parameter 0 (name) -> flows to return (conditionally sanitized)
// * Parameter 1 (id) -> flows to log (potential information leak)
// * Parameter 2 (validate) -> affects whether sanitization occurs
// * Side effect: writes to log

Designing summary representations

Now that we have a high-level idea of what a method summary is, let's explore how to design data structures that efficiently represent this information.

The structure of your summaries depends on your analysis needs.

Basic flow summary

For simple taint tracking:

public class BasicFlowSummary {
    // Which parameters flow to the return value
    private final Set<Integer> paramsToReturn;
    
    // Which parameters flow to which fields
    private final Map<Integer, Set<FieldRef>> paramsToFields;
    
    // Whether the return value is always sanitized
    private final boolean returnSanitized;
    
    public boolean parameterFlowsToReturn(int paramIndex) {
        return paramsToReturn.contains(paramIndex);
    }
}

Conditional flow summary

For more precision with path-sensitive information:

public class ConditionalFlowSummary {
    // Flows that always happen
    private final FlowSet unconditionalFlows;
    
    // Flows that depend on conditions
    private final Map<Condition, FlowSet> conditionalFlows;
    
    public static class Condition {
        enum Type { PARAM_NULL, PARAM_EQUALS, FIELD_CHECK }
        private final Type type;
        private final int paramIndex;
        private final Object value;
    }
    
    public FlowSet getFlows(CallContext context) {
        FlowSet flows = new FlowSet(unconditionalFlows);
        
        // Add flows whose conditions are met
        for (Map.Entry<Condition, FlowSet> entry : conditionalFlows.entrySet()) {
            if (evaluateCondition(entry.getKey(), context)) {
                flows.merge(entry.getValue());
            }
        }
        
        return flows;
    }
}

Access path summary

For field-sensitive analysis:

public class AccessPathSummary {
    // Tracks precise paths like param0.field1.field2
    private final Set<AccessPath> accessPaths;
    
    public static class AccessPath {
        private final Source source;
        private final List<String> fields;
        private final Destination destination;
        
        enum Source { PARAM_0, PARAM_1, THIS, STATIC_FIELD }
        enum Destination { RETURN, FIELD, SINK }
    }
    
    // Example: parameter 0's 'name' field flows to return
    // new AccessPath(PARAM_0, ["name"], RETURN)
}

Computing summaries

Next, let's explore the algorithms and strategies for actually computing these summaries from method bodies.

Bottom-up analysis

The standard approach computes summaries starting from leaf methods:

public class SummaryComputer {
    private final Map<MethodId, MethodSummary> computed = new HashMap<>();
    
    public void computeAllSummaries(CallGraph callGraph) {
        // Process in reverse topological order
        List<MethodId> order = callGraph.reverseTopologicalSort();
        
        for (MethodId method : order) {
            if (!computed.containsKey(method)) {
                computeSummary(method, callGraph);
            }
        }
    }
    
    private MethodSummary computeSummary(MethodId method, CallGraph callGraph) {
        // Get methods this one calls
        Set<MethodId> callees = callGraph.getCallees(method);
        
        // Ensure callees are computed first
        Map<MethodId, MethodSummary> calleeSummaries = new HashMap<>();
        for (MethodId callee : callees) {
            calleeSummaries.put(callee, 
                computed.computeIfAbsent(callee, 
                    m -> computeSummary(m, callGraph)));
        }
        
        // Now compute this method's summary using callee summaries
        return computeWithDependencies(method, calleeSummaries);
    }
}

Incremental summary computation

For large codebases, compute summaries incrementally:

public class IncrementalSummaryComputer {
    private final DependencyTracker deps = new DependencyTracker();
    
    public void methodChanged(MethodId changed) {
        // Invalidate the changed method's summary
        invalidateSummary(changed);
        
        // Invalidate methods that depend on it
        Set<MethodId> affected = deps.getDependents(changed);
        for (MethodId method : affected) {
            invalidateSummary(method);
        }
        
        // Recompute in dependency order
        recomputeInvalidated();
    }
    
    private void trackDependency(MethodId caller, MethodId callee) {
        deps.addDependency(caller, callee);
    }
}

Advanced summary features

Once you have the basic summary computation working, you can enhance your summaries with more sophisticated features to capture complex behaviors.

Heap abstractions

Track modifications to heap objects:

public class HeapSummary {
    // Which fields of which parameters are modified
    private final Map<ParamRef, Set<FieldRef>> fieldWrites;
    
    // Abstract objects created and returned
    private final Set<AbstractObject> allocations;
    
    public boolean modifiesField(int param, String field) {
        ParamRef ref = new ParamRef(param);
        Set<FieldRef> writes = fieldWrites.get(ref);
        return writes != null && writes.contains(new FieldRef(field));
    }
}

Effect summaries

Capture side effects beyond data flow:

public class EffectSummary {
    // I/O effects
    private final Set<IOEffect> ioEffects;
    
    // Synchronization effects
    private final Set<LockEffect> lockEffects;
    
    // Exception effects
    private final Set<ExceptionEffect> exceptions;
    
    public static class IOEffect {
        enum Type { FILE_READ, FILE_WRITE, NETWORK, DATABASE }
        private final Type type;
        private final AccessPath target;
    }
}

Compositional summaries

Build complex summaries from simpler ones:

public class CompositionalSummary {
    // Compose summaries for common patterns
    
    public static MethodSummary composeSequence(
            MethodSummary first, MethodSummary second) {
        // If method A calls B, compose their effects
        return new MethodSummary() {
            @Override
            public FlowSet getFlows() {
                FlowSet flows = first.getFlows();
                // Apply second's transformation to first's outputs
                return second.transformFlows(flows);
            }
        };
    }
    
    public static MethodSummary composeConditional(
            Condition cond, MethodSummary trueSummary, MethodSummary falseSummary) {
        // Conditional composition
        return new ConditionalMethodSummary(cond, trueSummary, falseSummary);
    }
}

Using summaries effectively

Summary application

Apply summaries at call sites:

public class SummaryApplication {
    public Set<TaintedValue> applyMethodSummary(
            J.MethodInvocation call,
            MethodSummary summary,
            AnalysisState state) {
        
        // Map arguments to parameters
        List<Expression> args = call.getArguments();
        Map<Integer, Set<TaintedValue>> argTaints = new HashMap<>();
        
        for (int i = 0; i < args.size(); i++) {
            argTaints.put(i, state.getTaintsFor(args.get(i)));
        }
        
        // Apply summary transformations
        Set<TaintedValue> result = new HashSet<>();
        
        // Check parameter-to-return flows
        for (int param : summary.getParamsFlowingToReturn()) {
            if (!argTaints.getOrDefault(param, Set.of()).isEmpty()) {
                result.add(new TaintedValue(call, 
                    summary.getReturnTaintType(param)));
            }
        }
        
        // Apply side effects
        applySideEffects(summary, state, argTaints);
        
        return result;
    }
}

Summary precision policies

Different analyses need different precision levels:

public interface SummaryPrecisionPolicy {
    // Decide how precise summaries should be
    
    boolean useContextSensitivity(MethodId method);
    boolean usePathSensitivity(MethodId method);
    boolean trackHeapEffects(MethodId method);
    int getMaxSummarySize(MethodId method);
}

public class SecurityFocusedPolicy implements SummaryPrecisionPolicy {
    @Override
    public boolean usePathSensitivity(MethodId method) {
        // High precision for security-critical methods
        return method.hasSecurityAnnotation() ||
               method.isInSecurityPackage();
    }
}

Practical example: taint summary

To bring all these concepts together, let's take a look at a complete example of implementing taint summaries for security analysis:

public class TaintSummaryExample {
    
    public static class TaintSummary {
        private final BitSet paramsToReturn;
        private final Map<Integer, BitSet> paramsToParams;
        private final boolean sanitizesReturn;
        private final Set<Integer> paramsSentToSink;
        
        public static TaintSummary compute(J.MethodDeclaration method) {
            ControlFlowGraph cfg = ControlFlowGraphs.build(method);
            
            BitSet paramsToReturn = new BitSet();
            Map<Integer, BitSet> paramsToParams = new HashMap<>();
            boolean sanitizesReturn = false;
            Set<Integer> paramsSentToSink = new HashSet<>();
            
            // Analyze each parameter
            for (int i = 0; i < method.getParameters().size(); i++) {
                ParameterTaintAnalysis analysis = 
                    new ParameterTaintAnalysis(cfg, i);
                ParameterTaintResult result = analysis.analyze();
                
                if (result.flowsToReturn()) {
                    paramsToReturn.set(i);
                }
                
                if (result.isSanitized()) {
                    sanitizesReturn = true;
                }
                
                paramsSentToSink.addAll(result.getSinkParams());
            }
            
            return new TaintSummary(
                paramsToReturn, paramsToParams, 
                sanitizesReturn, paramsSentToSink);
        }
    }
    
    // Using the summary
    public void analyzeCall(J.MethodInvocation call, TaintSummary summary) {
        List<Expression> args = call.getArguments();
        
        // Check each argument
        for (int i = 0; i < args.size(); i++) {
            if (isTainted(args.get(i))) {
                // Does this tainted arg flow to return?
                if (summary.paramsToReturn.get(i)) {
                    if (!summary.sanitizesReturn) {
                        markTainted(call);
                    }
                }
                
                // Does it flow to a sink?
                if (summary.paramsSentToSink.contains(i)) {
                    reportVulnerability(call, "Tainted data flows to sink");
                }
            }
        }
    }
}

Performance considerations

As your analysis scales to larger codebases, performance becomes critical. Here are key strategies for keeping summaries efficient without sacrificing precision.

Summary size management

Keep summaries compact:

public class CompactSummary {
    // Use bit vectors for parameter sets
    private final BitSet parameterMask;
    
    // Compress common patterns
    private static final int PATTERN_ALL_TO_RETURN = 1;
    private static final int PATTERN_NONE_TO_RETURN = 2;
    private static final int PATTERN_IDENTITY = 3;
    
    private final int pattern;
    
    // Only store exceptions to patterns
    private final Map<Integer, Integer> exceptions;
}

Caching strategies

Ensure that you use caches effectively:

public class SummaryCache {
    // Two-level cache: memory and disk
    private final Map<MethodId, MethodSummary> memoryCache;
    private final DiskCache diskCache;
    
    // Version tracking for invalidation
    private final Map<MethodId, Long> versions;
    
    public MethodSummary get(MethodId method) {
        // Check memory cache
        MethodSummary summary = memoryCache.get(method);
        if (summary != null) {
            return summary;
        }
        
        // Check disk cache
        summary = diskCache.load(method);
        if (summary != null && isValid(method, summary)) {
            memoryCache.put(method, summary);
            return summary;
        }
        
        // Compute and cache
        summary = computeSummary(method);
        cache(method, summary);
        return summary;
    }
}

Best practices

Design for your analysis

Don't over-engineer summaries. Include only information relevant to your specific analysis.

Balance precision and size

More precise summaries are larger and slower to compute. Find the right trade-off for your use case.

Validate summaries

Test that summaries accurately represent method behavior:

@Test
void validateSummary() {
    MethodSummary summary = computeSummary(method);
    
    // Run full analysis
    FullAnalysisResult full = runFullAnalysis(method);
    
    // Compare results
    assertEquals(full.getAbstraction(), summary.getAbstraction());
}

Document summary format

Clearly document what your summaries represent and how to interpret them.

Understanding method summaries​

Designing summary representations​

Basic flow summary​

Conditional flow summary​

Access path summary​

Computing summaries​

Bottom-up analysis​

Incremental summary computation​

Advanced summary features​

Heap abstractions​

Effect summaries​

Compositional summaries​

Using summaries effectively​

Summary application​

Summary precision policies​

Practical example: taint summary​

Performance considerations​

Summary size management​

Caching strategies​

Best practices​

Design for your analysis​

Balance precision and size​

Validate summaries​

Document summary format​