Method summary analysis
Method summaries are compact representations of what a method does, enabling efficient inter-procedural analysis without repeatedly analyzing method bodies. Think of them as "nutrition labels" for methods – they tell you what goes in, what comes out, and what effects occur, without needing to examine all the ingredients.
In this guide, we will walk you through the concepts behind method summaries, show you how to design different summary representations for various analysis needs, and demonstrate practical implementation techniques. We will cover everything from basic flow summaries to advanced compositional approaches, performance optimization strategies, and integration with OpenRewrite's analysis framework.
Understanding method summaries
Before diving into implementation details, let's take a moment to talk through what method summaries are and why they're useful for scalable program analysis.
A method summary captures the essential behavior of a method relevant to your analysis. For taint analysis, this might look like:
// Original method
public String processUser(String name, int id, boolean validate) {
if (validate) {
name = sanitize(name);
}
logAccess(id);
return "User: " + name;
}
// Summary:
// * Parameter 0 (name) -> flows to return (conditionally sanitized)
// * Parameter 1 (id) -> flows to log (potential information leak)
// * Parameter 2 (validate) -> affects whether sanitization occurs
// * Side effect: writes to log
Designing summary representations
Now that we have a high-level idea of what a method summary is, let's explore how to design data structures that efficiently represent this information.
The structure of your summaries depends on your analysis needs.
Basic flow summary
For simple taint tracking:
public class BasicFlowSummary {
// Which parameters flow to the return value
private final Set<Integer> paramsToReturn;
// Which parameters flow to which fields
private final Map<Integer, Set<FieldRef>> paramsToFields;
// Whether the return value is always sanitized
private final boolean returnSanitized;
public boolean parameterFlowsToReturn(int paramIndex) {
return paramsToReturn.contains(paramIndex);
}
}
Conditional flow summary
For more precision with path-sensitive information:
public class ConditionalFlowSummary {
// Flows that always happen
private final FlowSet unconditionalFlows;
// Flows that depend on conditions
private final Map<Condition, FlowSet> conditionalFlows;
public static class Condition {
enum Type { PARAM_NULL, PARAM_EQUALS, FIELD_CHECK }
private final Type type;
private final int paramIndex;
private final Object value;
}
public FlowSet getFlows(CallContext context) {
FlowSet flows = new FlowSet(unconditionalFlows);
// Add flows whose conditions are met
for (Map.Entry<Condition, FlowSet> entry : conditionalFlows.entrySet()) {
if (evaluateCondition(entry.getKey(), context)) {
flows.merge(entry.getValue());
}
}
return flows;
}
}
Access path summary
For field-sensitive analysis:
public class AccessPathSummary {
// Tracks precise paths like param0.field1.field2
private final Set<AccessPath> accessPaths;
public static class AccessPath {
private final Source source;
private final List<String> fields;
private final Destination destination;
enum Source { PARAM_0, PARAM_1, THIS, STATIC_FIELD }
enum Destination { RETURN, FIELD, SINK }
}
// Example: parameter 0's 'name' field flows to return
// new AccessPath(PARAM_0, ["name"], RETURN)
}
Computing summaries
Next, let's explore the algorithms and strategies for actually computing these summaries from method bodies.
Bottom-up analysis
The standard approach computes summaries starting from leaf methods:
public class SummaryComputer {
private final Map<MethodId, MethodSummary> computed = new HashMap<>();
public void computeAllSummaries(CallGraph callGraph) {
// Process in reverse topological order
List<MethodId> order = callGraph.reverseTopologicalSort();
for (MethodId method : order) {
if (!computed.containsKey(method)) {
computeSummary(method, callGraph);
}
}
}
private MethodSummary computeSummary(MethodId method, CallGraph callGraph) {
// Get methods this one calls
Set<MethodId> callees = callGraph.getCallees(method);
// Ensure callees are computed first
Map<MethodId, MethodSummary> calleeSummaries = new HashMap<>();
for (MethodId callee : callees) {
calleeSummaries.put(callee,
computed.computeIfAbsent(callee,
m -> computeSummary(m, callGraph)));
}
// Now compute this method's summary using callee summaries
return computeWithDependencies(method, calleeSummaries);
}
}
Incremental summary computation
For large codebases, compute summaries incrementally:
public class IncrementalSummaryComputer {
private final DependencyTracker deps = new DependencyTracker();
public void methodChanged(MethodId changed) {
// Invalidate the changed method's summary
invalidateSummary(changed);
// Invalidate methods that depend on it
Set<MethodId> affected = deps.getDependents(changed);
for (MethodId method : affected) {
invalidateSummary(method);
}
// Recompute in dependency order
recomputeInvalidated();
}
private void trackDependency(MethodId caller, MethodId callee) {
deps.addDependency(caller, callee);
}
}
Advanced summary features
Once you have the basic summary computation working, you can enhance your summaries with more sophisticated features to capture complex behaviors.
Heap abstractions
Track modifications to heap objects:
public class HeapSummary {
// Which fields of which parameters are modified
private final Map<ParamRef, Set<FieldRef>> fieldWrites;
// Abstract objects created and returned
private final Set<AbstractObject> allocations;
public boolean modifiesField(int param, String field) {
ParamRef ref = new ParamRef(param);
Set<FieldRef> writes = fieldWrites.get(ref);
return writes != null && writes.contains(new FieldRef(field));
}
}
Effect summaries
Capture side effects beyond data flow:
public class EffectSummary {
// I/O effects
private final Set<IOEffect> ioEffects;
// Synchronization effects
private final Set<LockEffect> lockEffects;
// Exception effects
private final Set<ExceptionEffect> exceptions;
public static class IOEffect {
enum Type { FILE_READ, FILE_WRITE, NETWORK, DATABASE }
private final Type type;
private final AccessPath target;
}
}
Compositional summaries
Build complex summaries from simpler ones:
public class CompositionalSummary {
// Compose summaries for common patterns
public static MethodSummary composeSequence(
MethodSummary first, MethodSummary second) {
// If method A calls B, compose their effects
return new MethodSummary() {
@Override
public FlowSet getFlows() {
FlowSet flows = first.getFlows();
// Apply second's transformation to first's outputs
return second.transformFlows(flows);
}
};
}
public static MethodSummary composeConditional(
Condition cond, MethodSummary trueSummary, MethodSummary falseSummary) {
// Conditional composition
return new ConditionalMethodSummary(cond, trueSummary, falseSummary);
}
}
Using summaries effectively
Summary application
Apply summaries at call sites:
public class SummaryApplication {
public Set<TaintedValue> applyMethodSummary(
J.MethodInvocation call,
MethodSummary summary,
AnalysisState state) {
// Map arguments to parameters
List<Expression> args = call.getArguments();
Map<Integer, Set<TaintedValue>> argTaints = new HashMap<>();
for (int i = 0; i < args.size(); i++) {
argTaints.put(i, state.getTaintsFor(args.get(i)));
}
// Apply summary transformations
Set<TaintedValue> result = new HashSet<>();
// Check parameter-to-return flows
for (int param : summary.getParamsFlowingToReturn()) {
if (!argTaints.getOrDefault(param, Set.of()).isEmpty()) {
result.add(new TaintedValue(call,
summary.getReturnTaintType(param)));
}
}
// Apply side effects
applySideEffects(summary, state, argTaints);
return result;
}
}
Summary precision policies
Different analyses need different precision levels:
public interface SummaryPrecisionPolicy {
// Decide how precise summaries should be
boolean useContextSensitivity(MethodId method);
boolean usePathSensitivity(MethodId method);
boolean trackHeapEffects(MethodId method);
int getMaxSummarySize(MethodId method);
}
public class SecurityFocusedPolicy implements SummaryPrecisionPolicy {
@Override
public boolean usePathSensitivity(MethodId method) {
// High precision for security-critical methods
return method.hasSecurityAnnotation() ||
method.isInSecurityPackage();
}
}
Practical example: taint summary
To bring all these concepts together, let's take a look at a complete example of implementing taint summaries for security analysis:
public class TaintSummaryExample {
public static class TaintSummary {
private final BitSet paramsToReturn;
private final Map<Integer, BitSet> paramsToParams;
private final boolean sanitizesReturn;
private final Set<Integer> paramsSentToSink;
public static TaintSummary compute(J.MethodDeclaration method) {
ControlFlowGraph cfg = ControlFlowGraphs.build(method);
BitSet paramsToReturn = new BitSet();
Map<Integer, BitSet> paramsToParams = new HashMap<>();
boolean sanitizesReturn = false;
Set<Integer> paramsSentToSink = new HashSet<>();
// Analyze each parameter
for (int i = 0; i < method.getParameters().size(); i++) {
ParameterTaintAnalysis analysis =
new ParameterTaintAnalysis(cfg, i);
ParameterTaintResult result = analysis.analyze();
if (result.flowsToReturn()) {
paramsToReturn.set(i);
}
if (result.isSanitized()) {
sanitizesReturn = true;
}
paramsSentToSink.addAll(result.getSinkParams());
}
return new TaintSummary(
paramsToReturn, paramsToParams,
sanitizesReturn, paramsSentToSink);
}
}
// Using the summary
public void analyzeCall(J.MethodInvocation call, TaintSummary summary) {
List<Expression> args = call.getArguments();
// Check each argument
for (int i = 0; i < args.size(); i++) {
if (isTainted(args.get(i))) {
// Does this tainted arg flow to return?
if (summary.paramsToReturn.get(i)) {
if (!summary.sanitizesReturn) {
markTainted(call);
}
}
// Does it flow to a sink?
if (summary.paramsSentToSink.contains(i)) {
reportVulnerability(call, "Tainted data flows to sink");
}
}
}
}
}
Performance considerations
As your analysis scales to larger codebases, performance becomes critical. Here are key strategies for keeping summaries efficient without sacrificing precision.
Summary size management
Keep summaries compact:
public class CompactSummary {
// Use bit vectors for parameter sets
private final BitSet parameterMask;
// Compress common patterns
private static final int PATTERN_ALL_TO_RETURN = 1;
private static final int PATTERN_NONE_TO_RETURN = 2;
private static final int PATTERN_IDENTITY = 3;
private final int pattern;
// Only store exceptions to patterns
private final Map<Integer, Integer> exceptions;
}
Caching strategies
Ensure that you use caches effectively:
public class SummaryCache {
// Two-level cache: memory and disk
private final Map<MethodId, MethodSummary> memoryCache;
private final DiskCache diskCache;
// Version tracking for invalidation
private final Map<MethodId, Long> versions;
public MethodSummary get(MethodId method) {
// Check memory cache
MethodSummary summary = memoryCache.get(method);
if (summary != null) {
return summary;
}
// Check disk cache
summary = diskCache.load(method);
if (summary != null && isValid(method, summary)) {
memoryCache.put(method, summary);
return summary;
}
// Compute and cache
summary = computeSummary(method);
cache(method, summary);
return summary;
}
}
Best practices
Design for your analysis
Don't over-engineer summaries. Include only information relevant to your specific analysis.
Balance precision and size
More precise summaries are larger and slower to compute. Find the right trade-off for your use case.
Validate summaries
Test that summaries accurately represent method behavior:
@Test
void validateSummary() {
MethodSummary summary = computeSummary(method);
// Run full analysis
FullAnalysisResult full = runFullAnalysis(method);
// Compare results
assertEquals(full.getAbstraction(), summary.getAbstraction());
}
Document summary format
Clearly document what your summaries represent and how to interpret them.