Matching Algorithm

symtrace uses a 5-phase matching process to compare old and new code nodes. It starts with the most confident matches and works down to less certain ones.

Overview

Phase 1 -- Exact match
  Both shape and content match:
    Same location   -> no change (skip)
    Different location -> MOVE

Phase 2 -- Shape match
  Same structure, different text:
    Same name          -> MODIFY
    Only names changed -> RENAME
    Otherwise          -> MODIFY

Phase 3 -- Fuzzy match
  Score remaining pairs by similarity:
    Same kind + name        -> MODIFY
    High similarity (>=90%) -> RENAME
    Moderate similarity     -> MODIFY

Phase 4 -- Unmatched old nodes -> DELETE
Phase 5 -- Unmatched new nodes -> INSERT

Phase 1: Exact match

If a code node has the same shape and content in both commits, it’s unchanged. If it appears at a different location, it was moved without being edited.

Phase 2: Shape match

If the structure is the same but the text differs, something inside changed. symtrace checks whether just the name changed (a rename) or the body changed (a modify).

Phase 3: Fuzzy match

For remaining unmatched nodes, symtrace computes a similarity score based on structure, content, and complexity. Pairs above a threshold are matched as either rename or modify depending on what changed. See Similarity Scoring for how scores work.

Phase 4 and 5: Leftovers

Any old node without a match is a delete. Any new node without a match is an insert.

Determinism

The algorithm always produces the same output for the same input, regardless of platform or run order.