Matching Algorithm
symtrace uses a 5-phase matching process to compare old and new code nodes. It starts with the most confident matches and works down to less certain ones.
Overview
Section titled “Overview”Phase 1 -- Exact match Both shape and content match: Same location -> no change (skip) Different location -> MOVE
Phase 2 -- Shape match Same structure, different text: Same name -> MODIFY Only names changed -> RENAME Otherwise -> MODIFY
Phase 3 -- Fuzzy match Score remaining pairs by similarity: Same kind + name -> MODIFY High similarity (>=90%) -> RENAME Moderate similarity -> MODIFY
Phase 4 -- Unmatched old nodes -> DELETEPhase 5 -- Unmatched new nodes -> INSERTPhase 1: Exact match
Section titled “Phase 1: Exact match”If a code node has the same shape and content in both commits, it’s unchanged. If it appears at a different location, it was moved without being edited.
Phase 2: Shape match
Section titled “Phase 2: Shape match”If the structure is the same but the text differs, something inside changed. symtrace checks whether just the name changed (a rename) or the body changed (a modify).
Phase 3: Fuzzy match
Section titled “Phase 3: Fuzzy match”For remaining unmatched nodes, symtrace computes a similarity score based on structure, content, and complexity. Pairs above a threshold are matched as either rename or modify depending on what changed. See Similarity Scoring for how scores work.
Phase 4 and 5: Leftovers
Section titled “Phase 4 and 5: Leftovers”Any old node without a match is a delete. Any new node without a match is an insert.
Determinism
Section titled “Determinism”The algorithm always produces the same output for the same input, regardless of platform or run order.