Skip to content

Matching Algorithm

symtrace uses a 5-phase matching process to compare old and new code nodes. It starts with the most confident matches and works down to less certain ones.

Phase 1 -- Exact match
Both shape and content match:
Same location -> no change (skip)
Different location -> MOVE
Phase 2 -- Shape match
Same structure, different text:
Same name -> MODIFY
Only names changed -> RENAME
Otherwise -> MODIFY
Phase 3 -- Fuzzy match
Score remaining pairs by similarity:
Same kind + name -> MODIFY
High similarity (>=90%) -> RENAME
Moderate similarity -> MODIFY
Phase 4 -- Unmatched old nodes -> DELETE
Phase 5 -- Unmatched new nodes -> INSERT

If a code node has the same shape and content in both commits, it’s unchanged. If it appears at a different location, it was moved without being edited.

If the structure is the same but the text differs, something inside changed. symtrace checks whether just the name changed (a rename) or the body changed (a modify).

For remaining unmatched nodes, symtrace computes a similarity score based on structure, content, and complexity. Pairs above a threshold are matched as either rename or modify depending on what changed. See Similarity Scoring for how scores work.

Any old node without a match is a delete. Any new node without a match is an insert.

The algorithm always produces the same output for the same input, regardless of platform or run order.