Similarity Scoring
When symtrace matches a code node across two commits (as a MOVE, RENAME, or MODIFY), it also reports how much changed. This is the similarity score.
How it’s calculated
Section titled “How it’s calculated”The score combines three factors:
- Structure (50%) — did the shape of the code change? (nesting, number of children)
- Tokens (30%) — did the actual source text change? (identifiers, literals, operators)
- Complexity (20%) — did the control flow change? (branches, loops, conditions)
Change intensity
Section titled “Change intensity”The score maps to a simple rating:
| Similarity | Intensity | What it means |
|---|---|---|
| 80% or higher | low | Minor edit — probably safe |
| 50 — 79% | medium | Real change — worth reviewing |
| Below 50% | high | Major rewrite — treat as new code |
Reading similarity in output
Section titled “Reading similarity in output”In CLI output, you’ll see it inline:
~ [MODIFY] function 'parse_body' modified (L10 -> L10) [75% similar, medium]In JSON output (--json), matched operations include a similarity object:
{ "similarity_percent": 75.2, "change_intensity": "medium", "structure_similarity": 0.84, "token_similarity": 0.61, "control_flow_changed": true}- INSERT and DELETE operations don’t have similarity scores (there’s nothing to compare against).
- A MOVE with 100% similarity confirms code was relocated without modification.
- A RENAME with 98% usually means only names changed.