Skip to content

Similarity Scoring

When symtrace matches a code node across two commits (as a MOVE, RENAME, or MODIFY), it also reports how much changed. This is the similarity score.

The score combines three factors:

  • Structure (50%) — did the shape of the code change? (nesting, number of children)
  • Tokens (30%) — did the actual source text change? (identifiers, literals, operators)
  • Complexity (20%) — did the control flow change? (branches, loops, conditions)

The score maps to a simple rating:

SimilarityIntensityWhat it means
80% or higherlowMinor edit — probably safe
50 — 79%mediumReal change — worth reviewing
Below 50%highMajor rewrite — treat as new code

In CLI output, you’ll see it inline:

~ [MODIFY] function 'parse_body' modified (L10 -> L10) [75% similar, medium]

In JSON output (--json), matched operations include a similarity object:

{
"similarity_percent": 75.2,
"change_intensity": "medium",
"structure_similarity": 0.84,
"token_similarity": 0.61,
"control_flow_changed": true
}
  • INSERT and DELETE operations don’t have similarity scores (there’s nothing to compare against).
  • A MOVE with 100% similarity confirms code was relocated without modification.
  • A RENAME with 98% usually means only names changed.