Hashing Strategy
To figure out what changed between two commits, symtrace needs to identify each piece of code and track it across versions. It does this by computing four different fingerprints (hashes) for every node in the syntax tree.
The four hashes
Section titled “The four hashes”| Hash | What it captures | Why it matters |
|---|---|---|
| Structural | The shape of the code (nesting, children) | Detects moves — same shape, different location |
| Content | The actual source text | Detects any text change, no matter how small |
| Identity | The shape with names replaced by placeholders | Detects renames — same structure, different names |
| Context | Parent node and depth in the tree | Detects when code is re-parented or restructured |
Why four?
Section titled “Why four?”A single fingerprint can’t distinguish between different kinds of changes. For example, if a function is both moved and slightly edited:
- The structural hash still matches (same shape), so it’s a move candidate.
- The content hash differs (body changed), so it’s not a pure move.
- The identity hash matches (names didn’t change), so it’s not a rename.
- The context hash differs (different position), confirming the relocation.
Combining all four lets symtrace classify the change accurately instead of falling back to “deleted + inserted.”
Performance
Section titled “Performance”symtrace uses BLAKE3 for hashing, which is fast and collision-resistant. When incremental parsing is enabled (the default), hashes are reused for unchanged parts of the code, avoiding redundant work.