Competitors / Similar Tools (Duplicate Code / Clone Detection)

Goal: quickly locate mature solutions in the ecosystem and clarify a differentiated direction for “Rust core + npm distribution”.

Common tools (by adoption / practicality)

jscpd: text/token duplication detection; multi-language/multi-directory; easy to integrate
PMD CPD: Java ecosystem “Copy/Paste Detector” (token-level)
SonarQube Duplication: duplication detection inside a quality platform (multi-language, platform reporting)
Simian, etc.: general clone detection products (good references for product shape)

Clone types as a lens

Type-1: identical or only formatting changes (whitespace/comments) — current MVP aligns here (even stricter: whitespace removal only)
Type-2: renaming identifiers/literals — typically needs tokenization
Type-3: small edits/insertions/deletions — needs more complex similarity/windowing/fingerprints/AST
Type-4: semantic equivalence with different structure — usually beyond “simple duplicate detection”

Research / large-scale approaches (methodology references)

SourcererCC: large-scale clone search (index/retrieval)
Deckard: structured / feature-vector approaches (AST features)
NiCad: normalization + comparison (emphasizes normalization strategy)

Comparison dimensions (what to record)

granularity: file / span (function/block) / cross-file composition
normalization: whitespace-only / comment removal / tokenization / AST
clone type coverage: Type-1/2/3
output: localization (path + range), JSON/SARIF, thresholds/min-block sizes
integration: CLI, CI, incremental scan, caching, ignore rules
performance: speed/memory, scaling for large/multi-repo (index vs full compare)

Suggested differentiation

Rust for scanning/normalization/fingerprints → speed + portability
Node for CLI distribution via npm → easy adoption in frontend/fullstack repos
start with robust cross-repo/file-level duplicates, then expand to span-level clones