Output & Report
dup-code-check supports both text output and JSON output. Text is for humans; JSON is for post-processing and CI integration.
1) Duplicate files (default mode)
Text
You’ll see:
duplicate groups: <N>- for each group:
hash=<...> normalized_len=<...> files=<...>- [repoLabel] path
JSON (--json)
JSON output is an array, each element:
ts
interface DuplicateGroup {
hash: string; // 16 hex chars (FNV-1a 64)
normalizedLen: number; // byte length after ASCII whitespace removal
files: { repoId: number; repoLabel: string; path: string }[];
}2) Suspected duplicate code spans (--code-spans)
Text
duplicate code span groups: <N>- per group:
hash=<...> normalized_len=<...> occurrences=<...>preview=<...>- [repoLabel] path:startLine-endLine
JSON (--json)
JSON output is an array, each element:
ts
interface DuplicateSpanGroup {
hash: string;
normalizedLen: number;
preview: string;
occurrences: {
repoId: number;
repoLabel: string;
path: string;
startLine: number;
endLine: number;
}[];
}3) Scan stats (--stats)
JSON mode
With --json --stats:
- default /
--code-spans:{ groups, scanStats } --report:{ report, scanStats }
scanStats fields include:
candidateFiles,scannedFiles,scannedBytesgitFastPathFallbacks: non-zero when the scan attempted the Git fast path and had to fall back to the filesystem walkerskippedNotFound,skippedPermissionDenied,skippedTooLarge,skippedBinary,skippedOutsideRoot,skippedRelativizeFailed,skippedWalkErrorsskippedOutsideRoot: paths outside roots or unsafe paths (e.g. symlink targets outside roots, or unsafe paths emitted by the Git fast path)skippedBudgetMaxFiles: non-zero when the scan stopped early due to themaxFilesbudgetskippedBudgetMaxTotalBytes: skipped due tomaxTotalBytes(reading would exceed the total bytes budget)skippedBudgetMaxNormalizedChars: non-zero when the scan stopped early due to themaxNormalizedCharsbudgetskippedBudgetMaxTokens: non-zero when the scan stopped early due to themaxTokensbudget (report mode)skippedBucketTruncated: detector guardrail; fingerprint buckets were truncated to cap worst-case cost (results may miss some matches)
Text mode
In text mode, --stats prints stats to stderr while keeping results on stdout:
bash
dup-code-check --stats . >result.txt 2>stats.txt4) Strict mode (--strict)
--strict is intended for CI and answers “was the scan complete?”:
- exits
1onPermissionDenied,outside_root,relativize_failed, traversal errors, bucket truncation, or budget limits (maxFiles/maxTotalBytes/maxNormalizedChars/maxTokens) - does not fail on
NotFound,TooLarge, orBinary
When --json is enabled and --stats is not, --strict still prints stats to stderr on failure (so you can see why).
5) Report mode (--report)
Text output contains multiple sections (in this order):
file duplicatescode span duplicatesline span duplicatestoken span duplicatesblock duplicatesAST subtree duplicatessimilar blocks (minhash)similar blocks (simhash)
JSON output:
ts
interface DuplicationReport {
fileDuplicates: DuplicateGroup[];
codeSpanDuplicates: DuplicateSpanGroup[];
lineSpanDuplicates: DuplicateSpanGroup[];
tokenSpanDuplicates: DuplicateSpanGroup[];
blockDuplicates: DuplicateSpanGroup[];
astSubtreeDuplicates: DuplicateSpanGroup[];
similarBlocksMinhash: SimilarityPair[];
similarBlocksSimhash: SimilarityPair[];
}For the meaning/implementation ideas of each section, see Detectors & Algorithms.