Cross-language duplicate code detector - Node.js bindings
npm install polydupNode.js bindings for PolyDup - a cross-language duplicate code detector powered by Rust.
- 🚀 Fast: Built in Rust with Tree-sitter for efficient parsing
- 🔄 Multi-language: Supports Rust, Python, JavaScript/TypeScript
- ⚡ Non-blocking: Async API runs on background threads
- Type-2 clones: Detects structurally similar code with different variable names
- Detailed reports: Statistics and similarity scores
``bash`
npm install polydup
`javascript
const { findDuplicates } = require('polydup');
findDuplicates(['./src', './lib'], 50, 0.85)
.then(report => {
console.log(Found ${report.duplicates.length} duplicates);Scanned ${report.filesScanned} files in ${report.stats.durationMs}ms
console.log();
report.duplicates.forEach(dup => {
console.log(${dup.file1} ↔️ ${dup.file2} (${(dup.similarity * 100).toFixed(1)}%));`
});
})
.catch(err => console.error('Scan failed:', err));
Asynchronously scans for duplicate code (recommended).
Parameters:
- paths: string[] - File or directory paths to scanminBlockSize?: number
- - Minimum code block size in tokens (default: 50)threshold?: number
- - Similarity threshold 0.0-1.0 (default: 0.85)
Returns: Promise
Example:
`javascript`
const report = await findDuplicates(['./src'], 30, 0.9);
Synchronously scans for duplicate code (blocks event loop - use sparingly).
Parameters: Same as findDuplicates
Returns: Report
Example:
`javascript`
const report = findDuplicatesSync(['./src'], 50, 0.85);
Returns the library version string.
Returns: string
Type definitions are automatically generated:
`typescript
import { findDuplicates, Report, DuplicateMatch } from 'polydup';
const report: Report = await findDuplicates(['./src']);
report.duplicates.forEach((dup: DuplicateMatch) => {
console.log(${dup.file1} ↔️ ${dup.file2});`
});
`typescript
interface Report {
filesScanned: number;
functionsAnalyzed: number;
duplicates: DuplicateMatch[];
stats: ScanStats;
}
interface DuplicateMatch {
file1: string;
file2: string;
startLine1: number;
startLine2: number;
length: number; // Block size in tokens
similarity: number; // 0.0 - 1.0
hash: string; // Hash signature
}
interface ScanStats {
totalLines: number;
totalTokens: number;
uniqueHashes: number;
durationMs: number;
}
`
1. Use async API: Always prefer findDuplicates() over findDuplicatesSync() to avoid blockingminBlockSize
2. Adjust window size: Smaller finds more matches but may include false positives
3. Filter results: Apply post-processing to filter duplicates by file patterns or directories
4. Parallel scans: Use Promise.all for multiple independent scans
`javascript
const { findDuplicates } = require('polydup');
async function analyzeCrossProject() {
const [frontend, backend] = await Promise.all([
findDuplicates(['./frontend/src'], 40, 0.9),
findDuplicates(['./backend/src'], 40, 0.9),
]);
console.log('Frontend duplicates:', frontend.duplicates.length);
console.log('Backend duplicates:', backend.duplicates.length);
// Find cross-project duplicates
const allPaths = ['./frontend', './backend'];
const crossProject = await findDuplicates(allPaths, 50, 0.95);
const crossDuplicates = crossProject.duplicates.filter(d =>
d.file1.includes('frontend') && d.file2.includes('backend')
);
console.log('Cross-project duplicates:', crossDuplicates.length);
}
analyzeCrossProject();
`
`bash`
cd crates/polydup-node
npm install
npm run build
npm test
Type definitions are auto-generated during build:
`bash`
npm run typegen
This creates index.d.ts` with TypeScript definitions for all exported functions.
- macOS (Intel & Apple Silicon)
- Linux (x64 & ARM64)
- Windows (x64)
MIT
https://github.com/wiesnerbernard/polydup