Finds duplicated text strings and generates a report about the longest substrings or most frequent words in supplied text
npm install longestrepeatedstrings
Longest Repeated Strings
========================
Finds duplicated text and generates a report about the longest substrings or
most frequent words in supplied text, weighted by how much space the string
takes up overall (length * occurences).
> You supply input text or files. It returns raw data or a text report.
š§µ Try an online demo
(This module was designed to analyze javascript code for refactoring opportunities in a Gulp task)
npm install longestrepeatedstrings -S
yarn add longestrepeatedstrings
javascript
var LRS = require('longestrepeatedstrings');
`
$3
You can analyze a single text by using the text function to find the longest repeated substrings:
`javascript
const text = 'Your text content goes here';
const results = LRS.text(text, { maxRes: 20, minLen: 8 });
console.log(results);
`
Parameters:
- text (String): The input text to analyze.
- opts (Object, optional): A configuration object with the following properties:
- maxRes (Number, default: 50): The maximum number of results to return.
Restricts the final list to highest scoring results and does not speed up processing.
- minLen (Number, default: 4): The minimum length of substrings to consider.
- maxLen (Number, default: 40): The maximum length of substrings to consider.
- minOcc (Number, default: 2): The minimum number of occurrences a substring must have to be included.
- penalty (Number, default: 0): Per-occurence score penalty, helps order results for deduplication.
- split (Array, default: [' ', ',', '.', '\n']): Splits input after specified strings. If not using the words and clean
options, settings THIS up properly for expected input will be key to making this module effective.
- break (Array, default: []): Splits input ON these strings and won't include them in matches.
Can be used to concatenate an array of texts with a special char.
- escSafe (Boolean, default: true): Will take extra care around escaped characters. May as well leave this on.
- words (Boolean, default: true): If true, matches only whole words.
- clean (Boolean, default: false): If true, strips all symbols from input.
- trim (Boolean, default: true): If true, trims white space from results.
- omit (Array, default: []): An array of substrings to omit from the results. Can be used to ignore accepted long/frequent words.
as minLen, for example, will cause longer substrings to appear earlier in the results. Negative penalty will favor more frequent substrings.
Returns: An array of objects containing the repeated substrings, their count, and a score for each.
$3
You can analyze multiple files by using the files function. This will read the contents of the files and find repeated substrings in each one.
`javascript
const fs = require('fs');
const files = ['file1.txt', 'file2.txt'];
const results = LRS.files(files, opts);
console.log(results);
`
Parameters:
- files (Array): An array of file paths to analyze.
- opts (Object, optional): Same options as in the text function.
Returns: An object where the keys are file names and the values are the repeated substrings found in each file.
$3
#### File Analysis Report
`javascript
const report = LRS.filesReport(results, 1); // Pass 1 to log to console
console.log(report);
`
Parameters:
- results (Object): The results returned by the files function.
- out (Number, optional, default: 0): If set to 1, the report will be logged to the console too.
- chars (Object, optional): A configuration object with the following properties:
- delim (String, default: 'ā
'): Character/s to insert between each result.
- open (String, default: 'ā¦
'): Character/s to insert before the repeat count.
- close (String, default: 'Ćā¦'): Character/s to insert after the repeat count.
Returns: A text report summarizing the repeated substrings found in each file.
#### Text Analysis Report
`javascript
const report = LRS.textReport(results, 1); // Pass 1 to log to console
console.log(report);
`
Parameters:
- results (Array): The results returned by the text function.
- out (Number, optional, default: 0): If set to 1, the report will be logged to the console too.
- chars (Object, optional): Same options as in the filesReport function.
Returns: A list of repeated substrings with their occurrence counts.
$3
1. Either, analyze a single text or multiple files:
`javascript
const text = 'This is an example text with repeated substrings';
const results = LRS.text(text);
`
or
`javascript
const files = ['file1.txt', 'file2.txt'];
const results = LRS.files(files);
`
2. Afterward, generate a report:
`javascript
const report = LRS.filesReport(results, 1); // Logs the report to console
`
$3
- Results are sorted by a score, which is calculated based on the length of the substring and the number of occurrences.
- This package is used in JCrush; a Javascript code deduplicator.
Gulp usage
In your gulpfile.mjs, use Longest Repeated Strings as a Gulp plugin:
#### Step 1: Import Longest Repeated Strings
`javascript
import LRS from 'longestrepeatedstrings';
`
#### Step 2: Create a Gulp Task for Longest Repeated Strings
`javascript
var analyzeStrings = true;
gulp.task('analyze', function (done) {
if (analyzeStrings) {
LRS.filesReport(LRS.files(['./script.min.js', './styles.min.css', './index.html'], {
clean: 1, words: 1,
omit: [
// This is a list of words that we just accept we've used a lot in the
// content, and we don't need to see them appear in repeated-strings
// reports. (supply all with lower-case)
'consciousness', 'enlightenment', 'ephemeral', 'watching', 'observing',
'communication', 'inspiring', 'realizing', 'uplifting', 'illusion',
],
}), 1, {delim: ", "});
analyzeStrings = false;
}
setTimeout(() => {analyzeStrings = true}, 1000 60 60); // Only run once an hour.
done(); // Signal completion
});
`
#### Step 3: Run Longest Repeated Strings After Minification
To run Longest Repeated Strings after your minification tasks, add Longest Repeated Strings in series after other tasks, such as in this example:
`javascript
gulp.task('default', gulp.series(
gulp.parallel('minify-css', 'minify-js', 'minify-html'), // Run your minification tasks first
'analyze' // Then run LRS
));
``