A multi-target library for ensuring all links in a file are alive
npm install repo-link-checkWant to ensure that all the links in your git-based website are alive? This
project may help!
This script searches through the source file content of either a directory or a
git branch's difference from main. This means it's possible to both verify the
life of every link in the repo and perform much smaller checks on branches to
quickly ensure all new links are valid.
Install our NPM package, repo-link-check, with the Node package manager of
your choice. This package contains a binary entry of the same name, allowing you
to run yarn repo-link-check or npm run repo-link-check. From there, it's
recommended to use scripts in package.json to have different pre-configured
runs.
For example:
``json`
"link-check": "repo-link-check -c config/link-check/config.yml -s filesystem -u",
"link-check-diff": "repo-link-check -c config/link-check/config.yml",
"link-check-dev-server": "repo-link-check -c config/link-check/config.yml -r http://localhost:3000",
"link-check-exclude": "repo-link-check -c config/link-check/config.yml -s filesystem -u only"
This setup checks all files in the project-relative directory /content with.css, .md, and .json extensions for links, resolve root-relative links tohttps://www.example.com
be relative to and excluding files based on the lines.config/exclude-files
in and .config.exclude-links
While not an actual GitHub Action, this project provides reusable GitHub Actions
workflows that use the CLI version of link-check to generate a report. Check
them out in
.github/workflows
This application is configured primarily through a YAML configuration file whose
path is specified with the --configFile (or -c) option. Other options that
override the file can be specified with flags.
For example, this is what a configuration file could look like:
`yml
rootURL: https://dvc.org
fileIncludePatterns: '{.github,content,src}/*/!(.test).{css,js,jsx,md,tsx,ts,json}'
fileExcludePatternFile: config/link-check/excluded-files.yml
linkExcludePatternFile: config/link-check/excluded-links.yml
linkOptions:
'*.wikipedia.org':
minTime: 2000
maxConcurrent: 1
'(*.)?github.com':
minTime: 1000
maxConcurrent: 1
`
#### configFile: string
When set by a runner, Link Check will read this path relative to the root of the
repo for a configuration file, either in JSON or YAML depending on the
extension. Both the CLI and GHA runners can do this, which is particularly
useful for sharing patterns between the two.
#### diff: boolean
When true, uses the output of a git diff between the current working area and
origin/main as input, as opposed to the default behavior of reading the
filesystem. It effectively means that this mode checks links that would be new
to main if the current state of the program were merged, and that this will
provide no links when checking out an up-to-date main.
#### rootURL: string
This string is used as the base for root-relative links (those that start with
'/'). It's useful for specifying a deploy preview or local server, particularly
from GitHub Actions.
#### linkIncludePatterns: string[]?
When provided, links to check will be limited to those that pass a micromatch
test with this option as the pattern. Otherwise, all links will be used.
#### linkExcludePatterns: string[]?
When provided, links that pass a micromatch test with this option as the
pattern will show up on the report, passing with no test necessary and marked as
excluded.
Exclusions take precedence over inclusions.
#### fileIncludePatterns: string[]?
When provided, files to check links in will be limited to those whose filenames
that pass a micromatch test with this option as the pattern. Otherwise, all
files from the content source will be used.
#### fileExcludePatterns: string[]?
When provided, files whose filenames match a micromatch check with this option
as its pattern will be completely excluded from checks and reports.
Exclusions take precedence over inclusions.
####
These four sister options mirror the Patterns variants, but instead take pathsPatterns
to files which are top-level arrays in YAML or JSON. These parsed arrays will be
used alongside ones provided in the related options.
#### dryRun: boolean
When this option is true, no link checks will actually be run. Useful for
debugging link patterns, as excluded links will have a description distinct from
those stopped by the dry run alone.
#### unusedPatternsOnly: boolean
If true, Link Check will use dryRun, report unused patterns, and then exit.
#### output: (string | string[])?
Selects the output strategy to use. Both runners can use "consoleLog", and the
GitHub Action has a "checksAction" mode to generate output for
LouisBRunner/checks-action. Can accept multiple strings to use multiple output
strategies.
#### failsOnly: boolean
When true, only log/report failed link checks. Useful to get around GitHub
Actions' character limit.
Disabled by default on CLI, enabled by default on GitHub Actions.
#### verbose: boolean
When true, the application will console.log the parsed options object before
running.
#### minTime: number
The minimum amount of time in ms to wait before two requests on one domain.
Defaults to 400.
#### maxConcurrent: number
The maximum amount of requests allowed on each hostname at one time. Defaults
to 1.
#### userAgent: string
When specified, will use this string as the user-agent header in link check
requests.
Defaults to
Mozilla/5.0 (compatible; Iterative/link-check; +https://github.com/iterative/link-check)
#### linkOptions: Map
This object determines settings that will be applied for each hostname. The keys
will be tried as a micromatch pattern against each link's hostname, and the
object at the first match will have its keys override the defaults for that
instance.
By default, each hostname is allowed one concurrent connection and at least
400ms minimum time between each call per hostname. Sites with more aggressive
429 responses may require a larger minTime, but the defaults handle the majority
of sites well.
This setting can only be defined in an options file.
Currently, the only settings here are minTime and maxConcurrent.
To specify multiple patterns or pattern files, use the relevant flag multiple
times. Use -h to get this help output:
Options from CLI flags can be used to override the config file when necessary.
`
Usage: repo-link-check [options]
Options:
-c, --configFile
-r, --rootURL
-o, --output
-d, --diff Use git diff from origin/main as a source instead of the whole filesystem.
--dryRun Skip checking parsed links and report them as skipped
-u, --unusedPatternsOnly Do a dry run and exit after printing unused patterns
-f, --failsOnly Only report failing links
-v, --verbose Log fully resolved options
-li, --linkIncludePatterns
-le, --linkExcludePatterns
-fi, --fileIncludePatterns
-fe, --fileExcludePatterns
-h, --help display help for command
To specify multiple patterns, use the relevant flag multiple times.
`
In its current state, most of this project is a standard node CLI package
published on NPM. The reusable GitHub Actions in .github/workflows deploy
through GitHub, but are contributed to the same as any other code.
To manually test the source, build it with yarn build and then runnode dist/cli.js with whatever flags you would otherwise pass torepo-link-check`