Pagean is a web page analysis tool designed to automate tests requiring web pages to be loaded in a browser window (e.g. horizontal scrollbar, console errors)
Pagean is a web page analysis tool designed to automate tests requiring web
pages to be loaded in a browser window (for example 404 error loading an
external resource, page renders with horizontal scrollbars). The specific tests
are outlined below, but are all general tests that do not include any
page-specific logic.
Install Pagean globally (as shown below), or locally, via
npm.
```
npm install -g pagean
Pagean runs as a command line tool and is executed as follows:
`
Installed globally:
> pagean [options]
Installed locally:
> npx pagean [options]
Options:
-V, --version output the version number
-c, --config
-h, --help display help for command
`
Pagean requires a configuration file named, which can be specified via the CLI
as detailed previously, or use the default file .pageanrc.json in the project
root. This file provides the URLs to be tested and options to configure the
tests and reports. Details on the available tests and the configuration file
format are provided below.
The tests use Puppeteer to launch
a headless Chrome browser. The URLs defined in the configuration file are each
loaded once, and after page load the applicable tests are executed. Test
results are passed or failed, but can be configured to report warningfailed
instead of failure. Only a test causes the test process to fail andwarning
exit with an error code (a does not). If a page URL fails to load,page error
it is retried up to two additional times and if unsuccessful the URL is
logged as a with the error message.
The broken link test checks for broken links on the page. It checks any href
tag on the page with pointing to another location on the current page orhttp(s)
another page (that is, only or file protocols).
- For links within the page, this test checks for existence of the element on
the page, passing if the element exists and failing otherwise (and passing
for cases that are always valid, for example # or #top for the current#element
page). It doesn't check the visibility of the element. Failing tests return a
response of "#element Not Found" (where identifies the specificHEAD
element).
- For links to other pages, the test tries to most efficiently confirm whether
the target link is valid. It first makes a request for that URL andGET
checks the response. If an erroneous response is returned (>= 400 with no
execution error) and not code 429 (Too Many Requests), the request is retried
with a request. The test passes for HTTP responses < 400 and failsfile:
otherwise (if HTTP response is >= 400 or another error occurs).
- This can result in false failure indications, specifically for 404
links ( or ECONNREFUSED) or where the browser passes a domain401
identity with the request (page loads when tested, but response forcheckWithBrowser
links to that page). For these cases, or other false failures, the test
configuration allows a Boolean option that insteadpuppeteer
checks links by loading the target in the browser (via ). Notefile:
this can increase test execution time, in some cases substantially, due to
the time to open a new browser tab and plus load the page and all assets.
- Note that links can only be tested with the checkWithBrowserignoredLinks
option.
- If the link to another page includes a hash it's removed prior to checking.
The test in this case is confirming a valid link, not that the element
exists, which is only done for the current page.
- The test configuration allows an array listing link URLs toignoreDuplicates
ignore for this test. Note this only applies to links to other pages, not
links within the page, which are always checked.
- To optimize performance, link test results are cached and those links aren't
re-tested for the entire test run (across all tested URLs). The test
configuration allows a Boolean option that can be set tofalse
to bypass this behavior and re-test all links. The results for any
failed links are included in the reports in any case.
For any failing test, the data array in the test report includes the original
URL and the response code or error as shown below.
`json`
[
{
"href": "https://about.gitlab.com/not-found",
"status": 404
},
{
"href": "http://localhost:3000/brokenLinks.html#notlinked",
"status": "#notlinked Not Found"
},
{
"href": "https://this.url.does.not.exist/",
"status": "ENOTFOUND"
}
]
Note: this test checks all links on the page, and doesn't respect mechanisms
intended to limit web crawlers such as robots.txt or noindex tags.
The console error test fails if any error is written to the browser console,
but is otherwise simply a subset of the console output test. This separation
allows for testing for console errors, but allowing any other console output.
The console output test fails if any output is written to the browser console.
An array is included in the report with all entries, as shown below:
`json`
[
{
"type": "error",
"text": "Failed to load resource: net::ERR_NAME_NOT_RESOLVED",
"location": {
"url": "https://this.url.does.not.exist/file.js"
}
}
]
The external script test is intended to identify any externally loaded
JavaScript files (for example loaded from a CDN) and aggregate those files so
they can undergo further analysis (for example dependency vulnerability
scanning). The test is included here since these tests load fully rendered
pages, therefore allowing the aggregation of this data for pages generated
using any language or framework. By default the test returns a warning if the
page includes any JavaScript files loaded from a different domain than the page
(although this could be overridden to fail instead via setting
failWarn: false, see the Configuration section below). These files are then
downloaded and saved in the "pagean-external-files" directory in the project
root. Subdirectories are created for each domain, then following the URL path.
For example, the following script…
`html`
…is saved as ./bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js. Thedata array in the test report includes the original file URL and the local
saved filename or applicable error, as shown below.
`json`
[
{
"url": "https://code.jquery.com/jquery-3.4.1.slim.min.js",
"localFile": "pagean-external-scripts/code.jquery.com/jquery-3.4.1.slim.min.js"
},
{
"url": "http://bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js",
"error": "Request failed with status code 404"
}
]
Each external script is saved only once, but is reported on any page where it's
referenced.
The horizontal scrollbar test fails if the rendered page has a horizontal
scrollbar. If a specific browser viewport size is desired for this test, that
can be configured in the puppeteerLaunchOptions.
The page load time test fails if the page load time (from start through the
load event) exceeds the defined threshold in the configuration file (or the
default of 2 seconds). The actual load time is included in the report. Tests
time out at twice the page load time threshold.
The rendered HTML test is intended for cases where content is dynamically
created prior to page load (that is, the load event firing). The rendered
HTML is returned and checked with
HTML Hint and the test fails if any
issues are found. An array is included in the report with all HTML Hint issues,
as shown below:
`json`
[
{
"col": 9,
"evidence": " ",
"line": 6,
"message": "The id value [ div1 ] must be unique.",
"raw": " id=\"div1\"",
"rule": {
"description": "The value of id attributes must be unique.",
"id": "id-unique",
"link": "https://github.com/thedaviddias/HTMLHint/wiki/id-unique"
},
"type": "error"
}
]
An htmlhintrc file can be specified in the configuration file, otherwise the
default "./.htmlhintrc" file is used (if it exists). See the Configuration
section below.
Note: this test may not find some errors in the original HTML that are
removed/resolved as the page is parsed (for example closing tags with no
opening tags).
Based on the reporters configuration, Pagean results may be displayed in the
console and saved in two reports in the project root directory (any or all of
the three):
- A JSON report named
pagean-results.json.
- An HTML report named
pagean-results.html.
Both reports contain:
- The time of test execution.
- A summary of the total tests and results (passed, warning, failed, and page
errors).
- The detailed test results, including the URL tested, list of tests performed
on that URL with results, and, if applicable, any relevant data associated
with the test failure (for example the console errors if the console error
test fails).
Complete reports for the example case in this project (the tests as specified
in the project
.pageanrc.json
file) can be found at the preceding links.
Pagean looks for a configuration file as specified via the CLI, or defaults to
a file named .pageanrc.json in the project root. If the configuration file is
not found, is not valid JSON, or doesn't contain any URLs to check the job
fails.
Below is an example .pageanrc.json file, which is broken into seven major
properties:
- htmlhintrc: An optional path to an htmlhintrc file to be used in theproject
rendered HTML test.
- : An optional name of the project, which is included in HTML andpuppeteerLaunchOptions
JSON reports.
- : An optional set of options to pass to Puppeteer onreporters
launch. The complete list of available options can be found at
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteerlaunchoptions.
- : An optional array of reporters indicating the test reports thatcli
should be provided. There are three possible options - , html, andjson
. The cli option reports all test details to the console, but thecli
final results summary is always output (even with disabled). Ifreporters
is specified, at least one reporter must be included. The defaultsettings
value, as specified below, is all three reporters enabled.
- : These settings enable/disable or configure tests, and are appliedenabled
to all tests overriding the default values.
- The shorthand notation allows easy enabling/disabling of tests. In this
format the test name is given with a Boolean value to enable or disable the
test. In this case any other test-specific settings use the default values.
- The longhand version includes an object for each test. Every test includes
two possible properties (some tests include additional settings):
- : A Boolean value to enable/disable the test, and some teststrue
include additional settings (default for all tests).failWarn
- : A Boolean value causing a failed test to report a warningfalse
instead of failure. A warning result doesn't cause the test process to
fail (exit with an error code). The default value for all tests is
except the externalScriptTest, as shown below.
The shorthand:
`json`
"settings": {
"consoleErrorTest": true
}
is equivalent to the longhand:
`json`
"settings": {
"consoleErrorTest": {
"enabled": true,
"failWarn": false
}
}
- sitemap: Specify a sitemap with URLs to test. If a sitemap is specified,urls
the URLs from the sitemap are added to the array. If a URL is in theurls
array with settings, those settings are retained. Note that
is currently not supported. The sitemap object can haveurl
the following properties:
- : The URL of the sitemap (required if sitemap is included). This canfind
be either an actual URL or a local file.
- : A string to search for in sitemap URLs (for examplehttps://somehere.test
) (required if replace is specified).replace
- : The string to replace the find string with (for examplehttp://localhost:3000
) (required if find is specified).exclude
- : An array of strings with regular expressions to exclude URLs['\.pdf$']
from the sitemap (for example to exclude any PDF files). Since\\.
these are string representations of regular expressions, the backslash must
be escaped (for example ). Exclude is performed before find/replace,urls
so uses the original URLs from the sitemap.
- : An array of URLs to be tested, which must contain at least one value.url
Each array entry can either be a URL string, or an object that contains a
string and an optional settings object. This object can contain anysettings
of the values identified previously and overrides that setting forurl
testing that URL. The string can be either an actual URL or a local
file, as shown in the example below.
The following shows all available settings, except sitemap, with the default
values.
`json`
{
"puppeteerLaunchOptions": {
"headless": "new"
},
"reporters": ["cli", "html", "json"],
"settings": {
"brokenLinkTest": {
"enabled": true,
"failWarn": false,
"checkWithBrowser": false,
"ignoreDuplicates": true
},
"consoleErrorTest": {
"enabled": true,
"failWarn": false
},
"consoleOutputTest": {
"enabled": true,
"failWarn": false
},
"externalScriptTest": {
"enabled": true,
"failWarn": true
},
"horizontalScrollbarTest": {
"enabled": true,
"failWarn": false
},
"pageLoadTimeTest": {
"enabled": true,
"failWarn": false,
"pageLoadTimeThreshold": 2
},
"renderedHtmlTest": {
"enabled": true,
"failWarn": false
}
}
}
Numerous example config files used in the tests can be found in the
test fixtures.
Provided with the Pagean project are container images configured to run the
tests. All available image tags can be found in the
registry.gitlab.com/gitlab-ci-utils/pagean
repository.
Details on each release can be found on the
Releases page.
Note: any images in the gitlab-ci-utils/pagean/tmp repository are
temporary images used during the build process and may be deleted at any point.
In Puppeteer v19
the default cache location for installing the Chrome binary was changed from
within the project's node_modules folder to ~/.cache/puppeteer. To simplifyPUPPETEER_CACHE_DIR
execution in a container, the environment variable is set/home/pptruser/.cache/puppeteer
to install the Chrome binaries in during
container build, so setting to another value before execution can cause errors
where Puppeteer can't find the Chrome binary.
The following is an example job from a .gitlab-ci.yml file to use this image to
run Pagean against another project in GitLab CI:
`yaml`
pagean:
image: registry.gitlab.com/gitlab-ci-utils/pagean:latest
stage: test
script:
- pagean
artifacts:
when: always
paths:
- pagean-results.html
- pagean-results.json
- pagean-external-scripts/
The container image shown previously includes
serve and
wait-on installed globally to run a
local HTTP server for testing static content. The example job below illustrates
how to use this for Pagean tests. The script starts the server in this
project's ./tests/fixtures/site directory and uses wait-on to hold thepageanrc
script until the server is running and returns a valid response. The referenced file is the same as the project default pageanrc, but references
all test URLs from the local server.
`yaml`
pagean:
image: registry.gitlab.com/gitlab-ci-utils/pagean:latest
stage: test
before_script:
# Start static server in test cases directory, discarding any console output,
# and wait until the server is running.
- serve ./tests/fixtures/site > /dev/null 2>&1 & wait-on http://localhost:3000
script:
- pagean -c static-server.pageanrc.json
artifacts:
when: always
paths:
- pagean-results.html
- pagean-results.json
- pagean-external-scripts/
A command line tool is also available to lint pageanrc files, which is executed
as follows:
`
Installed globally:
> pageanrc-lint [options] [file] (default: "./.pageanrc.json")
Installed locally:
> npx pageanrc-lint [options] [file] (default: "./.pageanrc.json")
Lint a pageanrc file
Options:
-V, --version output the version number
-j, --json output JSON with full details
-h, --help display help for command
`
The --json option outputs the JSON results to stdout in all cases for[]
consistency ( if no errors found, so that it always outputs valid
JSON). Otherwise errors are output to stderr, for example:
`sh`
.\tests\test-configs\cli-tests\some-test.pageanrc.json
In some cases, a single error might result in multiple messages based on the
options in the schema definition, especially for cases that can be either a
single value or an object with specific properties (for example the errors for in the preceding example).
Note that because of the large number of options, which are dependent on an
external project, the linting of puppeteerLaunchOptions` only checks that at
least one property is provided, it doesn't check the detailed settings.