Pagean

Pagean is a web page analysis tool designed to automate tests requiring web
pages to be loaded in a browser window (for example 404 error loading an
external resource, page renders with horizontal scrollbars). The specific tests
are outlined below, but are all general tests that do not include any
page-specific logic.

Installation

Install Pagean globally (as shown below), or locally, via
npm.

``npm install -g pagean`

`Usage`

Pagean runs as a command line tool and is executed as follows:

`Installed globally: > pagean [options]

Installed locally: > npx pagean [options]

Options: -V, --version output the version number -c, --config the path to the pagean configuration file (default: "./.pageanrc.json") -h, --help display help for command`

Pagean requires a configuration file named, which can be specified via the CLI as detailed previously, or use the default file.pageanrc.jsonin the project root. This file provides the URLs to be tested and options to configure the tests and reports. Details on the available tests and the configuration file format are provided below.

`Test cases`

The tests use Puppeteer to launch a headless Chrome browser. The URLs defined in the configuration file are each loaded once, and after page load the applicable tests are executed. Test results arepassed or failed, but can be configured to report warninginstead of failure. Only afailedtest causes the test process to fail and exit with an error code (awarningdoes not). If a page URL fails to load, it is retried up to two additional times and if unsuccessful the URL is logged as apage error with the error message.

`json [ { "col": 9, "evidence": "

",
    "line": 6,
    "message": "The id value [ div1 ] must be unique.",
    "raw": " id=\"div1\"",
    "rule": {
      "description": "The value of id attributes must be unique.",
      "id": "id-unique",
      "link": "https://github.com/thedaviddias/HTMLHint/wiki/id-unique"
    },
    "type": "error"
  }
]


An htmlhintrc file can be specified in the configuration file, otherwise the
default "./.htmlhintrc" file is used (if it exists). See the Configuration
section below.
Note: this test may not find some errors in the original HTML that are
removed/resolved as the page is parsed (for example closing tags with no
opening tags).
Reports

Based on the reportersconfiguration, Pagean results may be displayed in the console and saved in two reports in the project root directory (any or all of the three):

- A JSON report namedpagean-results.json. - An HTML report namedpagean-results.html.

Both reports contain:

- The time of test execution. - A summary of the total tests and results (passed, warning, failed, and page errors). - The detailed test results, including the URL tested, list of tests performed on that URL with results, and, if applicable, any relevant data associated with the test failure (for example the console errors if the console error test fails).

Complete reports for the example case in this project (the tests as specified in the project.pageanrc.jsonfile) can be found at the preceding links.

`Configuration`

Pagean looks for a configuration file as specified via the CLI, or defaults to a file named.pageanrc.jsonin the project root. If the configuration file is not found, is not valid JSON, or doesn't contain any URLs to check the job fails.

Below is an example .pageanrc.jsonfile, which is broken into seven major properties:

- htmlhintrc: An optional path to an htmlhintrc file to be used in the rendered HTML test. -project: An optional name of the project, which is included in HTML and JSON reports. -puppeteerLaunchOptions: An optional set of options to pass to Puppeteer on launch. The complete list of available options can be found at https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteerlaunchoptions. -reporters: An optional array of reporters indicating the test reports that should be provided. There are three possible options -cli, html, andjson. The clioption reports all test details to the console, but the final results summary is always output (even withclidisabled). Ifreportersis specified, at least one reporter must be included. The default value, as specified below, is all three reporters enabled. -settings: These settings enable/disable or configure tests, and are applied to all tests overriding the default values. - The shorthand notation allows easy enabling/disabling of tests. In this format the test name is given with a Boolean value to enable or disable the test. In this case any other test-specific settings use the default values. - The longhand version includes an object for each test. Every test includes two possible properties (some tests include additional settings): -enabled: A Boolean value to enable/disable the test, and some tests include additional settings (defaulttruefor all tests). -failWarn: A Boolean value causing a failed test to report a warning instead of failure. A warning result doesn't cause the test process to fail (exit with an error code). The default value for all tests isfalse except the externalScriptTest, as shown below.

The shorthand:

`json "settings": { "consoleErrorTest": true }`

is equivalent to the longhand:

`json "settings": { "consoleErrorTest": { "enabled": true, "failWarn": false } }`

- sitemap: Specify a sitemap with URLs to test. If a sitemap is specified, the URLs from the sitemap are added to theurlsarray. If a URL is in theurls array with settings, those settings are retained. Note that is currently not supported. The sitemapobject can have the following properties: -url: The URL of the sitemap (required if sitemapis included). This can be either an actual URL or a local file. -find: A string to search for in sitemap URLs (for examplehttps://somehere.test) (required if replaceis specified). -replace: The string to replace the findstring with (for examplehttp://localhost:3000) (required if findis specified). -exclude: An array of strings with regular expressions to exclude URLs from the sitemap (for example['\.pdf$']to exclude any PDF files). Since these are string representations of regular expressions, the backslash must be escaped (for example\\.). Exclude is performed before find/replace, so uses the original URLs from the sitemap. -urls: An array of URLs to be tested, which must contain at least one value. Each array entry can either be a URL string, or an object that contains aurl string and an optional settingsobject. This object can contain any of thesettingsvalues identified previously and overrides that setting for testing that URL. Theurlstring can be either an actual URL or a local file, as shown in the example below.

The following shows all available settings, except sitemap, with the default values.

`json { "puppeteerLaunchOptions": { "headless": "new" }, "reporters": ["cli", "html", "json"], "settings": { "brokenLinkTest": { "enabled": true, "failWarn": false, "checkWithBrowser": false, "ignoreDuplicates": true }, "consoleErrorTest": { "enabled": true, "failWarn": false }, "consoleOutputTest": { "enabled": true, "failWarn": false }, "externalScriptTest": { "enabled": true, "failWarn": true }, "horizontalScrollbarTest": { "enabled": true, "failWarn": false }, "pageLoadTimeTest": { "enabled": true, "failWarn": false, "pageLoadTimeThreshold": 2 }, "renderedHtmlTest": { "enabled": true, "failWarn": false } } }`

Numerous example config files used in the tests can be found in the test fixtures.

`Container images`

Provided with the Pagean project are container images configured to run the tests. All available image tags can be found in theregistry.gitlab.com/gitlab-ci-utils/pageanrepository. Details on each release can be found on the Releases page.

Note: any images in the gitlab-ci-utils/pagean/tmprepository are temporary images used during the build process and may be deleted at any point.

`$3`

In Puppeteer v19 the default cache location for installing the Chrome binary was changed from within the project'snode_modules folder to ~/.cache/puppeteer. To simplify execution in a container, thePUPPETEER_CACHE_DIRenvironment variable is set to install the Chrome binaries in/home/pptruser/.cache/puppeteerduring container build, so setting to another value before execution can cause errors where Puppeteer can't find the Chrome binary.

`GitLab CI configuration`

The following is an example job from a .gitlab-ci.yml file to use this image to run Pagean against another project in GitLab CI:

`yaml pagean: image: registry.gitlab.com/gitlab-ci-utils/pagean:latest stage: test script: - pagean artifacts: when: always paths: - pagean-results.html - pagean-results.json - pagean-external-scripts/`

`$3`

The container image shown previously includesserveandwait-oninstalled globally to run a local HTTP server for testing static content. The example job below illustrates how to use this for Pagean tests. The script starts the server in this project's./tests/fixtures/site directory and uses wait-onto hold the script until the server is running and returns a valid response. The referencedpageanrc file is the same as the project default pageanrc, but references all test URLs from the local server.

`yaml pagean: image: registry.gitlab.com/gitlab-ci-utils/pagean:latest stage: test before_script: # Start static server in test cases directory, discarding any console output, # and wait until the server is running. - serve ./tests/fixtures/site > /dev/null 2>&1 & wait-on http://localhost:3000 script: - pagean -c static-server.pageanrc.json artifacts: when: always paths: - pagean-results.html - pagean-results.json - pagean-external-scripts/`

`Linting pageanrc files`

A command line tool is also available to lint pageanrc files, which is executed as follows:

`Installed globally: > pageanrc-lint [options] [file] (default: "./.pageanrc.json")

Installed locally: > npx pageanrc-lint [options] [file] (default: "./.pageanrc.json")

Lint a pageanrc file

Options: -V, --version output the version number -j, --json output JSON with full details -h, --help display help for command`

The --jsonoption outputs the JSON results to stdout in all cases for consistency ([]if no errors found, so that it always outputs valid JSON). Otherwise errors are output to stderr, for example:

`sh .\tests\test-configs\cli-tests\some-test.pageanrc.json .puppeteerLaunchOptions must NOT have fewer than 1 properties .reporters[0] must be equal to one of the allowed values (cli, html, json) .settings.consoleOutputTest must be either Boolean or object with the appropriate properties .settings.pageLoadTimeTest.foo must NOT contain additional properties: "foo" .settings.pageLoadTimeTest must be either Boolean or object with the appropriate properties .sitemap must use 'find' and 'replace' together .urls[2].settings.consoleOutputTest must be either Boolean or object with the appropriate properties .urls[3] must be either URL string or object with the appropriate properties .urls[5] must have required property 'url'`

In some cases, a single error might result in multiple messages based on the options in the schema definition, especially for cases that can be either a single value or an object with specific properties (for example the errors for.settings.pageLoadTimeTest in the preceding example).

Note that because of the large number of options, which are dependent on an external project, the linting ofpuppeteerLaunchOptions` only checks that at
least one property is provided, it doesn't check the detailed settings.

Pagean

Installation

Install Pagean globally (as shown below), or locally, via
npm.

``npm install -g pagean`

`Usage`

Pagean runs as a command line tool and is executed as follows:

`Installed globally: > pagean [options]

Installed locally: > npx pagean [options]

Options: -V, --version output the version number -c, --config the path to the pagean configuration file (default: "./.pageanrc.json") -h, --help display help for command`

`Test cases`

`$3`

For any failing test, the dataarray in the test report includes the original URL and the response code or error as shown below.

Note: this test checks all links on the page, and doesn't respect mechanisms intended to limit web crawlers such asrobots.txt or noindex tags.

`$3`

The console output test fails if any output is written to the browser console. An array is included in the report with all entries, as shown below:

`json [ { "type": "error", "text": "Failed to load resource: net::ERR_NAME_NOT_RESOLVED", "location": { "url": "https://this.url.does.not.exist/file.js" } } ]`

`$3`

`html`

Each external script is saved only once, but is reported on any page where it's referenced.

`$3`

`json [ { "col": 9, "evidence": "

",
    "line": 6,
    "message": "The id value [ div1 ] must be unique.",
    "raw": " id=\"div1\"",
    "rule": {
      "description": "The value of id attributes must be unique.",
      "id": "id-unique",
      "link": "https://github.com/thedaviddias/HTMLHint/wiki/id-unique"
    },
    "type": "error"
  }
]


An htmlhintrc file can be specified in the configuration file, otherwise the
default "./.htmlhintrc" file is used (if it exists). See the Configuration
section below.
Note: this test may not find some errors in the original HTML that are
removed/resolved as the page is parsed (for example closing tags with no
opening tags).
Reports

Based on the reportersconfiguration, Pagean results may be displayed in the console and saved in two reports in the project root directory (any or all of the three):

- A JSON report namedpagean-results.json. - An HTML report namedpagean-results.html.

Both reports contain:

Complete reports for the example case in this project (the tests as specified in the project.pageanrc.jsonfile) can be found at the preceding links.

`Configuration`

Below is an example .pageanrc.jsonfile, which is broken into seven major properties:

The shorthand:

`json "settings": { "consoleErrorTest": true }`

is equivalent to the longhand:

`json "settings": { "consoleErrorTest": { "enabled": true, "failWarn": false } }`

The following shows all available settings, except sitemap, with the default values.

Numerous example config files used in the tests can be found in the test fixtures.

`Container images`

Note: any images in the gitlab-ci-utils/pagean/tmprepository are temporary images used during the build process and may be deleted at any point.

`$3`

`GitLab CI configuration`

The following is an example job from a .gitlab-ci.yml file to use this image to run Pagean against another project in GitLab CI:

`$3`

`Linting pageanrc files`

A command line tool is also available to lint pageanrc files, which is executed as follows:

`Installed globally: > pageanrc-lint [options] [file] (default: "./.pageanrc.json")

Installed locally: > npx pageanrc-lint [options] [file] (default: "./.pageanrc.json")

Lint a pageanrc file

Options: -V, --version output the version number -j, --json output JSON with full details -h, --help display help for command`