Scoop 🍨

![npm version](https://badge.fury.io/js/@harvard-lil%2Fscoop) ![JavaScript Style Guide](https://standardjs.com) ![Linting](https://github.com/harvard-lil/scoop/actions/workflows/lint.yml) ![Test suite](https://github.com/harvard-lil/scoop/actions/workflows/test.yml)

High-fidelity, browser-based, single-page web archiving library and CLI.

Use it in the terminal...
``bash scoop "https://lil.law.harvard.edu"`

... or in your Node.js project`javascript import { Scoop } from '@harvard-lil/scoop'

const capture = await Scoop.capture('https://lil.law.harvard.edu') const wacz = await capture.toWACZ()`

---

`Summary`


- About
- Main Features
- Getting Started
- Using Scoop on the command line
- Using Scoop as a JavaScript library
- Development
- FAQ
---
About
Scoop is a high fidelity, browser-based, web archiving capture engine for witnessing the web from the Harvard Library Innovation Lab. 
Fine-tune this custom web capture software to create robust single-page captures of the internet with accurate and complete provenance information. 
With extensive options for asset formats and inclusions, Scoop will create .warc, warc.gz or .wacz files to be stored by users and replayed using the web archive replay software of their choosing.
Scoop also comes with built-in support for the WACZ Signing and Verification specification, 
allowing users to cryptographically sign their captures. 
More info:
- "Witnessing the web is hard: Why and how we built the Scoop web archiving capture engine 🍨"

April 13 2023 - _lil.law.harvard.edu_
- "New Release: High Fidelity Capture Engine for Witnessing the Web 🍨"

March 28 2023 - _blogs.harvard.edu/perma_
👆 Back to the summary
---
Main Features

- High-fidelity, browser-based capture of singular web pages with no alterations
- Highly configurable
- Optional attachments: 
  - Provenance summary
  - Screenshot
  - Extracted videos with associated subtitles and metadata
  - PDF snapshot
  - DOM snapshot
  - SSL certificates
- Support for

.warc., .warc.gz and .wacz

 output formats
  - Support for the WACZ Signing and Verification specification
  - Optional preservation of _"raw"_ exchanges in WACZ files for later analysis or reprocessing _("wacz with raw exchanges"_)
$3

- 💾 Sample WACZ file captured with Scoop.

Playback software such as replayweb.page can be used to explore this sample capture.
- 📷 Entry points
- 📷 Web Capture
- 📷 Provenance Summary
- 📷 PDF Snapshot
- 📷 Embedded videos as attachments [[1]](/.github/assets/screenshot-video-as-attachment-1.png?raw=true) [[2]](/.github/assets/screenshot-video-as-attachment-2.png?raw=true)
👆 Back to the summary
---
Getting started
$3

Scoop requires Node.js 18+.

Other _recommended_ system-level dependencies: curl, python3 (for--capture-video-as-attachment option).

While the amount of resources Scoop needs is entirely dependent on what is being captured, a minimum of 4GB of RAM seems to be indicated for complex captures.

`$3`


This program has been written for UNIX-like systems and is expected to work on Linux, Mac OS, and Windows Subsystem for Linux.
$3

Scoop is available on npmjs.org and can be installed as follows:`bash

`As a CLI`


npm install -g @harvard-lil/scoop
As a library

npm install @harvard-lil/scoop --save
In both cases, you may need to install Playwright's dependencies: 

sudo npx playwright install-deps chromium


  Trouble installing the CLI?

- Make sure you are running Node.js 20-23 (node -v) - Permissions issues are a common when installingnpmpackages globally for the first time. See npm's documentation for solutions. - On certain systems, usinginstall-deps without the chromiumargument might be necessary:`bash sudo npx playwright install-deps`- npx may be used as an alternative to a global installation:`bash

`In a new folder`


npm init
npm install @harvard-lil/scoop
npx scoop "https://example.com"

👆 Back to the summary

---

`Using Scoop on the command line`

Here are a few examples of how the scoop command can be used to make a customized capture of a web page.

`bash

`This will capture a given url using the default settings.`


scoop "https://lil.law.harvard.edu" 
Unless specified otherwise, scoop will save the output of the capture as "./archive.wacz".

`We can change this with the` --output `/` -o `option`


scoop "https://lil.law.harvard.edu" -o my-collection/lil.wacz
But what if I want to change the output format itself?

scoop "https://lil.law.harvard.edu" -f warc -o my-collection/lil.warc
By default, Scoop runs in headless mode. 

I can turn the "headless" flag off to see what happens in Chromium during capture.

scoop "https://lil.law.harvard.edu" --headless false
Although it comes with "good defaults", scoop is highly configurable ...

timeout-related options are good 

scoop "https://lil.law.harvard.edu" --capture-video-as-attachment false --screenshot false --capture-window-x 320 --capture-window-y 480 --capture-timeout 30000 --max-capture-size 100000 --signing-url "https://example.com/sign"
... use --help to list the available options, and see what the defaults are.

scoop --help
Timeout-related options are good dials to turn first when trying to customize "how much" of a page to capture.

scoop "https://lil.law.harvard.edu" --capture-timeout 90000 --load-timeout 60000 --network-idle-timeout 30000


  See: Output of scoop --help 🔍

`Usage: scoop [options]

🍨 High-fidelity, browser-based, single-page web archiving library and CLI. More info: https://github.com/harvard-lil/scoop

Options: -v, --version Display Scoop and Scoop CLI version. -o, --output Output path. (default: "./archive.wacz") -f, --format Output format. (choices: "warc", "warc-gzipped", "wacz", "wacz-with-raw", default: "wacz") --json-summary-output If set, allows for saving a capture summary as JSON. Must be a path to .json file. --export-attachments-output If set, allows for exporting attachments (screenshot, certs, ...). Must be a path to an existing directory. --signing-url Authsign-compatible endpoint for signing WACZ file. --signing-token Authentication token to --signing-url, if needed. --screenshot Add screenshot step to capture? (choices: "true", "false", default: "true") --pdf-snapshot Add PDF snapshot step to capture? (choices: "true", "false", default: "false") --dom-snapshot Add DOM snapshot step to capture? (choices: "true", "false", default: "false") --capture-video-as-attachment Add capture video(s) as attachment(s) step to capture? (choices: "true", "false", default: "true") --capture-certificates-as-attachment Add capture certificate(s) as attachment(s) step to capture? (choices: "true", "false", default: "true") --provenance-summary Add provenance summary to capture? (choices: "true", "false", default: "true") --attachments-bypass-limits If active, attachments will not count towards time and size constraints imposed on capture (--capture-timeout, --max--capture-size). (choices: "true", "false", default: "true") --capture-timeout Maximum time allocated to capture process before hard cut-off, in ms. (default: 60000) --load-timeout Max time Scoop will wait for the page to load, in ms. (default: 20000) --network-idle-timeout Max time Scoop will wait for the in-browser networking tasks to complete, in ms. (default: 20000) --behaviors-timeout Max time Scoop will wait for the browser behaviors to complete, in ms. (default: 20000) --capture-video-as-attachment-timeout Max time Scoop will wait for the video capture process to complete, in ms. (default: 30000) --capture-certificates-as-attachment-timeout Max time Scoop will wait for the certificates capture process to complete, in ms. (default: 10000) --capture-window-x Width of the browser window Scoop will open to capture, in pixels. (default: 1600) --capture-window-y Height of the browser window Scoop will open to capture, in pixels. (default: 900) --max-capture-size Size limit for the capture's exchanges list, in bytes. (default: 209715200) --max-video-capture-size Size limit for the video attachment, in bytes. Scoop will not capture video attachments larger than this. (default: 209715200) --auto-scroll Should Scoop try to scroll through the page? (choices: "true", "false", default: "true") --auto-play-media Should Scoop try to autoplay

👆 Back to the summary

---

`Using Scoop as a JavaScript library`

Scoop can be used as a library in a Node.js project. Here are a few examples of how to programmatically capture web pages using theScoop.capture() method, which returns an instance of the Scoop class.

`javascript const capture = await Scoop.capture(url, options)`

`$3`


- List of available options for

Scoop.capture()

Scoop.toWACZ() method

Scoop.toWARC() method

Scoop.fromWACZ() method (experimental)


- Possible values of the

Scoop.state property

`$3`

javascript
import fs from 'fs/promises'
import { Scoop } from '@harvard-lil/scoop'

try { const capture = await Scoop.capture('https://lil.law.harvard.edu') const wacz = await capture.toWACZ() await fs.writeFile('archive.wacz', Buffer.from(wacz)) } catch(err) { // ... }`

`$3`

javascript
import fs from 'fs/promises'
import { Scoop } from '@harvard-lil/scoop'
try {
  const capture = await Scoop.capture('https://lil.law.harvard.edu', {
    screenshot: true,
    pdfSnapshot: true,
    captureVideoAsAttachment: false,
    captureTimeout: 120 * 1000,
    loadTimeout: 60 * 1000,
    captureWindowX: 320,
    captureWindowY: 480
  })

const warc = await capture.toWARC() await fs.writeFile('archive.warc', Buffer.from(warc)) } catch(err) { // ... }`

`$3`

javascript
import { Scoop } from '@harvard-lil/scoop'
try {
  // "options" will be a copy of Scoop's default settings
  const options = Scoop.defaults
  // It therefore becomes easier to inspect said defaults ...
  console.log(options)
  // ... and edit existing values
  options.pdfSnapshot = true
  options.blocklist.push('/https?:\/\/foo/')
  const capture = Scoop.capture('https://lil.law.harvard.edu', options)

// ... } catch(err) { // ... }`

`$3`

javascript
import fs from 'fs/promises'
import { Scoop } from '@harvard-lil/scoop'
try {
  const capture = await Scoop.capture('https://lil.law.harvard.edu')
  const signedWacz = await capture.toWACZ(true, {
    url: 'https://example.com/sign',
    token: 'some-very-secret-token'
  })

await fs.writeFile('archive.wacz', Buffer.from(signedWacz)) } catch(err) { // ... }`

👆 Back to the summary

---

`FAQ`

> 🚧 Under construction

`$3`

Browser-based capture means that Scoop uses a browser - Chromium - to visit the web page to capture and collect resources.

Specifically, it uses an HTTP proxy to _"intercept"_ network exchanges as early as possible and preserve them _"as is"_.

`mermaid flowchart LR A[Scoop] B[Playwright] C[Chromium] D[Website] E[HTTP Proxy] A <--> |Controls| B B <--> C C <--> D A <-.-> |Capture| E <-.-> C`

The browser Scoop controls was installed specifically for programmatic access by Playwright, the underlying tool it uses to communicate with it, and is different from the default browser of the machine Scoop is running on. Additionally, Scoop creates a single-use, isolated browsing context for every capture it makes.

More info: - https://playwright.dev/docs/browsers - https://playwright.dev/docs/api/class-browsercontext

`$3`

Not yet - for security reasons - but we're working on it.

Although Playwright supports loading browser profiles doing so: - Breaks context isolation - May lead to the presence of credentials / tokens in the captured exchanges

Help us design this feature: https://github.com/harvard-lil/scoop/issues/118

`$3`

Yes, and unless specified otherwise.

Namely: - If the main URL to capture is _not_ a web page _(for example: a PDF file)_, it will be captured using curl. - Videos captured as attachments are captured outside of the browser using yt-dlp. - Same goes for certificates, captured as attachments via crip. - Favicons may be captured out-of-band using curl, if not intercepted during capture.

Exchanges captured in that context still go through Scoop's HTTP proxy, with the exception of _crip_.

`mermaid flowchart LR A[Scoop] B[curl] C[Resource] D[HTTP Proxy] A <--> |Controls| B B <--> C A <-.-> |Capture| D <-.-> B

`$3`

The includeRaw option of Scoop.toWACZ() allows for adding a folder named _"raw"_ in the WACZ file, which contains a copy of unprocessed HTTP exchanges coming directly from Scoop's HTTP proxy.

This feature may be used to preserve finer elements that would otherwise be lost, such as ill-formed HTTP headers, and could be relevant in certain contexts such as forensic analysis.

In order to prevent unnecessary use of storage, Scoop only keeps in _"/raw"_ the contents of exchanges it assesses are presented differently in WARCs. In practice, this most often means the bodies of HTTP exchanges are not included in the _"/raw"_ files because the WARCs already contain the same data.

Experimental: WACZ files stored with the includeRaw option can be ingested by Scoop for analysis and processing via the Scoop.fromWACZ() method.

`$3`

In certain cases, running Scoop in _"headful"_ mode might yield better results.

Passing --headless false to the CLI or { headless: false } to the library will instruct Scoop to run Chromium in headful mode.

Simulating a graphical output is necessary when running Scoop in headful mode on a server. The following command can be used for that purpose:

`bash xvfb-run --auto-servernum -- scoop "https://lil.law.harvard.edu" --headless false`

👆 Back to the summary

---

`Development`

`$3`


This codebase uses the Standard JS coding style. 
-

npm run lint

 can be used to check formatting.
-

npm run lint-autofix

 can be used to check formatting _and_ automatically edit files accordingly when possible.
- Most IDEs can be configured to automatically check and enforce this coding style.
$3

JSDoc is used for both documentation and loose type checking purposes on this project.
$3

This project uses Node.js' built-in test runner.

`bash npm run test`

#### Tests-specific environment variables The following environment variables allow for testing features requiring access to a third-party server.

These are optional, and can be added to a local .env file which will be automatically interpreted by the test runner.

| Name | Description | | --- | --- | |TEST_WACZ_SIGNING_URL| URL of an authsign-compatible endpoint for signing WACZ files. To run such an endpoint locally, usenpm run dev-signer, which will overwrite .env and set this variable to http://localhost:5000/sign; see .services/signer.| |TEST_WACZ_SIGNING_TOKEN | If required by the server at TEST_WACZ_SIGNING_URL, an authentication token. |

`$3`

`bash

`Runs test suite`


npm run test
Runs linter

npm run lint
Runs linter and attempts to automatically fix issues

npm run lint-autofix
Runs a local instance of wacz-signer for test purposes (see "Testing" section)

npm run dev-signer
Step-by-step NPM publishing helper

npm run publish-util

👆 Back to the summary

Scoop 🍨

High-fidelity, browser-based, single-page web archiving library and CLI.

Use it in the terminal...
``bash scoop "https://lil.law.harvard.edu"`

... or in your Node.js project`javascript import { Scoop } from '@harvard-lil/scoop'

const capture = await Scoop.capture('https://lil.law.harvard.edu') const wacz = await capture.toWACZ()`

---

`Summary`


- About
- Main Features
- Getting Started
- Using Scoop on the command line
- Using Scoop as a JavaScript library
- Development
- FAQ
---
About
Scoop is a high fidelity, browser-based, web archiving capture engine for witnessing the web from the Harvard Library Innovation Lab. 
Fine-tune this custom web capture software to create robust single-page captures of the internet with accurate and complete provenance information. 
With extensive options for asset formats and inclusions, Scoop will create .warc, warc.gz or .wacz files to be stored by users and replayed using the web archive replay software of their choosing.
Scoop also comes with built-in support for the WACZ Signing and Verification specification, 
allowing users to cryptographically sign their captures. 
More info:
- "Witnessing the web is hard: Why and how we built the Scoop web archiving capture engine 🍨"

April 13 2023 - _lil.law.harvard.edu_
- "New Release: High Fidelity Capture Engine for Witnessing the Web 🍨"

March 28 2023 - _blogs.harvard.edu/perma_
👆 Back to the summary
---
Main Features

- High-fidelity, browser-based capture of singular web pages with no alterations
- Highly configurable
- Optional attachments: 
  - Provenance summary
  - Screenshot
  - Extracted videos with associated subtitles and metadata
  - PDF snapshot
  - DOM snapshot
  - SSL certificates
- Support for

.warc., .warc.gz and .wacz

 output formats
  - Support for the WACZ Signing and Verification specification
  - Optional preservation of _"raw"_ exchanges in WACZ files for later analysis or reprocessing _("wacz with raw exchanges"_)
$3

- 💾 Sample WACZ file captured with Scoop.

Playback software such as replayweb.page can be used to explore this sample capture.
- 📷 Entry points
- 📷 Web Capture
- 📷 Provenance Summary
- 📷 PDF Snapshot
- 📷 Embedded videos as attachments [[1]](/.github/assets/screenshot-video-as-attachment-1.png?raw=true) [[2]](/.github/assets/screenshot-video-as-attachment-2.png?raw=true)
👆 Back to the summary
---
Getting started
$3

Scoop requires Node.js 18+.

Other _recommended_ system-level dependencies: curl, python3 (for--capture-video-as-attachment option).

While the amount of resources Scoop needs is entirely dependent on what is being captured, a minimum of 4GB of RAM seems to be indicated for complex captures.

`$3`


This program has been written for UNIX-like systems and is expected to work on Linux, Mac OS, and Windows Subsystem for Linux.
$3

Scoop is available on npmjs.org and can be installed as follows:`bash

`As a CLI`


npm install -g @harvard-lil/scoop
As a library

npm install @harvard-lil/scoop --save
In both cases, you may need to install Playwright's dependencies: 

sudo npx playwright install-deps chromium


  Trouble installing the CLI?

`In a new folder`


npm init
npm install @harvard-lil/scoop
npx scoop "https://example.com"

👆 Back to the summary

---

`Using Scoop on the command line`

Here are a few examples of how the scoop command can be used to make a customized capture of a web page.

`bash

`This will capture a given url using the default settings.`


scoop "https://lil.law.harvard.edu" 
Unless specified otherwise, scoop will save the output of the capture as "./archive.wacz".

`We can change this with the` --output `/` -o `option`


scoop "https://lil.law.harvard.edu" -o my-collection/lil.wacz
But what if I want to change the output format itself?

scoop "https://lil.law.harvard.edu" -f warc -o my-collection/lil.warc
By default, Scoop runs in headless mode. 

I can turn the "headless" flag off to see what happens in Chromium during capture.

scoop "https://lil.law.harvard.edu" --headless false
Although it comes with "good defaults", scoop is highly configurable ...

timeout-related options are good 

scoop "https://lil.law.harvard.edu" --capture-video-as-attachment false --screenshot false --capture-window-x 320 --capture-window-y 480 --capture-timeout 30000 --max-capture-size 100000 --signing-url "https://example.com/sign"
... use --help to list the available options, and see what the defaults are.

scoop --help
Timeout-related options are good dials to turn first when trying to customize "how much" of a page to capture.

scoop "https://lil.law.harvard.edu" --capture-timeout 90000 --load-timeout 60000 --network-idle-timeout 30000


  See: Output of scoop --help 🔍

`Usage: scoop [options]

🍨 High-fidelity, browser-based, single-page web archiving library and CLI. More info: https://github.com/harvard-lil/scoop

👆 Back to the summary

---

`Using Scoop as a JavaScript library`

`javascript const capture = await Scoop.capture(url, options)`

`$3`


- List of available options for

Scoop.capture()

Scoop.toWACZ() method

Scoop.toWARC() method

Scoop.fromWACZ() method (experimental)


- Possible values of the

Scoop.state property

`$3`

javascript
import fs from 'fs/promises'
import { Scoop } from '@harvard-lil/scoop'

try { const capture = await Scoop.capture('https://lil.law.harvard.edu') const wacz = await capture.toWACZ() await fs.writeFile('archive.wacz', Buffer.from(wacz)) } catch(err) { // ... }`

`$3`

javascript
import fs from 'fs/promises'
import { Scoop } from '@harvard-lil/scoop'
try {
  const capture = await Scoop.capture('https://lil.law.harvard.edu', {
    screenshot: true,
    pdfSnapshot: true,
    captureVideoAsAttachment: false,
    captureTimeout: 120 * 1000,
    loadTimeout: 60 * 1000,
    captureWindowX: 320,
    captureWindowY: 480
  })

const warc = await capture.toWARC() await fs.writeFile('archive.warc', Buffer.from(warc)) } catch(err) { // ... }`

`$3`

javascript
import { Scoop } from '@harvard-lil/scoop'
try {
  // "options" will be a copy of Scoop's default settings
  const options = Scoop.defaults
  // It therefore becomes easier to inspect said defaults ...
  console.log(options)
  // ... and edit existing values
  options.pdfSnapshot = true
  options.blocklist.push('/https?:\/\/foo/')
  const capture = Scoop.capture('https://lil.law.harvard.edu', options)

// ... } catch(err) { // ... }`

`$3`

javascript
import fs from 'fs/promises'
import { Scoop } from '@harvard-lil/scoop'
try {
  const capture = await Scoop.capture('https://lil.law.harvard.edu')
  const signedWacz = await capture.toWACZ(true, {
    url: 'https://example.com/sign',
    token: 'some-very-secret-token'
  })

await fs.writeFile('archive.wacz', Buffer.from(signedWacz)) } catch(err) { // ... }`

👆 Back to the summary

---

`FAQ`

> 🚧 Under construction

`$3`

Browser-based capture means that Scoop uses a browser - Chromium - to visit the web page to capture and collect resources.

Specifically, it uses an HTTP proxy to _"intercept"_ network exchanges as early as possible and preserve them _"as is"_.

`mermaid flowchart LR A[Scoop] B[Playwright] C[Chromium] D[Website] E[HTTP Proxy] A <--> |Controls| B B <--> C C <--> D A <-.-> |Capture| E <-.-> C`

More info: - https://playwright.dev/docs/browsers - https://playwright.dev/docs/api/class-browsercontext

`$3`

Not yet - for security reasons - but we're working on it.

Although Playwright supports loading browser profiles doing so: - Breaks context isolation - May lead to the presence of credentials / tokens in the captured exchanges

Help us design this feature: https://github.com/harvard-lil/scoop/issues/118

`$3`

Yes, and unless specified otherwise.

Exchanges captured in that context still go through Scoop's HTTP proxy, with the exception of _crip_.

`mermaid flowchart LR A[Scoop] B[curl] C[Resource] D[HTTP Proxy] A <--> |Controls| B B <--> C A <-.-> |Capture| D <-.-> B

`$3`

The includeRaw option of Scoop.toWACZ() allows for adding a folder named _"raw"_ in the WACZ file, which contains a copy of unprocessed HTTP exchanges coming directly from Scoop's HTTP proxy.

This feature may be used to preserve finer elements that would otherwise be lost, such as ill-formed HTTP headers, and could be relevant in certain contexts such as forensic analysis.

Experimental: WACZ files stored with the includeRaw option can be ingested by Scoop for analysis and processing via the Scoop.fromWACZ() method.

`$3`

In certain cases, running Scoop in _"headful"_ mode might yield better results.

Passing --headless false to the CLI or { headless: false } to the library will instruct Scoop to run Chromium in headful mode.

Simulating a graphical output is necessary when running Scoop in headful mode on a server. The following command can be used for that purpose:

`bash xvfb-run --auto-servernum -- scoop "https://lil.law.harvard.edu" --headless false`

👆 Back to the summary

---

`Development`

`$3`


This codebase uses the Standard JS coding style. 
-

npm run lint

 can be used to check formatting.
-

npm run lint-autofix

 can be used to check formatting _and_ automatically edit files accordingly when possible.
- Most IDEs can be configured to automatically check and enforce this coding style.
$3

JSDoc is used for both documentation and loose type checking purposes on this project.
$3

This project uses Node.js' built-in test runner.

`bash npm run test`

#### Tests-specific environment variables The following environment variables allow for testing features requiring access to a third-party server.

These are optional, and can be added to a local .env file which will be automatically interpreted by the test runner.

`$3`

`bash

`Runs test suite`


npm run test
Runs linter

npm run lint
Runs linter and attempts to automatically fix issues

npm run lint-autofix
Runs a local instance of wacz-signer for test purposes (see "Testing" section)

npm run dev-signer
Step-by-step NPM publishing helper

npm run publish-util

👆 Back to the summary

@harvard-lil/scoop

Scoop 🍨

Summary

About

Main Features

$3

Getting started

$3

$3

$3

As a CLI

As a library

In both cases, you may need to install Playwright's dependencies:

In a new folder

Using Scoop on the command line

This will capture a given url using the default settings.

Unless specified otherwise, scoop will save the output of the capture as "./archive.wacz".

We can change this with the --output / -o option

But what if I want to change the output format itself?

By default, Scoop runs in headless mode.

I can turn the "headless" flag off to see what happens in Chromium during capture.

Although it comes with "good defaults", scoop is highly configurable ...

timeout-related options are good

... use --help to list the available options, and see what the defaults are.

Timeout-related options are good dials to turn first when trying to customize "how much" of a page to capture.

Using Scoop as a JavaScript library

$3

$3

$3

$3

$3

FAQ

$3

$3

$3

$3

$3

Development

$3

$3

$3

$3

Runs test suite

Runs linter

Runs linter and attempts to automatically fix issues

Runs a local instance of wacz-signer for test purposes (see "Testing" section)

Step-by-step NPM publishing helper

@harvard-lil/scoop

Scoop 🍨

Summary

About

Main Features

$3

Getting started

$3

$3

$3

As a CLI

As a library

In both cases, you may need to install Playwright's dependencies:

In a new folder

Using Scoop on the command line

This will capture a given url using the default settings.

Unless specified otherwise, scoop will save the output of the capture as "./archive.wacz".

We can change this with the --output / -o option

But what if I want to change the output format itself?

By default, Scoop runs in headless mode.

I can turn the "headless" flag off to see what happens in Chromium during capture.

Although it comes with "good defaults", scoop is highly configurable ...

timeout-related options are good

... use --help to list the available options, and see what the defaults are.

Timeout-related options are good dials to turn first when trying to customize "how much" of a page to capture.

Using Scoop as a JavaScript library

$3

$3

$3

$3

$3

FAQ

$3

`Summary`

`$3`

`As a CLI`

`In a new folder`

`Using Scoop on the command line`

`This will capture a given url using the default settings.`

`We can change this with the` --output `/` -o `option`

`Using Scoop as a JavaScript library`

`$3`

`$3`

`$3`

`$3`

`$3`

`FAQ`

`$3`

`$3`

`$3`

`$3`

`$3`

`Development`

`$3`

`$3`

`Runs test suite`

`Summary`

`$3`

`As a CLI`

`In a new folder`

`Using Scoop on the command line`

`This will capture a given url using the default settings.`

`We can change this with the` --output `/` -o `option`

`Using Scoop as a JavaScript library`

`$3`

`$3`

`$3`

`$3`

`$3`

`FAQ`

`$3`

`$3`

`$3`

`$3`

`$3`

`Development`

`$3`

`$3`

`Runs test suite`