Implementation of a `fetch` that extends the implementation from `node-fetch` to add an HTTP cache using a local cache folder for crawling purpose.
npm install fetch-filecache-for-crawlingfetch function that extends the implementation
fetch to add an HTTP cache using a local cache folder.
refresh parameter. By default, the cache follows HTTP expiration rules but
once will make the cache behave completely
npm install fetch-filecache-for-crawling.
js
const fetch = require('fetch-filecache-for-crawling');
// URLs to crawl, some of which may be identical
let urls = [
'https://caniuse.com/data.json',
'https://caniuse.com/data.json'
]
Promise.all(urls.map(url =>
fetch(url, { logToConsole: true })
.then(response => response.json())
.then(json => console.log(Object.keys(json.data).length +
' entries in Can I Use'))
)).catch(err => console.error(err));
`
Configuration
On top of usual fetch options, the following optional parameters can be
passed to fetch in the options parameter to change default behavior:
- cacheFolder: the name of the cache folder to use. By default, the code caches all files in a folder named .cache.
- resetCache: set to true to empty the cache folder when the application starts. Defaults to false. Note that the cache folder will only be reset once, regardless of whether the parameter is set to true in subsequent calls to fetch.
- refresh: the refresh strategy to use for the cache. Values can be one of:
- force: Always consider that the content in the cache has expired
- default: Follow regular HTTP rules (that is the mode by default)
- once: Fetch the URL at least once, but consider the cached entry to then be valid throughout the lifetime of the application
- never: Always consider that the content in the cache is valid
- an integer: Consider that cache entries are valid for the given period of time (in seconds)
- logToConsole: set to true to output progress messages to the console. Defaults to false. All messages start with the ID of the request to be able to distinguish between them.
For instance, you may do:
`js
const fetch = require('fetch-filecache-for-crawling');
fetch('https://www.w3.org/', {
resetCache: true,
cacheFolder: 'mycache',
logToConsole: true
}).then(response => {});
`
Configuration parameters may also be set for all requests programmatically by calling fetch.setParameter(name, value) where name is the name of the parameter to set and value the value to set it to. Note parameters passed in options` take precedence).