A puppeteer-core enhancement with zyte smart proxy manager services.
Use puppeteer-core with
Smart Proxy Manager easily!
A wrapper over puppeteer-core to provide Zyte Smart Proxy Manager specific functionalities.
1. Install Zyte SmartProxy puppeteer-core
```
npm install zyte-smartproxy-puppeteer-core chrome-launcher axios
2. Create a file sample.js with following content and replace with your SPM Apikey
` javascript
const puppeteer = require('zyte-smartproxy-puppeteer-core');
const chromeLauncher = require('chrome-launcher');
const axios = require('axios');
(async () => {
// Initializing a Chrome instance manually
const chrome = await chromeLauncher.launch(
{chromeFlags: ['--proxy-server=http://proxy.zyte.com:8011', '--disable-site-isolation-trials']}
);
const response = await axios.get(http://localhost:${chrome.port}/json/version);
const { webSocketDebuggerUrl } = response.data;
// Connect to the Chrome instance
const browser = await puppeteer.connect({
spm_apikey: '
spm_host: 'http://proxy.zyte.com:8011',
ignoreHTTPSErrors: true,
browserWSEndpoint: webSocketDebuggerUrl,
static_bypass: false, // enable to save bandwidth (but may break some websites)
block_ads: false, // enable to save bandwidth (but may break some websites)
});
console.log('Before new page');
const page = await browser.newPage();
console.log('Opening page ...');
try {
await page.goto('https://toscrape.com/', {timeout: 180000});
} catch(err) {
console.log(err);
}
console.log('Taking a screenshot ...');
await page.screenshot({path: 'screenshot.png'});
await browser.close();
})();
`
Make sure that you're able to make https requests using Smart Proxy Manager by following this guide Fetching HTTPS pages with Zyte Smart Proxy Manager
3. Run sample.js using Node
` bash`
node sample.js
ZyteProxyPuppeteer.connect accepts all the arguments accepted by Puppeteer.connect and some
additional arguments defined below:
| Argument | Default Value | Description |
|----------|---------------|-------------|
| spm_apikey | undefined | Zyte Smart Proxy Manager API key that can be found on your zyte.com account. |spm_host
| | http://proxy.zyte.com:8011 | Zyte Smart Proxy Manager proxy host. |static_bypass
| | true | When true ZyteProxyPuppeteer will skip proxy use for static assets defined by static_bypass_regex or pass false to use proxy. |static_bypass_regex
| | /.*?\.(?:txt\|json\|css\|less\|gif\|ico\|jpe?g\|svg\|png\|webp\|mkv\|mp4\|mpe?g\|webm\|eot\|ttf\|woff2?)$/ | Regex to use filtering URLs for static_bypass. |block_ads
| | true | When true ZyteProxyPuppeteer will block ads defined by block_list using @cliqz/adblocker-puppeteer package. |block_list
| | ['https://secure.fanboy.co.nz/easylist.txt', 'https://secure.fanboy.co.nz/easyprivacy.txt'] | Block list to be used by ZyteProxyPuppeteer in order to initiate blocker enginer using @cliqz/adblocker-puppeteer and block ads |headers
| | {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'pass', 'X-Crawlera-Cookies': 'disable'} | List of headers to be appended to requests |
and static_bypass enabled (default). Try to disable them if you encounter any issues.- When using remote browser in
headless mode, values generated for some browser-specific headers are a bit different, which may be detected by websites. Try using 'X-Crawlera-Profile': 'desktop' in that case:
` javascript
const browser = await puppeteer.connect({
spm_apikey: '',
spm_host: 'http://proxy.zyte.com:8011',
ignoreHTTPSErrors: true,
browserWSEndpoint: webSocketDebuggerUrl,
headers: {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'desktop', 'X-Crawlera-Cookies': 'disable'}
});
``- Consider our new zyte-smartproxy-plugin for playwright-extra
and puppeteer-extra frameworks.