Parse html by CSS query selectors
npm install kirinuki-coreKirinuki is a library that convert any html to JSON using CSS selectors.
https://rike422.github.io/kirinuki-core
Parse string and build DOM by cheerio and extract JSON from that.
- browser(schema: Object, node: string)
- browser(schema: Object, node: string, context: object)
``javascript
import { node as kirinuki } from 'kirinuki-core';
const html =


;
const schema = {
topic: {
content: ".content",
contents: ".content"
}
}kirinuki(schema, html)
// > { topic: {
// content: 'Batman come back in Gossam City!'
// contents: [
// 'Batman come back in Gossam City!',
// 'Dr. Strange got into a traffic accident.',
// ]
// } }
`#### Text Node
If you want to scrape text node in A tag, you can do it in follow code
`javascriptconst html =
const schema =
{
topics: {
_unfold: true,
title: [".sub-news-list a", "text"],
link: ".sub-news-list a"
}
}kirinuki(schema, html)
`#### Auto complete
If url is a relative path and you want to change from that to absolute path, pass context object.
Relative paths are convert by
origin property
`javascript
const html = const context = {
origin: 'https://example.com'
}
const schema = {
unfoldTopics: {
_unfold: true,
content: ".news-list .content",
image: ".news-list img",
link: ".news-list a"
},
topics: {
contents: ".news-list .content",
images: ".news-list img",
links: ".news-list a"
}
}
kirinuki(schema, html, context)
// { unfoldTopics:
// [ { content: 'Batman come back in Gossam City!',
// image: 'https://example.com/assets/batman.png',
// link: 'https://example.com/batman/news/1' },
// { content: 'Dr. Strange got into a traffic accident.',
// image: 'https://example.com/assets/strange.png',
// link: 'https://example.com/dr_strage/news/1' } ],
// topics:
// { contents:
// [ 'Batman come back in Gossam City!',
// 'Dr. Strange got into a traffic accident.' ],
// images:
// [ 'https://example.com/assets/batman.png',
// 'https://example.com/assets/strange.png' ],
// links:
// [ 'https://example.com/batman/news/1',
// 'https://example.com/dr_strage/news/1' ] } }
``$3
scrape to Doucment or HTMLElement by DOM API
- browser(schema: Object, node: Document)
- browser(schema: Object, node: HTMLElement)
- browser(schema: Object, node: string)
- browser(schema: Object) // auto assign to window.document to node variable
`javascript
import { browser as kirinuki } from 'kirinuki-core';
const schema = {
topic: {
content: ".content",
contents: ".content"
}
}
kirinuki(schema)
// > { topic: {
// content: 'Batman come back in Gossam City!'
// contents: [
// 'Batman come back in Gossam City!',
// 'Dr. Strange got into a traffic accident.',
// ]
// } }
`#### Standalone js file
kirinuki.standalone.js` is builded at umd style, that is Included only libraries for browser javascript engien