Chunking for CAR files. Split a single CAR into multiple CARs.
npm install carbites



Chunking for CAR files. Split a single CAR into multiple CARs.
``sh`
npm install carbites
Carbites supports 3 different strategies:
1. Simple (default) - fast but naive, only the first CAR output has a root CID, subsequent CARs have a placeholder "empty" CID.
2. Rooted - like simple, but creates a custom root node to ensure all blocks in a CAR are referenced.
3. Treewalk - walks the DAG to pack sub-graphs into each CAR file that is output. Every CAR has the same root CID, but contains a different portion of the DAG.
`js
import { CarSplitter } from 'carbites'
import { CarReader } from '@ipld/car'
import fs from 'fs'
const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const targetSize = 1024 1024 100 // chunk to ~100MB CARs
const splitter = new CarSplitter(bigCar, targetSize) // (simple strategy)
for await (const car of splitter.cars()) {
// Each car is an AsyncIterable`
}
⚠️ Note: The first CAR output has roots in the header, subsequent CARs have an empty root CID bafkqaaa as recommended.
Instead of an empty CID, carbites can generate a special root node for each split CAR that references all the blocks _and_ the original roots (only in the first CAR). To do this, use the RootedCarSplitter constructor. When reading/extracting data from the CARs, the root node should be discarded.
Example
`js
import { RootedCarSplitter } from 'carbites/rooted'
import { CarReader } from '@ipld/car/reader'
import * as dagCbor from '@ipld/dag-cbor'
import fs from 'fs'
const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const targetSize = 1024 1024 100 // chunk to ~100MB CARs
const splitter = new RootedCarSplitter(bigCar, targetSize)
const cars = splitter.cars()
// Every CAR has a single root - a CBOR node that is an tuple of /carbites/1,
// an array of root CIDs and an array of block CIDs.
// e.g. ['/carbites/1', ['bafkroot'], ['bafy1', 'bafy2']]
const { done, value: car } = await cars.next()
const reader = await CarReader.fromIterable(car)
const rootCids = await reader.getRoots()
const rootNode = dagCbor.decode(await reader.get(rootCids[0]))
console.log(rootNode[0]) // /carbites/1
console.log(rootNode[1]) // Root CIDs (only in first CAR)
/*
[
CID(bafybeictvyf6polqzgop3jt32owubfmsg3kl226omqrfte4eyidubc4rpq)
]
*/
console.log(rootNode[2]) // Block CIDs (all blocks in this CAR)
/*
[
CID(bafybeictvyf6polqzgop3jt32owubfmsg3kl226omqrfte4eyidubc4rpq),
CID(bafyreihcsxqhd6agqpboc3wrlvpy5bwuxctv5upicdnt3u2wojv4exxl24),
CID(bafyreiasq7d2ihbqm5xvhjjzlmzsensuadrpmpt2tkjsuwq42xpa34qevu)
]
*/
`
The root node is limited to 4MB in size (the largest message IPFS will bitswap). Depending on the settings used to construct the DAG in the CAR, this may mean a split CAR size limit of around 30GiB.
Every CAR file has the _same_ root CID but a different portion of the DAG. The DAG is traversed from the root node and each block is decoded and links extracted in order to determine which sub-graph to include in each CAR.
Example
`js
import { TreewalkCarSplitter } from 'carbites/treewalk'
import { CarReader } from '@ipld/car/reader'
import * as dagCbor from '@ipld/dag-cbor'
import fs from 'fs'
const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const [rootCid] = await bigCar.getRoots()
const targetSize = 1024 1024 100 // chunk to ~100MB CARs
const splitter = new TreewalkCarSplitter(bigCar, targetSize)
for await (const car of splitter.cars()) {
// Each car is an AsyncIterable`
const reader = await CarReader.fromIterable(car)
const [splitCarRootCid] = await reader.getRoots()
console.assert(rootCid.equals(splitCarRootCid)) // all cars will have the same root
}
Install the CLI tool to use Carbites from the comfort of your terminal:
`sh
npm i -g carbites-cli
API
class CarSplitter
* constructor(car: AsyncIterable
* cars(): AsyncGenerator
* static async fromBlob(blob: Blob, targetSize: number): CarSplitter
* static async fromIterable(iterable: AsyncIterable
* class CarJoiner
* constructor(cars: Iterable
* car(): AsyncGenerator
* class RootedCarSplitter
* class RootedCarJoiner
* class TreewalkCarSplitter
* class TreewalkCarJoiner$3
Split a CAR file into several smaller CAR files.
Import in the browser:
`js
import { CarSplitter } from 'https://cdn.skypack.dev/carbites'
`Import in Node.js:
`js
import { CarSplitter } from 'carbites'
`Note: This is an alias of
SimpleCarSplitter - the default strategy for splitting CARs.####
constructor(car: CarReader, targetSize: number)Create a new
CarSplitter for the passed CAR file, aiming to generate CARs of around targetSize bytes in size.####
cars(): AsyncGeneratorSplit the CAR file and create multiple smaller CAR files. Returns an
AsyncGenerator that yields the split CAR files (of type AsyncIterable).RootsReader interface from @ipld/car which means you can call getRoots(): Promise to obtain the root CIDs.####
static async fromBlob(blob: Blob, targetSize: number): CarSplitterConvenience function to create a new
CarSplitter from a blob of CAR file data.####
static async fromIterable(iterable: AsyncIterableConvenience function to create a new
CarSplitter from an AsyncIterable of CAR file data.$3
Join together split CAR files into a single big CAR.
Import in the browser:
`js
import { CarJoiner } from 'https://cdn.skypack.dev/carbites'
`Import in Node.js:
`js
import { CarJoiner } from 'carbites'
`Note: This is an alias of
SimpleCarJoiner - a joiner for the the default CAR splitting strategy.####
constructor(cars: IterableCreate a new
CarJoiner for joining the passed CAR files together.####
car(): AsyncGeneratorJoin the CAR files together and return the joined CAR.
$3
Split a CAR file into several smaller CAR files ensuring every CAR file contains a single root node that references all the blocks _and_ the original roots (only in the first CAR). When reading/extracting data from the CARs, the root node should be discarded.
Import in the browser:
`js
import { RootedCarSplitter } from 'https://cdn.skypack.dev/carbites/rooted'
`Import in Node.js:
`js
import { RootedCarSplitter } from 'carbites/rooted'
`CarSplitter.#### Root Node Format
The root node is a
dag-cbor node that is a tuple of the string /carbites/1, an array of root CIDs (only seen in first CAR) and an array of block CIDs (all the blocks in the CAR). e.g. ['/carbites/1', ['bafkroot'], ['bafy1', 'bafy2']].Note: The root node is limited to 4MB in size (the largest message IPFS will bitswap). Depending on the settings used to construct the DAG in the CAR, this may mean a split CAR size limit of around 30GiB.
$3
RootedCarSplitter.CarJoiner.$3
Split a CAR file into several smaller CAR files. Every CAR file has the _same_ root CID but a different portion of the DAG. The DAG is traversed from the root node and each block is decoded and links extracted in order to determine which sub-graph to include in each CAR.
Import in the browser:
`js
import { TreewalkCarSplitter } from 'https://cdn.skypack.dev/carbites/treewalk'
`Import in Node.js:
`js
import { TreewalkCarSplitter } from 'carbites/treewalk'
`CarSplitter.$3
TreewalkCarSplitter.CarJoiner.Releasing
You can publish by either running
npm publish in the dist directory or using npx ipjs publish`.Feel free to dive in! Open an issue or submit PRs.
Dual-licensed under MIT + Apache 2.0