Dataset utilities for batching, shuffling, and splitting data in Starlight ML
npm install starlight-datasetDataset class)
map, filter, shuffle, etc.)
bash
npm install starlight-dataset
`
Or import directly in your Starlight environment:
`js
import { Dataset, dataset } from "starlight-dataset";
`
---
Basic Usage
$3
`js
import { dataset } from "starlight-dataset";
const ds = dataset([1, 2, 3, 4, 5]);
`
---
$3
`js
const processed = ds
.map(x => x * 2)
.filter(x => x > 5);
processed.toArray();
// [6, 8, 10]
`
---
Shuffling
`js
const shuffled = ds.shuffle();
`
Deterministic shuffle with seed:
`js
const shuffled = ds.shuffle(0.42);
`
---
Batching
`js
const batches = ds.batch(2);
batches.toArray();
// [ [1, 2], [3, 4], [5] ]
`
---
Train / Test Split
`js
const { train, test } = ds.split(0.8);
train.size(); // 4
test.size(); // 1
`
Disable shuffle if needed:
`js
ds.split(0.8, false);
`
---
Pairing Features & Labels
`js
import { fromPairs } from "starlight-dataset";
const X = [[1], [2], [3]];
const y = [2, 4, 6];
const paired = fromPairs(X, y);
paired.toArray();
// [ { x: [1], y: 2 }, { x: [2], y: 4 }, { x: [3], y: 6 } ]
`
---
Dataset API
$3
| Method | Description |
| ------------------------ | ----------------------- |
| map(fn) | Transform each element |
| filter(fn) | Filter elements |
| shuffle(seed?) | Shuffle dataset |
| batch(size) | Create batches |
| split(ratio, shuffle?) | Train/test split |
| take(n) | Take first n elements |
| skip(n) | Skip first n elements |
| repeat(times) | Repeat dataset |
| size() | Dataset size |
| toArray()` | Convert to array |