starlight-dataset

A lightweight dataset utility library for the Starlight Machine Learning ecosystem.
It provides a clean abstraction for handling data, batching, shuffling, and train/test splitting—designed to work seamlessly with other Starlight ML packages.

---

Features

* Dataset abstraction (Dataset class)
* Immutable operations (map, filter, shuffle, etc.)
* Deterministic shuffling
* Batch generation
* Train / test split
* Works with regression, classification, clustering, and pipelines

---

Installation

bash

npm install starlight-dataset





Or import directly in your Starlight environment:

js

import { Dataset, dataset } from "starlight-dataset";





---



 Basic Usage



$3

js

import { dataset } from "starlight-dataset";



const ds = dataset([1, 2, 3, 4, 5]);





---



$3

js

const processed = ds

  .map(x => x * 2)

  .filter(x => x > 5);



processed.toArray();

// [6, 8, 10]





---



 Shuffling

js

const shuffled = ds.shuffle();





Deterministic shuffle with seed:

js

const shuffled = ds.shuffle(0.42);





---



 Batching

js

const batches = ds.batch(2);



batches.toArray();

// [ [1, 2], [3, 4], [5] ]





---



 Train / Test Split

js

const { train, test } = ds.split(0.8);



train.size(); // 4

test.size();  // 1





Disable shuffle if needed:

js

ds.split(0.8, false);





---



 Pairing Features & Labels

js

import { fromPairs } from "starlight-dataset";



const X = [[1], [2], [3]];

const y = [2, 4, 6];



const paired = fromPairs(X, y);



paired.toArray();

// [ { x: [1], y: 2 }, { x: [2], y: 4 }, { x: [3], y: 6 } ]





---



 Dataset API



$3



| Method                   | Description             |

| ------------------------ | ----------------------- |

|

map(fn)

                | Transform each element  |

|

filter(fn)

             | Filter elements         |

|

shuffle(seed?)

         | Shuffle dataset         |

|

batch(size)

            | Create batches          |

|

split(ratio, shuffle?)

 | Train/test split        |

|

take(n) | Take first n

 elements |

|

skip(n) | Skip first n

 elements |

|

repeat(times)

          | Repeat dataset          |

|

size()

                 | Dataset size            |

|

toArray()` | Convert to array |

---

Designed for Starlight ML

This package integrates naturally with:

* starlight-ml
* starlight-vec
* starlight-classifier
* starlight-regression
* starlight-pipeline
* starlight-train (future)

---

Philosophy

* Simple over clever
* Immutable over mutable
* Readable over magical
* Educational yet production-ready

---

License

MIT © Dominex Macedon