A TypeScript client for Apache Spark Connect. Build Spark logical plans entirely in TypeScript.
npm install ts-spark-connectorTypeScript client for Apache Spark Connect.
Construct Spark logical plans entirely in TypeScript and run them against a Spark Connect server.
- Build Spark logical plans using a fluent, PySpark-style API in TypeScript
- Evaluate transformations locally or stream results via Arrow
- Tagless Final DSL design with support for multiple backends
- Composable, immutable, and strongly typed DataFrame operations
- Column expressions (col, .gt, .alias, .and, etc.)
- Compatible with Spark Connect Protobuf and
spark-submit --class org.apache.spark.sql.connect.service.SparkConnectServer
- Set operations (UNION, INTERSECT, EXCEPT) with by_name, is_all, and allow_missing_columns
- Spark-compatible joins with configurable join types
- Session-aware execution (no global singletons)
- Plan viz / AST dump: export client AST to JSON & Mermaid
- Ready-to-run examples in examples/
``bash`
git clone https://github.com/BaldrVivaldelli/ts-spark-connector
cd ts-spark-connector
npm install
> You need a running Spark Connect server. See spark-server/README.md for a ready-to-use Docker setup, or run your own server.
`ts
import { SparkSession } from "./src/client/session";
import { col } from "./src/engine/column";
const session = SparkSession.builder()
// optional: auth / TLS
.getOrCreate();
const people = session.read
.option("delimiter", "\t")
.option("header", "true")
.csv("/data/people.tsv");
const purchases = session.read
.option("delimiter", "\t")
.option("header", "true")
.csv("/data/purchases.tsv");
await people
.join(purchases, col("id").eq(col("user_id")), "left")
.select("name", "product", "amount")
.filter(col("amount").gt(100))
.show();
`
`ts
const p2024 = purchases.filter(col("year").eq(2024));
const p2025 = purchases.filter(col("year").eq(2025));
await p2024.union(p2025, { is_all: true, by_name: false })
.limit(5)
.show();
`
Inspect the client-side plan (before server optimization):
`ts
const df = purchases
.select("user_id", "product", "amount")
.filter(col("amount").gt(100))
.orderBy(col("user_id").descNullsLast());
console.log(df.toClientASTJSON()); // JSON AST
console.log(df.toClientASTMermaid()); // Mermaid diagram
console.log(df.toSparkLogicalPlanJSON());// Client logical plan
console.log(df.toProtoJSON()); // Spark Connect proto
`
> Tip: write these strings to disk (.mmd, .json) and publish them as CI artifacts.
`ts`
const session = SparkSession.builder()
.enableTLS({
keyStorePath: "./certs/keystore.p12",
keyStorePassword: "password",
trustStorePath: "./certs/cert.crt",
trustStorePassword: "password",
})
.getOrCreate();
| Component | Supported / Tested |
|------------------|------------------------------------|
| Spark Connect | 3.5.x |
| Scala ABI (JAR) | 2.12 (spark-connect_2.12) |
| Node.js | 18, 20, 22 |
| OS | Linux (CI); macOS (local) |
> Planned: add CI jobs for macOS/Windows; update table as coverage expands.
| Feature | Supported |
|------------------------------------------------------------------------|-----------|
| CSV Reading | β
|
| Filtering | β
|
| Projection / Alias | β
|
| Arrow decoding (.show()) | β
|col
| Column expressions (, .gt, .and, .alias, etc.) | β
|groupBy().agg({...})
| DSL abstraction (Tagless Final) | β
|
| Joins (configurable types) | β
|
| Aggregation () | β
|distinct()
| Distinct (, dropDuplicates(...)) | β
|orderBy(...)
| Sorting (, sort(...)) | β
|limit(n)
| Limit () | β
|UNION
| Set operations (, INTERSECT, EXCEPT) | β
|withColumnRenamed(...)
| Column renaming () | β
|.d.ts
| Type declarations () | β
|isNull
| Modular compiler core (backend-agnostic) | β
|
| Tests (Unit + Integration + E2E) | β
|
| withColumn(...) | β
|
| when(...).otherwise(...) | β
|
| Window functions | β
|
| Null handling (, na.drop/fill/replace) | β
|partitionBy
| Parquet Reading | β
|
| JSON Reading | β
|
| DataFrameWriter (CSV/JSON/Parquet/ORC/Avro) | β
|
| Write , bucketBy, sortBy | β
|summary()
| describe(), | β
|explode/posexplode
| unionByName(...) | β
|
| Complex types + | β
|from_json
| JSON helpers (, to_json) | β
|simple/extended/formatted
| repartition(...) / coalesce(...) | β
|
| explain(...) () | β
|SparkSession.builder.config(...)
| | β
|createOrReplaceTempView
| Auth/TLS for Spark Connect | β
|
| spark.sql(...) | β
|
| Temp views () | β
|read.table
| Catalog (, saveAsTable) | β
|broadcast
| Plan viz / AST dump | β
|
| cache() / persist() / unpersist() | β οΈ Limited by Spark Connect |
| Join hints (, etc.) | β
|randomSplit(...)` | β
|
| sample(...),
| UDF (scalar) | β οΈ Limited by Spark Connect |
| UDAF / Vectorized UDF (Arrow) | β οΈ Limited by Spark Connect |
| Structured Streaming | β
|
| Watermark / triggers / output modes | β
|
| Lakehouse: Delta/Iceberg/Hudi | β |
| JDBC read/write | β |
| MLlib | β |
Apache-2.0