Library for processing lineage from SQL
npm install @meta-sql/lineageA TypeScript library for extracting column-level lineage from SQL queries, implementing the OpenLineage Column Lineage Dataset Facet specification.
> ⚠️ Experimental: This library is currently in active development and may undergo significant changes. APIs, interfaces, and functionality may change without notice in future versions. Use with caution in production environments.
This library analyzes SQL SELECT statements to generate detailed column-level lineage information, tracking how data flows from input columns to output columns through various transformations like joins, aggregations, filters, and CTEs (Common Table Expressions).
- ✅ Column-level lineage extraction from SQL SELECT statements
- ✅ CTE (Common Table Expression) support with nested lineage tracking
- ✅ Direct transformations (IDENTITY)
- ✅ Schema-aware parsing with table and column validation
- ✅ OpenLineage specification compliance for interoperability
- ✅ TypeScript-first with comprehensive type definitions
``bash`
npm install @meta-sql/lineage node-sql-parseror
bun add @meta-sql/lineage node-sql-parser
`typescript
import { getLineage } from "@meta-sql/lineage";
import { Parser } from "node-sql-parser";
const parser = new Parser();
const ast = parser.astify("SELECT id, name FROM users") as Select;
const schema = {
namespace: "my_database",
tables: [{ name: "users", columns: ["id", "name", "email"] }],
};
const lineage = getLineage(ast, schema);
console.log(lineage);
// Output:
// {
// id: {
// inputFields: [{
// namespace: "my_database",
// name: "users",
// field: "id",
// transformations: [{ type: "DIRECT", subtype: "IDENTITY" }]
// }]
// },
// name: {
// inputFields: [{
// namespace: "my_database",
// name: "users",
// field: "name",
// transformations: [{ type: "DIRECT", subtype: "IDENTITY" }]
// }]
// }
// }
`
- Basic SELECT statements
- Column aliases (SELECT id as user_id)
- Common Table Expressions (CTEs)
- Nested subqueries
- Simple column references
Our development roadmap aligns with the OpenLineage Column Lineage Dataset Facet specification:
- ✅ DIRECT/TRANSFORMATION support for computed columns
- ✅ Mathematical operations (SELECT price * quantity)SELECT UPPER(name)
- ✅ String functions ()SELECT DATE_ADD(created_at, INTERVAL 1 DAY)
- ✅ Date functions ()COUNT
- ✅ DIRECT/AGGREGATION support for aggregation functions
- ✅ Basic aggregations (, SUM, AVG, MIN, MAX)SELECT MD5(email)
- ✅ Masking detection for privacy-preserving transformations
- ✅ Hash functions ()SELECT ANONYMIZE(ssn)
- ✅ Anonymization functions ()
- [ ] INDIRECT/JOIN lineage tracking
- Track columns used in JOIN conditions
- Multi-table relationship mapping
- [ ] INDIRECT/FILTER for WHERE clause dependencies
- Identify filtering columns that affect output
- [ ] INDIRECT/GROUP_BY for grouping dependencies
- Track GROUP BY columns impact on aggregations
- [ ] INDIRECT/SORT for ORDER BY clause tracking
- [ ] INDIRECT/WINDOW for window function dependencies
- [ ] INDIRECT/CONDITION for CASE WHEN and IF statements
- [ ] Complex JOIN types (LEFT, RIGHT, FULL OUTER)
- [ ] UNION and INTERSECT operations
- ✅ Recursive CTEs support
- [ ] Dataset-level lineage for operations affecting entire datasets
- [ ] Multi-statement support (DDL operations)
- ✅ Multiple SQL dialect support (PostgreSQL, MySQL, BigQuery, Snowflake)
Extracts column lineage from a SQL SELECT AST.
Parameters:
- select: Parsed SQL SELECT statement from node-sql-parserschema
- : Schema definition with table and column information
Returns: Column lineage mapping conforming to OpenLineage specification
`typescript
type Schema = {
namespace: string;
tables: Table[];
};
type Table = {
name: string;
columns: string[];
};
``
MIT License - see LICENSE for details.