An auto-parser of JSON into SQL.
Database class-based architecture
bash
npm install autosql
`
---
π Table of Contents
- Supported SQL Dialects
- Quick Start
- Configuration
- Metadata Format
- SSH Support
- Insert Options
- Core Interfaces
- Database Class
- Convenience Utilities
- Type Inference & Normalisation
- Config & Metadata Tools
- Metadata Inference & Preparation
- Insert Planning & Execution
---
𧬠Supported SQL Dialects
AutoSQL supports:
- MySQL (via mysql2)
- PostgreSQL (via pg)
Optional support for SSH tunneling is available via:
- ssh2
---
β‘ Quick Start
`ts
import { Database } from 'autosql';
const config = {
sqlDialect: 'mysql',
host: 'localhost',
user: 'root',
password: 'root',
database: 'mysql',
port: 3306
};
const data = [
{ id: 1, name: 'Alice', created_at: '2024-01-01' },
{ id: 2, name: 'Bob', created_at: '2024-01-02' }
];
let db: Database;
db = Database.create(config);
await db.establishConnection();
// Option 1: Direct insert if schema already exists or is managed externally
await db.autoInsertData({ table: 'target_table', data });
// Option 2: Fully automated schema + insert workflow
await db.autoSQL('target_table', data);
await db.closeConnection();
`
AutoSQL will:
- Infer metadata and key structure
- Create or alter the target table
- Batch insert rows
- Handle dialect-specific quirks automatically
- Automatically manage timestamps and optional history tracking (if configured)
---
βοΈ Configuration
`ts
export interface DatabaseConfig {
// Required connection settings
sqlDialect: 'mysql' | 'pgsql';
host?: string;
user?: string;
password?: string;
database?: string;
port?: number;
// Optional table target
// ALL SETTINGS BELOW HERE ARE OPTIONAL
schema?: string;
table?: string;
// Metadata control
metaData?: { [tableName: string]: MetadataHeader };
existingMetaData?: { [tableName: string]: MetadataHeader };
updatePrimaryKey?: boolean;
primaryKey?: string[];
// Table creation and charset settings
engine?: string;
charset?: string;
collate?: string;
encoding?: string;
// Type inference controls
pseudoUnique?: number; // The % of values that must be unique to be considered pseudoUnique. - defaults to 0.9 (90%)
autoIndexing?: boolean; // Automatically identify and add indexes to tables when altering / creating - defaults to TRUE
decimalMaxLength?: number; // Automatically round decimals to a maximum of X decimal places - defaults to 10
maxKeyLength?: number; // Limits indexes / primary keys from using columns that are longer than this length - defaults to 255
maxVarcharLength?: number; // Prevents varchar columns from exceeding this length, autoconverts this length of varchar to text columns -- defaults to 1024 characters
// Sampling controls
sampling?: number; // If provided data exceeds samplingMinimum rows, we sample this % of values for identifying uniques and column types β defaults to 0, allows values between 0 and 1
samplingMinimum?: number; // If provided data exceeds this row count, sampling kicks in β defaults to 100
// Insert strategy
insertType?: 'UPDATE' | 'INSERT'; // UPDATE automatically replaces non-primary key values with new values that are found
insertStack?: number; // Maximum number of rows to insert in one query - defaults to 100
safeMode?: boolean; // Prevent the altering of tables if needed - defaults to false
deleteColumns?: boolean; // Drop columns if needed - defaults to false
// Timestamp columns
addTimestamps?: boolean; // If TRUE, runs function ensureTimestamps as part of AutoSQL function. Which adds a dwh_created_at, dwh_modified_at and dwh_loaded_at timestamp columns that are automatically filled. -- defaults to TRUE
// Optional advanced insert modes
useStagingInsert?: boolean; // Enable temporary staging table insert pattern (if supported) -- defaults to TRUE
addHistory?: boolean; // Automatically duplicate rows into history tables before overwrites -- defaults to FALSE
historyTables?: string[]; // Names of the tables to have history tracked -- pairs with addHistory above
autoSplit?: boolean; // Automatically split large datasets (columns) across multiple tables if needed
addNested?: boolean; // Extracts nested JSON values into separate tables with composite primary keys -- defaults to FALSE
nestedTables?: string[]; // Nested Table names to apply nested extraction on -- if nesting columnA on tableB, this would be [tableB_columnA]
excludeBlankColumns?: boolean; // Exclude columns from insert queries if all their values are null or undefined -- defaults to TRUE
// Performance scaling
useWorkers?: boolean;
maxWorkers?: number;
// SSH tunneling support
sshConfig?: SSHKeys;
sshStream?: ClientChannel | null;
sshClient?: SSHClient;
}
`
---
π§ Metadata Format
AutoSQL can infer metadata from your data, or you can specify it manually:
`ts
meta_data: [
{
created_at: {
type: 'datetime',
length: 0,
allowNull: true,
default: 'CURRENT_TIMESTAMP',
index: true
}
},
{
name: {
type: 'varchar',
length: 50,
allowNull: false,
unique: true,
primary: true
}
}
]
`
---
π SSH Support
AutoSQL supports SSH tunneling for connecting to remote MySQL or PostgreSQL servers via an intermediate gateway.
Include the SSH configuration inside your DatabaseConfig object under the sshConfig key. AutoSQL will automatically establish the tunnel when establishConnection() is called.
`ts
const config: DatabaseConfig = {
...
sshConfig: {
username: 'ssh_user',
host: 'remote_host',
port: 22,
password: 'password',
private_key: 'PRIVATE_KEY_STRING',
private_key_path: '/path/to/key.pem',
source_address: 'localhost',
source_port: 3306,
destination_address: 'remote_sql_host',
destination_port: 3306
}
}
const db = Database.create(config);
await db.establishConnection();
// Tunnel is now active and DB connection is routed through it
`
---
π Insert Options
These control how data is batched, inserted, and optionally how schema alterations are handled.
$3
- insertType: 'UPDATE' | 'INSERT'
Determines behaviour on duplicate keys. UPDATE replaces non-primary key values with new ones. Defaults to 'INSERT'.
- insertStack: number
Maximum number of rows to insert in a single query. Defaults to 100.
- safeMode: boolean
If true, prevents any table alterations during runtime. Defaults to false.
- deleteColumns: boolean
Allows dropping of existing columns when altering tables. Defaults to false.
---
$3
- addTimestamps: boolean
If true, automatically adds and manages the following timestamp columns:
- dwh_created_at,
- dwh_modified_at,
- dwh_loaded_at
These are injected and updated during insert operations. Defaults to true.
This will also check a variety of common timestamp columns and will only add the equivalent if they do not exist in the existing data. As an example, modified timestamps will check modified_at, modify_at, modified_date, update_date etc.
---
$3
- useStagingInsert: boolean
Enables a staging table strategy where data is first inserted into a temporary table before being merged into the target. Useful for large or high-concurrency environments. Defaults to true.
- addHistory: boolean
If enabled, before overwriting rows (in UPDATE mode), AutoSQL writes the previous version into a corresponding history table. Defaults to false.
- historyTables: string[]
List of table names to track with history inserts. Used in conjunction with addHistory.
- autoSplit: boolean
Automatically splits datasets across multiple tables when the row size or column count exceeds allowed limits. Prevents failed inserts due to row size limits. Defaults to false
- addNested: boolean
If enabled, AutoSQL will extract nested objects or arrays from a field and insert them into a separate table.
Defaults to false.
- excludeBlankColumns: boolean
When enabled, columns that contain only null or undefined values across all rows are excluded from the generated insert queries and parameter lists. This helps to avoid inserting empty data unnecessarily.
Defaults to true.
- nestedTables: string[]
Used in conjunction with addNested. Specifies which nested structures should be extracted and written into their own relational tables.
Format: Each entry should follow the pattern: "
For each entry:
- If the dataset includes a table that matches ,
- And that table contains a column named ,
- And the column contains a JSON object or an array of JSON objects,
- AutoSQL will extract the nested structure into a new table named
Behavior:
- The new nested table will include the parent rowβs primary key (e.g., row1_id) to maintain relationships
- The nested object will define the child tableβs schema
- Arrays will be flattenedβeach item becomes a separate row in the nested table
$3
- useWorkers: boolean
Enables parallel worker threads for inserting batches. Improves performance with large datasets. Defaults to true
- maxWorkers: number
Maximum number of concurrent workers to use during insertion. Must be used with useWorkers. Defaults to 8
π Core Classes:
Database (with AutoSQL Utilities)
The Database class is the primary entry point into AutoSQL's workflow. It handles connection management and exposes high-level autoSQL methods for automated insertions, table creation, and metadata handling.
`ts
import { Database } from 'autosql';
const db = Database.create(config);
await db.establishConnection();
await db.autoConfigureTable(
'target_table', // table name
sampleData, // raw input data
null, // optional existing metadata
initialMeta // optional manually defined metadata
);
`
This is the core interface for managing connections, generating queries, and executing inserts.
$3
#### πΈ Static Method
- Database.create(config) β Returns an instance of either MySQLDatabase or PostgresDatabase based on config.
#### πΉ Core Methods
- getConfig() β Returns the full DatabaseConfig used to initialise this instance.
- updateSchema(schema: string) β Updates the current schema name being used.
- getDialect() β Returns the SQL dialect (mysql or pgsql).
- establishConnection() β Creates and stores a live database connection.
- testConnection() β Attempts to connect and returns success as a boolean.
- runQuery(queryOrParams: QueryInput | QueryInput[]) β Executes a SQL query or list of queries.
- startTransaction() / commit() / rollback() β Manages manual transaction blocks.
- runTransaction(queries: QueryInput[]) β Runs multiple queries inside a single transaction.
- runTransactionsWithConcurrency(queryGroups: QueryInput[][]) β Runs multiple query batches in parallel.
- closeConnection() β Safely closes the active DB connection.
#### πΉ Table and Schema Methods
- checkSchemaExists(schemaName: string) β Returns whether the given schema exists.
- createSchema(schemaName: string) β Creates the schema if it doesn't exist already.
- createTableQuery(table: string, headers: MetadataHeader) β Returns QueryInput[] to create a table.
- alterTableQuery(table: string, oldHeaders: MetadataHeader, newHeaders: MetadataHeader) β Returns QueryInput[] to alter an existing table.
- dropTableQuery(table: string) β Returns a QueryInput to drop a table.
- getTableMetaData(schema: string, table: string) β Fetches current metadata from the DB for a given table.
#### πΉ AutoSQL Methods (Exposed on db)
- autoSQL(table: string, data: Record
The simplest way to handle everything β metadata inference, schema changes, batching, inserting, history, workers, and nested structures β in one call.
Designed for production-ready automation and one-liner ingestion.
- autoInsertData(inputOrTable: InsertInput | string, inputData?: Record
Executes a full insert using the dialect-aware batching engine.
If inputRunQuery is true, queries are executed via runTransactionsWithConcurrency().
If false, a list of insert queries (QueryInput[]) is returned without running them.
- autoConfigureTable(inputOrTable: InsertInput | string, data?: Record
Determines whether a table should be created or altered based on metadata comparison.
If runQuery is true, schema changes are applied immediately via runTransactionsWithConcurrency().
If false, queries are returned for inspection.
- autoCreateTable(table: string, newMetaData: MetadataHeader, tableExists?: boolean, runQuery = true)
Creates a new table with the provided metadata.
If runQuery is false, returns the CREATE TABLE queries without executing them.
- autoAlterTable(table: string, tableChanges: AlterTableChanges, tableExists?: boolean, runQuery = true)
Alters an existing table using a computed diff.
Like above, runQuery controls whether to return or execute the queries.
- fetchTableMetadata(table: string)
Looks up metadata for the given table and returns { currentMetaData, tableExists }.
Used internally for decisions about schema creation or alteration.
- splitTableData(table: string, data: Record
If autoSplit is enabled, splits a wide dataset across multiple smaller tables.
Returns an array of InsertInput instructions for multi-table insert execution.
- handleMetadata(table: string, data: Record
Combines metadata inference and comparison into one call.
Returns an object with:
- currentMetaData: existing table metadata from the DB
- newMetaData: metadata inferred from new data
- mergedMetaData: result of merging existing and new metadata
- initialComparedMetaData: diff result, if any
- changes: schema changes needed for alignment
- getMetaData(config: DatabaseConfig, data: Record
Analyses sample data and returns a metadata map with type, length, nullability, uniqueness, and key suggestions.
- compareMetaData(oldMeta: MetadataHeader, newMeta: MetadataHeader)
Compares two metadata structures and returns:
- changes: an AlterTableChanges diff object
- updatedMetaData: the merged metadata structure
Each method is designed to work with the same Database instance.
---
π§° Convenience Utilities
AutoSQL exposes utilities that power autoSQL and can be used independently. These include metadata analysis, SQL formatting, batching, config validation, and more.
$3
- predictType(value) β Predicts SQL-compatible type (varchar, datetime, int, etc.) based on a single input value.
- collateTypes(typeSetOrArray) β Accepts a Set or Array of types and returns a single compatible SQL type.
- normalizeNumber(input, thousands, decimal) β Standardises numeric values to SQL-safe format with optional locale indicators.
- calculateColumnLength(column, value, sqlLookup) β Dynamically computes and updates column length and decimal precision based on input data.
- shuffleArray(array) β Randomly reorders an array (used for sampling).
- isObject(val) β Type-safe check to determine if a value is a non-null object.
$3
- validateConfig(config) β Validates and merges the provided DatabaseConfig with default settings.
- mergeColumnLengths(lengthA, lengthB) β Chooses the greater length definition between two metadata column states.
- setToArray(set) β Converts a Set to a regular array.
- normalizeKeysArray(keys) β Flattens and sanitizes arrays of key strings (e.g., for primary keys).
- isValidDataFormat(data) β Checks if the input is a valid array of plain objects suitable for inserts.
$3
- initializeMetaData(headers) β Constructs a default metadata object from column headers with default flags and null types.
- getDataHeaders(data, config) β Scans sample data to derive column names and infer initial metadata.
- predictIndexes(metaData, maxKeyLength?, primaryKey?, sampleData?) β Suggests primary keys, unique constraints, and indexes based on uniqueness, length limits, or configured priorities.
- updateColumnType(existingMeta, newValue) β Adjusts the type and attributes of a column based on new sample input.
$3
- splitInsertData(data, config) β Splits large datasets into batches that meet size and row count constraints.
- getInsertValues(metaData, row, dialectConfig) β Extracts a single row's values as a SQL-safe array, accounting for dialect-specific formatting.
- organizeSplitData(data, splitMetaData) β Partitions the dataset by metadata groups for multiple table insert strategies.
- organizeSplitTable(table, newMetaData, currentMetaData, dialectConfig) β Generates split metadata configurations based on structural divergence.
- estimateRowSize(metaData, dialect) β Estimates the byte size of a row using provided metadata and flags potential overflows.
- parseDatabaseMetaData(rows, dialectConfig?) β Transforms SQL column descriptions into AutoSQL-compatible metadata.
- tableChangesExist(alterTableChanges) β Returns true if the proposed table changes indicate schema modification is needed.
- isMetaDataHeader(obj) β Type guard to check if an object qualifies as a metadata header.
- isValidDataFormat(data) β Validates that the input is an array of row objects suitable for processing.
---
π³ Docker & Local Configuration
The tests/docker-init folder contains a prebuilt Docker Compose setup to run AutoSQL tools locally. This is especially useful for integration testing or working with supported databases in a consistent environment.
$3
`
/tests
βββ utils/
β βββ config.local.json β Configuration file used by tests and docker
βββ docker-init/
βββ docker-compose.yml β Starts all test containers
βββ .env β (Optional) Environment variables for overrides
`
$3
Navigate to the docker-init directory and run:
`bash
cd tests/docker-init
docker-compose up
`
This will spin up the configured containers (e.g., Postgres, MySQL, etc.) defined in the docker-compose.yml file.
$3
Make sure the contents of config.local.json in tests/utils/ match the credentials and ports defined in docker-compose.yml. This ensures AutoSQL tests can connect to the correct database containers.
For example, if docker-compose.yml sets the MySQL container like this:
`yaml
mysql:
image: mysql:8
ports:
- "3307:3306"
environment:
MYSQL_USER: testuser
MYSQL_PASSWORD: testpass
MYSQL_DATABASE: testdb
`
Then your config.local.json should include:
`json
{
"mysql": {
"host": "localhost",
"port": 3307,
"username": "testuser",
"password": "testpass",
"database": "testdb"
}
}
``