A very simple tool to convert a Prostgres DDL file to JSON format. It is not very robust yet.
If the Node.js version is sufficient (≥ 14), the script runs as expected. If the version is insufficient: you will get an error message.
Error: This script requires Node.js version 14 or higher. Current version: 12.22.1
DDL Parser
is a Node.js command-line utility that reads a SQL DDL (Data Definition Language) file and converts its schema structure into a JSON representation. The JSON output includes metadata for schemas, sequences, tables, columns, constraints, indexes, and column comments, making it easier to work with database schemas programmatically.
-
Schema Parsing: Extracts schema definitions (e.g.,
CREATE SCHEMA
). -
Sequence Parsing: Handles SQL sequence definitions (e.g.,
CREATE SEQUENCE
). - Table Parsing: Processes table definitions, including columns, constraints, and indexes.
-
Comment Parsing: Associates column comments (
COMMENT ON COLUMN
) with the correct table and column. - JSON Output: Outputs a clean, hierarchical JSON representation of the DDL file.
You can install the script globally as an npm package:
npm install -g ddl-parser
Alternatively, you can clone the repository and run it directly using Node.js.
Run the parser with the path to your DDL file:
ddl-parser path/to/ddl.sql
Clone the repository, then run:
node ddl_parser.js path/to/ddl.sql
The JSON representation of the DDL will be printed to the terminal (stdout). You can redirect the output to a file if needed:
ddl-parser path/to/ddl.sql > output.json
The resulting JSON structure contains the following fields:
- schema: The name of the database schema.
-
type: The type of the object (e.g.,
sequence
,table
). - name: The name of the object.
- columns: A list of column definitions, including types and constraints.
- constraints: Primary key definitions and other table-level constraints.
- indexes: Index definitions for the table.
- comments: Column-level comments associated with the table.
For the following DDL:
CREATE TABLE new_ops.person (
person_id int8 NOT NULL,
first_name text NOT NULL,
last_name text NOT NULL
);
COMMENT ON COLUMN new_ops.person.first_name IS 'Person''s first name';
COMMENT ON COLUMN new_ops.person.last_name IS 'Person''s last name';
The output will be:
[
{
"schema": "new_ops",
"type": "table",
"name": "person",
"columns": [
{ "name": "person_id", "type": "int8", "constraints": ["NOT NULL"] },
{ "name": "first_name", "type": "text", "constraints": ["NOT NULL"] },
{ "name": "last_name", "type": "text", "constraints": ["NOT NULL"] }
],
"indexes": [],
"constraints": [],
"comments": {
"first_name": "Person's first name",
"last_name": "Person's last name"
}
}
]
The parser supports the following DDL constructs:
-
CREATE SCHEMA:
- Extracts schema names.
-
CREATE SEQUENCE:
- Extracts sequence definitions and metadata.
-
CREATE TABLE:
- Parses table definitions, including column names, types, and constraints (e.g.,
NOT NULL
).
- Parses table definitions, including column names, types, and constraints (e.g.,
-
CONSTRAINTS:
- Extracts primary key definitions.
-
INDEXES:
- Captures index definitions.
-
COMMENT ON COLUMN:
- Associates comments with the correct table and column.
CREATE SCHEMA my_schema;
CREATE SEQUENCE my_schema.my_sequence;
CREATE TABLE my_schema.my_table (...);
COMMENT ON COLUMN my_schema.my_table.column_name IS 'Column description';
The parser assumes specific formatting for the DDL file. If the file deviates from these assumptions, the parser may not work correctly. The key assumptions are:
-
Table Definitions:
-
CREATE TABLE
must be on a single line, followed by the table name. - Column definitions must each be on their own line, with the column name followed by its type and optional constraints (e.g.,
NOT NULL
). - The table definition must end with
);
on a single line.
Example:
CREATE TABLE schema_name.table_name ( column1 type NOT NULL, column2 type );
-
-
Sequence Definitions:
-
CREATE SEQUENCE
must be on a single line, followed by the sequence name. - Any sequence options (e.g.,
INCREMENT BY
,START
,CACHE
) must be on separate lines if included.
Example:
CREATE SEQUENCE schema_name.sequence_name INCREMENT BY 1 START 1 CACHE 1;
-
-
Constraints:
- Primary key constraints must be included as a single line within the table definition using the
CONSTRAINT
keyword.
Example:
CONSTRAINT table_name_pkey PRIMARY KEY (column1, column2)
- Primary key constraints must be included as a single line within the table definition using the
-
Column Comments:
- Comments must use
COMMENT ON COLUMN
syntax, and the column must be specified in the formatschema_name.table_name.column_name
.
Example:
COMMENT ON COLUMN schema_name.table_name.column_name IS 'Description of the column';
- Comments must use
-
Line-by-Line Parsing:
- The parser processes the DDL file line by line and does not handle multiline constructs like:
CREATE TABLE schema_name.table_name (column1 type, column2 type, ...);
- Each column definition and constraints must be on their own lines.
- The parser processes the DDL file line by line and does not handle multiline constructs like:
-
Parentheses Handling:
- The parser does not explicitly track the opening and closing parentheses of table definitions. It relies on the assumption that:
- Columns start immediately after
(
. - The table definition ends with
);
on its own line.
- Columns start immediately after
- The parser does not explicitly track the opening and closing parentheses of table definitions. It relies on the assumption that:
-
Sequences:
- The parser assumes that
CREATE SEQUENCE
statements are on a single line and any options follow in subsequent lines. - Unexpected formatting (e.g., multiline
CREATE SEQUENCE
statements) may cause parsing errors.
- The parser assumes that
-
Non-Standard SQL:
- The script assumes PostgreSQL DDL syntax and may not handle vendor-specific SQL features (e.g., MySQL, Oracle).
-
Partial Error Handling:
- The parser logs warnings for unexpected constructs but may fail for certain edge cases without recovering.
If you'd like to extend or modify the parser:
- Clone the repository:
git clone https://github.com/your-repo/ddl-parser.git cd ddl-parser
- Install dependencies (if any).
- Make modifications to the script, which uses a state machine approach for parsing.
To test your changes:
node ddl_parser.js path/to/test.sql
-
Misaligned Formatting:
- If the DDL file does not follow the expected formatting (e.g., multiline
CREATE TABLE
or inline comments mixed with code), the parser might fail or produce incorrect JSON output. - Ensure the file follows the documented assumptions for formatting.
- If the DDL file does not follow the expected formatting (e.g., multiline
-
Unexpected Syntax:
- Non-standard SQL or advanced PostgreSQL-specific features (e.g., inheritance, custom types) might not be parsed correctly.
- If you encounter an issue, consider manually editing the DDL file to simplify complex constructs before parsing.
-
Handling Warnings:
- Warnings are printed to the console when:
- A comment references an unknown table or column.
- The script encounters unrecognized or malformed lines.
- Warnings are printed to the console when:
-
Performance:
- The parser processes the DDL line by line, which is efficient for moderately sized files. However, extremely large DDL files might slow down parsing.
While the current version handles most common PostgreSQL DDL constructs, future improvements could include:
-
Support for Multiline Constructs:
- Add support for parsing multiline
CREATE TABLE
and other definitions by tracking opening and closing parentheses.
- Add support for parsing multiline
-
Improved Error Handling:
- Provide more detailed error messages for unrecognized syntax or misaligned formatting.
-
Extended Features:
- Parse additional PostgreSQL constructs, such as:
- Views (
CREATE VIEW
). - Index definitions with specific column mappings.
- Triggers (
CREATE TRIGGER
). - Foreign key constraints.
- Views (
- Parse additional PostgreSQL constructs, such as:
-
Cross-Database Compatibility:
- Extend support for other SQL dialects, such as MySQL, Oracle, or SQL Server.
-
Enhanced JSON Output:
- Include richer metadata, such as column defaults, check constraints, and extended attributes like storage options.
-
Configurable Options:
- Allow users to customize parsing behavior (e.g., enabling or disabling certain features) via command-line arguments.
This project is licensed under the MIT License.
Nurul Choudhury