@uniweb/semantic-parser

A semantic parser for ProseMirror/TipTap content structures that helps bridge the gap between natural content writing and component-based web development.

What it Does

The parser transforms rich text editor content (ProseMirror/TipTap) into structured, semantic groups that web components can easily consume. It provides two complementary views of your content:

1. Sequence: An ordered list of all content elements (for rendering in document order)
2. Groups: Content organized into semantic sections (main content + items)

Installation

``bash npm install @uniweb/semantic-parser`

`Quick Start`

`js import { parseContent } from "@uniweb/semantic-parser";

// Your ProseMirror/TipTap document const doc = { type: "doc", content: [ { type: "heading", attrs: { level: 1 }, content: [{ type: "text", text: "Welcome" }], }, { type: "paragraph", content: [{ type: "text", text: "Get started today." }], }, ], };

// Parse the content const result = parseContent(doc);

// Access different views console.log(result.sequence); // Ordered array of elements console.log(result.title); // Main content fields at top level console.log(result.items); // Additional content groups`

`Output Structure`

`$3`

An ordered array of semantic elements preserving document order:

`js result.sequence = [ { type: "heading", level: 1, content: "Welcome" }, { type: "paragraph", content: "Get started today." } ]`

`$3`

Main content fields are at the top level. The items array contains additional content groups (created when headings appear after content), each with the same field structure:

`js result = { // Header fields (from headings) pretitle: "", // Heading before main title title: "Welcome", // Main heading subtitle: "", // Heading after main title subtitle2: "", // Third heading level

// Body fields paragraphs: ["Get started today."], links: [], // All links (including buttons, documents) imgs: [], videos: [], icons: [], lists: [], quotes: [], data: {}, // Structured data (tagged code blocks, forms, cards) headings: [], // Overflow headings after title/subtitle/subtitle2

// Additional content groups (from headings after content) items: [ { title: "Feature 1", paragraphs: [...], links: [...] }, { title: "Feature 2", paragraphs: [...], links: [...] } ],

// Ordered sequence for document-order rendering sequence: [...],

// Original document raw: { type: "doc", content: [...] } }`

`Common Use Cases`

`$3`

`js const content = parseContent(doc);

const title = content.title; const description = content.paragraphs.join(" "); const image = content.banner?.url;`

`$3`

`js const content = parseContent(doc);

// Main content console.log("Title:", content.title); console.log("Description:", content.paragraphs);

// Additional content groups content.items.forEach(item => { console.log("Section:", item.title); console.log("Content:", item.paragraphs); });`

`$3`

`js const { sequence } = parseContent(doc);

sequence.forEach(element => { switch(element.type) { case 'heading': renderHeading(element); break; case 'paragraph': renderParagraph(element); break; case 'image': renderImage(element); break; } });`

`Content Mapping Utilities`

The parser includes optional mapping utilities to transform parsed content into component-specific formats. Perfect for visual editors and component-based systems.

`$3`

Automatically transform content based on field types with context-aware behavior:

`js const schema = { title: { path: "title", type: "plaintext", // Auto-strips , , etc. maxLength: 60 // Auto-truncates intelligently }, excerpt: { path: "paragraphs", type: "excerpt", // Auto-creates excerpt from paragraphs maxLength: 150 }, image: { path: "imgs[0].url", type: "image", defaultValue: "/placeholder.jpg" } };

// Visual editor mode (default) - silent, graceful cleanup const data = mappers.extractBySchema(parsed, schema);

// Build mode - validates and warns const data = mappers.extractBySchema(parsed, schema, { mode: 'build' });`

Field Types: plaintext, richtext, excerpt, number, image, link

`$3`

`js import { parseContent, mappers } from "@uniweb/semantic-parser";

const parsed = parseContent(doc);

// Extract hero component data const heroData = mappers.extractors.hero(parsed); // { title, subtitle, kicker, description, image, cta, ... }

// Extract card data const cards = mappers.extractors.card(parsed, { useItems: true });

// Extract statistics const stats = mappers.extractors.stats(parsed); // [{ value: "12", label: "Partner Labs" }, ...]

// Extract navigation menu const nav = mappers.extractors.navigation(parsed);

// Extract features list const features = mappers.extractors.features(parsed);`

`$3`

Define custom mappings using schemas:

`js const schema = { brand: "pretitle", title: "title", subtitle: "subtitle", image: { path: "imgs[0].url", defaultValue: "/placeholder.jpg" }, actions: { path: "links", transform: links => links.map(l => ({ label: l.label, type: "primary" })) } };

const componentData = mappers.accessor.extractBySchema(parsed, schema);`

`$3`

- hero- Hero/banner sections -card- Card components -article- Article/blog content -stats- Statistics/metrics -navigation- Navigation menus -features- Feature lists -testimonial- Testimonials -faq- FAQ sections -pricing- Pricing tiers -team- Team members -gallery - Image galleries

See Mapping Patterns Guide for complete documentation.

`Rendering Content`

After extracting content, render it using a Text component that handles paragraph arrays, rich HTML, and formatting marks.

`$3`

`jsx import { parseContent, mappers } from '@uniweb/semantic-parser'; import { H1, P } from './components/Text';

const parsed = parseContent(doc); const hero = mappers.extractors.hero(parsed);

// Render extracted content <>

`{/ Handles arrays automatically /}``
`The Text component: - Handles arrays - Renders`["Para 1", "Para 2"]`as separate paragraphs - Supports rich HTML - Preserves formatting marks - Multi-line headings - Wraps multiple lines in semantic heading tags - Color marks - Supports` `and` `for visual emphasis`
`See Text Component Reference for implementation guide.`
`$3`
`Sanitize content at the engine level (during data preparation), not in components:`
``javascript import { parseContent, mappers } from '@uniweb/semantic-parser';`
`function prepareData(parsed) { const hero = mappers.extractors.hero(parsed); return { ...hero, title: mappers.types.sanitizeHtml(hero.title, { allowedTags: ['strong', 'em', 'mark', 'span'], allowedAttr: ['class', 'data-variant'] }) }; }``
`The parser provides sanitization utilities but doesn't enforce their use. Your engine decides when to sanitize based on security requirements.`
`Content Grouping`
`The parser supports two grouping modes:`
`$3`
`Groups are created based on heading patterns. A new group starts when: - A heading follows content - Multiple H1s appear (no main content created) - The heading level indicates a new section`
`Pretitle Detection: Any heading followed by a more important heading is automatically detected as a pretitle: - H3 before H1 → pretitle ✅ - H2 before H1 → pretitle ✅ - H6 before H5 → pretitle ✅ - H4 before H2 → pretitle ✅`
`No configuration needed - it just works naturally!`
`$3`
`When any horizontal rule (`---`) is present, the entire document uses divider-based grouping. Groups are split explicitly by dividers.`
`Text Formatting`
`Inline formatting is preserved as HTML tags:`
``js // Input: Text with bold mark // Output: "Text with bold"`
`// Input: Text with italic mark // Output: "Text with emphasis"`
`// Input: Link mark // Output: "Click here"`
`// Input: Span mark (bracketed spans) // Output: "This is highlighted text"``
`$3`
`Bracketed spans (`[text]{.class}`) are converted to` `elements with their attributes:`
``js // Input mark { type: "span", attrs: { class: "highlight", id: "note-1" } }`
`// Output HTML 'text'``
`Spans can have classes, IDs, and custom attributes. They combine with other marks—a span with bold becomes` text`.
Documentation
- Content Writing Guide: Learn how to structure content for optimal parsing
- API Reference: Complete API documentation with all element types
- Mapping Patterns Guide: Transform content to component-specific formats
- Text Component Reference: Reference implementation for rendering parsed content
- File Structure: Codebase organization
Use Cases
- Component-based websites: Extract structured data for React/Vue components
- Content management: Parse editor content into database-friendly structures
- Static site generation: Transform rich content into template-ready data
- Content analysis: Analyze document structure and content types
License
GPL-3.0-or-later

@uniweb/semantic-parser

A semantic parser for ProseMirror/TipTap content structures that helps bridge the gap between natural content writing and component-based web development.

What it Does

The parser transforms rich text editor content (ProseMirror/TipTap) into structured, semantic groups that web components can easily consume. It provides two complementary views of your content:

1. Sequence: An ordered list of all content elements (for rendering in document order)
2. Groups: Content organized into semantic sections (main content + items)

Installation

``bash npm install @uniweb/semantic-parser`

`Quick Start`

`js import { parseContent } from "@uniweb/semantic-parser";

// Parse the content const result = parseContent(doc);

`Output Structure`

`$3`

An ordered array of semantic elements preserving document order:

`js result.sequence = [ { type: "heading", level: 1, content: "Welcome" }, { type: "paragraph", content: "Get started today." } ]`

`$3`

Main content fields are at the top level. The items array contains additional content groups (created when headings appear after content), each with the same field structure:

// Additional content groups (from headings after content) items: [ { title: "Feature 1", paragraphs: [...], links: [...] }, { title: "Feature 2", paragraphs: [...], links: [...] } ],

// Ordered sequence for document-order rendering sequence: [...],

// Original document raw: { type: "doc", content: [...] } }`

`Common Use Cases`

`$3`

`js const content = parseContent(doc);

const title = content.title; const description = content.paragraphs.join(" "); const image = content.banner?.url;`

`$3`

`js const content = parseContent(doc);

// Main content console.log("Title:", content.title); console.log("Description:", content.paragraphs);

// Additional content groups content.items.forEach(item => { console.log("Section:", item.title); console.log("Content:", item.paragraphs); });`

`$3`

`js const { sequence } = parseContent(doc);

`Content Mapping Utilities`

The parser includes optional mapping utilities to transform parsed content into component-specific formats. Perfect for visual editors and component-based systems.

`$3`

Automatically transform content based on field types with context-aware behavior:

// Visual editor mode (default) - silent, graceful cleanup const data = mappers.extractBySchema(parsed, schema);

// Build mode - validates and warns const data = mappers.extractBySchema(parsed, schema, { mode: 'build' });`

Field Types: plaintext, richtext, excerpt, number, image, link

`$3`

`js import { parseContent, mappers } from "@uniweb/semantic-parser";

const parsed = parseContent(doc);

// Extract hero component data const heroData = mappers.extractors.hero(parsed); // { title, subtitle, kicker, description, image, cta, ... }

// Extract card data const cards = mappers.extractors.card(parsed, { useItems: true });

// Extract statistics const stats = mappers.extractors.stats(parsed); // [{ value: "12", label: "Partner Labs" }, ...]

// Extract navigation menu const nav = mappers.extractors.navigation(parsed);

// Extract features list const features = mappers.extractors.features(parsed);`

`$3`

Define custom mappings using schemas:

const componentData = mappers.accessor.extractBySchema(parsed, schema);`

`$3`

See Mapping Patterns Guide for complete documentation.

`Rendering Content`

After extracting content, render it using a Text component that handles paragraph arrays, rich HTML, and formatting marks.

`$3`

`jsx import { parseContent, mappers } from '@uniweb/semantic-parser'; import { H1, P } from './components/Text';

const parsed = parseContent(doc); const hero = mappers.extractors.hero(parsed);

// Render extracted content <>

`{/ Handles arrays automatically /}``
`The Text component: - Handles arrays - Renders`["Para 1", "Para 2"]`as separate paragraphs - Supports rich HTML - Preserves formatting marks - Multi-line headings - Wraps multiple lines in semantic heading tags - Color marks - Supports` `and` `for visual emphasis`
`See Text Component Reference for implementation guide.`
`$3`
`Sanitize content at the engine level (during data preparation), not in components:`
``javascript import { parseContent, mappers } from '@uniweb/semantic-parser';`
`function prepareData(parsed) { const hero = mappers.extractors.hero(parsed); return { ...hero, title: mappers.types.sanitizeHtml(hero.title, { allowedTags: ['strong', 'em', 'mark', 'span'], allowedAttr: ['class', 'data-variant'] }) }; }``
`The parser provides sanitization utilities but doesn't enforce their use. Your engine decides when to sanitize based on security requirements.`
`Content Grouping`
`The parser supports two grouping modes:`
`$3`
`Groups are created based on heading patterns. A new group starts when: - A heading follows content - Multiple H1s appear (no main content created) - The heading level indicates a new section`
`Pretitle Detection: Any heading followed by a more important heading is automatically detected as a pretitle: - H3 before H1 → pretitle ✅ - H2 before H1 → pretitle ✅ - H6 before H5 → pretitle ✅ - H4 before H2 → pretitle ✅`
`No configuration needed - it just works naturally!`
`$3`
`When any horizontal rule (`---`) is present, the entire document uses divider-based grouping. Groups are split explicitly by dividers.`
`Text Formatting`
`Inline formatting is preserved as HTML tags:`
``js // Input: Text with bold mark // Output: "Text with bold"`
`// Input: Text with italic mark // Output: "Text with emphasis"`
`// Input: Link mark // Output: "Click here"`
`// Input: Span mark (bracketed spans) // Output: "This is highlighted text"``
`$3`
`Bracketed spans (`[text]{.class}`) are converted to` `elements with their attributes:`
``js // Input mark { type: "span", attrs: { class: "highlight", id: "note-1" } }`
`// Output HTML 'text'``
`Spans can have classes, IDs, and custom attributes. They combine with other marks—a span with bold becomes` text`.
Documentation
- Content Writing Guide: Learn how to structure content for optimal parsing
- API Reference: Complete API documentation with all element types
- Mapping Patterns Guide: Transform content to component-specific formats
- Text Component Reference: Reference implementation for rendering parsed content
- File Structure: Codebase organization
Use Cases
- Component-based websites: Extract structured data for React/Vue components
- Content management: Parse editor content into database-friendly structures
- Static site generation: Transform rich content into template-ready data
- Content analysis: Analyze document structure and content types
License
GPL-3.0-or-later