A Node.js library for extracting slides, notes, and media from PowerPoint (.pptx) files.
npm install pptx-content-extractorPPTX Content Extractor is a Node.js library for extracting slides, notes, and media content (e.g., images) from .pptx files. This tool leverages JSZip for unpacking .pptx archives and xml2js for parsing XML-based content.
- Extract text content from PowerPoint slides (.pptx).
- Retrieve media files (e.g., images) embedded in the presentation.
- Extract speaker notes for each slide.
- Modular structure for extracting specific content types (slides, media, or notes).
---
Install the library via npm:
``bash`
npm install --save pptx-content-extractor
Extract all slides, media, and notes from a .pptx file:
`typescript
import { extractPptx } from 'pptx-content-extractor';
(async () => {
const result = await extractPptx('/path/to/presentation.pptx');
console.log('Slides:', result.slides);
console.log('Media:', result.media);
console.log('Notes:', result.notes);
})();
`
---
#### Slides
`typescript
import { extractPptxSlides } from 'pptx-content-extractor';
(async () => {
const slides = await extractPptxSlides('/path/to/presentation.pptx');
console.log('Slides:', slides);
})();
`
---
#### Media
`typescript
import { extractPptxMedia } from 'pptx-content-extractor';
(async () => {
const media = await extractPptxMedia('/path/to/presentation.pptx');
console.log('Media:', media);
})();
`
---
#### Notes
`typescript
import { extractPptxNotes } from 'pptx-content-extractor';
(async () => {
const notes = await extractPptxNotes('/path/to/presentation.pptx');
console.log('Notes:', notes);
})();
`
---
Extracts slides, media, and notes from a .pptx file.
- filePath: Path to the .pptx file.Promise
- Returns: A containing:slides
- : An array of parsed slides.media
- : An array of media content.notes
- : An array of parsed notes.
---
Extracts only the slides.
- filePath: Path to the .pptx file.Promise
- Returns: A containing parsed slides.
---
Extracts only the media content.
- filePath: Path to the .pptx file.Promise
- Returns: A containing media content.
---
Extracts only the notes.
- filePath: Path to the .pptx file.Promise
- Returns: A containing parsed notes.
---
Base interface for parsed content.
`typescript`
export interface ParsedContent {
name: string;
content: unknown;
}
---
`typescript`
export interface ParsedPowerPoint {
slides: ParsedSlide[];
media: ParsedMedia[];
notes: ParsedNote[];
}
---
`typescript`
export interface ParsedSlide extends ParsedContent {
content: { id: string; type: string; text: string[] }[];
mediaNames: string[] // names of media file e.g. ['image23.jpeg']
}
---
`typescript`
export interface ParsedMedia extends ParsedContent {
content: string; // Base64-encoded media content
}
---
`typescript``
export interface ParsedNote extends ParsedContent {
content: string;
}
---