BlogContentProcessor API Documentation

Overview

BlogContentProcessor is a high-level content processor that handles reading, parsing, and transforming markdown files into HTML with automatic table of contents generation and heading ID assignment.

Namespace

namespace Blog\Renderer;

Class: BlogContentProcessor

Constructor

public function __construct()

Initializes the content processor with a new BlogRenderer instance.

Public Methods

`processMarkdownFile(string $filePath): array`

Processes a markdown file and extracts title, HTML content, and table of contents.

Parameters:

$filePath (string): Path to the markdown file

Returns:

(array): Processed data with keys:

- title (string): Extracted article title - content (string): HTML content with heading IDs - toc (array): Table of contents array

Title Extraction Priority:

YAML front matter title: field
First H1 heading in markdown content
Empty string if neither found

YAML Front Matter Format:

---
title: Custom Title
description: Custom description
---

# This heading will be ignored

Content goes here...

Features:

Reads file from disk
Extracts and removes YAML front matter
Parses markdown to HTML
Adds IDs to headings without them
Generates table of contents
Returns empty content if file doesn't exist

Example:

$processor = new BlogContentProcessor();

// Process a markdown file
$result = $processor->processMarkdownFile('articles/my-post.md');

echo $result['title'];    // Article title
echo $result['content'];   // HTML content
print_r($result['toc']);   // Table of contents

Example YAML Front Matter:

---
title: My Custom Title
author: John Doe
date: 2024-01-15
---

# Regular Content

This content will be processed normally.

`extractTableOfContents(string $html): array`

Extracts table of contents from HTML by finding headings with IDs.

Parameters:

$html (string): HTML content to extract TOC from

Returns:

(array): TOC items, each with keys:

- level (int): Heading level (1-6) - id (string): Heading ID attribute - text (string): Heading text content

Heading Pattern:

Matches <h1> through <h6> tags
Requires id attribute
Strips HTML tags from heading text
Decodes HTML entities

Example:

$html = '<h1 id="intro">Introduction</h1>
         <h2 id="setup">Setup</h2>';

$processor = new BlogContentProcessor();
$toc = $processor->extractTableOfContents($html);

// Result:
// [
//     ['level' => 1, 'id' => 'intro', 'text' => 'Introduction'],
//     ['level' => 2, 'id' => 'setup', 'text' => 'Setup']
// ]

`addIdsToHeadings(string $html): string`

Adds ID attributes to headings that don't have them.

Parameters:

$html (string): HTML content

Returns:

(string): HTML with IDs added to headings

ID Generation:

Converts text to lowercase
Replaces non-alphanumeric characters with hyphens
Collapses multiple hyphens to single hyphen
Trims leading/trailing hyphens
Uses 'section' as fallback for empty IDs
Preserves existing IDs

Example:

$html = '<h1>Hello World</h1>';

$processor = new BlogContentProcessor();
$result = $processor->addIdsToHeadings($html);

// Result: '<h1 id="hello-world">Hello World</h1>'

Example with Existing ID:

$html = '<h1 id="custom-id">Heading</h1>';

$processor = new BlogContentProcessor();
$result = $processor->addIdsToHeadings($html);

// Result: '<h1 id="custom-id">Heading</h1>' (unchanged)

`getRenderer(): BlogRenderer`

Returns the internal BlogRenderer instance.

Returns:

(BlogRenderer): The BlogRenderer instance used by this processor

Example:

$processor = new BlogContentProcessor();
$renderer = $processor->getRenderer();

// Use renderer directly
$renderer->setConfig(['title' => 'My Blog']);
$html = $renderer->renderDocument([...]);

Private Methods

`extractTitle(string $content, string $frontMatter = ''): string`

Extracts article title from content or front matter.

Priority:

title: field in YAML front matter
First # heading in markdown content
Empty string if neither found

Parameters:

$content (string): Markdown content (with front matter removed)
$frontMatter (string): YAML front matter string

Returns:

(string): Extracted title

`generateId(string $text): string`

Generates a URL-friendly ID from heading text.

Transformation Steps:

Convert to lowercase
Replace non-alphanumeric Unicode characters with hyphens
Collapse multiple hyphens to single hyphen
Trim leading/trailing hyphens
Return 'section' if empty

Parameters:

$text (string): Plain text heading

Returns:

(string): URL-friendly ID

Examples:

generateId('Hello World')      // 'hello-world'
generateId('  Test  ')        // 'test'
generateId('Hello--World')     // 'hello-world'
generateId('123')             // '123'
generateId('')                // 'section'
generateId('你好世界')         // '你好世界'

Usage Workflow

Basic Workflow

<?php
use Blog\Renderer\BlogContentProcessor;

$processor = new BlogContentProcessor();

// 1. Process markdown file
$result = $processor->processMarkdownFile('article.md');

// 2. Extract data
$title = $result['title'];
$content = $result['content'];
$toc = $result['toc'];

// 3. Get renderer for HTML generation
$renderer = $processor->getRenderer();

// 4. Render complete document
$html = $renderer->renderDocument([
    'rootPath' => './',
    'title' => $title,
    'content' => $content,
    'toc' => $toc,
]);

Complete Example

<?php
use Blog\Renderer\BlogContentProcessor;

$processor = new BlogContentProcessor();

// Process article with YAML front matter
$articleData = $processor->processMarkdownFile('posts/my-article.md');

// Output metadata
echo "Title: " . $articleData['title'] . "\n";
echo "TOC Items: " . count($articleData['toc']) . "\n";

// Get renderer for custom configuration
$renderer = $processor->getRenderer();
$renderer->setConfig([
    'title' => 'My Tech Blog',
    'description' => 'Technical articles',
]);

// Render with custom breadcrumbs
$html = $renderer->renderDocument([
    'rootPath' => '../',
    'title' => $articleData['title'],
    'content' => $articleData['content'],
    'toc' => $articleData['toc'],
    'breadcrumbs' => [
        ['link' => '../index.php', 'name' => 'Home'],
        ['link' => 'index.php', 'name' => 'Posts'],
    ]
]);

// Output HTML
file_put_contents('posts/my-article.html', $html);

YAML Front Matter Support

Supported Fields

Currently, only title is automatically extracted from front matter, but you can include any custom metadata:

---
title: My Article Title
author: John Doe
date: 2024-01-15
category: PHP
tags: [php, markdown, tutorial]
---

# Article Content

The article content goes here.

Note: Only the title field is used by default. You can extend the class to extract other fields as needed.

Integration with BlogRenderer

BlogContentProcessor uses BlogRenderer internally for markdown parsing. You can access the renderer instance to:

Update configuration
Render documents
Parse markdown directly
Use other renderer features

$processor = new BlogContentProcessor();

// Access renderer
$renderer = $processor->getRenderer();

// Configure renderer
$renderer->setConfig(['title' => 'Custom Blog']);

// Parse markdown directly
$html = $renderer->parseMarkdown('# Hello');

Error Handling

The processor handles missing files gracefully:

$result = $processor->processMarkdownFile('nonexistent.md');

// Returns:
// [
//     'title' => 'nonexistent',
//     'content' => '',
//     'toc' => []
// ]

Table of Contents Structure

The TOC array is structured as follows:

[
    [
        'level' => 1,           // Heading level (1-6)
        'id' => 'section-id',   // Heading ID attribute
        'text' => 'Section Title' // Plain text heading content
    ],
    [
        'level' => 2,
        'id' => 'subsection-id',
        'text' => 'Subsection Title'
    ],
    // ... more items
]

This structure is compatible with the toc parameter in BlogRenderer::renderDocument().

步子哥的博客

BlogContentProcessor API Documentation

BlogContentProcessor API Documentation

Overview

Namespace

Class: BlogContentProcessor

Constructor

Public Methods

`processMarkdownFile(string $filePath): array`

`extractTableOfContents(string $html): array`

`addIdsToHeadings(string $html): string`

`getRenderer(): BlogRenderer`

Private Methods

`extractTitle(string $content, string $frontMatter = ''): string`

`generateId(string $text): string`

Usage Workflow

Basic Workflow

Complete Example

YAML Front Matter Support

Supported Fields

Integration with BlogRenderer

Error Handling

Table of Contents Structure

See Also