BlogContentProcessor API Documentation

BlogContentProcessor API Documentation

Overview

BlogContentProcessor is a high-level content processor that handles reading, parsing, and transforming markdown files into HTML with automatic table of contents generation and heading ID assignment.

Namespace

namespace Blog\Renderer;

Class: BlogContentProcessor

Constructor

public function __construct()

Initializes the content processor with a new BlogRenderer instance.

Public Methods

processMarkdownFile(string $filePath): array

Processes a markdown file and extracts title, HTML content, and table of contents.

Parameters:

  • $filePath (string): Path to the markdown file

Returns:

  • (array): Processed data with keys:

- title (string): Extracted article title - content (string): HTML content with heading IDs - toc (array): Table of contents array

Title Extraction Priority:

  1. YAML front matter title: field
  2. First H1 heading in markdown content
  3. Empty string if neither found

YAML Front Matter Format:

---
title: Custom Title
description: Custom description
---

# This heading will be ignored

Content goes here...

Features:

  • Reads file from disk
  • Extracts and removes YAML front matter
  • Parses markdown to HTML
  • Adds IDs to headings without them
  • Generates table of contents
  • Returns empty content if file doesn't exist

Example:

$processor = new BlogContentProcessor();

// Process a markdown file
$result = $processor->processMarkdownFile('articles/my-post.md');

echo $result['title'];    // Article title
echo $result['content'];   // HTML content
print_r($result['toc']);   // Table of contents

Example YAML Front Matter:

---
title: My Custom Title
author: John Doe
date: 2024-01-15
---

# Regular Content

This content will be processed normally.

extractTableOfContents(string $html): array

Extracts table of contents from HTML by finding headings with IDs.

Parameters:

  • $html (string): HTML content to extract TOC from

Returns:

  • (array): TOC items, each with keys:

- level (int): Heading level (1-6) - id (string): Heading ID attribute - text (string): Heading text content

Heading Pattern:

  • Matches <h1> through <h6> tags
  • Requires id attribute
  • Strips HTML tags from heading text
  • Decodes HTML entities

Example:

$html = '<h1 id="intro">Introduction</h1>
         <h2 id="setup">Setup</h2>';

$processor = new BlogContentProcessor();
$toc = $processor->extractTableOfContents($html);

// Result:
// [
//     ['level' => 1, 'id' => 'intro', 'text' => 'Introduction'],
//     ['level' => 2, 'id' => 'setup', 'text' => 'Setup']
// ]

addIdsToHeadings(string $html): string

Adds ID attributes to headings that don't have them.

Parameters:

  • $html (string): HTML content

Returns:

  • (string): HTML with IDs added to headings

ID Generation:

  • Converts text to lowercase
  • Replaces non-alphanumeric characters with hyphens
  • Collapses multiple hyphens to single hyphen
  • Trims leading/trailing hyphens
  • Uses 'section' as fallback for empty IDs
  • Preserves existing IDs

Example:

$html = '<h1>Hello World</h1>';

$processor = new BlogContentProcessor();
$result = $processor->addIdsToHeadings($html);

// Result: '<h1 id="hello-world">Hello World</h1>'

Example with Existing ID:

$html = '<h1 id="custom-id">Heading</h1>';

$processor = new BlogContentProcessor();
$result = $processor->addIdsToHeadings($html);

// Result: '<h1 id="custom-id">Heading</h1>' (unchanged)

getRenderer(): BlogRenderer

Returns the internal BlogRenderer instance.

Returns:

  • (BlogRenderer): The BlogRenderer instance used by this processor

Example:

$processor = new BlogContentProcessor();
$renderer = $processor->getRenderer();

// Use renderer directly
$renderer->setConfig(['title' => 'My Blog']);
$html = $renderer->renderDocument([...]);

Private Methods

extractTitle(string $content, string $frontMatter = ''): string

Extracts article title from content or front matter.

Priority:

  1. title: field in YAML front matter
  2. First # heading in markdown content
  3. Empty string if neither found

Parameters:

  • $content (string): Markdown content (with front matter removed)
  • $frontMatter (string): YAML front matter string

Returns:

  • (string): Extracted title

generateId(string $text): string

Generates a URL-friendly ID from heading text.

Transformation Steps:

  1. Convert to lowercase
  2. Replace non-alphanumeric Unicode characters with hyphens
  3. Collapse multiple hyphens to single hyphen
  4. Trim leading/trailing hyphens
  5. Return 'section' if empty

Parameters:

  • $text (string): Plain text heading

Returns:

  • (string): URL-friendly ID

Examples:

generateId('Hello World')      // 'hello-world'
generateId('  Test  ')        // 'test'
generateId('Hello--World')     // 'hello-world'
generateId('123')             // '123'
generateId('')                // 'section'
generateId('你好世界')         // '你好世界'

Usage Workflow

Basic Workflow

<?php
use Blog\Renderer\BlogContentProcessor;

$processor = new BlogContentProcessor();

// 1. Process markdown file
$result = $processor->processMarkdownFile('article.md');

// 2. Extract data
$title = $result['title'];
$content = $result['content'];
$toc = $result['toc'];

// 3. Get renderer for HTML generation
$renderer = $processor->getRenderer();

// 4. Render complete document
$html = $renderer->renderDocument([
    'rootPath' => './',
    'title' => $title,
    'content' => $content,
    'toc' => $toc,
]);

Complete Example

<?php
use Blog\Renderer\BlogContentProcessor;

$processor = new BlogContentProcessor();

// Process article with YAML front matter
$articleData = $processor->processMarkdownFile('posts/my-article.md');

// Output metadata
echo "Title: " . $articleData['title'] . "\n";
echo "TOC Items: " . count($articleData['toc']) . "\n";

// Get renderer for custom configuration
$renderer = $processor->getRenderer();
$renderer->setConfig([
    'title' => 'My Tech Blog',
    'description' => 'Technical articles',
]);

// Render with custom breadcrumbs
$html = $renderer->renderDocument([
    'rootPath' => '../',
    'title' => $articleData['title'],
    'content' => $articleData['content'],
    'toc' => $articleData['toc'],
    'breadcrumbs' => [
        ['link' => '../index.php', 'name' => 'Home'],
        ['link' => 'index.php', 'name' => 'Posts'],
    ]
]);

// Output HTML
file_put_contents('posts/my-article.html', $html);

YAML Front Matter Support

Supported Fields

Currently, only title is automatically extracted from front matter, but you can include any custom metadata:

---
title: My Article Title
author: John Doe
date: 2024-01-15
category: PHP
tags: [php, markdown, tutorial]
---

# Article Content

The article content goes here.

Note: Only the title field is used by default. You can extend the class to extract other fields as needed.

Integration with BlogRenderer

BlogContentProcessor uses BlogRenderer internally for markdown parsing. You can access the renderer instance to:

  • Update configuration
  • Render documents
  • Parse markdown directly
  • Use other renderer features
$processor = new BlogContentProcessor();

// Access renderer
$renderer = $processor->getRenderer();

// Configure renderer
$renderer->setConfig(['title' => 'Custom Blog']);

// Parse markdown directly
$html = $renderer->parseMarkdown('# Hello');

Error Handling

The processor handles missing files gracefully:

$result = $processor->processMarkdownFile('nonexistent.md');

// Returns:
// [
//     'title' => 'nonexistent',
//     'content' => '',
//     'toc' => []
// ]

Table of Contents Structure

The TOC array is structured as follows:

[
    [
        'level' => 1,           // Heading level (1-6)
        'id' => 'section-id',   // Heading ID attribute
        'text' => 'Section Title' // Plain text heading content
    ],
    [
        'level' => 2,
        'id' => 'subsection-id',
        'text' => 'Subsection Title'
    ],
    // ... more items
]

This structure is compatible with the toc parameter in BlogRenderer::renderDocument().

See Also

  • BlogRenderer - Main rendering class
  • SecurityFilter - Security filtering
  • Parsedown - Markdown parsing
← 返回目录