mdBook/guide/src/for_developers/preprocessors.md

# Preprocessors

A *preprocessor* is simply a bit of code which gets run immediately after the
book is loaded and before it gets rendered, allowing you to update and mutate
the book. Possible use cases are:

- Creating custom helpers like `\{{#include /path/to/file.md}}`
- Updating links so `[some chapter](some_chapter.md)` is automatically changed
  to `[some chapter](some_chapter.html)` for the HTML renderer
- Substituting in latex-style expressions (`$$ \frac{1}{3} $$`) with their
  mathjax equivalents


## Hooking Into MDBook

MDBook uses a fairly simple mechanism for discovering third party plugins.
A new table is added to `book.toml` (e.g. `preprocessor.foo` for the `foo`
preprocessor) and then `mdbook` will try to invoke the `mdbook-foo` program as
part of the build process.

While preprocessors can be hard-coded to specify which backend it should be run
for (e.g. it doesn't make sense for MathJax to be used for non-HTML renderers)
with the `preprocessor.foo.renderer` key.

```toml
[book]
title = "My Book"
authors = ["Michael-F-Bryan"]

[preprocessor.foo]
# The command can also be specified manually
command = "python3 /path/to/foo.py"
# Only run the `foo` preprocessor for the HTML and EPUB renderer
renderer = ["html", "epub"]
```

Once the preprocessor has been defined and the build process has started, MdBook will execute the command defined in the `preprocessor.foo.command` key passing the arguments support and the renderer name, monitoring the status code of the executed command.
If the status code retrieved is 0, the library will be sending through stdin both the context and the book representation serialized in JSON format, and it'll be capturing the response from stdout, which will be the modified book which has to also be serialized in json format.

The easiest way to get started is by creating your own implementation of the
`Preprocessor` trait (e.g. in `lib.rs`) and then creating a shell binary which
translates inputs to the correct `Preprocessor` method. For convenience, there
is [an example no-op preprocessor] in the `examples/` directory which can easily
be adapted for other preprocessors.

<details>
<summary>Example no-op preprocessor</summary>

```rust
// nop-preprocessors.rs

{{#include ../../../examples/nop-preprocessor.rs}}
```
</details>

## Hints For Implementing A Preprocessor

By pulling in `mdbook` as a library, preprocessors can have access to the
existing infrastructure for dealing with books.

For example, a custom preprocessor could use the
[`CmdPreprocessor::parse_input()`] function to deserialize the JSON written to
`stdin`. Then each chapter of the `Book` can be mutated in-place via
[`Book::for_each_mut()`], and then written to `stdout` with the `serde_json`
crate.

Chapters can be accessed either directly (by recursively iterating over
chapters) or via the `Book::for_each_mut()` convenience method.

The `chapter.content` is just a string which happens to be markdown. While it's
entirely possible to use regular expressions or do a manual find & replace,
you'll probably want to process the input into something more computer-friendly.
The [`pulldown-cmark`][pc] crate implements a production-quality event-based
Markdown parser, with the [`pulldown-cmark-to-cmark`][pctc] allowing you to
translate events back into markdown text.

The following code block shows how to remove all emphasis from markdown,
without accidentally breaking the document.

```rust
fn remove_emphasis(
    num_removed_items: &mut usize,
	    chapter: &mut Chapter,
) -> Result<String> {
    let mut buf = String::with_capacity(chapter.content.len());

    let events = Parser::new(&chapter.content).filter(|e| {
        let should_keep = match *e {
            Event::Start(Tag::Emphasis)
            | Event::Start(Tag::Strong)
            | Event::End(Tag::Emphasis)
            | Event::End(Tag::Strong) => false,
            _ => true,
        };
        if !should_keep {
            *num_removed_items += 1;
        }
        should_keep
    });

    cmark(events, &mut buf, None).map(|_| buf).map_err(|err| {
        Error::from(format!("Markdown serialization failed: {}", err))
    })
}
```

For everything else, have a look [at the complete example][example].

## implementing a preprocessor with a different language
The fact that MdBook utilizes stdin and stdout to communicate with the preprocessors, makes it easy for developers to implement them  in  a language different than rust.
The following code shows how to implement a simple preprocessor in python, which will modify the content of the first chapter.
The example will follow the configuration shown above with `preprocessor.foo.command` actually pointing to a  python script. The code of said script is below:

```python
import json
import sys


if __name__ == '__main__':
    if len(sys.argv) > 1: # we check if we received any argument
        if sys.argv[1] == "supports": 
            # then we are good to return an exit status code of 0, since the other argument will just be the renderer's name
            sys.exit(0)

    # we will load both the context and the book representations from stdin
    stdin = sys.stdin
    context, book = json.load(stdin)
    # and now, we can just modify the content of the first chapter
    book['sections'][0]['Chapter']['content'] = '<h3>Hello</h3>'
    # we are done with the book's modification, we can just print it to stdout, 
    print(json.dumps(book))
```


[preprocessor-docs]: https://docs.rs/mdbook/latest/mdbook/preprocess/trait.Preprocessor.html
[pc]: https://crates.io/crates/pulldown-cmark
[pctc]: https://crates.io/crates/pulldown-cmark-to-cmark
[example]: https://github.com/rust-lang/mdBook/blob/master/examples/nop-preprocessor.rs
[an example no-op preprocessor]: https://github.com/rust-lang/mdBook/blob/master/examples/nop-preprocessor.rs
[`CmdPreprocessor::parse_input()`]: https://docs.rs/mdbook/latest/mdbook/preprocess/trait.Preprocessor.html#method.parse_input
[`Book::for_each_mut()`]: https://docs.rs/mdbook/latest/mdbook/book/struct.Book.html#method.for_each_mut
Increase Documentation Coverage (#543) * Added documentation to the `config` module * Added an example to the `config` module * Updated the docs in lib.rs regarding implementing backends * Started writing an alternate backends walkthrough * Mentioned the output.foo.command key * Added example output * Added a config section to the backends tutorial * Finished off the backends tutorial * Made sure travis checks mdbook-wordcount * Fixed the broken link at in the user guide * Changed how travis builds the project * Added a conclusion * Went through and documented a lot of stuff * Added a preprocessors chapter and updated For Developers 2018-01-21 22:35:11 +08:00			`# Preprocessors`

			`A preprocessor is simply a bit of code which gets run immediately after the`
			`book is loaded and before it gets rendered, allowing you to update and mutate`
			`the book. Possible use cases are:`

Escaped the book's example {{#include ...}} tag 2018-01-23 21:10:52 +08:00			- Creating custom helpers like `\{{#include /path/to/file.md}}`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00			- Updating links so `[some chapter](some_chapter.md)` is automatically changed
Increase Documentation Coverage (#543) * Added documentation to the `config` module * Added an example to the `config` module * Updated the docs in lib.rs regarding implementing backends * Started writing an alternate backends walkthrough * Mentioned the output.foo.command key * Added example output * Added a config section to the backends tutorial * Finished off the backends tutorial * Made sure travis checks mdbook-wordcount * Fixed the broken link at in the user guide * Changed how travis builds the project * Added a conclusion * Went through and documented a lot of stuff * Added a preprocessors chapter and updated For Developers 2018-01-21 22:35:11 +08:00			to `[some chapter](some_chapter.html)` for the HTML renderer
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00			- Substituting in latex-style expressions (`$$ \frac{1}{3} $$`) with their
Increase Documentation Coverage (#543) * Added documentation to the `config` module * Added an example to the `config` module * Updated the docs in lib.rs regarding implementing backends * Started writing an alternate backends walkthrough * Mentioned the output.foo.command key * Added example output * Added a config section to the backends tutorial * Finished off the backends tutorial * Made sure travis checks mdbook-wordcount * Fixed the broken link at in the user guide * Changed how travis builds the project * Added a conclusion * Went through and documented a lot of stuff * Added a preprocessors chapter and updated For Developers 2018-01-21 22:35:11 +08:00			`mathjax equivalents`


Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`## Hooking Into MDBook`

			`MDBook uses a fairly simple mechanism for discovering third party plugins.`
			A new table is added to `book.toml` (e.g. `preprocessor.foo` for the `foo`
			preprocessor) and then `mdbook` will try to invoke the `mdbook-foo` program as
			`part of the build process.`

			`While preprocessors can be hard-coded to specify which backend it should be run`
			`for (e.g. it doesn't make sense for MathJax to be used for non-HTML renderers)`
			with the `preprocessor.foo.renderer` key.

			```toml
			`[book]`
			`title = "My Book"`
			`authors = ["Michael-F-Bryan"]`

			`[preprocessor.foo]`
			`# The command can also be specified manually`
			`command = "python3 /path/to/foo.py"`
			# Only run the `foo` preprocessor for the HTML and EPUB renderer
			`renderer = ["html", "epub"]`
			```
Increase Documentation Coverage (#543) * Added documentation to the `config` module * Added an example to the `config` module * Updated the docs in lib.rs regarding implementing backends * Started writing an alternate backends walkthrough * Mentioned the output.foo.command key * Added example output * Added a config section to the backends tutorial * Finished off the backends tutorial * Made sure travis checks mdbook-wordcount * Fixed the broken link at in the user guide * Changed how travis builds the project * Added a conclusion * Went through and documented a lot of stuff * Added a preprocessors chapter and updated For Developers 2018-01-21 22:35:11 +08:00
hopefully made the documentation more clearer on what concerns preprocessor implementation with non rust languages 2021-03-06 04:11:29 +08:00			Once the preprocessor has been defined and the build process has started, MdBook will execute the command defined in the `preprocessor.foo.command` key passing the arguments support and the renderer name, monitoring the status code of the executed command.
			`If the status code retrieved is 0, the library will be sending through stdin both the context and the book representation serialized in JSON format, and it'll be capturing the response from stdout, which will be the modified book which has to also be serialized in json format.`
Increase Documentation Coverage (#543) * Added documentation to the `config` module * Added an example to the `config` module * Updated the docs in lib.rs regarding implementing backends * Started writing an alternate backends walkthrough * Mentioned the output.foo.command key * Added example output * Added a config section to the backends tutorial * Finished off the backends tutorial * Made sure travis checks mdbook-wordcount * Fixed the broken link at in the user guide * Changed how travis builds the project * Added a conclusion * Went through and documented a lot of stuff * Added a preprocessors chapter and updated For Developers 2018-01-21 22:35:11 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`The easiest way to get started is by creating your own implementation of the`
			`Preprocessor` trait (e.g. in `lib.rs`) and then creating a shell binary which
			translates inputs to the correct `Preprocessor` method. For convenience, there
			is [an example no-op preprocessor] in the `examples/` directory which can easily
			`be adapted for other preprocessors.`
Increase Documentation Coverage (#543) * Added documentation to the `config` module * Added an example to the `config` module * Updated the docs in lib.rs regarding implementing backends * Started writing an alternate backends walkthrough * Mentioned the output.foo.command key * Added example output * Added a config section to the backends tutorial * Finished off the backends tutorial * Made sure travis checks mdbook-wordcount * Fixed the broken link at in the user guide * Changed how travis builds the project * Added a conclusion * Went through and documented a lot of stuff * Added a preprocessors chapter and updated For Developers 2018-01-21 22:35:11 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`<details>`
			`<summary>Example no-op preprocessor</summary>`
Increase Documentation Coverage (#543) * Added documentation to the `config` module * Added an example to the `config` module * Updated the docs in lib.rs regarding implementing backends * Started writing an alternate backends walkthrough * Mentioned the output.foo.command key * Added example output * Added a config section to the backends tutorial * Finished off the backends tutorial * Made sure travis checks mdbook-wordcount * Fixed the broken link at in the user guide * Changed how travis builds the project * Added a conclusion * Went through and documented a lot of stuff * Added a preprocessors chapter and updated For Developers 2018-01-21 22:35:11 +08:00
			```rust
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`// nop-preprocessors.rs`
Updated the documentation for new preprocessor format (#787) * Updated the documentation for new preprocessor format * adjusted the descriptions for preprocessors 2018-09-10 18:58:55 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`{{#include ../../../examples/nop-preprocessor.rs}}`
			```
			`</details>`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`## Hints For Implementing A Preprocessor`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			By pulling in `mdbook` as a library, preprocessors can have access to the
			`existing infrastructure for dealing with books.`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`For example, a custom preprocessor could use the`
			[`CmdPreprocessor::parse_input()`] function to deserialize the JSON written to
			`stdin`. Then each chapter of the `Book` can be mutated in-place via
			[`Book::for_each_mut()`], and then written to `stdout` with the `serde_json`
			`crate.`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`Chapters can be accessed either directly (by recursively iterating over`
			chapters) or via the `Book::for_each_mut()` convenience method.
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			The `chapter.content` is just a string which happens to be markdown. While it's
			`entirely possible to use regular expressions or do a manual find & replace,`
			`you'll probably want to process the input into something more computer-friendly.`
			The [`pulldown-cmark`][pc] crate implements a production-quality event-based
			Markdown parser, with the [`pulldown-cmark-to-cmark`][pctc] allowing you to
			`translate events back into markdown text.`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			`The following code block shows how to remove all emphasis from markdown,`
			`without accidentally breaking the document.`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00
			```rust
Updated the documentation for new preprocessor format (#787) * Updated the documentation for new preprocessor format * adjusted the descriptions for preprocessors 2018-09-10 18:58:55 +08:00			`fn remove_emphasis(`
			`num_removed_items: &mut usize,`
hopefully made the documentation more clearer on what concerns preprocessor implementation with non rust languages 2021-03-06 04:11:29 +08:00			`chapter: &mut Chapter,`
Updated the documentation for new preprocessor format (#787) * Updated the documentation for new preprocessor format * adjusted the descriptions for preprocessors 2018-09-10 18:58:55 +08:00			`) -> Result<String> {`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00			`let mut buf = String::with_capacity(chapter.content.len());`
Updated the documentation for new preprocessor format (#787) * Updated the documentation for new preprocessor format * adjusted the descriptions for preprocessors 2018-09-10 18:58:55 +08:00
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00			`let events = Parser::new(&chapter.content).filter(\|e\| {`
			`let should_keep = match *e {`
			`Event::Start(Tag::Emphasis)`
			`\| Event::Start(Tag::Strong)`
			`\| Event::End(Tag::Emphasis)`
			`\| Event::End(Tag::Strong) => false,`
			`_ => true,`
			`};`
			`if !should_keep {`
			`*num_removed_items += 1;`
			`}`
			`should_keep`
			`});`
Updated the documentation for new preprocessor format (#787) * Updated the documentation for new preprocessor format * adjusted the descriptions for preprocessors 2018-09-10 18:58:55 +08:00
			`cmark(events, &mut buf, None).map(\|_\| buf).map_err(\|err\| {`
			`Error::from(format!("Markdown serialization failed: {}", err))`
			`})`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00			`}`
			```

			`For everything else, have a look [at the complete example][example].`

hopefully made the documentation more clearer on what concerns preprocessor implementation with non rust languages 2021-03-06 04:11:29 +08:00			`## implementing a preprocessor with a different language`
			`The fact that MdBook utilizes stdin and stdout to communicate with the preprocessors, makes it easy for developers to implement them in a language different than rust.`
			`The following code shows how to implement a simple preprocessor in python, which will modify the content of the first chapter.`
			The example will follow the configuration shown above with `preprocessor.foo.command` actually pointing to a python script. The code of said script is below:

			```python
			`import json`
			`import sys`


			`if __name__ == '__main__':`
			`if len(sys.argv) > 1: # we check if we received any argument`
			`if sys.argv[1] == "supports":`
			`# then we are good to return an exit status code of 0, since the other argument will just be the renderer's name`
			`sys.exit(0)`

			`# we will load both the context and the book representations from stdin`
			`stdin = sys.stdin`
			`context, book = json.load(stdin)`
			`# and now, we can just modify the content of the first chapter`
			`book['sections'][0]['Chapter']['content'] = '<h3>Hello</h3>'`
			`# we are done with the book's modification, we can just print it to stdout,`
			`print(json.dumps(book))`
			```



Updated the documentation for new preprocessor format (#787) * Updated the documentation for new preprocessor format * adjusted the descriptions for preprocessors 2018-09-10 18:58:55 +08:00			`[preprocessor-docs]: https://docs.rs/mdbook/latest/mdbook/preprocess/trait.Preprocessor.html`
Add complete preprocessor example (#629) * First version of preprocessor example, with quicli It seems it's not worth it right now. * Remove quicli, just to simplify everything * Finish de-emphasise example * Finish preprocessor example in book * Rename preprocessor type * Apply changes requested in review * Update preprocessor docs with latest code [skip CI] 2018-02-24 18:14:52 +08:00			`[pc]: https://crates.io/crates/pulldown-cmark`
			`[pctc]: https://crates.io/crates/pulldown-cmark-to-cmark`
rust-lang-nursery -> rust-lang Fixes #1080 2019-10-29 21:04:16 +08:00			`[example]: https://github.com/rust-lang/mdBook/blob/master/examples/nop-preprocessor.rs`
			`[an example no-op preprocessor]: https://github.com/rust-lang/mdBook/blob/master/examples/nop-preprocessor.rs`
Rewrote a large proportion of the Preprocessor docs to be up-to-date 2018-09-25 19:41:38 +08:00			[`CmdPreprocessor::parse_input()`]: https://docs.rs/mdbook/latest/mdbook/preprocess/trait.Preprocessor.html#method.parse_input
			[`Book::for_each_mut()`]: https://docs.rs/mdbook/latest/mdbook/book/struct.Book.html#method.for_each_mut