Author: | David Goodger |
---|---|
Contact: | docutils-develop@lists.sourceforge.net |
Date: | 2023-06-26 |
Revision: | 9408 |
Copyright: | This document has been placed in the public domain. |
The docutils.core.Publisher class is the core of Docutils, managing all the processing and relationships between components. See PEP 258 for an overview of Docutils components. Configuration is done via runtime settings assembled from several sources. The Publisher convenience functions are the normal entry points for using Docutils as a library.
There are several convenience functions in the docutils.core module. Each of these functions sets up a docutils.core.Publisher object, then calls its publish() method. docutils.core.Publisher.publish() handles everything else.
See the module docstring, help(docutils.core), and the function docstrings, e.g., help(docutils.core.publish_string), for details and a description of the function arguments.
Function for custom command-line front-end tools (like tools/rst2html.py) or "console_scripts" entry points (like core.rst2html()) with file I/O. In addition to writing the output document to a file-like object, also returns it as str instance (rsp. bytes for binary output document formats).
For programmatic use with file I/O. In addition to writing the output document to a file-like object, also returns it as str instance (rsp. bytes for binary output document formats).
For programmatic use with string I/O:
Caution!
The "output_encoding" and "output_encoding_error_handler" runtime settings may affect the content of the output document: Some document formats contain an encoding declaration, some formats use substitutions for non-encodable characters.
Use publish_parts() to get a str instance of the output document as well as the values of the output_encoding and output_encoding_error_handler runtime settings.
This function is provisional because in Python 3 the name and behaviour no longer match.
Parse string input (cf. string I/O) into a Docutils document tree data structure (doctree). The doctree can be modified, pickled & unpickled, etc., and then reprocessed with publish_from_doctree().
Render from an existing document tree data structure (doctree). Returns the output document as a memory object (cf. string I/O).
This function is provisional because in Python 3 the name and behaviour of the string output interface no longer match.
Auxilliary function used by publish_file(), publish_string(), publish_doctree(), and publish_parts(). Applications should not need to call this function directly.
For programmatic use with string input (cf. string I/O). Returns a dictionary of document parts as str instances. [1] Dictionary keys are the part names. Each Writer component may publish a different set of document parts, described below.
Example: post-process the output document with a custom function post_process() before encoding with user-customizable encoding and errors
def publish_bytes_with_postprocessing(*args, **kwargs): parts = publish_parts(*args, **kwargs) out_str = post_process(parts['whole']) return out_str.encode(parts['encoding'], parts['errors'])
There are more usage examples in the docutils/examples.py module.
Contains the entire formatted document. [1]
[1] | (1, 2) Output documents in binary formats (e.g. ODT) are stored as a bytes instance. |
parts['body_prefix'] contains:
</head> <body> <div class="document" ...>
and, if applicable:
<div class="header"> ... </div>
parts['body_pre_docinfo] contains (as applicable):
<h1 class="title">...</h1> <h2 class="subtitle" id="...">...</h2>
parts['body_suffix'] contains:
</div>
(the end-tag for <div class="document">), the footer division if applicable:
<div class="footer"> ... </div>
and:
</body> </html>
parts['html_head'] contains the HTML <head> content, less the stylesheet link and the <head> and </head> tags themselves. Since publish_parts() returns str instances which do not know about the output encoding, the "Content-Type" meta tag's "charset" value is left unresolved, as "%s":
<meta http-equiv="Content-Type" content="text/html; charset=%s" />
The interpolation should be done by client code.
parts['html_prolog] contains the XML declaration and the doctype declaration. The XML declaration's "encoding" attribute's value is left unresolved, as "%s":
<?xml version="1.0" encoding="%s" ?>
The interpolation should be done by client code.
The PEP/HTML writer provides the same parts as the HTML4 writer, plus the following:
The S5/HTML writer provides the same parts as the HTML4 writer.
The HTML5 writer provides the same parts as the HTML4 writer. However, it uses semantic HTML5 elements for the document, header and footer.
See the template files default.tex, titlepage.tex, titlingpage.tex, and xelatex.tex for examples how these parts can be combined into a valid LaTeX document.
parts['body'] contains the document's content. In other words, it contains the entire document, except the document title, subtitle, and docinfo.
This part can be included into another LaTeX document body using the \input{} command.
parts['docinfo'] contains the document bibliographic data, the docinfo field list rendered as a table.
With --use-latex-docinfo 'author', 'organization', 'contact', 'address' and 'date' info is moved to titledata.
'dedication' and 'abstract' are always moved to separate parts.
parts['titledata] contains the combined title data in \title, \author, and \date macros.
With --use-latex-docinfo, this includes the 'author', 'organization', 'contact', 'address' and 'date' docinfo items.
Docutils is configured by runtime settings assembled from several sources:
Docutils overlays default and explicitly specified values from these sources such that settings behave the way we want and expect them to behave. For details, see Docutils Runtime Settings. The individual settings are described in Docutils Configuration.
To pass application-specific setting defaults to the Publisher convenience functions, use the settings_overrides parameter. Pass a dictionary of setting names & values, like this:
app_defaults = {'input_encoding': 'ascii', 'output_encoding': 'latin-1'} output = publish_string(..., settings_overrides=app_defaults)
Settings from command-line options override configuration file settings, and they override application defaults.
See Docutils Runtime Settings or the docstring of publish_programmatically() for a description of all configuration arguments of the Publisher convenience functions.
Important
Details will change over the next Docutils versions. See RELEASE-NOTES
The input encoding can be specified with the input_encoding setting.
By default, the input encoding is detected from a Unicode byte order mark (BOM) or a "magic comment" [2] similar to PEP 263. The fallback is "utf-8". The default behaviour differs from Python's open():
The default will change to "utf-8" in Docutils 0.22, the input encoding detection will be removed in Docutils 1.0.
The default output encoding is UTF-8. A different encoding can be specified with the output_encoding setting.
Caution!
Docutils may introduce non-ASCII text if you use auto-symbol footnotes or the "contents" directive. In non-English documents, also auto-generated labels may contain non-ASCII characters.
[2] | (1, 2) A comment like .. text encoding: <encoding name> on the first or second line of a reStructuredText source defines <encoding name> as the source's input encoding. Examples: (using formats recognized by popular editors) .. -*- mode: rst -*- -*- coding: latin1 -*- or: .. vim: set fileencoding=cp737 : More precisely, the first and second line are searched for the following regular expression: coding[:=]\s*([-\w.]+) The first group of this expression is then interpreted as encoding name. If the first line matches the second line is ignored. This feature is scheduled to be removed in Docutils 1.0. See the inspecting_codecs package for a possible replacement. |