How to Convert AsciiDoc to EPUB3 with Asciidoctor

Asciidoctor EPUB3 is a set of Asciidoctor extensions for converting AsciiDoc documents directly to the EPUB3 and KF8/MOBI e-book formats.

Introduction

Asciidoctor EPUB3 is not merely a converter from AsciiDoc to EPUB3 and KF8/MOBI, but rather a tool to help you create aesthetic, professional, easy-to-read e-books. Let’s face it, many of the technical e-books out there—​especially those produced from software documentation—​are hideous. The Asciidoctor project wants to disrupt the status quo.

epub text
An excerpt from an e-book produced by Asciidoctor EPUB3 shown in Day, Night and Sepia mode.

Notable Features

  • Direct AsciiDoc to EPUB3 conversion

  • Direct AsciiDoc to KF8/MOBI conversion

  • Highly-aesthetic and readable styles with optimized text legibility

  • EPUB3 metadata, manifest and spine (assembled by Gepub)

  • Document metadata (title, authors, subject, keywords, etc.)

  • Internal cross reference links

  • Syntax highlighting with CodeRay or Pygments (must use inline styles)

  • Unicode callout numbers

  • Page breaks avoided in block content (so much as it’s supported by the reader)

  • Orphan section titles avoided (so much as it’s supported by the reader)

  • Table border settings honored

  • Support for SVG images in the content

Project Mission

Asciidoctor EPUB3 aims to produce EPUB3 documents that meet the following objectives:

Fully Semantic

Produce deeply semantic XHTML5 documents, including use of the recommended epub:type attribute.

Exceptional Readability

Readers should be drawn into the text so that they read and absorb it. Maximize the readability of the text using carefully crafted styles, focusing on:

  • Custom, readable fonts with strong UTF-8 character support

  • Sufficient line spacing and margins

  • Modular font size scale

  • Subtle, pleasing colors with good contrast

  • A responsive design that scales well from small to large screens

  • Widowed and orphaned content avoided where possible

Complete and Accurate Metadata

Fully populate the EPUB3 package metadata using information in the AsciiDoc source document.

Consistent Rendering

Render consistently across a broad range of EPUB3 (and select EPUB2+) readers and respond to any size screen.

Polish, Polish and More Polish

Add polish to the final product such as font-based icons and callout numbers.

We believe that the e-books produced by Asciidoctor EPUB3 are the very best output you can expect to find in digital publishing today. Of course, there’s always room for improvement, so we’ll continue to work with you to achieve and maintain this goal.

Project Status

Asciidoctor EPUB3 is currently alpha software. Use accordingly. Although the bulk of AsciiDoc content is rendered, there’s still work needed to fill in gaps where rendering is incomplete or unstyled.

Asciidoctor EPUB3 only produces variable layout (i.e., reflowable) EPUB3 documents since this layout is best suited for the types of documents typically written in AsciiDoc. We may explore the use of fixed layout documents in the future if the need arises.

Planned Features and Work In Progress

See the WORKLOG.

Converter Workflow

Asciidoctor EPUB3 takes a single, logical AsciiDoc document as input and converts it to an EPUB3 publication archive (often described as a “website in a box”). Using the EPUB3 publication as the digital master, Asciidoctor EPUB3 can then produce a KF8/MOBI, the file format required by Amazon Kindle. The conversion to KF8/MOBI is performed by sending the EPUB3 through KindleGen.

Traditional EPUB conversion

An EPUB3 archive is typically structured with the contents of each chapter in a separate XHTML file. The converter must therefore chunk the source document into multiple XHTML files to put in the EPUB3 archive. Other converters tend to handle this task by automatically slicing up the XHTML output at predetermined heading levels. Asciidoctor EPUB3 takes a different approach.

Declaring the Spine

Asciidoctor relies on top-level include directives (i.e., include directives in the master document) to indicate where the chapter splits should occur. In other words, you must be explicit. Asciidoctor will not try to guess. If your AsciiDoc document is not structured in this way, you’ll need to change it to use the Asciidoctor EPUB3 converter properly.

You can think of the master document as the spine of the book and the include directives as the individual items being bound together. The target of each include directive in the master document is parsed and rendered as a separate AsciiDoc document, with certain options and attributes being passed down from the master to ensure consistent behavior. Each resulting XHTML document is then added to the EPUB3 archive as a chapter document and the master document becomes the navigation file (i.e., the table of contents).

If the master document does not contain any include directives, then the converter treats the document as the sole chapter in the EPUB3 archive and automatically produces a navigation file that references it.

Eventually, we envision introducing a dedicated block macro to represent a spine item so that we don’t overload the meaning of the include directive. However, for the time being, the include directive will suit this purpose.

Prerequisites

All that’s needed to use Asciidoctor EPUB3 is Ruby 1.9.3 or greater and a few RubyGems, which we’ll explain how to install in the next section.

To check if you have Ruby available, use the ruby command to query the version installed:

$ ruby --version

If you’re using RVM, we recommend creating a new gemset to work with Asciidoctor EPUB3:

$ rvm use @asciidoctor-epub3 --create

We like RVM because it keeps the dependencies required by various projects isolated ;)

Getting Started

Asciidoctor EPUB3 isn’t yet published as a RubyGem itself, so you’ll need to get the source code.

Retrieve the project

You can retrieve Asciidoctor EPUB3 in one of two ways:

  1. Clone the git repository

  2. Download a zip archive of the repository

Option 1: Fetch Using git clone

If you want to clone the git repository, copy the GitHub repository URL and pass it to the git clone command.

$ git clone https://github.com/asciidoctor/asciidoctor-epub3

Next, change to the project directory.

$ cd asciidoctor-epub3

Option 2: Download the Archive

If you want to download a zip archive, click on the Download Zip button on the right-hand side of the repository page on GitHub. Once the download finishes, extract the archive, open a console and change to that directory.

We’ll now leverage the project configuration to install the necessary dependencies.

Install the Dependencies

The dependencies needed to use Asciidoctor EPUB3 are defined in the Gemfile at the root of the project. We can use Bundler to install the dependencies for us.

To check if you have Bundler available, use the bundle command to query the version installed.

$ bundle --version

If it’s not installed, use the gem command to install it.

$ gem install bundle

Then use the bundle command to install the project dependencies.

$ bundle install

Build and Install the Gem

Now that the dependencies are installed, you can build and install the Gem.

Use the Rake build tool to build the Gem.

$ rake build

The build will report that it built the Gem into the pkg directory.

Finally, install the Gem.

$ gem install pkg/asciidoctor-epub3-1.0.0.dev.gem

You’re now ready to use Asciidoctor EPUB3! Let’s get an AsciiDoc document ready to convert to EPUB3.

Prepare an AsciiDoc Document

If you don’t already have an AsciiDoc document, you can use the sample-book.adoc file and its chapters found in the data/samples directory of the repository.

Master file named sample-book.adoc
= Asciidoctor EPUB3: Sample Book
Author Name
v1.0, 2014-04-15
:doctype: book
:producer: Asciidoctor
:keywords: Asciidoctor, samples, e-book, EPUB3, KF8, MOBI, Asciidoctor.js
:copyright: CC-BY-SA 3.0
:imagesdir: images

include::asciidoctor-epub3-readme.adoc[]

include::sample-content.adoc[]

include::asciidoctor-js-introduction.adoc[]

include::asciidoctor-js-extension.adoc[]

The metadata in the generated EPUB3 file is populated from attributes in the AsciiDoc document. The names of the attributes and the metadata elements to which they map are documented in this section.

The term package metadata in Table 1 is in reference to the <metadata> element in the EPUB3 package document (e.g., package.opf). The dc namespace prefix is in reference to the Dublin Core Metadata Element Set.

Table 1. AsciiDoc attributes that control the EPUB3 metadata (i.e., package.opf)
Name Description

id

Populates the required unique identifier (<dc:identifier>) in the package metadata. An id will be generated automatically from the doctitle if not specified. The recommended practice is to identify the document by means of a string or number conforming to a formal identification system.

lang

Populates the content language / locale (<dc:language>) in the package metadata.

scripts

Controls the font subsets that are selected based on the specified scripts (e.g., alphabets). (values: latin, latin-ext, latin-cyrillic or multilingual)

revdate

Populates the publication date (<dc:date>) in the package metadata. The date should be specified in a parsable format, such as 2014-01-01.

doctitle

Populates the title (<dc:title>) in the package metadata. The title is added to the metadata in plain text format.

author

Populates the contributors (<dc:contributor>) in the package metadata. The authors in each chapter document are aggregated together with the authors in the master file.

username

Used to resolve an avatar for the author that is displayed in the header of a chapter. The avatar image should be located at the path {imagesdir}/avatars/{username}.png, where {username} is the value of this attribute.

producer

Populates the publisher (<dc:publisher>) in the package metadata.

creator

Populates the creator (<dc:creator>) in the package metadata. If the creator is not specified, the value of the producer attribute is used.

description

Populates the description (<dc:description>) in the package metadata.

keywords

Populates the subjects (i.e., <dc:subject>) in the package metadata. The keywords should be represented as comma-separated values (CSV).

front-cover-image

Populates the front cover image and the image on the cover page (EPUB3 only) in the package metadata. The image is also added to the e-book archive. May be specified as a path or inline image macro. Using the inline image macro is preferred as it allows the height and width to be specified.

copyright

Populates the rights statement (<dc:rights>) in the package metadata.

source

Populates the source reference (<dc:source>) in the package metadata. The recommended practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.

epub-properties

An optional override of the properties attribute for this document’s item in the manifest. Only applies to a chapter document.

doctype

Effectively ignored. The master document is assumed to be a book and each chapter an article.

When using the EPUB3 converter, the ebook-format attribute resolves to the name of the e-book format being generated (epub3 or kf8) and the corresponding attribute ebook-format-<name> is defined, where <name> is epub3 or kf8. You can use these attributes in a preprocessor directive if you only want to show certain content to readers using a particular device. For instance, if you want to display a message to readers on Kindle, you can use:

ifdef::ebook-format-kf8[Hello Kindle reader!]

With that out of the way, it’s time to convert the AsciiDoc document directly to EPUB3.

Performing the Conversion

You can convert AsciiDoc documents to EPUB3 and KF8/MOBI from the commandline using the asciidoctor-epub3 script provided with the Asciidoctor EPUB3 project.

Convert AsciiDoc to EPUB3

Converting an AsciiDoc document to EPUB3 is as simple as passing your document to the asciidoctor-epub3 command. This command should be available on your PATH if you installed the asciidoctor-epub3 gem. Otherwise, you can find the command in the bin folder of the project. We also recommend specifying an output directory using the -D option flag.

$ asciidoctor-epub3 -D output data/samples/sample-book.adoc

When the script completes, you’ll see the file sample-book.epub appear in the output directory. Open that file with an EPUB3 reader to view the result.

Below are several screenshots of this sample book as it appears on an Android phone.

epub chapter title
An example of a chapter title and abstract shown side-by-side in Day and Night mode
epub section title paragraph
An example of a section title followed by paragraph text separated by a literal block
epub figure admonition
An example of a figure and an admonition
epub sidebar
An example of a sidebar
epub table
An example of a table
The asciidoctor-epub3 command is a temporary solution for invoking the Asciidoctor EPUB3 converter. We plan to remove this script once we have completed proper integration with the asciidoctor command.

Validate the EPUB3 Archive

Next, let’s validate the EPUB3 archive to ensure it built correctly.

EPUB3 with validation
$ asciidoctor-epub3 -D output -a ebook-validate data/samples/sample-book.adoc
Validation success
Epubcheck Version 3.0.1

Validating against EPUB version 3.0
No errors or warnings detected.

If the EPUB3 archive contains any errors, they will be output in your terminal.

EPUB Standard and Validator

The electronic publication (EPUB) standard is developed by the International Digital Publishing Forum (IDPF). EPUB 3.0, released in October 2011, is the latest version of this standard.

An EPUB3 archive contains:

  • a package document (metadata, file manifest, spine)

  • a navigation document (table of contents)

  • one or more content documents

  • assets (images, fonts, stylesheets, etc.)

The IDPF also supports EpubCheck. EpubCheck parses and validates the file against the EPUB schema.

If you want to browse the contents of the EPUB3 file that is generated, or preview the XHTML files in a regular web browser, add the -a ebook-extract flag to the asciidoctor-epub3 command. The EPUB3 file will be extracted to a directory adjacent to the generated file, but without the file extension.

$ asciidoctor-epub3 -D output -a ebook-extract data/samples/sample-book.adoc

In this example, the contents of the EPUB3 will be extracted to the output/sample-book directory.

Convert AsciiDoc to KF8/MOBI

Creating a KF8/MOBI archive directly from an AsciiDoc document is done with the same generation script (asciidoctor-epub3). You just need to specify the format (-a ebook-format) as kf8.

$ asciidoctor-epub3 -D output -a ebook-format=kf8 data/samples/sample-book.adoc

When the script completes, you’ll see the file sample-book.mobi appear in the output directory.

KindleGen does mandatory validation so you don’t need to run the validate command after converting to KF8/MOBI.

What is KF8?

Kindle Format 8 (KF8) is Amazon’s next generation file format offering a wide range of new features and enhancements—​including HTML5 and CSS3 support—​that publishers can use to create a broad range of books. The format is close enough to EPUB3 that it’s safe to think of it simply as an EPUB3 implementation under most circumstances. You can read more about the format on the Kindle Format 8 page.

Amazon continues to use the .mobi file extension for KF8 archives, despite the fact that they’ve switched from the Mobipocket format to the EPUB3-like KF8 format. That’s why we refer to the format in this project as KF8/MOBI.

Command Arguments

-h, --help

Show the usage message

-D, --destination-dir

Writes files to specified directory (defaults to the current directory)

-a ebook-extract

Extracts the EPUB3 to a folder in the destination directory after the file is generated

-a ebook-format=<format>

Specifies the e-book format to generate (epub3 or kf8, default: epub3)

-a ebook-validate

Runs Epubcheck 3.0.1 to validate output file against the EPUB3 specification

-v, --version

Display the program version

EPUB3 Archive Structure

Here’s a sample manifest of files found in an EPUB3 document produced by Asciidoctor EPUB3.

META-INF/
  container.xml
OEBPS/
  fonts/
    font-awesome.ttf
    font-icons.ttf
    mplus-1mn-latin-bold.ttf
    mplus-1mn-latin-light.ttf
    mplus-1mn-latin-medium.ttf
    mplus-1mn-latin-regular.ttf
    mplus-1p-latin-bold.ttf
    mplus-1p-latin-light.ttf
    mplus-1p-latin-regular.ttf
    noto-serif-bold-italic.ttf
    noto-serif-bold.ttf
    noto-serif-italic.ttf
    noto-serif-regular.ttf
  images/
    avatars/
      default.png
    figure-01.png
    figure-02.png
  styles/
    epub3-css3-only.css
    epub3.css
  chapter-01.xhtml
  chapter-02.xhtml
  ...
  cover.xhtml
  nav.xhtml
  package.opf
  toc.ncx
mimetype

Working with Images

Images that your AsciiDoc document references should be saved in the directory defined in the imagesdir attribute, which defaults to the directory of the document. Asciidoctor EPUB3 will discover all local image references and insert the images into the EPUB3 archive at the same relative path.

The sample book contains placeholder images for an author avatar and a book cover.

Changing the Cover Image

E-book readers have different image resolution and file size limits regarding a book’s cover. Kindle covers tend to be 1050x1600 (16:9 resolution), which is the size of the sample cover provided with Asciidoctor EPUB3. To ensure your cover displays correctly, you’ll want to review the documentation or publisher guidelines for the application you’re targeting.

We’ve found that if the book cover is more than 1600px on any side, Aldiko will not render it and may even crash.

Feel free to use the SVG of the sample cover in the data/images folder as a template for creating your own cover. Once your image is ready, you can replace the placeholder cover image by defining the front-cover-image attribute in the header of the master document.

:front-cover-image: image:cover.png[width=1050,height=1600]

The image is resolved relative to the directory specified in the imagesdir attribute, which defaults to the document directory. The image can be in any format, though we recommend using PNG or JPG as they are the most portable formats.

You should always specify the dimensions of the cover image. This ensures the viewer will preserve the aspect ratio if it needs to be scaled to fit the screen. If you don’t specify a width and height, then the dimensions are assumed to be 1050x1600.

About the Theme

EPUB3 and KF8/MOBI files are styled using CSS3. However, each e-book reader honors a reduced set of CSS3 styles, and the styles they allow and how they implement them are rarely documented. All we’ve got to say is thank goodness for CSS hacks, media queries and years of CSS experience!

The theme provided with Asciidoctor EPUB3 has been crafted to display EPUB3 and KF8/MOBI files as consistently as possible across the most common EPUB3 reader applications and to degrade gracefully in select EPUB2 readers. The theme maintains readability regardless of the e-book reader’s background mode (i.e., day, night or sepia) or the display device’s pixel density and screen resolution.

The theme’s CSS files are located in the data/style directory.

Asciidoctor EPUB3 only provides one theme, and, at this time, you can not replace it with a custom theme using the stylesheet attribute.

Fonts

Asciidoctor EPUB3 embeds a set of fonts and font icons. The theme’s fonts are located in the data/fonts directory.

The M+ Outline fonts are used for titles, headings, literal (monospace) text, and annotation numbers. The body text uses Noto Serif. Admonition icons and the end-of-chapter mark are from the Font Awesome icon font. Refer to the NOTICE for further information about the fonts.

The text justification hack

Many of the EPUB3 readers use the WebKit browser engine to render the content and apply the CSS formatting and styles. Generally speaking, WebKit is a great engine that brings a lot of consistency and power to the e-book reader landscape. It also brings along the same set of bugs.

One of the bugs in WebKit causes rich text to be justified incorrectly. In particular, when the value of the text-align property is justify, WebKit drops the space between formatted text (bold, italic, hyperlink, etc) and non-formatted text, causing the words to be unevenly spaced across the line. You can see an example of this problem in the screenshot below.

epub incorrect text justification
WebKit justifying rich text incorrectly

It’s not terrible, but just enough to disrupt a reader’s flow. Here’s how we expect the text to look:

epub correct text justification
WebKit justifying rich text correctly after the “word joiner hack” is applied

After some time in the tech lab and some dumb luck, we found a way to trick WebKit into justifying the text correctly! We call it the “word joiner hack”.

Here’s the HTML source of the first sentence from the screenshots.

<strong><a href="...">Fork</a>⁠ the repository</strong> <span>and clone it locally.</span>

WebKit treats the space following an inline element as insignificant and thus fails to account for it when justifying the text.

At first glance, you might think to add a normal space character before the closing tag of the inline element (e.g., <a href="…​">Fork </a>). However, that would cause any underline beneath links to extend past the end of the word.

At second glance, you might think to add a zero-width space character immediately following the element (e.g., <a href="…​">Fork</a>​). However, that’s problematic if the next character is a period or other punctuation because it introduces a wrap opportunity where there shouldn’t be one.

Reflecting on the problem of the zero-width space brings us to either the zero-width no-break space character (e.g., <a href="…​">Fork</a>&#xfeff) or the word joiner character (e.g., <a href="…​">Fork</a>⁠). Like the zero-width space, these characters occupy no space. However, instead of introducing a wrap opportunity, they prevent one.

But here’s the clincher. If the character following a zero-width non-break space or a word joiner is a normal space (e.g., <a href="…​">Fork</a> the), then it behaves just like a regular space. We’ve covered all the scenarios! Hey WebKit, you’ve been Unicode punked!

UPDATE: The zero-width no-break space was deprecated in favor of the word joiner. However, as we’ve discovered, font support for the word joiner is abysmal, whereas the zero-width no-break space is supported everywhere we’ve checked. Therefore, we’ve decided to go with the zero-width no-break space to avoid nasty rectangle outlines from font bombing your content.

By adding the word joiner zero-width no-break space character immediately after any inline element, we can trick WebKit into justifying the text properly, as shown in the second screenshot above.

You won’t see  anywhere in the HTML source. That’s because we use the actual Unicode character so that any regular expressions being applied to the text still work as expected.

Although the fix may seem minor enhancement, it plays an important role in reaching one of the core objectives of this converter: to make the text in the EPUB3 as readable as possible.

Device-specific Styles

For readers that support JavaScript, Asciidoctor EPUB3 adds a CSS class to the body element of each chapter that corresponds to the name of the reader as reported by the epubReadingSystem JavaScript object. This enhancement allows you to use styles targeted specifically at that reader.

Below you can find the readers that are known to support this feature and the CSS class name that gets added to the body element.

Reader body class name

Gitden

gitden-reader

Namo PubTreeViewer

namo-epub-library

Readium

epub-js-viewer

iBooks

ibooks

Google Books

gb-reader-container (div)

Pushing to Android

While it’s certainly possible to view the EPUB3 on your desktop/laptop, you’ll probably want to test it where it’s most likely going to be read—​on a reading device such as a smart phone or a tablet. Assuming you have an Android device available, transferring the EPUB3 to the device is easy once you get a bit of setup out of the way.

You transfer files from your computer to an Android phone over a USB connection using a command from the Android SDK Tools called adb. Follow these steps to get it setup:

  1. Download the Android SDK Tools zip from the table labeled SDK Tools Only on the Get the Android SDK page

  2. Extract the archive

  3. Locate the path to the adb command (Hint: Look in the platform-tools folder)

  4. Set the environment variable named ADB to the path of the adb command

    $ export ADB=~/apps/android-sdk/platform-tools/adb

Now you can use the adb-push-ebook script provided by Asciidoctor EPUB3 to push the EPUB3 and KF8/MOBI files to your Android device.

Publish both EPUB3 and KF8 files to Android device
$ adb-push-ebook output/sample-book
Don’t include the file extension since the script will check for both the .epub and .mobi files.

The adb-push-ebook script copies the files to the following locations on the device:

File type Destination on Android device

*.epub

/sdcard/

*.mobi

/sdcard/Android/data/com.amazon.kindle/files/

Amazon Kindle should immediately detect the new file and display it in your “On Device” library. You’ll have to manually import the EPUB3 into your reader application of choice.

E-book Reader Recommendations and Quirks

EPUB3 readers will provide the best reading experience when viewing the book generated by Asciidoctor EPUB3. Here’s a list of some of the readers we know to have good EPUB3 support and the systems on which they run.

To get the full experience, ensure that the reader is configured to use the publisher’s styles. Different readers label this setting in different ways. Look for the option screen that allows you to set the fonts and font colors and disable it. With publisher’s styles active, you’ll still be able to adjust the relative size of the fonts and margins and toggle between day, night and sepia mode.

When the book is viewed in EPUB2 readers and Kindle apps/devices which have reached their end-of-life (EOL), the e-book relies on the strong semantics of the HTML and some fallback styles to render properly. EPUB2 readers, such as Aldiko, don’t understand CSS3 styles and therefore miss out on some of subtleties in the formatting.

As mentioned in the theme section, the stylesheet attempts to provide as consistent a reading experience as possible in the common EPUB3 readers, despite the different CSS implementation rules and limitations unique to each e-book application. Most of these obstacles were addressed using media queries or explicit classes. Some we haven’t conquered. Yet.

The Kindle quirks list shows you just a few of the constraints we encountered. To see all of the workarounds and why we chose certain style options, check out the code and comments in the epub3.css and epub3-css-only.css files.

Kindle Quirks
  • overrules margins and line heights like a medieval tyrant

  • font-family can’t be set on <body>

  • requires !important on text-decoration

  • position: relative isn’t permitted

  • strips (or unwraps) <header> tags

  • @page isn’t supported

  • page-break: avoid isn’t supported

  • max-width isn’t supported

  • widows are left in the cold

  • won’t style footers without an explicit class

Contributing

In the spirit of free software, everyone is encouraged to help improve this project.

To contribute code, simply fork the project on GitHub, hack away and send a pull request with your proposed changes.

Feel free to use the issue tracker or Asciidoctor mailing list to provide feedback or suggestions in other ways.

Authors

Asciidoctor EPUB3 was written by Dan Allen and Sarah White of OpenDevise on behalf of the Asciidoctor Project.

Copyright © 2014 OpenDevise Inc. and the Asciidoctor Project. Free use of this software is granted under the terms of the MIT License.

For the full text of the license, see the LICENSE file. Refer to the NOTICE for information about third-party open source software in use.