Automation and hot chocolate

Photo of Jens Tröger.

Jens Tröger holds a MS and PhD in computer science with more than 20 years of commercial and academic research & development. He’s passionate about book design and typography as well and is happy to wed the two into Bookalope. He’ll be at ebookcraft as a panelist talking about Cybernetic Ebooks: A Panel on Machine Learning and AI in Book Production.

I enjoy reading well-designed books, and enjoy designing them. But when I receive a new manuscript I want to be creative rather than cleaning it up and structuring its content to build an ebook or a print-ready book for it. I want to have fun on the typographical playground rather than fixing issues inherited from a history of editing.

Ideally, I want to take a “raw” manuscript, click a button, and enjoy my hot chocolate while I watch my computer analyze, clean up, and structure the new manuscript for me. Only when that’s done do I want to put down my cup and get creative knowing that I’m working with a sound foundation. Ideally…

Many years ago, I created my first ebook from scratch by hand. One by one, I arduously transcribed the paragraphs, images, tables, footnotes from the original Word manuscript, and with tedious handiwork I cleaned up their formatting issues and typos. Sometimes, the text styles had me guessing about the book’s structure, but leafing back and forth through the manuscript helped me to understand what the author may have intended. With the structured content in place, it was then easy to build the EPUB scaffolding — and there was my first ebook! It was clean and simple, it rendered well on whatever device I tried, and, most importantly, it validated perfectly.

Photo of a mug of hot chocolate.

But things got boring quickly, and with the third ebook I started to think about offloading the tedious work onto my computer. These are the moments when being a seasoned software engineer comes in tremendously handy, and so I set out to build myself tools to do most or, ideally, all of the work that I didn’t want to do myself.

What started out as a pet project many years ago has evolved into a chain of complex tools which, together, orchestrate an almost fully automatic book design workflow. Here is how it works today:

  1. Read the raw manuscript and analyze it to “understand” its intended structure. The structure of a manuscript is expressed visually through formatted paragraphs and text portions. Using a custom-designed AI classifier, this first step labels text with semantic structure information, e.g. “Chapter title”, “Quotation paragraph”, “emphasized”, and so forth. I spend a little time reviewing and possibly adjusting the labels, though, because no classifier gets everything always right. (And let’s be honest: Given a messy enough manuscript, even a human reader can get confused quickly.)

  2. Next, review spelling, punctuation, and all sorts of typographical nits. The tools either fix them automatically, or they flag problems for me to take a look.

  3. Finally, convert everything into a format that suits my next steps. If I need an ebook then I’m done: I apply one of my few stylesheets and package everything into an EPUB or MOBI container. (Note that ebooks inherit full accessibility support from the semantic structure that was extracted by the first step.) Generating a print-ready PDF with one of my simple styles works great for some books. If I want to get my hands dirty, I can generate a semantically structured ICML file for InDesign (HTML tags mapped!) and take my design from there. I’ve also played around with other export formats like DocBook, HTMLBook, and Word, but they’re only seldom used.

It’s been fun building these tools, and my journey is far from over: Requirements change and new feature requests come in. We’re currently redesigning the AI classifier to improve the accuracy of the semantic structuring, and we continuously improve online access to the tools to make it easier for remote clients to incorporate them into their own workflows.

All that work is very much worth it. Today, if I receive a book manuscript for conversion, I click a button, slurp my hot chocolate, and watch my laptop do a lot of boring work for me. When it’s done and I have finished my cup, I happily dive into designing the book using a semantically structured, clean and fixed, sound foundation.

If you’d like to hear more from Jens Tröger and more about machine learning in digital publishing, register for ebookcraft on March 18 and 19, 2019 in Toronto. You can find more details about the conference here, or sign up for the mailing list to get all of the conference updates.