My workflow for this site

Goals

I found out about quarto after learning about nbdev, which seems like a great all-in way to use Jupyter notebooks as a REPL+literate programming environment for python.

On nbdev: There's a lot to like there. All too often I find myself prototyping something in a notebook, then using some combination of export as python script or copy and paste to take what I figured out in the notebook and turn it into a standalone script. That process is often error-prone and it always feels more time-consuming than it should be.

In addition, as I continue to learn more about deep learning, the notebook workflow makes even more sense. The iteration cycle in ML is even more REPL-y than in other types of programming, as you're likely to have some expensive data operations up front that you want to experiment with without having to pay the price of those data loads etc every time through.

On top of that, as a full-on hobbyist programmer, writing usually just for myself, or occasionally for my family (very occasionally…) the literate programming element scratches another important itch: I tend to start things, drop them for a long time, and then come back much later, and I don't want to continue to rely on memory of how something works, and notebooks are great for this; nbdev is even better, as it can produce what I think are very pretty sites from your notebooks.

This all pointed me to quarto, which is the publishing system used by nbdev. And it turned out that quarto had nearly everything I wanted in a publishing system: it generates fully static sites, its sites look good (to my eyes at least) out of the box, and it makes generating pages that include the output of cells, code, etc, very easy.

Authoring

I write in org, mostly. While there are times that an outline is not the right scaffolding, those times are rare, and I have come to accept, over the years, that if I can't lay out an outline of what I want to think, it may be because I haven't really thought things through.

Notebooks are great for literate programming, but I was not willing to do pure writing in Jupyter notebooks (if I had been, I could have done this all with only nbdev, which would have made a number of things easy). Literate programming is useful when you want your code to have well-written documentation, but if there's not going to be any code, the editing capabilities of Jupyter+chrome or Jupyter+vscode, using straight markdown, are, well, pretty weak in comparison to emacs.

I've been using GNU Emacs since 1984 or so; I know it really well, and for editing, it feels like it's built into my fingers. Especially with org mode, which makes navigation, manipulating structure, and lightweight formatting joyfully easy.

It helps that I'm comfortable mucking around in emacs lisp as needed to make it do what I want. Honestly, sometimes the mucking around in emacs lisp is the most fun.

Emacs hacks

I ended up doing some emacs hacking to make good-looking markdown files out of org files.

My first observation is that Org's default markdown exporter is, well, crap. You can find a number of alternatives, but I quickly discovered that pandoc does a really outstanding job, at least for the org files I've been writing. Note that I haven't yet used org tables, or the org-babel literate programming features, or dozens of other amazing features in org mode, which might be harder for pandoc to deal with.

However, it turns out that by default, pandoc does not pull the #+title info from the org file to populate the markdown frontmatter, which is supposed to look like:

---
title: a document
---

But, because, of course it does, pandoc has a templating system, and fixing markdown output to create properly formatted frontmatter was a matter of making a custom markdown template with a small change from default. Honestly, I'm not sure why the default behavior is not a bug, but it's easy to specify a local template.

Quarto can run scripts before or after your site is rendered, and I'd initially thought that would run whenever an org file was updated in the content tree. But it turns out that the quarto change watcher only looks for changes to or additions of its native file types (.md, .qmd, etc). (I did find another use for pre-render scripts, which I'll share in a minute.)

So instead, I wrote a function that gets called whenever an org file is saved. It figures out if the org file is part of a quarto site, and if it is, it runs pandoc, using the appropriate template, to generate a markdown file. I've included it below. I'll put it up in github, perhaps in a separate repo, because others might find it useful. Some might not like that the .org "source" files are in the same directory as the output markdown files, but that doesn't bother me.

This is nice because when quarto is run locally in preview mode, it watches the site directory, and re-renders the site whenever a new markdown file is added, or an existing one is changed. Your browser just reloads with the new contant, like magic.

Note to self: should I be worried about the public image that comes from publishing emacs lisp code? This is not a new hotness, like Rust or Typescript.

Here's the emacs lisp code

;;; below should perhaps go in a separate libary, but whatever.
;;; an after-save-hook that uses pandoc to make a markdown version of org files after save.
;;; It uses a template I created that preserves the title in a format that quarto likes
(defcustom pandoc-binary "pandoc" "Location of pandoc binary"
  :type 'string)

(defcustom markdown-template (expand-file-name "~/lib/pandoc-org-template")
  "Location of custom pandoc markdown template"
  :type 'string)

(defcustom always-convert-org-to-md nil
  "If non-nil, auto-markdown-after-save will convert the org file to md regardless of
whether or not there is a _quarto.yml file in the current directory"
  :type 'boolean)

(defun auto-markdown-after-save ()
  "Use Pandoc to auto-convert an org file to markdown every time it's saved; 
Set `after-save-hook` in org mode to this value if you use quarto with org"
  (interactive)
  (when (and (eq major-mode 'org-mode)
         (or always-convert-org-to-md (file-exists-p "_quarto.yml")))
    (let* ((errbuf (get-buffer-create "*Pandoc Errors*"))
       (ofile (concat (file-name-sans-extension (buffer-file-name)) ".md")))
      (message "converting org file to markdown...")
      (call-process pandoc-binary nil errbuf nil
            "-s"
            "-o"
            ofile
            (buffer-file-name))
      (message "converting org file to markdown...done"))))

Other bits of the authoring workflow

I really like the left-hand-side site navigation tree that quarto displays, but it would appear that it's not generated automatically, it has to be manually created. There's a yml format for this and other site metadata.

I wrote a prerender script (remmeber those, from a few paragraphs ago?).

Prerender script

Don't judge. This does not have robust error handling, and has a bunch of other sharp edges. That's OK. The author is the only customer, and he forgives himself in advance for any shortcomings.

#!/usr/bin/env python

import os, sys
from pathlib import Path

tocfile = 'sidebar.yml'
newtocfile = tocfile + '.new'
backuptocfile = tocfile + '.orig'

def createtocfromdir(dir, indentlevel):
    retval = ''
    path = Path(dir)
    for p in sorted(path.iterdir()):
        if p.name[0] == '.' or p.name[0] == '_':
            pass
        elif p.suffix == ".md":
            retval += ' ' * indentlevel + '- ' + (str(p)) + '\n'
        elif p.is_dir():
            retval += ' ' * indentlevel + '- section: ' + (str(p)) + '\n'
            retval += ' ' * (indentlevel+2) + 'contents:' + '\n'
            retval += createtocfromdir(p, indentlevel+2)
        else:
            pass
    return retval

tocbody = createtocfromdir(".", 6)

if len(tocbody) < 4:
    print("exiting")
    sys.exit()

with open(newtocfile, "w") as f:
    print("website:",file=f)
    print("  sidebar:", file=f)
    print("    contents:", file=f)
    print(createtocfromdir(".", 6), file=f)

os.rename(tocfile, backuptocfile)
os.rename(newtocfile, tocfile)

File order

File names appear in lexicographic order in the quarto table of contents, so I prefix file names with a number as a way of manually setting the ordering; right now I'm using YYYYMMDDfile.org, based on the thought that I'm trying to do something a bit bloggish, or at least chronologically sequential.

nbdev recommends its users just use two-digit numerics as prefixes for setting file order. If I end up writing things that are more like articles than journal entries, I will try that.