Exporting emacs org-mode html, ain’t sed grand.

I’ve recently begun using emacs org-mode, and I quite like it.Â I do a lot of my writing, some of which might finally end up here on my blog, in org-mode these days.Â For a lot of my applications it’s a perfect overlay on plain text documents.Â I can export what I’ve written as html or latex, which is fantastic.

What’s less fantastic, is when I want to cut and paste the html output here to my WordPress blog.Â Unfortunately org-mode puts in linefeeds between paragraph elements, and for some reason wordpress maintains these, resulting in incorrect word wrapping.Â So I want a way to remove the the linefeeds between paragraph elements.

This was just a little bit beyond my capabilities with SED, and I’m often telling myself self, you should really learn how to use sed and regex terms better . So I thought bugger it, let’s figure out how to do this. So I whipped out the excellent book “sed&awk” from O’Reilly.

As someone who has only used sed for banal substitutions, I had to learn the following:

“:whatever” can be used to create a label.Â There are two commands that allow you to utilize these lables: “b” creates a branch, while “t” jumps to a label if a successful substitution has been made on the currently addressed line.
“N” is needed to join two lines, since sed normally works on a one-line-at-a-time fashion.

With these two tidbits, and a basic understanding of how sed operates, we can construct the desired script.

:top
//
{:loop
  N
  s/\n/ /
  /<\/p>/{P;D;btop}
  bloop}

In one line the command looks like this:

 sed ':top;/<p>/{:loop;N;s/\n/ /;/<\/p>/{P;D;btop};bloop}'

If you’re like me, it’s not immediately clear what’s going on here, so let’s break it down:

First we create a label with “:top”.
“/<p>/” tells sed to look for the paragraph block tag.Â The next ‘line’ of the script will be called after this tag is found.
The curly braces “{}” group a set of commands, so upon encountering the paragraph tag, it executes the contents of these brackets.Â Â In the brackets:
- Create a new label “:loop”.
- “N” creates a multiline pattern space by reading the next line of input, and appending it to the contents of the pattern space.
- “s/\n/ /”: substitute a space for the line feed.
- “/<\/p>/{P;D;btop}”:Â If sed encounters an end of paragraph tag, it executes “P;D;btop”, which (P) prints the contents of the multiline pattern space, (D) deletes it, and (btop) creates a branch(b) and goes to the label(top).Â Â It’s a little like “if (<p>) goto top”.
- “bloop” (b) branch and goto label(loop).

So as long as no closing tag (</p>) is found, we have a loop that keeps adding new lines to the multiline pattern buffer, and substituting spaces for linefeeds.Â When the closing tag (</p>) is found, the loop goes back up to the “top” label.Â That loop makes sure all of theÂ paragraph sections get handled.

So that’s it.Â If anyone knows a more elegant solution to this, I’d be glad to here about it.

Entangled discussions

Let's talk about the fundamental interconnectedness of all things…

Exporting emacs org-mode html, ain’t sed grand.

Related

Leave a Reply

Share this:

Related

Leave a Reply