(x)html, css, latex

Finally a better LaTeX to html converter

About a year ago I wrote a post about my frustration with the lack of a good LaTeX to html converter. Recently I found one. It is called plasTeX, so named because it is written in python. Finally a converter which works well with most any LaTeX package or macro you write, and produces sane, relatively standards-compliant html. Best of all, it comes with a built-in if \ifplastex, so you can specify that some things should be in the pdf version, while others should be in the html version. It also support some other formats like docbook, but I am not too interested in that, so I haven’t tested it all.

Though I discovered plasTeX a few months ago, I finally decided to play around with it some more this week, when I wasn’t feeling like working too much. So I decided to convert my CV from html to latex, so I can have both pretty pdf and html versions. So now I will update my CV only in latex, which is much nicer to write than html anyways.

To get both a pdf and an html version, this is what I do:

pdflatex cv
plastex -c cv.cfg cv
tidy --show-body-only true -asxhtml -utf8 -wrap 78 -indent cv/index.html > cv/index2.html

It took me awhile to figure out the syntax of the plasTeX config file. It is possible to specify these things on the command line, but I got tired of having too many things on the command line. The trick is that the plasTeX options are divided up into different sections. The sec-num-depth option, which tells plasTeX how many section levels deep to number, is in the document option section, while the split-level option is in the files section. The split-level option tells plasTeX how many different html pages to make. For my CV, I only wanted one. My plastex config file looks like:

sec-num-depth = 0
split-level = 0

Though plasTeX is pretty nice, there are 3 complaints I still have. One is that it uses <h1> tags for \section, which I can understand, but there is a long tradition in the web of having only one <h1> tag per page, which usually holds the title of the page. PlasTeX does have the ability to use different themes, so there might be a way to change this with a theme, but I haven’t figured that out yet. For the time being, I simply used \subsection tags instead.

My second complaint about plasTeX is that it doesn’t do any nice formatting of the html like indenting or line-wrapping. That is only a minor complaint though, since I can simply run it through htmltidy (as in the code above).

My third complaint is that plasTeX puts everything inside a <td> or <li> inside a <p>, which seems very strange to me, and creates some ugly formatting, so I used some CSS tricks to basically take away the standard effects of <p> tags inside these:

td > p {margin:0;padding:0;}
li > p {margin:0;padding:0;}

I am not completely finished yet, but you can check out the html and pdf results in: my new CV.