Finally a better LaTeX to html converter

About a year ago I wrote a post about my frustration with the lack of a good LaTeX to html converter. Recently I found one. It is called plasTeX, so named because it is written in python. Finally a converter which works well with most any LaTeX package or macro you write, and produces sane, relatively standards-compliant html. Best of all, it comes with a built-in if \ifplastex, so you can specify that some things should be in the pdf version, while others should be in the html version. It also support some other formats like docbook, but I am not too interested in that, so I haven’t tested it all.

Though I discovered plasTeX a few months ago, I finally decided to play around with it some more this week, when I wasn’t feeling like working too much. So I decided to convert my CV from html to latex, so I can have both pretty pdf and html versions. So now I will update my CV only in latex, which is much nicer to write than html anyways.

To get both a pdf and an html version, this is what I do:

pdflatex cv
plastex -c cv.cfg cv
tidy --show-body-only true -asxhtml -utf8 -wrap 78 -indent cv/index.html > cv/index2.html

It took me awhile to figure out the syntax of the plasTeX config file. It is possible to specify these things on the command line, but I got tired of having too many things on the command line. The trick is that the plasTeX options are divided up into different sections. The sec-num-depth option, which tells plasTeX how many section levels deep to number, is in the document option section, while the split-level option is in the files section. The split-level option tells plasTeX how many different html pages to make. For my CV, I only wanted one. My plastex config file looks like:

sec-num-depth = 0
split-level = 0

Though plasTeX is pretty nice, there are 3 complaints I still have. One is that it uses <h1> tags for \section, which I can understand, but there is a long tradition in the web of having only one <h1> tag per page, which usually holds the title of the page. PlasTeX does have the ability to use different themes, so there might be a way to change this with a theme, but I haven’t figured that out yet. For the time being, I simply used \subsection tags instead.

My second complaint about plasTeX is that it doesn’t do any nice formatting of the html like indenting or line-wrapping. That is only a minor complaint though, since I can simply run it through htmltidy (as in the code above).

My third complaint is that plasTeX puts everything inside a <td> or <li> inside a <p>, which seems very strange to me, and creates some ugly formatting, so I used some CSS tricks to basically take away the standard effects of <p> tags inside these:

td > p {margin:0;padding:0;}
li > p {margin:0;padding:0;}

I am not completely finished yet, but you can check out the html and pdf results in: my new CV.

This entry was posted in (x)html, css, latex and tagged , , , . Bookmark the permalink.

One Response to Finally a better LaTeX to html converter

  1. Blogging is much more than some kind of diary, they are fully functional websites that many would never know are are actually blogs as compared to standard HTML websites.