Today I was working on my family 2021 annual report. Usually I write this in December, but I am a bit behind this year. We have been writing one of these since 2006, and we generally have a lot of fun doing it. I write them in LaTeX, because I like LaTeX, and it produces really high-quality print format. Some years, we actually mail them out, some years we send them digitally. I also publish an html version on our family website – for this I use PlasTeX, which is a python-based LaTeX to html converter (actually I think it can convert to other formats as well, but I am only interested in html right now). It seems like every year I spend some of my time fiddling with PlasTeX to get it working for me, but that is fine – I enjoy it. This year, I decided to include a table in the report. For this I like to use the tabularx
package, because it let’s you specify column widths very exactly. I got an error trying to compile the html, and it seems that PlasTeX doesn’t support tabularx. Maybe I could write my own renderer, but for now I decided to just do a hack. Actually, I have several hacks. I created the table first as a tab-separated value (tsv) file (same as a csv, except with tabs instead of commas – this way I don’t have to parse the content, since data is very unlikely to contain tabs, but fairly likely to contain commas). In order to convert the tsv file into LaTeX tabular format, I use this simple two-liner. The sed command simply replaces all tab characters with &, which is the delimiter used in LaTeX, and the end of each row needs to end with two backslashes. I discovered that in order to put a line at the end of the table, the \hline
command needs to e on the same line. My second command only modifies the last line of the file (the $ matches the last line).
# Convert tsv file to tex table
tail -n +2 old-stuff-we-visited.tsv | sed -e 's/\t/ \& /g;s/$/ \\\\/' > old-stuff-we-visited.tex
# put hline on last line of file
sed -i '$s/\(\\\\\)/\1 \\\hline/' old-stuff-we-visited.tex
My input file looks like this:
date_visited monument location date_erected 2017-05-28 Nördlingen Germany 1300 2017-06-01 Nuremberg castle Germany 1000 2017-06-10 Aachen cathedral Germany 795 2017-06-11 Cologne Cathedral Germany 1248 2017-06-15 Notre Dame Paris, France 1163 2018-07-15 Cochem castle Cochem, Germany 1100 2018-10-20 Segovia Aqueduct Spain 98 2018-10-20 Segovia Aqueduct Spain 98 2019-04-20 Porta Nigra Trier, Germany 170 2019-10-20 Pompei Italy -500 2019-12-21 Stone Henge England -2900 2020-10-25 Akropolis Athens, Greece -700 2020-10-20 Akrotiri Greece -2500 2021-10-20 Tarxien Temples Malta -3200
My output file looks like this:
2017-05-28 & Nördlingen & Germany & 1300 \\ 2017-06-01 & Nuremberg castle & Germany & 1000 \\ 2017-06-10 & Aachen cathedral & Germany & 795 \\ 2017-06-11 & Cologne Cathedral & Germany & 1248 \\ 2017-06-15 & Notre Dame & Paris, France & 1163 \\ 2018-07-15 & Cochem castle & Cochem, Germany & 1100 \\ 2018-10-20 & Segovia Aqueduct & Spain & 98 \\ 2018-10-20 & Segovia Aqueduct & Spain & 98 \\ 2019-04-20 & Porta Nigra & Trier, Germany & 170 \\ 2019-10-20 & Pompei & Italy & -500 \\ 2019-12-21 & Stone Henge & England & -2900 \\ 2020-10-25 & Akropolis & Athens, Greece & -700 \\ 2020-10-20 & Akrotiri & Greece & -2500 \\ 2021-10-20 & Tarxien Temples & Malta & -3200 \\ \hline
I was able to easily include this tex format for the pdf version of my document, but not for the html file, by using the \ifplastex
conditional
\begin{table}[htb]
\begin{center}
\label{T:old-stuff}
\ifplastex\else%
\begin{tabularx}{\columnwidth}{ABCD}
\hline\hline
Date Visited & Monument & Location & Date erected \\
\hline
\input{old-stuff-we-visited.tex}
\end{tabularx}
\fi
\end{center}
\caption{When and where the Fedibbletys have seen old human-made structures}
\end{table}
Okay – so back to PlasTex – I decided that the easiest way to include this table in html format using plastex was to modify the html after running PlasTeX. Again, I took my tsv file as the source of truth, and converted it with a couple lines of sed.
#convert the tsv file to html format
echo "<table><thead>" > old-stuff-we-visited.html
head -n 1 old-stuff-we-visited.tsv | sed -e 's|\t|</th><th>|g;s|$|</th></tr>|;s|^|<tr><th>|' >> old-stuff-we-visited.html
echo "</thead><tbody>" >> old-stuff-we-visited.html
tail -n +2 old-stuff-we-visited.tsv | sed -e 's|\t|</td><td>|g;s|$|</td></tr>|;s|^|<tr><td>|' >> old-stuff-we-visited.html
echo "</tbody></table>" >> old-stuff-we-visited.html
My html version of the table now looks like (I’m not going to worry about indenting)
<table><thead> <tr><td>date_visited</td><td>monument</td><td>location</td><td>date_erected</td></tr> </thead><tbody> <tr><td>2017-05-28</td><td>Nördlingen</td><td> Germany</td><td>1300</td></tr> <tr><td>2017-06-01</td><td>Nuremberg castle</td><td> Germany</td><td>1000</td></tr> <tr><td>2017-06-10</td><td>Aachen cathedral</td><td> Germany</td><td>795</td></tr> <tr><td>2017-06-11</td><td>Cologne Cathedral</td><td>Germany</td><td>1248</td></tr> <tr><td>2017-06-15</td><td>Notre Dame</td><td> Paris, France</td><td>1163</td></tr> <tr><td>2018-07-15</td><td>Cochem castle</td><td>Cochem, Germany</td><td>1100</td></tr> <tr><td>2018-10-20</td><td>Segovia Aqueduct</td><td> Spain</td><td>98</td></tr> <tr><td>2018-10-20</td><td>Segovia Aqueduct</td><td> Spain</td><td>98</td></tr> <tr><td>2019-04-20</td><td>Porta Nigra</td><td> Trier, Germany</td><td>170</td></tr> <tr><td>2019-10-20</td><td>Pompei</td><td> Italy</td><td>-500</td></tr> <tr><td>2019-12-21</td><td>Stone Henge</td><td> England</td><td>-2900</td></tr> <tr><td>2020-10-25</td><td>Akropolis</td><td> Athens, Greece</td><td>-700</td></tr> <tr><td>2020-10-20</td><td>Akrotiri</td><td> Greece</td><td>-2500</td></tr> <tr><td>2021-10-20</td><td>Tarxien Temples</td><td> Malta</td><td>-3200</td></tr> </tbody></table>
Now all I needed to do was insert this in the right place in the html. I immediately thought of sed
, because I know that it is easy to append content after a line matching a pattern using the a
(for append) command. I did not know if I could read from a file though. It turns out that GNU sed can do this! Stackoverflow came to the rescue. The top answer seemed very promising, which suggested
sed -e '/StandardMessageTrailer/r file2' -e 'x;$G' file1
Unfortunately, this didn’t do exactly what I wanted, so I had to read some of the sed manual to actually understand what this command does. And thus I am writing this post so I can remember what these commands do. The r command reads a file, and then inserts it into the output stream. The problem with the above code is that it inserts the contents before the matching pattern, and I want it afterwards. After some trial and error and reading the manual, I discovered that the reason for this is because of the second -e command, which uses the x
command which “switches the pattern and the hold space”. That is, it puts the file contents before the pattern. If I get rid of this, then I get what I want. Also, if I ever want to insert the contents of a file into a stream before a line, I now know how to. According to the comments on stack overflow, the $G
assures that even if the pattern is matched on the last line of the file that it will work. So what does that do exactly? At first, I thought that $G
was some sort of global pre-defined variable like awk
has, but after more reading, I finally understood what it means. $
means match the last line of the file (as I should have already known) and according to the manual the function of G
is to “Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space.”
My final command was thus:
sed -i '/class="table"/r old-stuff-we-visited.html' annual-report-2021/index.html
Okay – here’s a question left to the reader – how would I accomplish this if I had multiple tables? I have decided not to worry about that myself until it actually comes up. At that point, I might resort to something more complicated (I suppose I could maybe output a unique placeholder per table from the LaTeX source, and then match on that. For another day.