UNIX tip of the day: Inserting files with sed

Bildschirmfoto 2022-01-12 um 10.46.06

Today I was working on my family 2021 annual report. Usually I write this in December, but I am a bit behind this year. We have been writing one of these since 2006, and we generally have a lot of fun doing it. I write them in LaTeX, because I like LaTeX, and it produces really high-quality print format. Some years, we actually mail them out, some years we send them digitally. I also publish an html version on our family website – for this I use PlasTeX, which is a python-based LaTeX to html converter (actually I think it can convert to other formats as well, but I am only interested in html right now). It seems like every year I spend some of my time fiddling with PlasTeX to get it working for me, but that is fine – I enjoy it. This year, I decided to include a table in the report. For this I like to use the tabularx package, because it let’s you specify column widths very exactly. I got an error trying to compile the html, and it seems that PlasTeX doesn’t support tabularx. Maybe I could write my own renderer, but for now I decided to just do a hack. Actually, I have several hacks. I created the table first as a tab-separated value (tsv) file (same as a csv, except with tabs instead of commas – this way I don’t have to parse the content, since data is very unlikely to contain tabs, but fairly likely to contain commas). In order to convert the tsv file into LaTeX tabular format, I use this simple two-liner. The sed command simply replaces all tab characters with &, which is the delimiter used in LaTeX, and the end of each row needs to end with two backslashes. I discovered that in order to put a line at the end of the table, the \hline command needs to e on the same line. My second command only modifies the last line of the file (the $ matches the last line).

# Convert tsv file to tex table
tail -n +2 old-stuff-we-visited.tsv | sed -e 's/\t/ \& /g;s/$/ \\\\/' > old-stuff-we-visited.tex
# put hline on last line of file
sed -i '$s/\(\\\\\)/\1 \\\hline/' old-stuff-we-visited.tex

My input file looks like this:

date_visited	monument	location	date_erected
2017-05-28	Nördlingen	 Germany	1300
2017-06-01	Nuremberg castle	 Germany	1000
2017-06-10	Aachen cathedral	 Germany	795
2017-06-11	Cologne Cathedral	Germany	1248
2017-06-15	Notre Dame	 Paris, France	1163
2018-07-15	Cochem castle	Cochem, Germany	1100
2018-10-20	Segovia Aqueduct	 Spain	98
2018-10-20	Segovia Aqueduct	 Spain	98
2019-04-20	Porta Nigra	 Trier, Germany	170
2019-10-20	Pompei	 Italy	-500
2019-12-21	Stone Henge	 England	-2900
2020-10-25	Akropolis	 Athens, Greece	-700
2020-10-20	Akrotiri	 Greece	-2500
2021-10-20	Tarxien Temples	 Malta	-3200

My output file looks like this:

2017-05-28 & Nördlingen &  Germany & 1300 \\
2017-06-01 & Nuremberg castle &  Germany & 1000 \\
2017-06-10 & Aachen cathedral &  Germany & 795 \\
2017-06-11 & Cologne Cathedral & Germany & 1248 \\
2017-06-15 & Notre Dame &  Paris, France & 1163 \\
2018-07-15 & Cochem castle & Cochem, Germany & 1100 \\
2018-10-20 & Segovia Aqueduct &  Spain & 98 \\
2018-10-20 & Segovia Aqueduct &  Spain & 98 \\
2019-04-20 & Porta Nigra &  Trier, Germany & 170 \\
2019-10-20 & Pompei &  Italy & -500 \\
2019-12-21 & Stone Henge &  England & -2900 \\
2020-10-25 & Akropolis &  Athens, Greece & -700 \\
2020-10-20 & Akrotiri &  Greece & -2500 \\
2021-10-20 & Tarxien Temples &  Malta & -3200 \\ \hline

I was able to easily include this tex format for the pdf version of my document, but not for the html file, by using the \ifplastex conditional

\begin{table}[htb]
  \begin{center}
    \label{T:old-stuff}
    \ifplastex\else%
      \begin{tabularx}{\columnwidth}{ABCD}
        \hline\hline
        Date Visited   & Monument    & Location & Date erected \\
        \hline
        \input{old-stuff-we-visited.tex}
      \end{tabularx}
    \fi
  \end{center}
  \caption{When and where the Fedibbletys have seen old human-made structures}
\end{table}

Okay – so back to PlasTex – I decided that the easiest way to include this table in html format using plastex was to modify the html after running PlasTeX. Again, I took my tsv file as the source of truth, and converted it with a couple lines of sed.

#convert the tsv file to html format
echo "<table><thead>" > old-stuff-we-visited.html
head -n 1 old-stuff-we-visited.tsv | sed -e 's|\t|</th><th>|g;s|$|</th></tr>|;s|^|<tr><th>|' >> old-stuff-we-visited.html
echo "</thead><tbody>" >> old-stuff-we-visited.html
tail -n +2 old-stuff-we-visited.tsv | sed -e 's|\t|</td><td>|g;s|$|</td></tr>|;s|^|<tr><td>|' >> old-stuff-we-visited.html
echo "</tbody></table>" >> old-stuff-we-visited.html

My html version of the table now looks like (I’m not going to worry about indenting)

<table><thead>
<tr><td>date_visited</td><td>monument</td><td>location</td><td>date_erected</td></tr>
</thead><tbody>
<tr><td>2017-05-28</td><td>Nördlingen</td><td> Germany</td><td>1300</td></tr>
<tr><td>2017-06-01</td><td>Nuremberg castle</td><td> Germany</td><td>1000</td></tr>
<tr><td>2017-06-10</td><td>Aachen cathedral</td><td> Germany</td><td>795</td></tr>
<tr><td>2017-06-11</td><td>Cologne Cathedral</td><td>Germany</td><td>1248</td></tr>
<tr><td>2017-06-15</td><td>Notre Dame</td><td> Paris, France</td><td>1163</td></tr>
<tr><td>2018-07-15</td><td>Cochem castle</td><td>Cochem, Germany</td><td>1100</td></tr>
<tr><td>2018-10-20</td><td>Segovia Aqueduct</td><td> Spain</td><td>98</td></tr>
<tr><td>2018-10-20</td><td>Segovia Aqueduct</td><td> Spain</td><td>98</td></tr>
<tr><td>2019-04-20</td><td>Porta Nigra</td><td> Trier, Germany</td><td>170</td></tr>
<tr><td>2019-10-20</td><td>Pompei</td><td> Italy</td><td>-500</td></tr>
<tr><td>2019-12-21</td><td>Stone Henge</td><td> England</td><td>-2900</td></tr>
<tr><td>2020-10-25</td><td>Akropolis</td><td> Athens, Greece</td><td>-700</td></tr>
<tr><td>2020-10-20</td><td>Akrotiri</td><td> Greece</td><td>-2500</td></tr>
<tr><td>2021-10-20</td><td>Tarxien Temples</td><td> Malta</td><td>-3200</td></tr>
</tbody></table>

Now all I needed to do was insert this in the right place in the html. I immediately thought of sed, because I know that it is easy to append content after a line matching a pattern using the a (for append) command. I did not know if I could read from a file though. It turns out that GNU sed can do this! Stackoverflow came to the rescue. The top answer seemed very promising, which suggested

sed  -e '/StandardMessageTrailer/r file2' -e 'x;$G' file1

Unfortunately, this didn’t do exactly what I wanted, so I had to read some of the sed manual to actually understand what this command does. And thus I am writing this post so I can remember what these commands do. The r command reads a file, and then inserts it into the output stream. The problem with the above code is that it inserts the contents before the matching pattern, and I want it afterwards. After some trial and error and reading the manual, I discovered that the reason for this is because of the second -e command, which uses the x command which “switches the pattern and the hold space”. That is, it puts the file contents before the pattern. If I get rid of this, then I get what I want. Also, if I ever want to insert the contents of a file into a stream before a line, I now know how to. According to the comments on stack overflow, the $G assures that even if the pattern is matched on the last line of the file that it will work. So what does that do exactly? At first, I thought that $G was some sort of global pre-defined variable like awk has, but after more reading, I finally understood what it means. $ means match the last line of the file (as I should have already known) and according to the manual the function of G is to “Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space.”

My final command was thus:

sed  -i '/class="table"/r old-stuff-we-visited.html'  annual-report-2021/index.html

Okay – here’s a question left to the reader – how would I accomplish this if I had multiple tables? I have decided not to worry about that myself until it actually comes up. At that point, I might resort to something more complicated (I suppose I could maybe output a unique placeholder per table from the LaTeX source, and then match on that. For another day.

Join 164 other subscribers

Archives

  • 2024 (5)
  • 2023 (8)
  • 2022 (15)
  • 2021 (19)
  • 2020 (1)
  • 2019 (1)
  • 2018 (2)
  • 2017 (1)
  • 2016 (2)
  • 2015 (5)
  • 2014 (5)
  • 2013 (2)
  • 2011 (7)
  • 2010 (10)
  • 2009 (50)
  • 2008 (28)
  • 2007 (31)
  • 2006 (8)

Category