sort using TAB as field separator in bash

I have run into this problem several times recently, and decided to finally write down the solution for myself rather than keep searching the internet for it.

This is the problem: if you want to sort a file that is tab-delimited (and some of the filelds contain spaces), then you must explicitly tell sort to use TABS as the field separator, otherwise it will use any whitespace character. For functions such as cut and paste, this can be done like so:

cut -f 1 -d '\t' file

where -f specifies the field number and -d specifies the field seperator.
The sort command uses the -t flag instead. So one would think that this would work:


#INCORRECT
sort -k 2 -t '\t' file

where -k specifies the field number and -t specifies the field separator
Unfortunately this does not work, because sort won’t accept ‘\t’, since it treats it as a multi-byte character. The solution is to place a $ before it, like so:


#CORRECT
sort -k 2 -t $'\t' file

The dollar sign tells bash to use ANSI-C quoting
From: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_03.html

3.3.5. ANSI-C quoting

Words in the form “$’STRING’” are treated in a special way. The word expands to a string, with backslash-escaped characters replaced as specified by the ANSI-C standard. Backslash escape sequences can be found in the Bash documentation.

So now I have the answer for myself the next time the problem arises. I hope someone else benefits as well.

Join 164 other subscribers

Archives

  • 2024 (5)
  • 2023 (8)
  • 2022 (15)
  • 2021 (19)
  • 2020 (1)
  • 2019 (1)
  • 2018 (2)
  • 2017 (1)
  • 2016 (2)
  • 2015 (5)
  • 2014 (5)
  • 2013 (2)
  • 2011 (7)
  • 2010 (10)
  • 2009 (50)
  • 2008 (28)
  • 2007 (31)
  • 2006 (8)

Category