{"id":773,"date":"2016-09-05T15:19:01","date_gmt":"2016-09-05T19:19:01","guid":{"rendered":"https:\/\/www.devolve.net\/blog\/?p=773"},"modified":"2018-07-13T10:17:08","modified_gmt":"2018-07-13T14:17:08","slug":"speed-sort-command","status":"publish","type":"post","link":"https:\/\/www.devolve.local\/speed-sort-command\/","title":{"rendered":"Speed of the sort command"},"content":{"rendered":"

GNU sort is normally crazy fast at what it does. However, recently I was trying to sort & unique several huge files and it seemed to be taking way too long. I did a little googling, and realized that it takes a lot longer to sort the full range of Unicode characters because it has to decode one or more bytes (UTF-8) before deciding where a character should be placed. There’s an easy way to increase the speed of the sort command, given a few caveats.<\/p>\n

I’m not sure how I haven’t run into this already, but I love whenever I run into one of these little gems. The solution is pretty simple:<\/p>\n

LC_ALL=C sort -uo uniqueoutput biginput1 biginput2<\/pre>\n

The C locale simply uses byte-ordering, so non-ASCII characters may end up in the wrong place. If you don’t need strict lexicographical sort, just a consistent sort, this seems to be the way to go.<\/p>\n","protected":false},"excerpt":{"rendered":"

GNU sort is normally crazy fast at what it does. However, recently I was trying to sort & unique several huge files and it seemed to be taking way too long. I did a little googling, and realized that it takes a lot longer to sort the full range of Unicode characters because it has […]<\/p>\n","protected":false},"author":3,"featured_media":775,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[34,41,25],"_links":{"self":[{"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/posts\/773"}],"collection":[{"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/comments?post=773"}],"version-history":[{"count":3,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/posts\/773\/revisions"}],"predecessor-version":[{"id":777,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/posts\/773\/revisions\/777"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/media\/775"}],"wp:attachment":[{"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/media?parent=773"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/categories?post=773"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/tags?post=773"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}