{"id":519,"date":"2014-07-18T11:04:48","date_gmt":"2014-07-18T15:04:48","guid":{"rendered":"http:\/\/devolve.net\/blog\/?p=519"},"modified":"2018-07-13T10:17:08","modified_gmt":"2018-07-13T14:17:08","slug":"sort-u-versus-sort-uniq","status":"publish","type":"post","link":"https:\/\/www.devolve.local\/sort-u-versus-sort-uniq\/","title":{"rendered":"sort -u versus sort | uniq"},"content":{"rendered":"
I just ran into an interesting situation with “OK,” I thought, “I’ll just cat the two files into I think what happened is that it sorted the second column, but only unique’d by that column as well. By splitting the procedure across a pipe ( I just ran into an interesting situation with sort -u. I had generated a couple files with md5sum and they had a lot of equal lines in them. So I thought I would create a merged version.<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[34,41,16],"_links":{"self":[{"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/posts\/519"}],"collection":[{"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/comments?post=519"}],"version-history":[{"count":1,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/posts\/519\/revisions"}],"predecessor-version":[{"id":520,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/posts\/519\/revisions\/520"}],"wp:attachment":[{"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/media?parent=519"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/categories?post=519"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devolve.local\/wp-json\/wp\/v2\/tags?post=519"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}sort -u<\/code>. I had generated a couple files with
md5sum<\/code> and they had a lot of equal lines in them. So I thought I would create a merged version. <\/p>\n
sort -u<\/code>.” But the md5sum is in the first column and the file path in the second. So they were sorted by md5sum and the file paths were all out of order afterwards. “No problem, I’ll just tell which column to sort with
sort -k 2 -u<\/code>“. This seemed perfectly natural to me, but it didn’t produce the expected results. There weren’t duplicate paths with different md5sums, at least not that I could see.<\/p>\n
sort -k 2 | uniq<\/code>), you ensure that the whole data set is sorted before stripping out the non-unique entries.<\/p>\n","protected":false},"excerpt":{"rendered":"