2013/04/11

Cool Shell One-Liner of the Day

awk -F, '{print $1}' CSV | sort | uniq -c | grep -vw 1 | tee /dev/tty | wc -l

UPDATE: I went back and saw this post and thought to myself, “Self, why didn’t you annotate this garbage, you cheeky bastard?” OK, so the first part is pretty clear: get the first (or whichever) column you want from a simple (unquoted) csv file, and then count dupes. The grep is where we remove non-dupes and should probably be grep -Ev '^ *1 ' to avoid matching any of the csv data. Now here’s the magic. The pipe to tee /dev/tty echoes everything to stdout, but one copy of the output can go through more pipes before being displayed. So the wc -l is actually counting the number of entries which have duplicates (not the total number of all duplicates!), and displays that number at the bottom.

Here’s the tail end of what I get from this on a sample csv:

   6 WOOD
   4 WRIGHT
   3 YOUNG
   2 ZIMMERMAN
     360

6 WOOD

4 WRIGHT

3 YOUNG

2 ZIMMERMAN

360

Cool Shell One-Liner of the Day is original content from devolve.

Tags:command line, data wrangling

About The Author

Charlie Herron

Denizen of Portland, Maine; tech jack; lover / hater / whatever; philosophical dabbler. http://twitter.com/realgeek