Moving Evernote notes into WordPress

proprietary insecurity

I’ve accumulated many notes (2000+) in Evernote over the years, and love that it can store binary attachments such as images or other media files. My favorite feature is the Evernote Web Clipper browser extension; it does a fantastic job at saving the parts of an article I want to save while keeping the styling intact.

Evernote has a free plan which I’ve enjoyed for a long time, but recently the financial status of the company has come into question, and they restricted syncing to only two devices. Also, the last thing I want to happen is another kind of Google Reader shutdown fiasco. I doubt that a shutdown would make my existing notes disappear, but it’s better to be prepared ahead of time. To that extent, I’ve been looking for a viable option to migrate my notes into another platform. Continue reading “Moving Evernote notes into WordPress”

distribution: histograms in the terminal

My new favorite tool is a python program called distribution that can easily show histograms in your terminal:

I used homebrew to install it, but you can see some usage examples and a few other tools on this stackoverflow page. I eagerly anticipate showing off some histograms to people.

Discard first column without AWK

UPDATE: Major derp moment on my part, thinking that you needed a loop in AWK to print all but one fields. Commandlinefu just cause a forehead-slapping moment when I saw this in my feed:

So, it seems AWK wins again. Carry on.

If you’re trying to print one or more particular columns from some input it is quite straightforward with AWK. You’d simply specify the variable(s) you know exist from the input (e.g.,

). However, it’s pretty AWKward (sorry) to omit one column of data and to print the rest, particularly if you don’t know exactly how many columns of input are expected on each line. Then you’d need to actually program a loop in AWK. Ugh. Continue reading “Discard first column without AWK”

Get items unique only to list1

…from two lists with some overlap. Spent some time working in Python on this problem. Afterwards, I realized it’s a shell one-liner.

comm -23 <(sort f_most) <(sort f_some) | sort -n > f_uniq_to_1

I re-sort the output numerically since comm assumes its input is sorted lexicographically, and I happen to be comparing lists of numbers.

Machine-readable Dates

I had some directories named in the format of “Jul 18, 2012”. Thanks, iPhoto export, but no thanks.

Note: gdate is GNU date after doing homebrew install coreutils.
Continue reading “Machine-readable Dates”

join: the command

From the manual:

I had two CSVs, baz01.csv and baz02.csv. They shared the same first column, which was a list of database table names. The second column contained the number of rows from each table. The row numbers between the two files were different, and I wanted to compare them. The join command to the rescue!

gave me exactly what I wanted: the output contains the first identical column from both files, followed by column 2 of the baz01.csv, followed by the second column of baz02.csv, followed by the third column minus the second.

Of course, this will only work on the simplest CSV files, meaning no escaped or quoted commas allowed.

AWK blows me away

How did I not know this about awk!? Don’t get me wrong, I’m no awk expert; I’m always using some of the most simple and obvious features it has. But I almost always use the -F option to specify the field separator. Until today, I thought you could only give it either a single character or a string literal. Go ahead, laugh, I’ll wait.

From StackOverflow:

I just used this to help parse the output of drush pml. Since it uses more than one space to separate fields, I used

(that’s two spaces and a plus sign for the field separator). I initially tried using

but apparently that’s some kind of super advanced regex that awk doesn’t understand.

Another cool thing about the StackOverflow page is that one person shows how you can parse a CSV with embedded / escaped commas if you have gawk 4 installed:

MySQL engines, constraints & keys

I wanted to see how I could improve the performance of a MySQL database with mixed table engines by converting all the MyISAM tables to InnoDB, as well as make the huge DB responsive while backing up by using mysqldump with the --single-transaction option. I used the following PHP script (I know, spare me):

After looking at the table status following the script run, there was one table which was still set to use MyISAM. In the mysql shell, I tried manually altering the one table to use InnoDB, and then needed a bunch of additional commands to smooth out the DDL problems before MySQL was happy.

Of course, I edited out a whole mess of trial and error here. The issue was that MySQL wants any column you mark as auto_increment to have its own key, it doesn’t need to be primary; but it can’t be an aggregate which is what it had originally. My solution was to simply add a non-primary key to that column, while keeping the aggregate key. Using a primary key was out of the question since there are duplicate values in the aid column.

Turns out, all I really needed to do was add the individual index, then alter the table engine. Oh well, it was educational. :-)

climagic is magic

If you’re not following @climagic, you should be forced to listen to this for hours on end:

That’s just one of the many glorious bits from this timeline.

sed is great, but not that great


It turns out, sed has no concept of a non-greedy match. You have to use perl or some other advanced tool to get that regex feature. The workaround given at Stack Overflow only works if you have a single character ending match delimiter (in this case, it was [^/]+ to match until the next forward slash).

Cool Shell One-Liner of the Day

awk -F, '{print $1}' CSV | sort | uniq -c | grep -vw 1 | tee /dev/tty | wc -l

UPDATE: I went back and saw this post and thought to myself, “Self, why didn’t you annotate this garbage, you cheeky bastard?” OK, so the first part is pretty clear: get the first (or whichever) column you want from a simple (unquoted) csv file, and then count dupes. The grep is where we remove non-dupes and should probably be grep -Ev '^ *1 ' to avoid matching any of the csv data. Now here’s the magic. The pipe to tee /dev/tty echoes everything to stdout, but one copy of the output can go through more pipes before being displayed. So the wc -l is actually counting the number of entries which have duplicates (not the total number of all duplicates!), and displays that number at the bottom.

Here’s the tail end of what I get from this on a sample csv: