Speed of the sort command

GNU sort is normally crazy fast at what it does. However, recently I was trying to sort & unique several huge files and it seemed to be taking way too long. I did a little googling, and realized that it takes a lot longer to sort the full range of Unicode characters because it has to decode one or more bytes (UTF-8) before deciding where a character should be placed. There’s an easy way to increase the speed of the sort command, given a few caveats.

I’m not sure how I haven’t run into this already, but I love whenever I run into one of these little gems. The solution is pretty simple:

The C locale simply uses byte-ordering, so non-ASCII characters may end up in the wrong place. If you don’t need strict lexicographical sort, just a consistent sort, this seems to be the way to go.

WordPress performance problem with many posts

If you have a ton of posts in your WordPress blog (we have over 35K in one site at work), it turns out that the Previous and Next links on each post may be running a tough query on your database.

I wanted to know why MySQL was using so much CPU and wrongly assumed it was due to a bad tuning effort (it usually is). I googled “SELECT p.ID FROM wp_posts AS p INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id” which was in my output of MySQL’s show full processlist command. It led me to this StackExchange page which showed an alternative, more efficient version of the WP function calls that produce those previous and next links.

In our case, we just didn’t need those links and our theme let us turn them off from the admin. An instant and dramatic drop in CPU by MySQL ensued.

Raspberry Pi SSH cipher speed

I was curious to see how quickly I could transfer files to my Pi using SSH rather than FTP. Obviously using FTP is way faster than almost any other method, but still I wanted to see how fast I could transfer data over SSH.

Here’s the time it took to transfer a 50 MB file to my Pi using different SSH ciphers.

I later re-tested the aes128-ctr cipher and it took about a second less than what I’d recorded initially. This boils down to:

  • Don’t use triple-DES ever, for both performance and security reasons
  • Most other ciphers give about the same performance, and are generally considered secure
  • arcfour is the fastest class of ciphers, but there is less trust in it from the crypto community. If you’re going to use it, try to avoid the base arcfour cipher and instead use the 128 or 256 version, which tosses out some of the initial bits as a precaution