WordPress performance problem with many posts

If you have a ton of posts in your WordPress blog (we have over 35K in one site at work), it turns out that the Previous and Next links on each post may be running a tough query on your database.

I wanted to know why MySQL was using so much CPU and wrongly assumed it was due to a bad tuning effort (it usually is). I googled “SELECT p.ID FROM wp_posts AS p INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id” which was in my output of MySQL’s show full processlist command. It led me to this StackExchange page which showed an alternative, more efficient version of the WP function calls that produce those previous and next links.

In our case, we just didn’t need those links and our theme let us turn them off from the admin. An instant and dramatic drop in CPU by MySQL ensued.

Allow webapps to make outgoing requests

I was experiencing a pretty bad slowdown while trying to use the admin pages of a WordPress site recently. The load on the machine was quite low, so I began to suspect that it was trying to call out to external services (facebook, pinterest, etc) that might have been blocked by CSF (configserver firewall).

I started playing around with tcpdump and friends and then realized that the information I was looking for (blocked outgoing requests) was already being logged in /var/log/kern.log on our Ubuntu system (same on Debian). Continue reading “Allow webapps to make outgoing requests”

Prepare a PDF file for OCR

If you have some need to OCR some text from a PDF or image file, you may want to use a tool like tesseract to do the job. But it won’t take any old input file, you’ll probably need to convert it first.

The first error I got from tesseract was

The Googles indicated that I can’t pass a PDF to it directly. Then I found that one format it will take is tiff. Continue reading “Prepare a PDF file for OCR”

Discard first column without AWK

UPDATE: Major derp moment on my part, thinking that you needed a loop in AWK to print all but one fields. Commandlinefu just cause a forehead-slapping moment when I saw this in my feed:

So, it seems AWK wins again. Carry on.

If you’re trying to print one or more particular columns from some input it is quite straightforward with AWK. You’d simply specify the variable(s) you know exist from the input (e.g.,

). However, it’s pretty AWKward (sorry) to omit one column of data and to print the rest, particularly if you don’t know exactly how many columns of input are expected on each line. Then you’d need to actually program a loop in AWK. Ugh. Continue reading “Discard first column without AWK”

Raspberry Pi can do fast video encoding

Yes, the Raspberry Pi can do fast video encoding. Of course you normally wouldn’t want to re-encode any video with an ARM processor, but that’s not what we’re going to do here. We’re going to leverage the GPU. I should point out before proceeding that the input formats for re-encoding are limited in this method, more about that below.

In order to do this, I’m using a proof-of-concept tool called omxtx, which I think is supposed to be a shortened form of “OpenMAX Transcoding”. Off the top of my head, here are the prerequisites for building the binary from source:

  • Raspbian. It will probably work on other RPi distros, but I haven’t tried them.
  • The build-essential package installed, which you normally need to build anything.
  • Memory split of 64MB for video. I previously had this all the way down to 16 since I don’t use a display on my Pi, but bumping it to only 32MB caused runtime errors from the omxtx binary. You need to give the GPU some breathing room to encode video.
  • There’s probably some libraries you may or may not have installed that the build wants to link in. When I run ldd on my finished binary, it loads all kinds of media libs like libav, libvorbis, libvpx, etc. YMMV.

Continue reading “Raspberry Pi can do fast video encoding”

Clone hard disk with rsync

I recently wanted to move a system over to a faster, larger SSD. I didn’t want to have to re-install an OS, figure out which old files to transfer over, and then re-configure everything. That’s not a fun time in my book.

Here’s what I did (on a live system, yeah!) to clone my disk. Note that this may cause data loss, don’t blame me, keep backups, blah blah…

First, use a partition tool like GNU parted to create a nice big partition on the new drive and mark it as bootable. Leave some space for other partitions or swap space. If you use a separate /boot partition, then I think that needs the bootable flag instead. I’m only using a single root partition and swap. For the purposes of this tutorial, I’ll call my new root partition /dev/sdb1. YMMV.

Wait a while.

Take note of the UUID listed for /dev/sdb.

Or use whatever editor you like and put the UUID for /dev/sdb in place of the existing UUID for /.

Now you should just need to swap out the drives.

Self-hosted open source RSS readers

I think I’ve tried pretty much all of them. After the Google Reader-pocalypse, one of the primary requirements was that I could host it myself. Bonus points go to apps that have configurable keyboard navigation (“j” to open the next item must be distinct from “space” to just scroll down in the browser), as well as decent integration on mobile. Here’s a roundup of the ones I’ve tried.

Newsblur

Awesome platform, but way too big for someone looking to host their own personal solution. I tried upgrading it once and broke it. No idea what I did wrong or how to even figure out how why it wasn’t working. Seems very well designed for a massive multi-user operation, though, if you’ve got the Python chops to figure everything out. Newsblur website.

Commafeed

Commafeed is also a larger piece of software, but requires many fewer components than Newsblur. You need Java, some java tools like maven, a DB and of course more than a little bit of RAM.

TT-RSS (Tiny Tiny RSS)

Nice, but not as configurable as I’d like. This and the rest of the readers listed are written in PHP. There are three larger downsides to tt-rss:

  • I had quite a bit of trouble trying to get it to run from a subdirectory on Nginx. This is not necessarily specific to tt-rss, many apps are hard to config this way.
  • The primary developer is not friendly. He seems to take pleasure in ridiculing people in the support forums.
  • Although it’s supposed to be tiny, and the application part is, it requires Postgres or MySQL with InnoDB support. I would prefer something that uses less memory on the DB side, either MyISAM tables or better yet SQLite.

SelfOSS

I ran SelfOSS for a while and liked it. However, I didn’t like the Android experience (what, no swipe?) so I went looking for something else.

FreshRSS

I’m currently running FreshRSS and it’s really, really good. But I’m starting to get discouraged by a few nagging bugs and the lack of recent updates to the github repo.

Miniflux

I ran Miniflux for a short time a while back and my memory is a bit hazy on the experience (after a while RSS reader experiences tend to blend in with one another). I think I’m going to give it another shot. On his site, reading down the list of what Miniflux is not vs what it is makes me take heart. The developer is clearly trying to convey a no-BS attitude with his intentions for this app. One thing that gives me a spark of hope is that there was a new point release this month. I will update this post with any news with Miniflux.

Finding call-time pass by references in PHP.

While trying to move an older code base to a newer system and thus a newer version of PHP (5.3 -> 5.5), I knew that some of the code would need to be changed to avoid using some removed features. Specifically, I mean call-time pass by references. For those who don’t know, this is kind of a weird feature of earlier versions of PHP that allows one to call a function and pass any of the arguments by reference rather than the usual call by value if the caller prepends an argument variable with the “reference to” operator &.

So, to illustrate, normally this code won’t have side effects because of call by value:

However there will be side effects if the caller chooses pass by reference:

I thought a regex might be in order to find these guys and fix them:

but it was a naive idea, and this regex devolved (heh) to its current form before I realized I could just use the built-in linter to find the problem spots.

HTH

Scripted WordPress Upgrades

This command line interface for administering WordPress is called wp-cli. It’s pretty great.

I wrote a script to run from cron for updating a bunch of different WP installs in the same directory.

Fix for LFD error in syslog

I noticed that I was getting emails from LFD (part of the ConfigServer Firewall package) about failing to find some added check line it was sending to syslog.

The syslog message looks like this:
lfd[%d]: *SYSLOG CHECK* Failed to detect check line [%s] sent to SYSLOG

Of course I’ve replaced the pid with %d and the check string that it’s looking for with %s, since that will vary.

The fix is simple. Just like how you may need to adjust the path in /etc/csf/csf.conf to the real location of the ipset binary, you also may need to set where your SYSLOG messages are going. On an Ubuntu system, that means /var/log/syslog rather than /var/log/messages. Then just run csf -r to restart LFD with the new settings.

UPDATE:
/var/log/messages appears in more than just csf.conf. Since /var/log/messages doesn’t exist on my system, I’m just going to symlink it to syslog and see what happens.

UPDATE 2:
OK, I thought better of it and just modified csf.syslogs and csf.logfiles. I deleted that messages symlink in /var/log next. LFD was still being a little bitch after I restarted using csf -r, so I ran service lfd stop and then started it again.

Switching from APF to CSF

CSF logoI was enjoying trying out APF on my Raspberry Pi, but I noticed that it wasn’t blocking repeat attackers the way I wanted it to. fail2ban was working the way it was supposed to work, but it only blocks temporarily, and I never figured out why the gamin back-end to continuously monitor log files didn’t work reliably. I tried to work around that with some extra iptables rules, but was still getting hammered by folks. It made me sad.

ConfigServer Security & Firewall, CSF, has been great so far. Reading through the main config file takes time but that’s good because it’s so well documented. I admit I’m not digging the extensive tuning needed to stop the seemingly endless squawking about IDS-related features (process resources, funny process names, custom cron scripts, etc.) so for now that’s turned off. I may fine-tune it soon.

Other things I like about CSF: optional automatic updates, built-in connection limits and rate limits, the idea of having separate allowed and ignored groups (allowed group may still be banned if not also in the ignored group, which is a nuanced distinction), lots of flexibility & customization, and it also has IPSET support for ultra-fast rule matching!

Long Live ControlPlane!

I’ve written about Locamatic before, and while it’s good at what it does, there are some definite drawbacks. For one, the most recent version is alpha quality and stated for use on Mountain Lion since prior versions won’t work anymore on a newer system. But as of this writing, Mountain Lion was two major releases ago. I think it’s safe to say that development has stalled, and that’s OK. Continue reading “Long Live ControlPlane!”

Re-map the Caps Lock Key to Esc on Mac

Seil IconSeil is a very cool utility for key re-mapping / enabling international keys on a Mac. I wanted to re-map the mostly useless Caps Lock key to Esc, which I use constantly in vi. If you’re a regular vi user, you know exactly what I’m talking about. Now I just have to develop the muscle memory to start using it regularly instead of reaching way up for the usual Esc key.