kworker using cpu on an otherwise idle system

I have an old thin client that I upgraded to a home server by adding some additional RAM and storage. I noticed after a recent kernel upgrade that the system seemed sluggish at times, despite doing nothing in particular at the time. top showed that a kworker process was using CPU, not all of it, but perhaps 25 to 50% of the total CPU.

I did a lot of searching to try and track down the offender. I used tools such as perf and iotop, read about various tunables under /proc related to power management. Finally, I ran Intel’s powertop command. It showed that “Audio codec alsa…” was hammering on some event loop.

I looked at the loaded kernel modules, and on a whim, I did sudo rmmod snd_hda_intel and that fixed the issue for me.

Others may find that a kworker is running in a tight loop for some other reason. It could be some other misbehaving driver or an I/O problem.

Finding how much time Apache requests take

When a request is logged in Apache’s common or combined format, it doesn’t actually show you how much time each request took to complete. To make reading logs a bit more confusing, each request is logged only once it’s completed. So a long-running request may have an earlier start time but appear later in the log than quicker requests.

To help look at some timing info without going deep enough to need a debugger, I decided that step one was to use a custom log format that saved the total request time. After adding usec:%D to the end of my Apache custom log format, we can now see how long various requests are taking to complete.

tail -q -1000 *access.log | mawk 'FS="(:|GET |POST | HTTP/1.1|\")" {print $NF" "$6}' | sort -nr | head -100 > /tmp/heavy2

I’m using the “%D” format for compatibility with older Apache releases, which reports the response time in microseconds. I would prefer milliseconds, but when I tried using “%{ms}T” on a server running 2.4.7, it didn’t work; too old. This output is a bit hard to read when looking at the numbers, so we can try to add in a little visual aid with commas as the thousands separator.

cat /tmp/heavy2 | xargs -L 1 printf "%'d %s\n" | less

Note that because we are measuring the total request time, some of the numbers may be high due to remote network latency or a slow client. I recommend correlating several samples before blaming some piece of local application code.

Hope this helps finding your long-running requests!

Keep getting logged out from Selfoss on Debian

I’m running Selfoss RSS reader and loving it!

One thing I don’t love is that it logs me out frequently (BTW, I’m running Apache php-fpm on Debian Jessie). But I think I found a solution. Try adding this to a file called .user.ini in the document root of Selfoss:

The 604800 means one week. If you’re running mod_php rather than FPM, you can add these lines to your .htaccess file.

UPDATE: The format for .user.ini is not the same used in .htaccess. The .user.ini version looks like this:

Use whatever cache_limiter() setting suits your needs best.

Apache, Fastcgi, PHP 7 on Debian Wheezy & Ubuntu 14.04

Intro: The Tyranny of Prefork

There are a lot of tutorials out there that go through the rote instructions on upgrading your Debian or Ubuntu system to use PHP 7. While I’m sure most of them are fine, they assume you’d want to use the prefork process model or event/threaded via CGI (via proxy and fcgi modules). While prefork is certainly battle-tested, it uses a ton more memory than it needs to, so I’m going to document how to upgrade an existing Fastcgi install to PHP 7. Continue reading “Apache, Fastcgi, PHP 7 on Debian Wheezy & Ubuntu 14.04”

Debian server DNS bogosity

Note: I’m running my Raspberry Pi as a server, and NetworkManager is not installed.

I discovered that if you want to manually assign search and nameserver entries in your /etc/resolv.conf file, you can’t just add the relevant entries to static entry in /etc/network/interfaces:

For some unknown reason, the resolvconf utility will still attempt to query an upstream DHCP server to get additional name service data. I don’t know why it works this way, I believe it should be hands-off if you’ve specified static in your interfaces file. I finally found that dhcpcd was called to get the info, and added the following line to /etc/dhcpcd.conf to disable actions relating to eth0:

I suppose if I wanted additional interfaces to work properly using dhcp, I’d have to get rid of all this and configure each interface manually via NetworkManager or wicd.

Sometimes, you don’t have any inodes left

You notice something is wrong with your system. I’ll just put this error message here for the sake of the Googles:

That stinks.
Continue reading “Sometimes, you don’t have any inodes left”

WordPress performance problem with many posts

If you have a ton of posts in your WordPress blog (we have over 35K in one site at work), it turns out that the Previous and Next links on each post may be running a tough query on your database.

I wanted to know why MySQL was using so much CPU and wrongly assumed it was due to a bad tuning effort (it usually is). I googled “SELECT p.ID FROM wp_posts AS p INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id” which was in my output of MySQL’s show full processlist command. It led me to this StackExchange page which showed an alternative, more efficient version of the WP function calls that produce those previous and next links.

In our case, we just didn’t need those links and our theme let us turn them off from the admin. An instant and dramatic drop in CPU by MySQL ensued.

Allow webapps to make outgoing requests

I was experiencing a pretty bad slowdown while trying to use the admin pages of a WordPress site recently. The load on the machine was quite low, so I began to suspect that it was trying to call out to external services (facebook, pinterest, etc) that might have been blocked by CSF (configserver firewall).

I started playing around with tcpdump and friends and then realized that the information I was looking for (blocked outgoing requests) was already being logged in /var/log/kern.log on our Ubuntu system (same on Debian). Continue reading “Allow webapps to make outgoing requests”

Clone hard disk with rsync

I recently wanted to move a system over to a faster, larger SSD. I didn’t want to have to re-install an OS, figure out which old files to transfer over, and then re-configure everything. That’s not a fun time in my book.

Here’s what I did (on a live system, yeah!) to clone my disk. Note that this may cause data loss, don’t blame me, keep backups, blah blah…

First, use a partition tool like GNU parted to create a nice big partition on the new drive and mark it as bootable. Leave some space for other partitions or swap space. If you use a separate /boot partition, then I think that needs the bootable flag instead. I’m only using a single root partition and swap. For the purposes of this tutorial, I’ll call my new root partition /dev/sdb1. YMMV.

Wait a while.

Take note of the UUID listed for /dev/sdb.

Or use whatever editor you like and put the UUID for /dev/sdb in place of the existing UUID for /.

Now you should just need to swap out the drives.

Scripted WordPress Upgrades

This command line interface for administering WordPress is called wp-cli. It’s pretty great.

I wrote a script to run from cron for updating a bunch of different WP installs in the same directory.

Fix for LFD error in syslog

I noticed that I was getting emails from LFD (part of the ConfigServer Firewall package) about failing to find some added check line it was sending to syslog.

The syslog message looks like this:
lfd[%d]: *SYSLOG CHECK* Failed to detect check line [%s] sent to SYSLOG

Of course I’ve replaced the pid with %d and the check string that it’s looking for with %s, since that will vary.

The fix is simple. Just like how you may need to adjust the path in /etc/csf/csf.conf to the real location of the ipset binary, you also may need to set where your SYSLOG messages are going. On an Ubuntu system, that means /var/log/syslog rather than /var/log/messages. Then just run csf -r to restart LFD with the new settings.

UPDATE:
/var/log/messages appears in more than just csf.conf. Since /var/log/messages doesn’t exist on my system, I’m just going to symlink it to syslog and see what happens.

UPDATE 2:
OK, I thought better of it and just modified csf.syslogs and csf.logfiles. I deleted that messages symlink in /var/log next. LFD was still being a little bitch after I restarted using csf -r, so I ran service lfd stop and then started it again.

Switching from APF to CSF

CSF logoI was enjoying trying out APF on my Raspberry Pi, but I noticed that it wasn’t blocking repeat attackers the way I wanted it to. fail2ban was working the way it was supposed to work, but it only blocks temporarily, and I never figured out why the gamin back-end to continuously monitor log files didn’t work reliably. I tried to work around that with some extra iptables rules, but was still getting hammered by folks. It made me sad.

ConfigServer Security & Firewall, CSF, has been great so far. Reading through the main config file takes time but that’s good because it’s so well documented. I admit I’m not digging the extensive tuning needed to stop the seemingly endless squawking about IDS-related features (process resources, funny process names, custom cron scripts, etc.) so for now that’s turned off. I may fine-tune it soon.

Other things I like about CSF: optional automatic updates, built-in connection limits and rate limits, the idea of having separate allowed and ignored groups (allowed group may still be banned if not also in the ignored group, which is a nuanced distinction), lots of flexibility & customization, and it also has IPSET support for ultra-fast rule matching!

Fix the broken APF package on Debian/Ubuntu

R-fx networks logoThe Debian / Ubuntu package for Advanced Policy Firewall (APF) seems a bit unmaintained. By default it won’t run without some initial tweaking. Note that they probably want everyone to just download and run the installer from their site nowadays, but that’s not how I roll (usually).

In functions.apf, change the line

to

That allows the basic functionality of the software to work. Next, for the sake of upgrade-ability, I copy /etc/apf-firewall/conf.apf to /etc/apf-firewall/conf.apf.my. Then the only change needed to the installed config is to source the .my file. Here’s the bottom of the file:

Since it won’t work if you try to source the internals.conf file twice, you need to make sure that the last line in the .my file is commented or removed. Now you can edit the other values in the .my file to your liking. Remember to turn off devel mode and change /etc/default/apf-firewall when you’re satisfied with any config changes, then restart the service in the usual way.

APF, fail2ban & more

APF is wonderful for a good-enough firewall solution for a lot of people. But what if you also want the power of another great tool, fail2ban?

The problem is, fail2ban wants to make changes directly to iptables, which APF is maintaining. Rules that fail2ban writes will be overwritten by APF. I found the solution is pretty straightforward: fail2ban with APF. This works really well.

No, I still didn’t think it was enough. Why? Because the “gamin” backend of fail2ban is buggy on Debian / Ubuntu, and tends to stop working after some time. So, I use the polling backend, which means it seems to wait about 30 seconds or so between checks of the log files. Today I saw a very persistent bot become unbanned and immediately get banned again, about 30-40 seconds later. How many attempts had it made on the SSH server while unbanned? I didn’t even stop to check, I just wanted to find a way to react more quickly.

Search for “limit connection rate linux” or whatnot on the Googs and you’ll find a number of sites with basically the same solution. It looks like this:

This works well if you’re manually (or using home-brewed scripts) to tamp down the baddies. But again it’s modifying iptables directly, which we don’t want. A couple more searches yielded a promising page that said to add any custom rules you need to APF’s postroute.rules file. However, using the plain -I or -A options to iptables without a line number doesn’t work right, since the order of rules, and what happens at the top and bottom of a ruleset are significant. So we need to add the rules just above the first blanket SSH ACCEPT rule. Here’s what I did in postroute.rules:

Notice that I also added the --name option, to prevent a conflict with the default name that APF uses.