I can think of some reasons why folks might use the Googlebot user agent on their non-Google bots, but I can’t think of any good, upstanding reasons to do it. Here’s how one might find some fine folks who would do such a thing. As of right now (May 2018), all valid Google Bot source
Tag: data wrangling
[crayon-5dfe407cddb37949708728/] Does this look familiar? Maybe you need more fiber in your diet. Or maybe you need THIS: [crayon-5dfe407cddb3b447145435/] You’re welcome.
Logwatch is a great utility for emailing me a summary of system logs over the last 24 hours. One of the things it shows are unsuccessful login attempts and their source IP addresses. But the default unsorted output is hard to analyze and take action on, since a single IP may appear many times in
When a request is logged in Apache’s common or combined format, it doesn’t actually show you how much time each request took to complete. To make reading logs a bit more confusing, each request is logged only once it’s completed. So a long-running request may have an earlier start time but appear later in the
You don’t see stuff like this everyday (I hope). [crayon-5dfe407cddbd4809420186/] [crayon-5dfe407cddbd6330986335/]
GNU sort is normally crazy fast at what it does. However, recently I was trying to sort & unique several huge files and it seemed to be taking way too long. I did a little googling, and realized that it takes a lot longer to sort the full range of Unicode characters because it has
proprietary insecurity I’ve accumulated many notes (2000+) in Evernote over the years, and love that it can store binary attachments such as images or other media files. My favorite feature is the Evernote Web Clipper browser extension; it does a fantastic job at saving the parts of an article I want to save while keeping
My new favorite tool is a python program called distribution that can easily show histograms in your terminal: [crayon-5dfe407cddd3b439834911/] I used homebrew to install it, but you can see some usage examples and a few other tools on this stackoverflow page. I eagerly anticipate showing off some histograms to people.
I find that using an idiom like [crayon-5dfe407cddd7e510048136/] is so useful. It replaces the replstr (“%” in this example) with all the arguments at once, or as many as can fit without going over the system’s limit. I couldn’t believe it when I learned that the GNU version of xargs lacks this flag. Yes, it’s
Just a quick one-liner for posterity. [crayon-5dfe407cdddf2150544611/]
I was experiencing a pretty bad slowdown while trying to use the admin pages of a WordPress site recently. The load on the machine was quite low, so I began to suspect that it was trying to call out to external services (facebook, pinterest, etc) that might have been blocked by CSF (configserver firewall). I
I recently skimmed a paper showing the success of attacking the security of various password database file formats. The only one which withstood both the passive and active attacks was the Password Safe format.
UPDATE: Major derp moment on my part, thinking that you needed a loop in AWK to print all but one fields. Commandlinefu just cause a forehead-slapping moment when I saw this in my feed: [crayon-5dfe407cdde7b031552250/] So, it seems AWK wins again. Carry on. If you’re trying to print one or more particular columns from some
While trying to move an older code base to a newer system and thus a newer version of PHP (5.3 -> 5.5), I knew that some of the code would need to be changed to avoid using some removed features. Specifically, I mean call-time pass by references. For those who don’t know, this is kind
…from two lists with some overlap. Spent some time working in Python on this problem. Afterwards, I realized it’s a shell one-liner. comm -23 <(sort f_most) <(sort f_some) | sort -n > f_uniq_to_1 I re-sort the output numerically since comm assumes its input is sorted lexicographically, and I happen to be comparing lists of numbers.