Prepare a PDF file for OCR

If you have some need to OCR some text from a PDF or image file, you may want to use a tool like tesseract to do the job. But it won’t take any old input file, you’ll probably need to convert it first.

The first error I got from tesseract was

The Googles indicated that I can’t pass a PDF to it directly. Then I found that one format it will take is tiff. Continue reading “Prepare a PDF file for OCR”

Discard first column without AWK

UPDATE: Major derp moment on my part, thinking that you needed a loop in AWK to print all but one fields. Commandlinefu just cause a forehead-slapping moment when I saw this in my feed:

So, it seems AWK wins again. Carry on.

If you’re trying to print one or more particular columns from some input it is quite straightforward with AWK. You’d simply specify the variable(s) you know exist from the input (e.g.,

). However, it’s pretty AWKward (sorry) to omit one column of data and to print the rest, particularly if you don’t know exactly how many columns of input are expected on each line. Then you’d need to actually program a loop in AWK. Ugh. Continue reading “Discard first column without AWK”

Raspberry Pi can do fast video encoding

Yes, the Raspberry Pi can do fast video encoding. Of course you normally wouldn’t want to re-encode any video with an ARM processor, but that’s not what we’re going to do here. We’re going to leverage the GPU. I should point out before proceeding that the input formats for re-encoding are limited in this method, more about that below.

In order to do this, I’m using a proof-of-concept tool called omxtx, which I think is supposed to be a shortened form of “OpenMAX Transcoding”. Off the top of my head, here are the prerequisites for building the binary from source:

  • Raspbian. It will probably work on other RPi distros, but I haven’t tried them.
  • The build-essential package installed, which you normally need to build anything.
  • Memory split of 64MB for video. I previously had this all the way down to 16 since I don’t use a display on my Pi, but bumping it to only 32MB caused runtime errors from the omxtx binary. You need to give the GPU some breathing room to encode video.
  • There’s probably some libraries you may or may not have installed that the build wants to link in. When I run ldd on my finished binary, it loads all kinds of media libs like libav, libvorbis, libvpx, etc. YMMV.

Continue reading “Raspberry Pi can do fast video encoding”

Clone hard disk with rsync

I recently wanted to move a system over to a faster, larger SSD. I didn’t want to have to re-install an OS, figure out which old files to transfer over, and then re-configure everything. That’s not a fun time in my book.

Here’s what I did (on a live system, yeah!) to clone my disk. Note that this may cause data loss, don’t blame me, keep backups, blah blah…

First, use a partition tool like GNU parted to create a nice big partition on the new drive and mark it as bootable. Leave some space for other partitions or swap space. If you use a separate /boot partition, then I think that needs the bootable flag instead. I’m only using a single root partition and swap. For the purposes of this tutorial, I’ll call my new root partition /dev/sdb1. YMMV.

Wait a while.

Take note of the UUID listed for /dev/sdb.

Or use whatever editor you like and put the UUID for /dev/sdb in place of the existing UUID for /.

Now you should just need to swap out the drives.