(Mis)Adventure With Spell Checking

June 24, 2017

Categories: Technical Tags: Linux

An Old Question

What’s the difference between aspell, ispell, myspell, gspell, gtkspell, hunspell, uspell, hspell? The answer always was: I dread to even care to know. Just one of those things it’s better to be willfully ignorant of.

Then suddenly I needed to know.

Spell checking for my native chat client

Lately I have grown very fond of weechat. I am not exactly like this guy, but I have always had an affinity for IRC. Is it dying? Sure. But a lot of chatting services coming en vogue are just well embellished IRC, it probably doesn’t take much to bridge the gap. Gitter works seamlessly here. Slack offers a gateway too I think. If you use BitlBee then suddenly you have an array of protocols supported through plugins. And with the much anticipated Matrix protocol in the offing, the possibilities are vast and weechat has a role to play right now.

But that’s IRC, I love weechat in particular. A software after my own heart, very modular and customisable. Can be extended through the fantastic scripting support (with Perl, Python, Ruby, Lua, Guile and tcl). I spent a bit of last night getting notification work through Dunst (and libnotify). There are fair number of options but I went with notifym.pl. It’s just simple, powerful and even kinda fun.

It’s no wonder that the spell checking support is also delegated to a plugin. Now, I tried weechat in Manjaro and it worked out of the box, but my daily driver is Void Linux and one of the things I love it for is its package granularity. Was hardly surprising that the plugins are neatly separated into subpackages. You only install what you need, and the dependencies are automatically accounted for. So I installed the plugin and idly observed that a new package called enchant was also installed.

Except it didn’t work. Doing /aspell listdict shows that it can’t even find a dictionary but I could swear that I have the package aspell and aspell-en installed. Doing aspell dicts in the shell reveals that I do have the dict files in the system and aspell can find them. In fact I use them in Emacs so why can’t weechat find them? This reeks of a path issue, but my system places the dict files in /usr/share/dict which is pretty standard, right?

The thick plottens

So what the hell? Instead of being logical, I went after a lot of red herrings, but that’s only obvious in retrospect. Not much was clear in the source of the plugin, except it seemed to be delegating the task of finding dictionaries to libenchant. More reading revealed enchant to be not a spellchecker in itself, but more of an intermediate layer that unifies all other incompatible programs under a stable API. Nice, now what’s wrong here?

Enchant was definitely not finding my aspell dicts. To drive home the point:

$ echo 'ths is misspllt' | enchant -a
@(#) International Ispell Version 3.1.20 (but really Enchant 1.6.0)
Couldn't create a dictionary for en_GB.UTF-8

Or in Python:

>>> import enchant
>>> enchant.listdicts()

I still didn’t know what to make of that. Also, why RTFM when you have got strace? /s

$ strace 2> log weechat
$ vim log

Strace notes down all the system calls a program makes during runtime. I asked aspell plugin to list dictionaries once again, and promptly exited weechat. Then went backward through log and immediately found something:

open("/home/natrys/.config/enchant/myspell", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/natrys/.enchant/myspell", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/share/myspell/dicts", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/share/myspell/dicts", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/share/enchant/myspell", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/share/hunspell", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

So those are where it looked. I don’t know what myspell is, though I have heard of hunspell. Enchant manual said it supports aspell and myspell, but no mention of hunspell. And also:

$ enchant-lsmod
myspell (Myspell Provider)

But my package manager shows there is no such thing as myspell in void repo. I hit google, the entry in Debian repo for it is a dummy transitional package for …. hunspell?! Wiki clarified, myspell was a predecessor of hunspell and the latter is backward compatible so things are kinda rotten. I installed a hunspell dict and now everything works. Now I see that void wiki even has a related entry, but it’s typically a bit threadbare so didn’t think of looking into it at first.

So, just two questions remain. Why do the weechat people call it ‘aspell’ plugin then? It makes no particular sense. I guess the answer is that initially they only supported aspell, hence the name. But at some later point switched to libenchant.

And secondly, enchant should still work with aspell. Why does it not? The answer turned out to be in the template that builds enchant, specifically here:

configure_args="--disable-zemberek --disable-ispell --disable-aspell --with-myspell-dir=/usr/share/hunspell"

Ah so my distro explicitly disables it. Found the exact pull request where that happened, it’s a couple of years already. I suppose the reasoning provided does make sense.

Here is a reddit thread that I wish I found sooner:

And for an alternative perspective which I am not sure what to make of:

In the end, things are still sometimes problematic. But solutions are usually also in plain sight.