Hacking


7
Jan 10

Spanish and Galician dictionaries for Vim 7

If you write long, literary text (like end-user documentation) in Spanish and/or Galician, and Vim 7 is your editor of choice, you may want to download the spell checker dictionaries for those languages. To use them, drop the files under ~/.vim/spell. (By the way, you may need to create the directory if needed).

To use the dictionaries, just type in the needed Ex command, e.g. for Galician that would be:

  :set spell spelllang=gl

If you use modelines in your text files, you may want to add those settings there as well. That makes an easy way to choose a different language for each file. Also, do not forget to take a look at the spell checker documentation to learn more about it (tip: some keybindings are really useful).

Note that you will only be able of editing UTF-8 texts. I have not crafted ISO-8859-1 versions of the dictionary tables because no single person should be using an encoding different from UTF-8 nowadays (for a number of good reasons). If someone has a strong need for ISO-encoded tables, please let me know.

Last but not least, let me stress that the dictionaries were converted from the ones used by OpenOffice.org plus some small patches I took from the Vim SVN repository. Big thanks go to the all the people working in both projects. I am not an expert with legal stuff, but as the source files are under the LGPL I think it is safe to assume that the Spanish and Galician dictionaries generated from them are LGPL’d, too.

Remember: it is always good to deliver well-written documents. Happy 2010 ;-)


14
Jul 09

Embedding widgets from other processes with PyGTK

After discovering the Surf minimalistic web browser, I was just curious on how difficutl would it be to use the XEmbed protocol to wrap it into another application. It is far easier than I thought initially by using the gtk.Socket class which does implement the protocol in a convenient way, for example using the following Python code:

import gtk, sys
 
socket = gtk.Socket()
window = gtk.Window()
window.set_title(u"Embedded widget")
window.add(socket)
 
# Embed *after* inserting the socket in a window!
socket.add_id(int(sys.argv[1]))
window.show_all()
gtk.main()

Save this in a file (e.g. embed.py) and now you can run Surf the following way:

# Running surf with -e will print the X window ID
surf -e -u http://google.com &
python embed.py <window-id>

Easy, wasn’t it? ;-)

(Thanks go to Claudio for pointing out the gtk.Socket/gtk.Plug classes)


8
Jul 09

Making Claws-Mail look better

Those of you who use Claws-Mail in a daily basis and like to tune-up how your Gnome desktop looks by means of the fine themes which are available for it, for sure have noticed the weird, ugly 2-pixel spacing between toolbars and the window border. This not only unpleasant to see, but totally breaks some dark themes (e.g. Dust). The bad news: the padding is hardwired in the Claws source code. The good news: there is a small patch I made which will make your day happier :-D

Update: The patch has been already included in the Claws-Mail repository. Thanks you guys!


3
Jul 09

Light charset detection (mostly CJK)

What happens when you have a pile of text you want to convert to a sane encoding like UTF-8 and you do not know which encoding is being used? In general, you have two options:

  • Trying all possible encodings. This may be more or less difficult depending on the language in which the text is written: some languages can be written in a number of encodings. For example encoding covering cyrillic characters is a mess: Macintosh Russian encoding, Windows CP1251, KOI-8 (and several variants of it), ISO 8859-5…
  • Asking the author of the text. This may not be feasible at all, as you may even happen to not know who the author is :-(

But there is another option: detect it programmatically. This one of the things that Enca can do for a variety of languages. But, just for a second, imagine that you want a similar funcionality using a lighther approach, and you are mostly handling Unicode and CJK text (Chinese-Japanese-Korean) in different encodings, and you prefer a lighter solution than Enca… enter GChardet, a wrapper on top of the Mozilla encoding detection routines (as used e.g. by Firefox) with a plain C interface designed to blend nicely with code using Glib.

This is a nice hack I did in a couple of hours by adding some definitions in fake header files, because the detection code is not totally isolated from the rest of the Mozilla code base. Also, to provide the C-only API I had to make some subclassing and override a pair of methods. After that, adding the “G-frienly” API on top was straightforward. The thing I like most about this solution is that it can be compiled to a small library of ~120kB in amd64, and the original Mozilla sources were not touched at all.

Just in case this could be useful for someone else, I have uploaded it to Gitorious. Feel free to clone the repository, use it, and provide feedback. By the way: as this uses Mozilla code, I have set the license to MPL.