The trashiest blog in the World...

Aller au contenu | Aller au menu | Aller à la recherche

French documentation, IRC and searching

bochecha on #fedora-fr tells me that would be a good idea to translate my last blog post in english. So, here it is! Thanks to him for his help on the present translation :-)

In march 2008, eponyme annouced arrival on the french IRC channels of a bot he has developped : trustyRC.

For about two years now, trustyRC has endlessly answered to users requests on the french documentation, on the FAS (Fedora Account System), ...

But he's now tired. eponyme is off on new adventures, and two major issues remains with trustyRC:

  • FAS datas has to be updated by hand ; that was rarely achieved (someone has to think of it, and have the guts, we all know that!),
  • search within French Fedora's documentation only asks to Mediawiki and gets an HTML result, although everyone knows that MediaWiki base search is very rather limited, and is simply not functionnal. As such, search results are not as relevant as we would like them to be, but let's not blame that poor trustyRC.

Recently, I've been taking a close look at Apache Solr (a Lucene-based search solution); I've also recently added php-pecl-solr extentsion in Fedora's repositories

solr.jpg

My goal was to index french documentation wiki ; because I know quite well datas and queries (at least from IRC) that are perfomed; that was a good comparison point for me.
Result is quite impressive, Solr's search power is really not comparable to the one a "simple" PHP application like MediaWiki can provide. Solr can, among others, remove special characters (like our beloved 'é', 'è', or 'à' ;-) ), lower case characters, split terms, highlighting, faceting, ... For example, a search on terms like réseau, reseau and network would produce - on the index I'm working on - the same results.

In order to test that indexation and search system, I needed a public querying interface. That was a good opportunity to make some tests against several IRC bots. I've decided to not contribute to trustyRC mainly because I do not have required skills. Instead, I've taken a look and tested several existing python bots; I found a few but only Supybot really satifies me (in fact, that was the only one that did not reconnect every five minutes to freenode network :( ).

The result? A Supybot plugin for French documentation, connected on French IRC Fedora channels, with the name MrBot!

External plugins loaded into this Supybot instance are:

  • the French documentation search plugin (developped by myself, sources are available under BSD license),
  • an (X)HTML validation plugin, just for fun (I've developped it as well, its based on Phenny validation plugin, and code source is also available under BSD license),
  • the Fedora plugin you could use to query FAS and know, for example, who maintains a specific package,
  • the Koji plugin which give some informations against Koji builders,
  • the Bugzilla plugin that displays details on each valid bugzilla URL entered (or with a string like bug #{bug number}).

MrBot usage for the documentation stands as follows:

  • .wiki what I search: search in wiki titles. If that did not return any results, then it will perfom a plain text search automatically,
  • .wiki plain what I search: force plain text search only,
  • .wiki solr {solr query}: query with a Solr request (principally for my testing usage).

Searching with the wiki command will return two links maximum, not showed results count will also be returned.

For most functions, it is possible to ask MrBot in private:

  • /msg MrBot wiki what I search

To check an URL validity against W3C validator:

  • .validate blog.ulysses.fr
  • .validate http://blog.ulysses.fr

Fedora services querying:

  • .whoowns package: returns package maintainer name (FAS)
  • .fas fasname: returns FAS account informations for the user. You'd probably use these command in private and not on a channel.
  • .branches package: returns active baracnhes list for specified package
  • .what package: returns a brief package description
  • .list Fedora: shows available commands for Fedora plugin