Code archived articles

Subscribe to the RSS feed for this category only

Code and Debian and Free software and Maps and Openstreetmap25 Aug 2015 at 10:52 by Jean-Marc Liotier

There you are, in some Openstreemap editor, correcting the same typo for the 16th time, cursing contributors who neglect correct capitalization and thinking about how tedious this necessary data gardening is. While JOSM is endowed with unfathomable depths of cartographic potentiality, you long for a way to simply whip out your favourite text editor and apply its familiar power to the pedestrian problem of repeatedly editing text. Or the problem requires editing multiple mutually dependent tags and some XML-aware logic is therefore required – all the same: you just want to perform Openstreetmap editing as text processing.

Of course, as an experienced Openstreetmap gardener, you are well aware of the dangers of casually wielding a rather large chainsaw around our burgeoning yet fragile data nursery. So you understand why automated processing is generally not conducive to improvement in data quality – rare is the automation whose grasp of context equals human judgment. But human judgment could use some power tools… So there.

My overall workflow shall be as follow:

0 – Read the Automated Edits code of conduct
1 – Get data
2 – Edit data
3 – Review data
4 – Commit data

The meticulous reader might object that making the reviewing an explicit step separate from the editing is superfluous since no self-respecting cartographer would commit edited data without having performed a review as a mandatory step integral to edition. But the reader who closely observes Openstreetmap activity might counter that this level of self-disciplined care might not be universal, so the step is worth mentioning. Moreover, I’ll add that as soon as any level of automation is introduced, I consider the reviewing as a necessary checklist item.

So, first let’s get the data ! There are many ways… The normal JOSM way of course – but your mass edition requirement probably means that you wish to edit a body of data much larger than what the Openstreetmap servers will let JOSM download at once – and, if you ever had to repeatedly download rectangles until you have covered you whole working area, you don’t want to do it again.

To illustrate this article, I chose to edit places of worship in Senegal (I am a rather active Openstreetmap contributor for Senegal and places of worship are socially and cartographically important landmarks). This dataset is rather small – in such cases you might want to peruse Overpass Turbo. The relevant Overpass Turbo query is as follow:

[out:xml][timeout:25];
 {{geocodeArea:Senegal}}->.searchArea;
 (
 node["amenity"="place_of_worship"](area.searchArea);
 way["amenity"="place_of_worship"](area.searchArea);
 relation["amenity"="place_of_worship"](area.searchArea);
 );
 out meta;
 >;
 out meta qt;

Another option, viable even for targeting the whole planet, is to use Osmosis (package available from good distributions) to filter a planet extract:

wget http://download.geofabrik.de/africa-latest.osm.pbf
 osmosis \
 --read-pbf file=africa-latest.osm.pbf \
 --bounding-box top=16.7977 bottom=12.0832 \
 left=-17.6317 right=-11.162 \
 --tag-filter accept-nodes amenity=place_of_worship \
 --tag-filter reject-relations \
 --tag-filter reject-ways outPipe.0=nodesPipe \
 --read-pbf file=africa-latest.osm.pbf \
 --bounding-box top=16.7977 bottom=12.0832 \
 left=-17.6317 right=-11.162 \
 --tag-filter accept-ways amenity=place_of_worship \
 --tag-filter reject-relations \
 --used-node outPipe.0=waysPipe \
 --merge inPipe.0=nodesPipe inPipe.1=waysPipe \
 --write-xml senegal-place_of_worship.osm

Yes, I didn’t take relations into account – there are only a couple of amenity=place_of_worship relations in Senegal’s Openstreetmap data… So adding relations to this query is left as an exercise for the reader.

A gigabyte download and a couple of minutes of osmosis execution later, your data is ready and you have found new appreciation of how fast Overpass Turbo is. Our Osmosis computation might have been a little faster if there was a Senegal planet extract available, but we had to contend with taking the whole of Africa as an input and filtering it through a bounding box.

By the way, the dedicated reader who assiduously tries to reproduce my work might notice that the two methods don’t return the same data. This is because the Overpass Turbo query filters properly by intersection with Senegal’s national borders whereas my Osmosis command uses a rectangular bounding box that includes bits of Mauritania, Mali, Guinea and Guinea Bissau. One can feed Osmosis a polygon produced out of the national borders relations, but I have not bothered with that.

Examples OSM XML elements extracted by the osmosis query:

<way id="251666247" version="2" timestamp="2014-03-27T22:16:56Z"
      uid="160042" user="Jean-Marc Liotier" changeset="21354510">
   <nd ref="2578488987"/>
   <nd ref="2578488988"/>
   <nd ref="2578488989"/>
   <nd ref="2578488990"/>
   <nd ref="2578488991"/>
   <nd ref="2748583071"/>
   <nd ref="2578488987"/>
   <tag k="name" v="Grande mosquée de Ndandia"/>
   <tag k="source" v="Microsoft Bing orbital imagery"/>
   <tag k="amenity" v="place_of_worship"/>
   <tag k="religion" v="muslim"/>
   <tag k="denomination" v="sunni"/>
</way>

<node id="2833508543" version="2" timestamp="2014-09-01T09:57:07Z"
      uid="160042" user="Jean-Marc Liotier" changeset="25155955"
      lat="14.7069108" lon="-17.      4580774">
   <tag k="name" v="Mosquée Mèye Kane"/>
   <tag k="amenity" v="place_of_worship"/>
   <tag k="religion" v="muslim"/>
</node>

<node id="2578488987" version="1" timestamp="2013-12-13T17:49:40Z"
   uid="1219752" user="fayecheikh75" changeset="19436962"
   lat="14.2258174" lon="-16.8134644"/>

The first two example OSM XML elements will not come as a surprise: they both contain <tag k=”amenity” v=”place_of_worship”/> – but what about the third, which does not ? Take a look at its node id – you’ll find it referred by one of the first example’s <nd /> elements, which means that this node is one of the six that compose this way. Including nodes used by the ways selected by the query is the role of the –used-node option in the osmosis command.

But anyway, why are we including nodes used by the ways selected by the query ? In the present use-case, I only care about correcting trivial naming errors – so why should I care about the way’s geometry ? Well… Remember the step “3 – Review data” ? Thanks to being able to represent the way geometrically, I can visually find that an English-language name may not be an error because the node that bears it is located in Guinea Bissau, not in Senegal where it would definitely be an error outside of the name:en tag. Lacking this information I would have erroneously translated the name into French. Actually, I first did and only corrected my error after having reviewed my data in JOSM – lesson learned !

Talking about reviewing, is your selection of data correct ? Again, one way to find out is to load it in JOSM to check tags and geographic positions.

And while in JOSM, you might also want to refresh your data – it might have become stale while you were mucking around with osmosis (do you really think I got the query right the first time ?) and Geofabrik’s Planet extracts are only daily anyway… So hit Ctrl-U to update your data – and then save the file.

This concludes step “1 – Get data” – let’s move on to step ‘2 – Edit data’ ! First, do not edit the file you just saved: we will need it later to determine what we have modified. So produce a copy, which is what we’ll edit – execute ‘cp senegal-place_of_worship.osm senegal-place_of_worship.mod.osm’ for example.

Now take your favourite text processing device and go at the data ! I used Vim – here is how it looks like:

A few edits later:

$ diff -U 0 senegal-place_of_worship.osm senegal-place_of_worship.mod.osm \
  | grep ^@ | wc -l
 164

As an example of modification, let’s have a look at this node :

<node id="2165307529" version="1" timestamp="2013-02-21T13:40:05Z"
      uid="1234702" user="malamine19" changeset="15112373"
      lat="16.0326014" lon="-16.5084412">
   <tag k="name" v="Mosquée Sidy TALL"/>
   <tag k="amenity" v="place_of_worship"/>
   <tag k="religion" v="muslim"/>
</node>

A few keystrokes later, its name’s capitalization is fixed :

<node id="2165307529" version="1" timestamp="2013-02-21T13:40:05Z"
      uid="1234702" user="malamine19" changeset="15112373"
      lat="16.0326014" lon="-16.5084412">
   <tag k="name" v="Mosquée Sidy Tall"/>
   <tag k="amenity" v="place_of_worship"/>
   <tag k="religion" v="muslim"/>
</node>

Let’s open the file in JOSM and upload this awesome edit to Openstreeetmap – here we go !

“Whaaat ? No changes to upload ? But where are my edits ? You said we could just edit and upload !” – No, and anyway I said that you had to review your data beforehand !

Fear not, your edits are safe (if you saved them before closing your editor…) – it is only JOSM who does not know which objects you edited. Looking at the above data, it has now way to determine if any part has been edited. We’ll have to tell it !

When JOSM modifies any element of an Openstreetmap object, it marks the Openstreetmap object with an action=”modify” attribute. So, we’ll emulate this behaviour.

“Whaaat ? Do I really have to write or copy/paste action=”modify” on the parent Openstreetmap object of every single modification ? You said this article was about automation !” – fear not, I have you covered with this article’s crowning achievement: the OSMXML_mark_modified_JOSM-style script.

Remember when I’ll said earlier “First, do not edit the file you just saved: we will need it later to determine what we have modified. So produce a copy, which is what we’ll edit – ‘cp senegal-place_of_worship.osm senegal-place_of_worship.mod.osm’ ” ? We are now later and the OSMXML_mark_modified_JOSM-style script will not only determine what we have modified but also mark the parent Openstreetmap object of each modification with an action=”modify” attribute.

This blog needs a wider stylesheet, so no inline code in the article – read OSMXML_mark_modified_JOSM-style on Github instead and save me the paraphrasing of my own code ! This script owes everything to XML::SemanticDiff and XML::LibXML – it is a mere ten-line conduit for their blinding awesomeness so all credits go to Shlomi Fish and Kip Hampton.

So, just make sure that you have XML::SemanticDiff and XML::LibXML installed from the CPAN or preferably from your distribution’s packages and execute the command line:

OSMXML_mark_modified_JOSM-style \
    originalOSMfile.xml \
    locally_modified_originalOSMfile.xml

or in our current example

OSMXML_mark_modified_JOSM-style \
    senegal-place_of_worship.osm \
    senegal-place_of_worship.mod.osm

As a result, the parent Openstreetmap object of each modification will have been marked with an action=”modify” attribute – as our example object:

<node id="2165307529" version="1" timestamp="2013-02-21T13:40:05Z"
      uid="1234702" user="malamine19" changeset="15112373"
      lat="16.0326014" lon="-16.5084412" action="modify">
   <tag k="name" v="Mosquée Sidy Tall"/>
   <tag k="amenity" v="place_of_worship"/>
   <tag k="religion" v="muslim"/>
</node>

Now open the modified file in JOSM and review the result. As I mention in passing in the script’s comments: BLOODY SERIOUSLY REVIEW YOUR CONTRIBUTION IN JOSM BEFORE UPLOADING OR THE OPENSTREETMAP USERS WILL COME TO EAT YOU ALIVE IN YOUR SLEEP ! Seriously though, take care : mindless automatons that trample the daisies are a grievous Openstreetmap faux pas. The Automated Edits code of conduct is mandatory reading.

Ok, I guess you got the message – you can now upload to Openstreetmap:

If you spent too long editing, you might encounter conflicts. Carefully resolve them without stepping on anyone’s toes… And enjoy the map !

Incidentally, this is my first time using XML::LibXML and actually understanding what I’m doing – I love it and there will be more of that !

Code and Free software and Networking & telecommunications and Systems administration and Unix01 Mar 2011 at 20:06 by Jean-Marc Liotier

I loathe Facebook and its repressive user-hostile policy that provides no value to the rest of the Web. But like that old IRC channel known by some of you, I keep an account there because some people I like & love are only there. I seldom go to Facebook unless some event, such as a comment on one of the posts that I post there through Pixelpipe, triggers a notification by mail. I would like to treat IRC that way: keeping an IRC application open and connected is difficult when mobile or when using the stupid locked-down mandatory corporate Windows workstation, and I’m keen to eliminate that attention-hogging stream from my environment – especially when an average of two people post a dozen lines a day, most of which are greetings and mealtimes notifications. But when a discussion flares up there, it is excellent discussion… And you never know when that will happen – so you need to keep an eye on the channel. Let’s delegate the watching to some automation !

So let me introduce to you to my latest short script : bipIRCnickmailnotify.sh – it sends IRC log lines by mail when a specific string is mentioned by other users. Of course in the present use case I set it up to watch for occurrences of my nickname, but I could have set it to watch any other string. The IRC logging is done by the bip IRC proxy that among other things keeps me permanently present on my IRC channels of choice and provides me with the full backlog whenever I join with a regular IRC client.

This Unix shell script also uses ‘since’ – a Unix utility similar to ‘tail’ that unlike ‘tail’ only shows the lines appended since the last execution. I’m sure that ‘since’ will come handy in the future !

So there… I no longer have to monitor IRC – bipIRCnickmailnotify.sh does it for me.

With trivial modification and the right library it could soon do XMPP notifications too – send me an instant message if my presence is ‘available’ and mail otherwise. See you next version !

Code and Mobile computing and Social networking and The Web01 Sep 2010 at 13:58 by Jean-Marc Liotier

Twenty two days ago, my periodically running script ceased to produce any check-ins on Brightkite. A quick look at the output showed that the format of the returned place object had changed. Had I used proper XML parsing, that would not have been a problem – but I’m using homely grep, sed and awk… Not robust code in any way, especially when dealing with XML. At least you get a nice illustration of why defensive programming with proper tools is good for you.

So here is a new update of latitude2brightkite.sh – a script that checks-in your Google Latitude position to Brightkite using the Brightkite API and the Google Public Location Badge. Description of the whole contraption may be found in the initial announcement.

The changes are :

% diff latitude2brightkite_old.sh latitude2brightkite.sh
69,70c69,70
< id=`wget -qO- "http://brightkite.com/places/search.xml?q=$latitude%2C$longitude" | grep "<id>" | sed s/\ \ \<id\>// | sed s/\<\\\/id\>//`
< place=`wget -qO- "http://brightkite.com/places/search.xml?q=$latitude%2C$longitude" | grep "<name>" | sed s/\ \ \<name\>// | sed s/\<\\\/name\>//`
---
> id=`wget -qO- "http://brightkite.com/places/search.xml?q=$latitude%2C$longitude" | grep "<id>" | sed s/\ \ \<id\>// | sed s/\<\\\/id\>// | tail -n 1`
> place=`wget -qO- "http://brightkite.com/places/search.xml?q=$latitude%2C$longitude" | grep "<name>" | sed s/\ \ \<name\>// | sed s/\<\\\/name\>// | md5sum | awk '{print $1}'`

I know I should use a revision control system… Posting this diff that does not even fit this blog is yet another reminder that a revision control system is not just for “significant” projects – anything should use one and considering how lightweight Git is in comparison to Subversion, there really is no excuse anymore.

Back to the point… To get the place identifier, I now only take the last line of the field – which is all we need. I mdsum the place name – I only need to compare it to the place name at the time of the former invocation, so a mdsum does the job and keeps me from having to deal with accented characters and newlines… Did I mention how hackish this is ?

Anyway… It works for me™ – get the code !

Code and Design and Systems and Technology13 Apr 2010 at 16:27 by Jean-Marc Liotier

Following a link from @Bortzmeyer, I was leafing through Felix von Leitner’s “Source Code Optimization” – a presentation demonstrating how unreadable code is rarely worth the hassle considering how good at optimizing compilers have become nowadays. I have never written a single line of C or Assembler in my whole life – but I like to keep an understanding of what is going on at low level so I sometimes indulge in code tourism.

I got the author’s point, though I must admit that the details of his demonstration flew over my head. But I found the memory access timings table particularly evocative :

Access Cost
Page Fault, file on IDE disk 1.000.000.000 cycles
Page Fault, file in buffer cache 10.000 cycles
Page Fault, file on ram disk 5.000 cycles
Page Fault, zero page 3.000 cycles
Main memory access 200 cycles (Intel says 159)
L3 cache hit 52 cycles (Intel says 36)
L1 cache hit 2 cycles

Of course you know that swapping causes a huge performance hit and you have seen the benchmarks where throughput is reduced to a trickle as soon as the disk is involved. But still I find that quantifying the number of cycles wasted illustrates the point even better. Now you know why programmers insist on keeping memory usage tight.

Code24 Mar 2010 at 18:08 by Jean-Marc Liotier

In my inbox right now :

“I agree with you that is not logic to have some 0=OFF and 0=ON but this is the way is coded in this version; HQ will try to improve in next version”.

Can you imagine how someone thought it would be a jolly good idea to have “0” mean “OFF” or “ON” for different variables in the same context… That will make parameters management so much more fun !

Names withheld to protect the somewhat innocents.

Code and Design and Knowledge management and Social networking and The Web21 Aug 2009 at 16:01 by Jean-Marc Liotier

LinkedIn’s profile PDF render is a useful service, but its output lacks in aesthetics. I like the HTML render by Jobspice, especially the one using the Green & Simple template – but I prefer hosting my resume on my own site. This is why since 2003 I have been using the XML Résumé Library. It is an XML and XSL based system for marking up, adding metadata to, and formatting résumés and curricula vitae. Conceptually, it is a perfect tool – and some trivial shell scripting provided me with a fully automated toolchain. But the project has been completely quiet since 2004 – and meanwhile we have seen the rise of the hresume microformat, an interesting case of “less is more” – especially compared to the even heavier HR-XML.

Interestingly, both LinkedIn and Jobspice use hresume. A PHP LinkedIn hResume grabber part of a WordPress plugin by Brad Touesnard takes the hresume microformat block from a LinkedIn public profile page and weeds out all the LinkedIn specific chaff. With pure hresume semantic XHTML, you just have to add CSS to obtain a presentable CV. So my plan is now to use LinkedIn as a resume writing aid and a social networking tool, and use hresume microformated output extracted from it to host a nice CSS styled CV on my own site.

Preparing to do that, I went through the “hResume examples in the wild” page of the microformats wiki and selected the favorite styles that I’ll use for inspiration :

Great excuse to play with CSS – and eventually publish an updated CV…

Code and Mobile computing and Social networking and The Web17 Jun 2009 at 11:11 by Jean-Marc Liotier

I just released a new update of latitude2brightkite.sh – a script that checks-in your Google Latitude position to Brightkite using the Brightkite REST API and the Google Public Location Badge.

The changes are :

20090607 – 0.3 – The working directory is now a parameter
20090612 – 0.4 – Only post updates if the _name_ of the location changes, not if only the _internal BK id_ of the place does (contribution by Yves Le Jan <inliner@grabeuh.com>).
20090615 – 0.5 – Perl 5.8.8 compatibility of the JSON coordinate parsing (contribution by Jay Rishel <jay@rishel.org>).

Yves’ idea smooths location sampling noise and makes check-ins much more meaningful.

Thanks to Yves and Jay for their contributions ! Maybe it is time for revision control…

Code and Mobile computing and Social networking and The Web05 Jun 2009 at 21:43 by Jean-Marc Liotier

Tired of waiting for Google to release a proper Latitude API, I went ahead and scribbled latitude2brightkite.sh – a script that checks-in your Google Latitude position to Brightkite using the Brightkite REST API and the Google Public Location Badge. See my seminal post from yesterday for more information about how I cobbled it together.

Since yesterday I cleaned it up a little, but most of all, as promised, I made it more intelligent by having it compare the current position with the last one, in order to check-in with Brightkite only if the Google Latitude position has changed. Not checking-in at each invocation will certainly reduce the number of check-ins by 99% – and I’m sure that Brightkite will be thankful for the lesser load on their HTTP servers…

So grab the code for latitude2brightkite.sh, put it in your crontab and have more fun with Brightkite and Google Latitude !

There is quite a bit of interest for this script – it seems that I have filled a widely felt need.

Code and Mobile computing and Social networking and The Web05 Jun 2009 at 0:51 by Jean-Marc Liotier

Tired of waiting for Google to release a proper Latitude API, I went ahead and scribbled latitude2brightkite.sh – a script that checks-in your Google Latitude position to Brightkite using the Brightkite REST API and the Google Public Location Badge.

This script is an ugly mongrel hack, but that is what you get when an aged script kiddie writes something in a hurry. The right way to do it would be to parse Latitude’s JSON output cleanly using the Perl library. But that dirty prototype took me all of ten minutes to set up while unwinding between meetings, and it now works fine in my crontab.

Apart from Bash, the requirements to run this script are the Perl JSON library (available in Debian as libjson-perl) and Curl.

The main limitation of this script is that your Google Public Location Badge has to be enabled and it has to show the best available location. This means that for this script to work, your location has to be public. The privacy conscious among my readers will surely love it !

This script proves that automatic Google Latitude position check-in in Brightkite can be done, it works for me, and the official Google Latitude API will hopefully soon make it obsolete !

Meanwhile, grab the code for latitude2brightkite.sh, put it in your crontab and have more fun with Brightkite and Google Latitude… To me, this is what both services were missing to become truly usable.

Of course, doing it with “XEP-0080 – User Location” via publish-subscribe (“XEP-0080 – PubSub” would make much more sense than polling an HTTP server all the time, but we are not there yet. Meanwhile this script could be made more intelligent by only checking in with Brightkite if the Google Latitude position has changed. I’ll think about it for the next version…

Code and Debian and Free software and Knowledge management and RSS and Social networking and Systems and Unix18 May 2009 at 12:15 by Jean-Marc Liotier

If you want to skip the making-of story, you can go straight to the laconica2IRC.pl script download. Or in case anyone is interested, here is the why and how…

Some of my best friends are die-hard IRC users that make a point of not touching anything remotely looking like a social networking web site, especially if anyone has ever hinted that it could be tagged as “Web 2.0” (whatever that means). As much as I enjoy hanging out with them in our favorite IRC channel, conversations there are sporadic. Most of the time, that club house increasingly looks like an asynchronous forum for short updates posted infrequently on a synchronous medium… Did I just describe microblogging ? Indeed it is a very similar use case, if not the same. And I don’t want to choose between talking to my close accomplices and opening up to the wider world. So I still want to hang out in IRC for a nice chat from time to time, but while I’m out broadcasting dents I want my paranoid autistic friends to get them too. To satisfy that need, I need to have my IRC voice say my dents on the old boys channel.

The data source could be an OpenMicroblogging endpoint, but being lazy I found a far easier solution : use Laconi.ca‘s Web feeds. Such solution looked easier because there are already heaps of code out there for consuming Web feeds, and it was highly likely that I would find one I could bend into doing my bidding.

To talk on IRC, I had previously had the opportunity to peruse the Net::IRC library with great satisfaction – so it was an obvious choice. In addition, in spite of being quite incompetent with it, I appreciate Perl and I was looking for an excuse to hack something with it.

With knowledge of the input, the output and the technology I wanted to use, I could start implementing. Being lazy and incompetent, I of course turned to Google to provide me with reusable code that would spare me building the script from the ground up. My laziness was of course quick to be rewarded as I found rssbot.pl by Peter Baudis in the public domain. That script fetches a RSS feed and says the new items in an IRC channel. It was very close to what I wanted to do, and it had no exotic dependancies – only Net::IRC library (alias libnet-irc-perl in Debian) and XML::RSS (alias libxml-rss-perl in Debian).

So I set upon hacking this script into the shape I wanted. I added IRC password authentication (courtesy of Net::IRC), I commented out a string sanitation loop which I did not understand and whose presence cause the script to malfunction, I pruned out the Laconi.ca user name and extraneous punctuation to have my IRC user “say” my own Identi.ca entries just as if I was typing them myself, and after a few months of testing I finally added an option for @replies filtering so that my IRC buddies are not annoyed by the noise of remote conversations.

I wanted my own IRC user “say” the output, and that part was very easy because I use the Bip an IRC proxy which supports multiple clients on one IRC server connection. This script was just going to be another client, and that is why I added password authentication. Bip is available in Debian and is very handy : I usually have an IRC client at home, one in the office, occasionally a CGI-IRC, rarely a mobile client and now this script – and to the dwellers of my favorite IRC channel there is no way to tell which one is talking. And whichever client I choose, I never missing anything thanks to logging and replay on login. Screen with a command-line IRC client provides part of this functionality, but the zero maintainance Bip does so much more and is so reliable that one has to wonder if my friends cling to Irssi and Screen out of sheer traditionalism.

All that remained to do was to launch the script in a sane way. To control this sort of simple and permanently executed piece of code and keep it from misbehaving, Daemon is a good way. Available in Debian, Daemon proved its worth when the RSS file went missing during the Identi.ca upgrade and the script crashed everytime it tried to access it for lack of exception catching. Had I simply put it in an infinite loop, it would have hogged significant ressources just by running in circles like a headless chicken. Daemon not only restarted it after each crash, but also killed it after a set number of retries in a set duration – thus preventing any interference with the rest of what runs on our server. Here is the Daemon launch command that I have used :

#!/bin/bash
path=/usr/local/bin/laconica2IRC
daemon -a 16 -L 16 -M 3 -D $path -N -n laconica2IRC_JML -r -O $path/laconica2IRC.log -o $path/laconica2IRC.log $path/laconica2IRC.pl

And that’s it… Less cut and paste from Identi.ca to my favorite IRC channel, and my IRC friends who have not yet adopted microblogging don’t feel left out of my updates anymore. And I can still jump into IRC from time to time for a real time chat. I have the best of both worlds – what more could I ask ?

Sounds good to you ? Grab the laconica2IRC.pl script !

Code and Knowledge management and Social networking and Technology29 Jan 2009 at 15:46 by Jean-Marc Liotier

I sometimes get requests for help. Often they are smart questions and I’m actually somewhat relevant to them – for example questions about a script or an article that I wrote, or an experience I had. But sometimes it is not the case. This message I received today is particularly bad, so I thought it might be a public service to share it as an example of what not to do. This one is especially appalling because it comes not from some wet-behind-the-ears teenager to whom I would gracefully have issued a few hints and a gentle reminder of online manners, but from the inside of the corporate network of Wipro – a company that has a reputation as a global IT services organization.

From: xxxxxx.kumar@wipro.com
Subject: Perl
To: jim@liotier.org
Date: Thu, 29 Jan 2009 16:22:32 +0530

Hi Jim,

Could you please help me in finding out the solution for my problem. Iam new to perl i have tried all the options whatever i learned but couldn’t solve. Please revert me if you know the solution.

Here is the problem follows:

Below is the XML in which you could see the lines with AssemblyVersion and Version in each record i need to modify these values depending on some values which i get from perforce. Assuming hardcode values as of now need to change those values upon user wish using Perl. Upon changing these lines it should effect in existing file .

<FileCopyInfo>
<Entries>
<Entry>
<DeviceName>sss</DeviceName>
<ModuleName>general1</ModuleName>
<AssemblyVersion>9</AssemblyVersion>
<Language>default</Language>
<Version>9</Version>
<DisplayName>Speech – eneral</DisplayName>
<UpdateOnlyExisting>false</UpdateOnlyExisting>
</Entry>
<Entry>
<DeviceName>sss</DeviceName>
<ModuleName>general2</ModuleName>
<AssemblyVersion>9</AssemblyVersion>
<Language>default</Language>
<Version>9</Version>
<DisplayName>Speech – recog de_DE</DisplayName>
<UpdateOnlyExisting>false</UpdateOnlyExisting>
</Entry>
</Entries>
</FileCopyInfo>

Thanks & Regards,
Xxxxxxx

From what I gather from the convoluted use of approximative English, the problem is about changing the value of two XML elements in a file. Can anyone believe that this guy has even tried to solve this simple problem on his own ? It is even sadder that he tries to obtain answers by spamming random strangers by mail, soliciting answers that will never be shared with the wider world. Least he could have done is posting his message on a Perl forum so that others with similar questions can benefit from the eventual answer.

Had he performed even a cursory Google search, he would have found that one of his compatriots has done exactly that and gotten three different answers to a similar question, letting him choose between XML::Twig, XML::Rules and XML::Simple. These are just three – but the Perl XML FAQ enumerates at least a dozen CPAN modules for manipulating XML data. The documentation for any of them or the examples in the FAQ would also have put him on the track to a solution.

Everyone can be clueless about something and learning is a fundamental activity for our whole lives. But everyone can do some research, read the FAQ, ask smart questions and make sure that the whole community benefits from their learning process, especially as it doe not cost any additional effort. Knowledge capitalization within a community of practice is such an easy process with benefits for everyone involved that I don’t understand why it is not a universally drilled reflex.

The funny part is that while I’m ranting about it and wielding the cluebat over the head of some random interloper, I realize that the same sort of behavior is standard internally in a very large company I know very well, because a repository of community knowledge has not even been made available for those willing to share. Is there any online community without a wiki and a forum ?

Ten years ago I was beginning to believe that consulting opportunities in knowledge management were drying up because knowledge management skills had entered the mainstream and percolated everywhere. I could not be more wrong : ten years of awesome technological progress have proved beyond reasonable doubt that technology and tools are a peripheral issue : knowledge management is about the people and their attitudes; it is about cooperation. This was the introduction of my graduation paper ten years ago, with the prisoner’s dilemma illustrating cooperation issues – and it is today still as valid as ever.

Code and Systems administration and Unix and VOIP01 Jun 2008 at 18:55 by Jean-Marc Liotier

Ever since the Linux Advanced Routing & Shaping HOWTO introduced it, I have been a big fan of the Wondershaper, a traffic shaping script that drives Linux‘s class based queuing with stochastic fairness queuing (SFQ) in a pretty effective attempt at maintaining low latency for interactive traffic while at the same time maintaining high throughput. There is even a ‘wondershaper’ Debian package that includes some additional polish. This script is key to the joy of perfectly responsive SSH sessions while peer to peer file sharing traffic saturates the uplink.

Some people have even concluded the resulting quality of service is good enough for voice traffic. But even with the Debian Wondershaper ruling my ADSL link I noticed that SIP and IAX still suffer too much packet loss with the saturating traffic occupying the background. I needed better traffic control.

As usual, being a late adopter I am not the only one to have hit that obstacle, and solutions have already been put forth. After rummaging through various mutations, I found Robert Koch’s version of the Wondershaper for the Asus WL-xxx documented on the Wondershaper package page of the WL-500G wiki to be quite promising. Compared to the standard version it prioritizes VOIP traffic by source port for idiot proof configuration, but also by type of service which is much more flexible and can be used thanks to Asterisk being capable of correctly setting TOS fields. As a bonus, using TOS also makes this version of the script capable of distinction between console interactive SSH traffic and bulk SCP traffic using the same protocol and port. And to top it all, it is based on the better hierarchical token bucket (HTB) discipline which is standard since Linux 2.4.20 while the Debian Wondershaper version uses the more based queuing which used to be the more widespread one.

The first shortcoming I found is that it prioritizes SIP and RTP but not IAX and others which I’ll have to add using the SIP stanzas as templates. The other is that taking lists of low priority ports as arguments could make the command line messy and configuration puzzling for the inexperienced user, so I prefer to have this configuration item as a documented variable allocation inside the script. But those are trifles compared to the new VOIP support, enhanced SSH discrimination and overall upgrade.

Hacking on the script I couldn’t resist reorganizing a few things. I originally intended to provide a diff, but that would be pointless since I ended up touching most of the lines. Also be warned that I do not understand why putting ‘prio 1’ everywhere makes the script work whereas other ‘prio’ values at various places made traffic end up in the wrong class and did not make sense at all. In effect, I think that by putting ‘prio 1’ everywhere I just eschewed the use of priority bands inside the classes, which is just fine with me for the intended use. But this show that my tc fluency is still limited and that there are therefore surely ways to enhance this script. I’ll also welcome feedback – whether it works for you or not.

Anyway – it works ! I had a few VOIP conversations across an IAX trunk with lots of background traffic on the uplink and no perceptible effects on voice quality. Life is good. Now that I have removed the last obstacle to taking full advantage of VOIP at home. Soon all my traffic will be routed through Asterisk and there shall be no more RJ11 nor their French T-sockets alter ego in my home.

Here is my modified wondershaper script in all its glory – contrary to Robert Koch’s version it is a drop-in replacement for Debian’s package. Inheriting from the original Wondershaper it is licensed under the GPL so enjoy, modify and share !

Code and Jabber and Music08 May 2008 at 2:27 by Jean-Marc Liotier

Now that in its 2.0 incarnation Ejabberd supports publish-subscribe and therefore personal eventing, it is time to play with it and demonstrate to the wider world the marvellous use-cases that the future holds. A nice first one that should be popular and therefore useful for propaganda purposes is using Psi so that contacts can see in your presence status the music that you are playing. I stumbled upon an Amarok script that notifies Psi’s through the tune file interface and lets Psi publish the currently playing song status via PEP – and it looked good.

PEP is defined “XEP-0163: Personal Eventing via Pubsub“. And Pubsub is defined by “XEP-0060: Publish-Subscribe“. So far so good. But digging around a bit I learned about “XEP-0118: User Tune” and then it dawned on me that there appeared to be room for improvement : the script outputs a composite “tune” element which is a radical simplification of the schema specified in XEP-0118.

So I had a go at modifying the script to get it as close to the specification as possible. You can judge of the resulting output for yourself : not quite XEP-0118 compliant but a good step in that direction.

The source code is available from the usual dump, but if you are an Amarok and Psi user you might actually want to use the Amarok script package that installs and runs in a coupe of clicks – thanks to the previous authors whose work I built upon.

While I was at it I discovered a bug that causes Psi 0.11 to use the element tag “source” to contain the album information, so I promptly provided the psi project team with the trivial patch needed.

It is is 3:30 AM and a few hours ago I did not realize that upgrading Ejabberd would get me that far for today…

Code and Social networking11 Apr 2008 at 11:12 by Jean-Marc Liotier

As far as I have looked, is no working FQL console application (I just tested the four FQL consoles that are published in the applications directory on Facebook but they either don’t load or crash on query). Although Facebook mentions that one is supposed to exist in the “Tools” page, there is actually none there at the moment. I guess I’ll have to build a small PHP application for playing with FQL.

My immediate practical goal is to be able to select members of two different groups. The query should be something like ‘SELECT uid FROM group_member WHERE gid=my_gid AND uid in (SELECT uid FROM group_member WHERE gid=my_other_gid)’ – for example to cross special interest groups or geographical areas.

There is plenty of potential for useful data mining that is not exposed by Facebook’s default interface. Search with multiple criteria of the same category is an obvious need for finding interesting people. Maybe did Facebook decide that the cost of additional clutter was not worth it for the average user. Or maybe they would prefer that the users don’t realize how much information can emerge from mining their data…

Code and PHP and Systems14 Aug 2007 at 14:35 by Jean-Marc Liotier

Since I began playing with Net_SmartIRC, I found a new way to put that library to work : a Munin plugin script to monitor the number of users in an IRC channel.

Here is an example of the graphical output provided by Munin :

As you can see, the Debian IRC channel is a very crowded place ! You may also notice small gaps in the data : the script sometimes fails on a refused connection, and I have not elucidated the cause. But as the graph shows, I have coded the script so that those failure cases only result in a null output, which Munin handles well by showing a blank record.

Because my lacking skills and crass lazyness prevented me from writing it all in a single language, I hacked that plugin by simply patching together the parts I could produce rapidly :

The PHP script is uses Net_SmartIRC which is available in Debian as php-net-smartirc. It must be configured by modifying the hardcoded server and channel – that may not be what is best in production use, but for the moment it works for me. Here is the full extent of the PHP code :

< ?php
include_once('/usr/share/php/Net/SmartIRC.php');
$irc = &new Net_SmartIRC();
//$irc->setDebug(SMARTIRC_DEBUG_ALL);
$irc->setUseSockets(TRUE);
$irc->setBenchmark(TRUE);
$irc->connect('irc.eu.freenode.net', 6667);
$irc->login('usercount', 'Users counting service for Munin monitoring',
'0', 'usercount');
$irc->getList('#test_channel');
$resultar = $irc->objListenFor(SMARTIRC_TYPE_LIST);
$irc->disconnect();
if (is_array($resultar)) {
    echo $resultar[0]->rawmessageex[4];
} else {
}
?>

The irc_channel_users Bash script is also quite simple. Apart from the barely modified boilerplate adapted from other simple Munin bash scripts, the specific meat of the script is as follow :

work_directory=/home/jim/applications/munin/irc_channel_users
php_interpreter=`which php`
user_population=`$php_interpreter $work_directory/irc_channel_users.php
 | awk -F"#" '{print($1)}' | grep -e '^[0-9]+$'`
echo -n "population.value "
echo $user_population

As you can see, the munin bash script is mostly about setting a few Munin variables, calling the php script and formatting the output.

Here are sample outputs :

15:32 munin@kivu /etc/munin/plugins% ./irc_channel_users autoconf
yes

15:32 munin@kivu /etc/munin/plugins% ./irc_channel_users config
graph_title #b^2 IRC channel users
graph_args --base 1000 -l 0
graph_vlabel population
graph_scale no
population.label users

15:32 munin@kivu /etc/munin/plugins% ./irc_channel_users
population.value 6

No demonstration is available on a public site, but the above graph is about all there is to know about the output of this plugin.

The code resides on its own page and updates if they ever appear shall be stored there.

This experience taught me that coding basic Munin plugins is fun and easy. I will certainly come back to it for future automated graphing needs.

And for those who wonder about the new syntax highlighting, it is produced using GeSHi by Ryan McGeary‘s very nice WP-Syntax WordPress plugin.

Code and PHP and RSS27 Jul 2007 at 0:53 by Jean-Marc Liotier

Since my migration to PHP 5 revealed a problem with the ancient Lilina 0.7 my interest in Lilina has been rekindled. As I said, I am quite hopeful because I would like to keep using Lilina for small aggregations and avoid deploying the more complex Gregarius where its better scalability is not needed. What first attracted me toward Lilina still holds true.

I started by checking out the development version from Google Code and I deployed it on a handful of small aggregations such as my personal feed. I then immediately started gently pestering the project lead about a few bugs, first by mail and then using the nice Google Code Lilina issue tracker. Ryan Mc Cue has been very responsive and very nice. I look forward making my small contributions in helping him bring Lilina toward 1.0

A year has passed since Ryan announced his intentions. He has already done a lot but his roadmap for Lilina 1.0 is very ambitious so there is still a lot of work remaining. At the current pace it may take a few more months, but this is a journey I am eager to follow : I like Lilina and since I am a novice PHP coder I am quite happy whenever I find a project I like and to which I can give a little help. I am actually quite proud of being mentioned in the credits for having brought RSS output to Lilina : it was the first time I had some of my code included in a significant project.

Next Page »