altre destinazioni

ultimi post

ultimi commenti

tag principali

archivi

powered by

  • WPFrontman + WP

friends

copyright

  • © 2004-2011
    Ludovico Magnocavallo
    tutti i diritti riservati

Reports from the past - The Gibigianna/1

23 agosto 2003

my grandmother is the girl in the centre of the picture

Since I’m a bit fed up with programming, but as usual the idea of going to sleep (well, actually of lying in bed with a book, currently the 7th of the 26 in Terry Pratchett’s Discworld series) does not look so great, I might as well start putting on my blog my family’s ancient pictures. Ancient because that’s what came out writing in (more or less) American English on a computer, but thinking about it stuff from the early 1900 should be considered old not ancient.

Scanning all the family pictures scattered around my relatives’ homes is an old pet project of mine, one I started last year but never managed to stick to. I hope this will be an incentive towards scanning a few more pictures and letters (especially my paternal grandfather’s sometimes humorous letters from the first World War, a good number of them written to his parents by his attendant on pre-signed — by my grandfather — stationery, or so the family lore says).

I am starting this fragmented slideshow with the gibigianna. It’s a regional (the region being Lombardy, in Italy, where Milan is), pretty disused word meaning

  • a glitter of light reflected from a mirror or glass
  • humorous, for a woman who displays elegance

A beautiful word, isn’t it? It’s a special language where you have a word for a glitter of light reflected from a mirror, whose other meaning has to do with a woman’s ostentatious display of elegance (but then I suspect most languages have beautiful, very special words). Not very politically correct, but it fits perfectly my grandmother’s pictures.

In my grandmother’s case, the gibigianna was a sort of party where young people dressed in costumes and reenacted historical or literal scenes (or so my aunt tells me). My paternal grandmother was the daughter of a noblewoman (who unfortunately does not appear to have been one of the heirs to her family’s huge fortune or at least to part of it), and had relatives among some of Milan’s noble families, and so got to take part to this kind of events.

I have always found these pictures wonderful, and appalling. Imagine a few teenagers (one of the pictures has 20 people in it) that spend a day dressing up in stage-quality costumes, only to reenact historical scenes in their family’s private theater inside their villa. Not only that, but they also have silver ashtrays made for the event.

Full coverage of the Gibigianna coming soon, if I manage not to get distracted by something else, as usual. =)

update: the second and last part of this Report from the Past is now online. All the images of the Gibigianna are available in a separate directory (since getting text wrapping around multiple images in a blog page is a tedious and error prone process).

PHPTAL (re)visited

20 agosto 2003

As promised in a previous entry, I managed to give a look at PHPTAL. Well, not a serious, in-depth look at the implementation, more like a sort of I like TAL let’s see if this is good enough look. Okay, okay, I can see your noses wrinkling, let me explain this a bit.

My involvement with PHP, which led me to develop the PEAR DB LDAP driver and maintaining for a while the Interbase driver, plus using it for lots of personal and work projects, came to an abrupt halt last year for two reasons: I got a new job for a major Italian bank managing a group of consultants, so less or no development; I fell in love with Python, so I began using it more and more for my personal projects.

Lately I started developing again at work, partly because I like it more than managing people (managing a development team is better, but unfortunately we delegate this sort of stuff to consultants), partly because the current economic climate (and plenty of available spare time at work) lets me convince the bosses from time to time to try and develop some of our projects in-house.

Thus my involvement with PHP began again, since some of the stuff we’re rewriting or refactoring is already done in PHP, and very few things (none that I know of) can beat PHP web applications in speed of development/deployment and performance. Performance is in fact one of our major issues, since the applications I am working on are used daily by a good part of our 60.000 internal users. The other major issue related to PHP and its ancillary libraries/classes is reliability. We’re not working on financial applications but on simple intranet stuff like our internal telephone directory. Simple stuff on which however some of our business processes depend, since the only way of timely reaching somebody is often to call him on his mobile phone, be it for troubleshooting an “important” application or process or to schedule a meeting, and one of the few ways to know who deals with what (if the what is something you don’t know anything about in your usual tasks) is resorting to the telephone/organizational units directory.

Thus my good enough criteria means good enough for our loads, and reliable enough not to expose strange or random behaviour.

For templating we currently use a mix of the old PHPlib template class (used by our consultants), and my rewrite of it (in the new projects). It’s simple, it’s fast, it allows you to separate the logic from the presentation pretty well, although not as much as TAL does. So yesterday morning I spent a few minutes with a colleague trying to benchmark PHTAL and see if it is fast enough to try and develop something with it.

The results were pretty much what I expected: PHPTAL is too slow to be used for our applications. What I did not expect, however, was the order of magnitude of its slowness compared to what we are using (more on that later). Not satisfied with the (basic, but sufficient for our needs) performance tests, I did a quick and dirty reliability test by comparing PHPTAL with my reference TAL implementation, the Python library simpleTAL. I was pretty surprised to discover that simpleTAL is slightly faster than PHPTAL, and that it spits out warnings if you try to use the TAL templates used in the PHPTAL examples. This did not sound good for PHPTAL’s quality.

So tonight I did a bit of reading around the PHPTAL documentation, and was pretty surprised to learn that PHPTAL requires a separate Types library that define new data types on top of the (perfectly complete, in PHP’s context) native PHP ones. Urgh! I am allergic to too many abstractions (what Joel of Joel On Software fame calls leaky abstractions), and this looks definitely like a bad case of abstractitis. What’s the need of a reference helper? Every decent PHP developer should know his way around references, they’re not so hard (well, after you bang your head against the wall a few times in frustration but decide to stick with it). What’s the need of creating an Iterator interface on top of PHP’s very good, feature-rich, and fast arrays? Something like this does not look like sensible PHP code to me:

require_once 'Types/Iterator.php';
$i = $iterable->getNewIterator();
while ($i->isValid()) {
    $value =& $i->value();
    $i->next();
}

It is suspiciously similar in functionality to this:

$i = array('a'=>'b', 'c'=>'d');
foreach ($i as $k => $v)
    $value =& $v;

Two lines less, more clarity, more speed, less abstractions. Of course, this is only my opinion. Well, back to the original topic of this entry, testing.

For the tests, I tried to use one of the templates used in the PHPTAL documentation. Since it sports invalid TAL syntax according to simpleTAL (the Python library), I had to drop a row which was used only as a placeholder anyway. The resulting template looks like this:

<?xml version="1.0"?> 
<html>
<head>
    <title tal:content="title">place for the page title</title>
</head>
<body>
<h1 tal:content="title">sample title</h1>
<table>
<tr>
    <td>name</td>
    <td>phone</td>
</tr>
<tr tal:repeat="item users"> 
    <td tal:content="item/matricola">matricola</td>
    <td tal:content="string: ${item/cognome} ${item/nome}">item name</td>
    <td tal:content="item/telefono">item phone</td>
</tr>
</table>
</body> 
</html>

The data is in a separate php file that simply declares a “$users” array composed of 200 elements, each an associative array with the required fields. To time results, I used the very good PEAR Benchmark_Timer class by Sebstian Bergmann.

The sample PHPTAL code looks like this:

#!/usr/bin/php -q
<?php
require_once 'users.php';
require_once 'Benchmark/Timer.php';

$t =& new Benchmark_Timer(true);

require_once "HTML/Template/PHPTAL.php";
$t->setMarker('post-require');

$tpl =& new PHPTAL("tal_template.xml");
$t->setMarker('post-template');

$tpl->set("title", "Test Page");
$t->setMarker('post-set-title');

$tpl->setRef("users", $users);
$t->setMarker('post-set-users');

$res = $tpl->execute();
$t->setMarker('post-execute');

if (PEAR::isError($res))
    echo $res->toString(), "n";
?>

Running this test the first time gives:

----------------------------------------------------------------------
marker             time index            ex time               perct
----------------------------------------------------------------------
Start              1061421217.91528700   -                       0.00%
----------------------------------------------------------------------
post-require       1061421217.93816100   0.022874                5.79%
----------------------------------------------------------------------
post-template      1061421217.93849100   0.000330                0.08%
----------------------------------------------------------------------
post-set-title     1061421217.93855700   0.000066                0.02%
----------------------------------------------------------------------
post-set-users     1061421217.93861100   0.000054                0.01%
----------------------------------------------------------------------
post-execute       1061421218.30995300   0.371342               94.04%
----------------------------------------------------------------------
Stop               1061421218.31017400   0.000221                0.06%
----------------------------------------------------------------------
total              -                     0.394887              100.00%
----------------------------------------------------------------------

subsequent runs use a cached version of the parsed template (something I don’t like too much, it should be an option to cache, not an option not to cache) and give:

----------------------------------------------------------------------
marker             time index            ex time               perct
----------------------------------------------------------------------
Start              1061421220.91106600   -                       0.00%
----------------------------------------------------------------------
post-require       1061421220.93519000   0.024124               15.37%
----------------------------------------------------------------------
post-template      1061421220.93551900   0.000329                0.21%
----------------------------------------------------------------------
post-set-title     1061421220.93558300   0.000064                0.04%
----------------------------------------------------------------------
post-set-users     1061421220.93563700   0.000054                0.03%
----------------------------------------------------------------------
post-execute       1061421221.06779900   0.132162               84.20%
----------------------------------------------------------------------
Stop               1061421221.06802400   0.000225                0.14%
----------------------------------------------------------------------
total              -                     0.156958              100.00%
----------------------------------------------------------------------

Pretty slow, considering it’s running on a sufficiently fast machine, and it’s practically doing nothing. Our real application on top of that has lots of templating operations, LDAP searches, etc.

My second test tried to replicate the same functionality using my Template class. The template looks like this:

<html>
<head>
<title>{title}</title>
</head>
<body>
  <h1>{title}</h1>
  <table>
  <tr>
    <td>name</td>
    <td>phone</td>
  </tr>
  <!-- BEGIN row -->
  <tr> 
    <td>{row_matricola}</td>
    <td>{row_cognome} {row_nome}</td>
    <td>{row_telefono}</td>
  </tr>
  <!-- END row -->
  <!-- BEGIN dummy -->
  <tr> 
   <td>sample name</td>
   <td>sample phone</td>
  </tr>
  <tr>
    <td>sample name</td>
    <td>sample phone</td>
  </tr>
  <!-- END dummy -->
</table>
</body> 
</html>

The dummy row is there to serve the same purpose of the rows I removed from the TAL template after trying it with Python. I left them there since the speed difference is already enough. The code used for the second test is:

#!/usr/bin/php -q
<?php
require_once 'users.php';
require_once 'Benchmark/Timer.php';

$t =& new Benchmark_Timer(true);

require_once "Template.php";
$t->setMarker('post-require');

$tpl =& new Template('/home/ludo/tests');
$tpl->setFile('main', 'template.html');
$t->setMarker('post-template');

$tpl->setVar("title", "Test Page");
$t->setMarker('post-set-title');

$tpl->parseBlock('ROW', 'row', $users, 'main');
$t->setMarker('post-set-users');

$tpl->setBlock('main', 'dummy', 'DUMMY');
$tpl->setVar('DUMMY', '');

$res = $tpl->parse('MAIN', 'main');
$t->setMarker('post-execute');

// result may be an error
if (PEAR::isError($res))
    echo $res->toString(), "n";
?>

Running this test gives:

----------------------------------------------------------------------
marker             time index            ex time               perct
----------------------------------------------------------------------
Start              1061421680.50660500   -                       0.00%
----------------------------------------------------------------------
post-require       1061421680.50944200   0.002837               22.93%
----------------------------------------------------------------------
post-template      1061421680.50970300   0.000261                2.11%
----------------------------------------------------------------------
post-set-title     1061421680.50974900   0.000046                0.37%
----------------------------------------------------------------------
post-set-users     1061421680.51790200   0.008153               65.88%
----------------------------------------------------------------------
post-execute       1061421680.51878400   0.000882                7.13%
----------------------------------------------------------------------
Stop               1061421680.51898000   0.000196                1.58%
----------------------------------------------------------------------
total              -                     0.012375              100.00%
----------------------------------------------------------------------

More than 10 times faster, and that’s without caching anything. More than 30 times faster against the uncached version of PHPTAL.

To compare with Python, I timed the execution of the three scripts with the shell time command, since I was just interested in a rough overview of the relative speed.

Python:

/-(ludo@pippozzo)-(53/pts)-(01:24:37:Thu Aug 21)--
-($:~/tests)-- time ./test.py
 
real    0m0.591s
user    0m0.580s
sys     0m0.010s

PHPTAL:

/-(ludo@pippozzo)-(54/pts)-(01:24:38:Thu Aug 21)--
-($:~/tests)-- time ./test_tal.php >/dev/null

real    0m0.421s
user    0m0.310s
sys     0m0.110s

Template:

/-(ludo@pippozzo)-(56/pts)-(01:24:53:Thu Aug 21)--
-($:~/tests)-- time ./test_tpl.php >/dev/null

real    0m0.268s
user    0m0.190s
sys     0m0.080s

To sum it up, we’re going to keep using our Template class. PHPTAL is a commendable effort, and something that I would definitely use in most of my PHP development, but it needs to be more lightweight, reliable and fast.

If you’re interested in the Python script, here it is:

#!/usr/bin/env python
def test():
   from simpletal import simpleTAL, simpleTALES
    import sys, cPickle

    users = cPickle.load(file('users.pickle', 'r'))

    context = simpleTALES.Context()
    context.addGlobal("title", "Hello World")
    context.addGlobal("users", users)

    template = simpleTAL.compileHTMLTemplate (file("tal_template.xml", 'r'))

    template.expand(context, file('/dev/null', 'w'))

if __name__ == '__main__':
    test()

PHPTAL (re)visited -- moved

20 agosto 2003

As promised in a previous entry, I managed to give a look at PHPTAL. Since the entry is a bit long, I copied it as a standalone article, and left only this brief notice on the blog.

PHP 101 - PEAR error handling

18 agosto 2003

Read with interest today A Few Tips for Writing Useful Libraries in PHP (via PHP Everywhere), a nice article by the author of the MagpieRSS parser.

Posted a couple comments to the site, to which the author kindly replied by email. Nothing much, the usual “don’t name your php files with an extension other than the one defined in your webserver”, and an observation about error handling.

As all PHP developers know, PHP has no exceptions (and thus no try/something clauses), errors are either stored in special variables or accessed by calling functions, depending on what extension you’re working with (PHP5 will introduce exceptions). All pretty messy, as the article author correctly points out.

One thing most developers don’t know is that PEAR (the default PHP library bundled with the PHP source code) has a nice PEAR base class with great error handling features (why a good number of PHP developers stay away from PEAR is another matter entirely from the topic of this entry).

Basically, if your classes (you use PHP’s OO features when writing reusable code, don’t you?) inherit the PEAR base class, you get error handling for free (and a few other things, like destructors).

Using PEAR error handling is very easy, a quick example will suffice:

// our imaginary library (not very useful, is it?)

define('MYLIB_ERROR', 100);

require_once 'PEAR.php';

class myClass extends PEAR {

    function myClass($myargs = null) {
            $this->PEAR();
    }

    function &raiseError($message, $method, $line) {
        $error = PEAR::raiseError(sprintf("%s.%s() line %d: %s",
            get_class($this), $method, $line, $message),
            MYLIB_ERROR);
    }

    function myMethod() {
        $this->raiseError("hmmm something went wrong.....", 'myMethod', __LINE__);
    }

}

For our libray developer, that’s all there is to it. Basically you import the base PEAR class, and use it as a base class for your library classes. Whenever you need to “throw” an exception, just raise a PEAR error. The raiseError method is what I use in my classes, makes much easier debugging what went wrong by appending the method name and the line number where the error occurred, and a constant I use to trap errors depending on the originating classes.

Let’s see the user part of PEAR’s error handling routines:

// user code accessing our library

PEAR::setErrorHandling(PEAR_ERROR_CALLBACK, 'errorHandler');
function errorHandler($err) {
    echo("<b>PEAR error</b><br>message: <i>"
        . $err->getMessage()
        . "</i><br>user info: <i>"
        . $err->getUserInfo()
        . "</i><br>");
}

// or use any of the standard PEAR error messages, eg
// PEAR::setErrorHandling(PEAR_ERROR_PRINT);
// for development or a more sophisticated handler for production,
// eg mailing a copy of the error or saving it in a log

$myinst =& new myClass();
$res = $myinst->myMethod();
if (PEAR::isError($res)) {
    // do something, the error is handled
    // (in our case printed) by the handler routine
} else {
    // do something else
}

The user only needs to import the PEAR base class (or alternatively use $myinst->isError() since it extends PEAR) and check method return values for possible errors. An added benefit of using PEAR’s error handling routines is that by changing the function used to handle errors you can switch from debugging (ie print verbose error messages) to production (ie don’t let the user see errors, handle them in your application) just by changing the error handling routine. A common practice is to automate this sort of things by checking the environment of the running server (eg hostname, etc.).

Windows Tools

18 agosto 2003

I will slowly migrate my bookmarks (currently scattered on at least 3 machines at work and at home) to this site, even though I’m still undecided about how to do it. One possible option is a new category type for bookmarks, whose entries are not displayed in the front page/RSS feeds, but which can accommodate folders. We’ll see.

In the meantime, I will transfer from time to time useful links from my old site (they’re stored in a MySQL DB, things would — ahem — be much easier if they were in ascii files).

Here’s tonight’s batch, in no particular order:

  • ELM (Multiple Boot Manager) a tiny, useful Japanese bookmark manager that reads the partition table and allows you to boot any valid partition on the disk, very handy when you totally mess up things =)
  • WinMerge and (better, IMHO) ExamDiff, Windows visual diff tools
  • StrokeIt mouse gestures for Windows, pretty nice if you’re into this sort of things

After 9 years of going back and forth from Windows to Linux with brief periods on various commercial Unices, a few months ago I finally managed to get rid of Windows (thanks to Slackware, which I unfortunately overlooked for a loong time), so these links are the first to go, and will slowly be buried at the bottom of my archives. =)

The Italian Experience

18 agosto 2003

If you’re a veteran procrastinator then you too have known these moments when large matters, easily visible from a distance of six months away, finally somehow depend on the heroic scramble to beat a five minute deadline or to find a new ink cartridge for the printer. Ridiculous. [Making Change]

Being sort of an outsider in most of life situations, I usually enjoy descriptions of the Italian Life by foreigners, outsiders in a pretty strange culture.

Thus I spent a funny 5 minutes browsing through Stumbling Tongue, a blog kept by an American (?) living in Milano (found it via geourl). I especially liked Making Change, a description of the Italian Way of dealing with cash and change.

Recommended.

StaticBlog

18 agosto 2003

Just a quick entry to open up the category, since my title points to it. =)

StaticBlog is a tool I’m developing to manage my blog. In as few words as possible, your entry writing process is

  • fire up your preferred editor and write your entry in any of the available formats (as of now, txt html and restructured text)
  • optionally define author, date, title and encoding (entry keywords are not yet implemented) in commented out lines at the beginning of the file if you don’t want the entry to get the defaults
  • save the file in the directory corresponding to its category inside staticblog’s data dir
  • run staticblog to catch changes, index them and generate the modified HTML pages and the RSS feeds

It is similar in concept to bzero, but unlike bzero you don’t have to explicitly tell it to parse a new entry, and well lots of other stuff. I gave up on bzero after a few minutes when it did refuse to work on one of my machines — a pretty plain Slackware 9.0 setup — and I noticed that its source code is not available.

You can read a bit more about StaticBlog in one of my previous entries.

If you’re interested in helping develop StaticBlog, or want to give it a try (it’s still very much work-in-progress, mind you), drop me a note.

update: I forgot to add that StaticBlog is written in Python.

Generating abstracts from HTML snippets

18 agosto 2003

A couple of days ago while working on this site’s generator, I had to solve the (small) problem of generating abstracts of possibly arbitrary length from HTML snippets.

I wanted my code not only to trim down a snippet to a certain word length, but also to count how many words were left out from the abstract, and to preserve HTML tags (not counting them as words, ofc).

A brief look at Text Processing in Python by David Mertz pointed me in the right direction. It took a few minutes, and the abstracts appear to be good so far.

Last night I was browsing through Fredrik Lundh’s blog and I stumbled upon his solution to the same problem, that I had only briefly read without much interest when it appeared in my aggregator. I did (and still do not) fully understand his code, mainly because I have never used formatter classes.

Ever the curious person, I decided to benchmark the two solutions together, mentally prepared to take a beating from a far superior programmer than me. I was surprised when my code resulted 35% faster (of course, this may be due to it not actually being my code, but a variation on DM code).

/-(ludo@pippozzo)-(84/pts)-(00:50:25:Mon Aug 18)--
-($:~/pystuff/staticblog)-- ./test.py
effbot code, 100 runs
2.05247092247
my code, 100 runs
1.30467903614

Not satisfied, I thought the difference was in my reusing the same istance vs effbot’s code creating a new instance at each call (correct me if I’m wrong).

/-(ludo@pippozzo)-(102/pts)-(00:59:40:Mon Aug 18)--
-($:~/pystuff/staticblog)-- ./test.py
effbot code, 100 runs
2.05521595478
effbot code, 100 * 1 run min 0.011666 max 0.043925 avg 0.019288
my code, 100 runs
1.25740003586
my code, 100 * 1 run min 0.007895 max 0.037792 avg 0.013004

Still faster, though not by much. Here’s my code:

import re
from HTMLParser import HTMLParser

class abstractParser(HTMLParser):
    """inspired by a simpler parser described in Text Processing in Python chap 5
    http://gnosis.cx/TPiP/chap5.txt"""
    space_re = re.compile('(?:s|&nbsp;)+', re.S)
    
    def __init__(self, abstract_length):
        HTMLParser.__init__(self)
        self.abstract_length = abstract_length
    
    def reset(self):
        HTMLParser.reset(self)
        self.tagstack = []
        self.abstract = []
        self.wordcount = 0
        self.morewords = 0
        self.completed = False
    
    def handle_starttag(self, tag, attrs):
        if not self.completed:
            self.tagstack.append('</%s>' % tag)
            self.abstract.append(self.get_starttag_text())
    
    def handle_endtag(self, tag):
        if not self.completed:
            self.abstract.append(self.tagstack.pop())
    
    def handle_data(self, data):
        if self.completed:
            self.morewords += len(self.space_re.findall(data))
        else:
            if data:
                words = []
                for word in self.space_re.split(data):
                    if self.completed and word != '':
                        self.morewords += 1
                        continue
                    if self.wordcount == self.abstract_length:
                        self.completed = True
                    if word != '':
                        self.wordcount += 1
                    words.append(word)
                self.abstract.append(' '.join(words))
    
    def feed(self, content):
        self.reset()
        # TODO: split feeding in reasonable chunks until self.completed
        HTMLParser.feed(self, content)
        HTMLParser.close(self)
        if self.morewords > 0:
            self.abstract.append("... (%s more words)" % self.morewords)
        self.tagstack.reverse()
        for t in self.tagstack:
            self.abstract.append(t)
        return ''.join(self.abstract)

if __name__ == '__main__':
    snippet1 = "<p>Lorem ipsum dolor sit <b>amet</b> ipso facto.</p>"
    snippet2 = "<p>Lorem ipsum <i>dolor sit <b>amet</b> ipso facto</i>.</p>"
    p = abstractParser(5)
    print p.feed(snippet1)
    print p.feed(snippet2)
    # gives
    # <p>Lorem ipsum dolor sit <b>amet</b>... (2 more words)</p>
    # <p>Lorem ipsum <i>dolor sit <b>amet</b>... (2 more words)</i></p>

Hmmm I even caught a bug writing this blog entry, grrr when will I learn to write tests before coding even for small things?