Storing hierarchical data: Materialized Path

Web applications often need a way to represent and store hierarchies.
A menu with its submenus. A category with its subcategories. A comment and its replies.
Storing the hierarchy, and later reconstructing it from the stored data is just a part of the puzzle. We also need a way to find the parents or children of an item. We need to be able to re-parent an item (move it to another part of the hierarchy). Finally, there is the need to order items in a way that reflects their position in the hierarchy.

There are several ways to do this, each with its own pros and cons:

  • Adjacency list
  • Nested set
  • Closure table (aka bridge table)
  • Materialized path (path enumeration)

I won’t compare all of them, but a quick surf through search results and StackOverflow will tell you that closure table and materialized path are potentially the two best choices.

Looking at our storage requirements, materialized path starts to look like a simpler option:
Storing the hierarchy in a materialized path requires only one column in the table.
Storing the hierarchy in a closure table requires an additional table with a large number of rows.
The closure table also won’t work if you need to sort items by hierarchy, and re-parenting items is slow and costly. On the other hand, it’s normalized, which can’t be said for materialized paths.

So, let’s give materialized paths a shot. We’ll then see how our encoding trick makes it even better.
Continue reading

How Devel causes heisenbugs

Here’s what killed my Friday.
The story has been edited to remove pain, suffering, prolonged coffee intake.

Inside a module I have code similar to the commerce cart refresh code.
It loads a line item, clones it, runs pricing rules and other modifications on the created clone. Then it compares the cloned and original line items to determine whether something had changed (requiring the line item in the database to be updated).

That looks something like this:

$line_item = commerce_line_item_load($line_item_id);
// dsm($line_item)
$cloned_line_item = clone $line_item;
// Pretend that as a result of a complex calculation
// or function call, the unit price of the new line 
// item has been changed.
$cloned_line_item->commerce_unit_price[LANGUAGE_NONE][0] = array(
  'amount' => '66600', 
  'currency_code' => 'EUR',
// Now let's compare the old and the new amount.
$old_amount = $line_item->commerce_unit_price[LANGUAGE_NONE][0]['amount'];
$new_amount = $cloned_line_item->commerce_unit_price[LANGUAGE_NONE][0]['amount'];
if ($old_amount != $new_amount) {
  dsm('The price has been changed.');
else {
  dsm('This can never happen.');

I am using Devel to do some light debugging. At one point, wanting to see which
line items get processed, I uncomment the dsm($line_item) on top of the script.

Suddenly, the output changes to “This can never happen.”. What happened?
The dsm call had turned the line item variables (such as commerce_unit_price) into references. PHP’s clone is a shallow clone, so it only cloned the line item, not caring about what’s inside.
With both $line_item and $cloned_line_item having the same references for its fields, changing $cloned_line_item->commerce_unit_price also changed $line_item->commerce_unit_price. This means that the line item in the entity controller static cache has been changed as well, so doing another commerce_line_item_load() inside the same request will return a line item with the wrong unit price.
Many subtle bugs followed, followed with suspicious glances at Entity Wrapper (later proved to be innocent).

So, my simple debugging call caused all of the bugs I was seeing from that point on.

Futurama: You changed the outcome by measuring it

There’s an issue in Devel’s issue queue from November 2012 about this:
Krumo side effect: Object vars become references.
Click the “follow” button, and be very careful what you use Devel for.
Note that Devel in D8 uses Kint instead of Krumo, so it might be immune to this.

Detecting the system timezone from PHP

PHP Warning: date(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but please set date.timezone to select your timezone.

We’ve all seen this at least once.
PHP5 requires you to have a timezone specified in your php.ini.
If you don’t, it will issue the warning and versions <5.4 will try to autodetect the system timezone.
This didn't always work, so PHP 5.4 dropped the autodetection code completely and left us on our own. Wonderful.

Normally, this isn’t such a big deal, a script can always run
date_default_timezone_set() to set a new default.
However, I’m currently writing a CLI tool (using Symfony Console) and prompting the user to specify a timezone is much more annoying in this context.
So I wrote some code that tries to autodetect the system timezone, with a UTC fallback:

$timezone = 'UTC';
if (is_link('/etc/localtime')) {
    // Mac OS X (and older Linuxes)    
    // /etc/localtime is a symlink to the 
    // timezone in /usr/share/zoneinfo.
    $filename = readlink('/etc/localtime');
    if (strpos($filename, '/usr/share/zoneinfo/') === 0) {
        $timezone = substr($filename, 20);
} elseif (file_exists('/etc/timezone')) {
    // Ubuntu / Debian.
    $data = file_get_contents('/etc/timezone');
    if ($data) {
        $timezone = $data;
} elseif (file_exists('/etc/sysconfig/clock')) {
    // RHEL / CentOS
    $data = parse_ini_file('/etc/sysconfig/clock');
    if (!empty($data['ZONE'])) {
        $timezone = $data['ZONE'];


A further improvement would be to try and make autodetection work on Windows as well.

Learning PHP development with Silex

A small group of students wants to learn PHP and use it to develop a small portal for a university project.
It should show best practices, and have the usual set of features (comments, a bit of ajax, sending an email, login / logout, a small admin panel).
They are already familiar with OOP through previous Java work, have learned the basic PHP syntax, and are now asking you how to proceed. A framework? Which one?

I was asked the same thing back in June, and I wasn’t sure what to answer. The last PHP frameworks I developed with were Zend Framework 1 (ugh) and CodeIgniter. Some research had to be done.

Making the choice

The basic requirements were:

  1. Modern PHP code.
    This means PHP 5.3+, namespaces, PSR0, hopefully Composer.
  2. Decent number of users.
    More popular frameworks have more documentation, StackExchange answers and other resources. They are also more likely to be useful later in the job market.
  3. Minimal and clear.
    We wanted something that is easy to read and understand, approachable to a beginner.

Continue reading

Entity Bundle Plugin

After weeks of work Commerce License is finally up, as well as Commerce File 7.x-2.x to go along with it.

Commerce License provides a framework for selling access to local or remote resources.

In practice, this means that there’s a license entity, usually created during order checkout, that holds information about accessing the purchased resource, and it has a status and an optional expiration date.
This allows selling access to anything from files to node types, or perhaps ZenDesk tickets and accounts on remote sites, all using a common API, while always having a record of the purchased access.

At the heart of that API is the entity bundle plugin, which allows different license types to have different logic.
What is entity bundle plugin? The project page says only this:

This API module allow developers to build an entity type which is attached to strong behaviors.

That doesn’t help much, so let’s dive in. Let’s start by looking at how entities are built on D7.

Continue reading

Let’s stop using the issue queues for providing support

We spend a lot of our time in the issue queues, working on bug fixes and adding new features. It’s an okay tool for organizing development, and we’ve grown used to its quirks by now.
Users also try to use the issue queues to receive support, with more or less luck (typically, the bigger the module is, the less luck they have).
And while it’s great to have the support request category to reclassify confused bug reports or already implemented feature requests, as an actual tool for support the issue queue is terrible and it hurts the community.

Continue reading

Using Features for install profiles: The problem of default configuration

Recently I’ve made several remarks on Twitter about the pain of using the Features module for install profiles. Since they weren’t universally understood, I’ve decided to blog a bit about what is one of my pet topics: the problem of default configuration.

The problem


  1. An install profile / distribution needs to be able to export and provide default configuration (content types, fields, variables, etc).
  2. The user needs to be able to revert to default configuration when desired.
  3. The user needs to be able to “untrack” any piece of configuration, allowing it to be deleted.
  4. The user needs to be able to export and version his own export of the configuration, maintaining his changes across distribution upgrades.

The last two requirements are not satisfiable with Features, because Features has no concept of “default configuration” over regular configuration.

Continue reading