Liberated Pixel Cup and distributed free culture projects

cwebber, July 11th, 2012

Screenshot of the LPC style guide navigation boxes
The Liberated Pixel Cup style guide, which is the cornerstone of coordinating collaboration in Liberated Pixel Cup themed artwork.

We announced Liberated Pixel Cup just a little over two months ago. In the time since the announcement, a stunning number of things have happened: we hit… exceeded!… the fundraising goal for the project, completed a style guide, base asset repository, and interactive demo, wrapped up the art phase, and we are now well into the coding phase.

I’m happy to see things going so well. Liberated Pixel Cup is an important project to me, and its success is near and dear to my heart. This isn’t just because I think “games are cool” either: there are actually a lot of motivations behind Liberated Pixel Cup. I discussed one of these in depth on my personal blog in a post titled “Why games matter to free software and free culture“. If you’re looking for a larger-policy reason for working on free software and free culture games, you may wish to read that.

But there are other reasons why I think the Liberated Pixel Cup is important that were outside the scope of that post. One of these is that games are one of the few direct intersections of free culture and free software (or otherwise directly functional) works (the other major instance of these I think of being 3d printing). Ideally, we’ll want to see only more and more of these combinations in the future, but what we see is that when we do, a lot of interesting questions are raised. And getting these movements to connect is surely important generally. It’s also true that getting into game development is presently extremely hard as you end up splitting your efforts between designing a game engine and designing artwork assets, so reducing the barrier for certain types of games seems to be a worthy goal.

But one of the biggest reasons for doing Liberated Pixel Cup was very specific: doing distributed free culture projects is hard and has barely happened at all in the way that distributed free software projects has. I wanted to prove that there is a way to do distributed free culture projects and achieve a coherent result.

Background thinking on distributed free culture projects

Free software projects have operated in a distributed fashion for some time. But for whatever reason, there are few examples of successful distributed free culture projects. Sure, there’s the obvious exception of Wikipedia and wikis generally. However, these are very much so the exception, and it’s even arguable that while free culture projects, these fall more on the data and factual information side of things than on the highly creative and artistically directed projects side of things (one might argue that Wikimedia Commons is more on the side of creative works, but this is more of an aggregation of creative works than work to create a coherent and stylistically consistent whole).

Probably the most famous example of artistic free cultural works that comes close to free software works is the Blender Foundation’s films. There’s a lot that comes close here: obviously first of all in that they use a major piece of free software (Blender) as the key part of their pipeline, but more specific to workflow aspects, the Blender Foundation films release their works in much the way that a free software project releases their works: with all the source files used to build the film attached under a free license so that anyone can build, modify, and study the project.

But for the most part, Blender films aren’t developed in a decentralized manner (or haven’t been, at the time of this writing). For Sintel, Big Buck Bunny, and Elephants Dream, the development of the work was done by a small and closely knit team of artists behind closed doors and the entirety of the film was released all at once at the end of the project. This is not to say such a pattern would comparatively disqualify something from being called free software, but it would make it a non-distributed project along the lines of what we call in free software “throwing code over the wall”.

This hasn’t gone unnoticed by members of the community; if you frequent the BlenderArtists forums, every now and then someone brings up the idea of doing a distributed open movie project that anyone can join and contribute to. These projects tend to start with a bunch of enthusiasm but after not too long seem to generally fizzle out (I’m not going to point to any in particular because I don’t want to make anyone feel on the spot, but it’s easy enough to find on your own by doing a search for the terms “open movie” and browsing through the archives).

Why does this happen? Is it simply that distributed free culture projects aren’t possible in the way that free software projects are? I don’t believe this is true, and have been trying to think it through for some time.

One event occurred that lead to my thinking about how to approach a distributed free culture project: during the making of Sintel there was a period where they asked for some outside help improving the characters. I was excited about this, but thought maybe with some more careful direction they could have better results. I sent an email to the director Colin Levy and suggested that they do a sprint instead: it would give them the opportunity to pre-allocate a list of tasks so that multiple people could work on multiple things without conflicting with each others’ work (resolving conflicts in 3d modeling is not as easy as it is to do with plaintext source code), they could set aside a series of time where they could give direct feedback to people working on things, and they could have a short timeline in which they could see how well things worked. And the good news is that the Sintel team ran such a sprint (I’m credited in the post as “cwebb”) and it was a “stellar success“. (There was also later a animation sprint, but that went a bit less well… I did an interview with Ton Roosendaal where he explained a bit why he thought that was.) I’m not trying to take credit for any of Sintel’s success here, even in the modeling sprint (I didn’t participate, I just sent that email); that would be stupid. But I was excited to see that when properly framing things, something collaborative was possible for a free culture project. This was a small subset of a larger project (which was still mostly done in a throw-the-work-over-the-wall fashion), but maybe some lessons here could be expanded into something larger?

I’ve continued to think about this and whether or not things could be expanded to larger projects. I had a call with a friend of mine, Bassam Kurdali, who is the director of the first open movie project, Elephants Dream, as well as Tube, a new and exciting open movie film project. I picked his brains on what he thought about running a largeish distributed free culture project would work like.

We came up with the following points:

  • For one thing, the idea that you just throw open a project and everyone shows up and just builds their piece of the universe and bam, your world is created! … is wrong. In fact, it’s even wrong for software: most free software projects that go far might have a lot of contributors with a large and varied set of interests, but there tends to be one or just a few people setting out a very specific set of “project vision” for the software. If you don’t have this, the software heads in all sorts of conflicting directions and falls apart under its own weight and lack of cohesion.
  • This is even more so a problem for free culture. When you develop an animated film, a game, or whatever, you need clear stylistic direction. If you don’t have that, you end up with a ton of pieces that you can try to mash together but don’t really look like they belong together at all. Everyone has a different idea of where the project should go and a different preference in look, and eventually you hit creative difficulties, and the piece falls apart. But is there a way to get past this?
  • There are probably two ways to get past this. One is to have a strong “artistic director” of the project who coordinates the entire style of the project from start to finish. And another is to borrow an idea from programming and set up a “style guide”. Bassam pointed out that python programmers are more than happy to conform to PEP-8, the style guide that dictates the general look and feel of code. And of course, there are plenty of other conventions in python code imposed by the language’s design itself. Could the same system work for artists?

Relatedly, around this time, Jonathan Palecek (CC software engineer, and author of the Liberated Pixel Cup Demo) and I were speaking about an engine he was building and looking around on OpenGameArt for resources to use when prototyping the engine. The trouble we found is exactly the problem laid out above: there are tons of wonderful resources on OpenGameArt. But the problem is that you simply can’t use most of them together. There’s simply too much variance between the items.

… and then it became obvious. This is the perfect space to try to prove distributed free culture projects are possible with enough direction and preparation…

Why we did what we did

So that’s the thinking that lead up to the point of Liberated Pixel Cup. I approached OpenGameArt and the Free Software Foundation about the idea of a competition that bridged both free software and free culture with a specifically laid out style. Once they were both on board (Mozilla would join later) and we knew our general direction it was time to figure out: how exactly would we structure the contest and the style guide specifically?

For one thing, we knew what building a style guide generally would look like, because there was a wonderful project called Tango which had already done the same for vectorish application icons. They seemed to do it exactly right with their existing style guide coupled with a base icon library. So we went this route: we built a style guide, we commissioned a series of base assets, and to show off that in fact that the Liberated Pixel Cup style dream was real, we even built an interactive demo that you can play with.

With that in mind, we had to make a decision on what style we wanted to shoot for. We went with a raster-based top-down, orthographic view tiling at 32×32. But why this specific style? It was no accident; we had these very specific goals in mind:

  • We wanted it to be adaptable to a wide variety of types of gameplay. In the “16-bit era” of game consoles, there were a wide variety of games that used this perspective: adventure games, RPGs, real-time strategy games, farming simulations, civilization simulations, and so on. We knew that it was flexible, and flexibility was critical to a design that could be useful for a variety of games for a long time to come.
  • We wanted it to be extendable to a variety of thematic genres. Even though the base assets that we commissioned for Liberated Pixel Cup had some specific thematic elements to them (vaguely victorianesque interiors, some traditional fantasy game tropes), there was nothing specific about the elements described in the style guide towards one thematic genre or another. (This was proven true in the art phase of the competition; we got some nice looking science fiction themed submissions.)
  • We wanted the style to be easy to collaborate on without a persistent “art director”. The style we were going for was fairly well understood by having a lot history of games with similar (though not exactly the same) styling. The orthographic style and the abstracted and highly stylized proportions of the characters we had commissioned were done with clear intentions; for example, we had considered a character style that had more realistic proportions, but such a character style would require either a much more intensive set of art direction (which we could not handle such a resource with the kind of contest we wanted to run) or much longer and more specific descriptions and layouts around the characters. Similarly, there are ways to do tiled games that have a more “faked” but high definition sense of perspective (even though on a flat grid you can’t have real perspective lines by definition) but this would again require a lot more hand-holding than just “keep it in an orthographic projection.”
  • We wanted artwork that looked beautiful but could have a low barrier to entry for a variety of artists. The art style we chose was intended to have specific but easy to understand rules, but ones where fairly new artists could still accomplish nice things that seemed to match, and advanced artists could use their full skills. This affected decisions like “just what kind of texturing/shading are we going to push for?”
  • Despite the above, we wanted something that looked nice and, even though borrowing from a long history of games with similar and well understood styling, had distinctive elements. To this end, we laid out the base directions of the art style first, then commissioned a base set of artwork, then developed a clear style guide based on the existing set of work we had. Decisions like the general “camera angle” we wanted, finalized shading directions, how to handle outlines (which are colored instead of black and white in our style), were made based on the artwork produced by our commissioned work. And we do feel that the result was something that was easy to build things off of but had a clear and distinct “Liberated Pixel Cup look” to it.

Knowing all this, we split out the contest into three phases. First, we had a commission / style guide phase. Bart Kelsey of OpenGameArt and I wrote out the core aspects of the style we knew we wanted and then brought on Lanea Zimmerman as our lead artist to work out initial tiles then gradually brought in four other artists (Stephen Challener developing the base characters, Charles Sanchez doing monsters, Manuel Riecke doing character hair/accessories, and Daniel Armstrong building us a bonus castle set). With the artists coordinating we extrapolated what we had into a style guide.

At that point we were able to move into part two, the art phase, which recently wrapped up. And I’m happy to say, it was a success! We got a large variety of entries and for the large part, most of them look like they fit beautifully with the style we set out. And it’s hard not to feel validated: we’ve seen in Liberated Pixel Cup that it’s possible to get a large, distributed set of people to collaborate on something big and make things that actually can work together. And now we’re finally moving on to stage three of the project, the coding phase; hopefully seeing contributions from the art contest being used in games will make that achievement seem more real.

In conclusion

So among other agendas behind Liberated Pixel Cup was proving that distributed, collaborative free culture projects can be done if approached right. And I believe we showed that it can in this case: with enough forethought, careful planning, and creating the kind of conditions that made artists want to create a cohesive set of works.

Can the approach we took with Liberated Pixel Cup work across all free culture projects? I don’t think it’s quite that simple; we made some very specific choices as to the decisions we wanted to take given the kind of project we intended to run (eg, the choice for non-realistically proportioned characters being done because we knew with everyone working separately we couldn’t have a careful “art director” type person… but if we were working on a film, we probably would go for something with a style that might need more negotiation). But what I do think is that if you have people who know their field well and decide to take the forethought to give a project clear planning and direction, a “distributed, collaborative free culture project” is more than possible. Creating something that’s cohesive takes more work than just “throwing open the doors and letting everyone toss in whatever they feel like”, but if you take the time to plan things out, you really can get people collaborating on something wonderful and even have it fit together beautifully. And so, I hope we see more of such projects in the future!

Comments Off

Dissecting the Liberated Pixel Cup Demo

lunpa, July 10th, 2012


The Liberated Pixel Cup Demo (LPCD) was written by yours truly over the course of two weeks, prior to the art phase of the Liberated Pixel Cup contest. The demo had several intended purposes. First, to test the usability of the base tile set for building levels. Second, to show character sprites interacting with environments and to demonstrate animations. And third, to inspire. As there has been some interest in the construction of the demo, this article is an overview to how the demo was constructed. Before I go into any detail, it is worth noting that this demo was put together without really knowing how much time would have been available to work on it. Because of this, the demo progressed through several stages – each playable and a plausible endpoint – before arriving to what it is today. This is reflected in a few places in the source code, either in code that was written with the best of intentions or in code that was written to be the foundation for something that never came to be.


Complex JavaScript programs get messy pretty fast. This is largely because it is impractical to split a JavaScript program across several files. Lack of namespaces and overly verbose language features (like Object.__defineGetter__) probably don’t help the matter. There is a ridiculous amount of information on how to organize your code and keep sane. I’ve yet to fall madly in love with any of these solutions.

Here’s what I usually do:

I start by defining a dummy module using the object notation (I call this the ‘header’). Then I monkey patch all of my functions into it. As I add function definitions and the like, I update the module to reflect the expected structure. Function stubs have comments next to them outlining the expected arguments. I don’t use a closure to fake a private scope for the module. Instead, the module is organized to keep calls, callbacks, and different sorts of data separate. It makes testing your code much easier. If you want to scare people from touching something, throw some underscores in front of its name.

The program itself is split into several files, grouping code more or less by purpose. Header.js contains the module object definition, and the starting point of execution for the game engine. All of the remaining files are appended to the end of this file (the order doesn’t really matter). Assembly of the program (as well as minification) is automated via a make file.

The advantages of using this organizational scheme are:

  • The header provides a simple reference and easy visualization of the program’s structure.
  • Doesn’t do anything clever with language features to make it work.
  • Looks cleaner to me.

The only disadvantage I can think of is that the header must be maintained as the program is written. It isn’t easy to tell if the header is maintained well, since the program can still run if function stubs are missing or some of the variables aren’t defined.


Levels are built using the program Tiled, with the level data exported to json. The levels are tiled on a 32×32 grid, which turned out to be a mistake. If I wrote this again, I would go with a 16×16 grid instead, to simplify the conversion of world coordinates to and from screen space coordinates. This is explained further in the section about the physics engine.

Tile boards are rendered upon two html5 canvas elements; an iframe between the two is where the actors are drawn. Level data may contain more than two layers, but will be automatically flattened into two layers when the level is rendered. Actors are represented with div elements; css is used to crop and position them. For actors inheriting from VisibleKind, Z-index is used to do depth sorting, which is why the actors are in an iframe. Depth sorting behavior is done on the actor’s _dirty method, which may be overridden.

Art assets are fetched in the background by creating a new Image object in JavaScript. The onload callback is used to inform the engine when the resource is ready for use. When the json file for a level is being parsed, the number of pending downloads is incremented when an image download is started, and decremented on its callback. This allows for the program to wait far all of the images to finish downloading before drawing the tile boards. A similar technique is used with art assets for actors, but this is unnecessary because the asset is displayed using css. This is a throwback from when a third canvas element was used to draw the actors.

The redraw event is scheduled when the focused character’s coordinates change (it might still be when any actor’s coordinates change, which would be a throwback from when all actors were drawn on a canvas). Because a bunch of functions may request a redraw at once (some might do this multiple times), the first request is honored and the rest are ignored. This simplifies things quite a bit, because the request itself is inexpensive, it can be used when-in-doubt without worrying about a significant performance cost. I’m thinking of generalizing this for another JavaScript game engine I am planning, where there are various engine functions that would make sense to schedule like this. I’m thinking in that version, I’ll have the scheduling function be named “please”. Eg, please(“redraw scene”), etc.


Physics information is stored on a 16×16 conceptual grid. Originally, this was to be 32×32, but proved to be a mistake: in some cases, this would prevent the character from walking right up to the edge of something. Because many hours of work already spent building levels would be lost by making the whole engine use a 16×16 grid, I opted for a flimsy workaround. Physics info for tiles is now one of A, N, NE, E, SE, S, SW, W, NW; which describes the wall coverage in a given graphical tile’s conceptual subtiles.

The physics grid is populated during level load. Several helper functions exist to check if a given coordinate is blocked by a wall, an actor, or a warp point.

Actors that prototype AnimateKind (which also happens to be the actors which can be the focused player) have a _move_to function that initiates the walk cycle. The walk cycle function is probably the most complex singular part of the game engine. This is in part due to the fact that the character’s coordinates are floating point values, not array indices. A good chunk of this code is used to make sure the character doesn’t appear to be walking through walls when cutting around a corner; this had the added side effect of the movement trajectory appearing to be adaptive to obstructions despite the lack of a real path finding algorithm. Part of the complexity of this function also comes from the fact that it is possible to call events on other actors when colliding into them.

The player character is an actor. Any actor that prototypes AnimateKind can be focused as the main character. This is used in the demo a bit, allowing you to play as Alice (by default), Bobby Tables, and a secret character. Using a JavaScript debugger and a little know-how, you can take control of many other actors; such as any of the students or any of the monsters.


Each entity in gameplay is represented by a javascript object that contains data describing the actor, and event handler functions. Actor objects are stored in LPCD.ACTORS.registry, and there exists several helper functions to be used to manage them. If you use the api functions to create your actors, this process is entirely automatic.

There is an inheritance chain used in creating an actor, allowing different engine features to be implemented on the actors themselves while keeping the code isolated. This means that the code for things like human characters, monsters, treasure boxes, and etc are all responsible for rendering themselves in the graphics engine. These actor type constructors can be found on the header object in LPCD.ACTORS, and defined in the file actor_model.js. For the most part, these constructors are fairly concise, with the exceptions of VisibleKind and AnimateKind.

All actors inherit from AbstractKind. The most important aspect of this actor is the variable “_binding”, which determines if an actor is cleared from memory or not when a new level is loaded. This allows focused actors to travel from level to level. There was going to be a feature for persistent actors, allowing for things like items and treasure, though this was never implemented. Thus, PersistentKind exists, though I don’t believe anything actually uses it.

VisibleKind inherits from AbstractKind and is used to provide a presence for the actor in the graphics engine by creating a div element and inserting it into the iframe used to display actors. This object also provides world coordinates (since they’re needed for drawing) to the actor. This object does not make an actor responsive to collision detection.

ObjectKind inherits from VisibleKind, and is used for inanimate objects. It provides the _blocking function, so that the actor can be used in the physics system.

AnimateKind inherits from ObjectKind. It provides the _gain_input_focus function, directional facing information, a _look_at function, and the walk cycle via the _move_to function. This does not implement any animation features, but is simply for animate objects. CritterKind and HumonKind both inherit from AnimateKind and implement animation specific features.


Level scripts are found in the dynamics folder, and have the file name of the level they correspond to + “.js”. So for example, the starting level’s file name is “start1.json” (level data is found in the levels folder. I do not recommend viewing it via web browser), the corresponding dynamics script is “start1.json.js“. To make it easy to clean things up when the level changes; when the level is loaded, an iframe is created and the level dynamics script is loaded within that iframe. It is given access to LPCD.API via a global variable named API; but is left blind to the rest of the engine. This allows us to dispose of the script easily by deleting the iframe.

An amusing side effect of this is if you define within a dynamics script an actor that inherits from AnimateKind, and change your input focus to this new actor and leave the level; the object for the actor remains, but none of its member functions may be called anymore. However, anything in the prototype chain still works fine provided that it was defined in the engine itself. Because of this, characters.js is used to define game-specific characters and useful objects outside of the levels and instance them from the level dynamics script via the API.instance function. Because the code was defined outside of the level, the object remains functional after the level has been flushed.

Conveniently, this behavior is consistent between Firefox and Chrome. If this behavior for scripts in iframes is standardized, I imagine this was never an intended use case.


Overall, I’m quite pleased with how the demo turned out. There are some rough spots where it isn’t clear where things are happening (eg, flushing the level actors by changing the innerHTML property of a DOM element), which I had forgotten about prior to writing this article. Despite that, I think the code is pretty usable as a game engine, and should still be fairly easy to extend. Hopefully this article serves as a guide for others to tinker with the engine, to use the code in their own projects, or even to study in building something entirely new.

1 Comment »

Setting kernel clocksource to HPET solves mysterious performance issues

nkinkade, April 10th, 2012

For quite a long time the server which runs this very site has had some performance issues. This same server runs one or two instances of Mediawiki, and I have always just presumed that Mediawiki was the cause of the problems. I really didn’t give it too much more thought, since the issues weren’t causing many horrible user-facing performance issues. The server sort of hobbled along in the background, fairly loaded, but still managing to serve up pages decently. However, the problem most seriously manifested itself for me personally when working in a remote shell. Sometimes I’d go to save a file and the operation would take 10 or 15 seconds to complete. I ignored this, too, for some time, but it reached a point where I couldn’t take it any longer.

I watched the output of top for a while, sorting on various metrics, and noticed that flush and kjournald were pegged at the top when sorted by process state, both being in a disk-wait (“D”) state. This didn’t make any sense to me, since the machine doesn’t host any really busy sites and should have plenty of memory to handle what it has. I decided to do a web search for “linux flush kswapd” to see what it would turn up. As it turns out, the very first article returned in the search ended up indirectly shedding light on this issue, even though it turned out to be mostly unrelated to my own problem. However, what I did take away from it was learning of a utility that I didn’t previous know about. Namely, perf, and specifically perf top -a.

What I discovered upon running this command was that the kernel was spending a huge amount of time (60% to 80%) running the function acpi_pm_read. A little investigation on this tracked it back to the kernel clocksource being set to acpi_pm. The current, and available, clocksource(s) can be discovered by running the following, respectively:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource

I then went to another machine, also running Mediawiki, but one not having any performance issues, and found its clocksource to be hpet. After a little more research, some experiementing, and a few reboots, I found that adding the kernel parameter hpet=force to the variable GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and then running update-grub got the system using hpet as the clocksource. And this seems to have totally cleared up the issues on the machine. Processor usage is way down, memory usage is way down, processes in the disk-wait state are down, and our Mediawiki site is returning pages much faster that it ever has.

For reference, here are a few machine specifications which might be useful for others investigating this:

  • OS: Debian Squeeze
  • Processors: 2 x AMD Opteron 246
Comments Off

Converting a remote, headless server to RAID1

nkinkade, April 6th, 2012

We have a particular server which has been running very well for the past few years, but I have had a certain amount of low-level anxiety that the hard disk in the machine might fail, causing the sites it hosts to go down. We have backups, of course, but restoring a machine from backups would take hours, if for no other reason than transferring gigabytes of data to the restored disk. So I had our hosting company add a second hard disk to the machine so that I could attempt to covert it to boot to a RAID1 array. Playing around with how a system boots on a remote, headless machine to which you have no console access is a little nerve-racking, because it’s very easy to make a small mistake and have the machine fail to boot. You are then at the mercy of a data center technician, and the machine may be down for some time.

There are several documents on the Web to be found about how to go about doing this, but I found one in particular to be the most useful. I pretty much followed the instructions in that document line for line, until it broke down because that person was on a Debian Etch system running GRUB version 1, but our system is running Debian Squeeze with GRUB 2 (1.98). The steps I followed differ from those in that document starting on page two around half way down where the person says “Now up to the GRUB boot loader.“. Here are the steps I used to configure GRUB 2:

The instructions below assume the following. Things may be different on your system, so you will have to change device names to match those on your system:

Current system:

/dev/sda1 = /boot
/dev/sd5 = swap
/dev/sd6 = /

New RAID1 arrays:

/dev/md0 = /boot
/dev/md1 = swap
/dev/md2 = /

Create a file called /etc/grub.d/06_raid with the following contents. Be sure to make the file executable:

exec tail -n +3 $0
# A custom file for CC to get system to boot to RAID1 array

menuentry "Debian GNU/Linux, with Linux 2.6.32-5-amd64, RAID1"
        insmod raid
        insmod mdraid
        insmod ext2
	set root='(md0)'
	search --no-floppy --fs-uuid --set 075a7fbf-eed4-4903-8988-983079658873
	echo 'Loading Linux 2.6.32-5-amd64 with RAID1 ...'
	linux /vmlinuz-2.6.32-5-amd64 root=/dev/md2 ro quiet panic=5
	echo 'Loading initial ramdisk ...'
	initrd  /initrd.img-2.6.32-5-amd64

Of course, you are going to need to change the UUID in the search line, and also the kernel and initrd image names to match those on your system, and probably some other details.

We should tell GRUB to fall back on a known good boot scenario, just in case something doesn’t work. I will say up front that this completely saved my ass, since it took me numerous reboots before I found a working configuration, and if it weren’t for this fall-back mechanism the machine would have stuck on a failed boot screen in GRUB. I found some instructions about how to go about doing this in GRUB 2. Create a file name /etc/grub.d/01_fallback and add the following contents. Be sure to make the file executable:

#! /bin/sh -e

if [ ! "x${GRUB_DEFAULT}" = "xsaved"  ] ; then
  if [ "x${GRUB_FALLBACK}" = "x" ] ; then 
      export  GRUB_FALLBACK = ""
     GRUB_FALLBACK = $( ls /boot | grep -c 'initrd.img' )
   echo "fallback set to menuentry=${GRUB_FALLBACK}"  >&2

  cat << EOF
  set fallback="${GRUB_FALLBACK}"


Then add the following line to /etc/default/grub:

export GRUB_FALLBACK="1"

And while you are in there, also uncomment or add the following line, unless you plan to be using UUIDs everywhere. I'm not sure if this is necessary, but since I was mostly using device names (e.g. /dev/md0) everywhere, I figured it couldn't hurt.


Update the GRUB configuration by executing the following command:

# update-grub

Make sure GRUB is installed and configured in the MBR of each of our disks:

# grub-install /dev/sda
# grub-install /dev/sdb

Now we need to update the initramfs image so that it knows about our RAID set up. You could do this by simply running update-initramfs -u, but I found that running the following command did this for me, and perhaps some other relevant things(?), and also it verified that my mdadm settings were where they needed to be:

# dpkg-reconfigure mdadm

I used rsync, instead of cp, to move the data from the running system to the degraded arrays like so:

# rsync -ax /boot/ /mnt/md0
# rsync -ax / /mnt/md2

When rsync finishes moving / to /mnt/md2, then edit the following files, chaning any references to the current disk to our new mdX devices:

# vi /mnt/md2/etc/fstab
# vi /mnt/md2/etc/mtab

Warning: do not edit /etc/fstab and /etc/mtab on the currently running system, as the instructions would seem to indicate, else if the new RAID configuration fails and the machine has to fall back to the current system, then it won't be able to boot that either.

I believe that was it, though it's possible I may have forgot to add a step here. Don't run the following command unless you can afford to possibly have the machine down for a while. This is a good time to also make sure you have good backups. But if you're ready, then run:

# shutdown -rf now

Now cross your fingers and hope the system comes back up on the RAID1 arrays, or at all.

If the machine comes back up on the RAID1 arrays, then you can now add the original disk to the new arrays with commands like the following:

# mdadm --manage /dev/md0 --add /dev/sda1
# mdadm --manage /dev/md1 --add /dev/sda5
# mdadm --manage /dev/md2 --add /dev/sda6

The arrays will automatically rebuild themselves, and you can check the status by running:

#cat /proc/mdstat
Comments Off

cc.legaleratta: errata annotations without republishing licenses

lunpa, February 22nd, 2012

Screenshot of the errata tool.

The html for the legalcode pages cannot be changed once they are published.  The reason for this is because we provide sha1 hashes of them so that they may be redistributed.  It also is a reason of credibility; that the license you’ve applied to your work today will still be the same one tomorrow.  However, sometimes there are errors.  They need to be accessible, yet they usually are too small to merit releasing a new version of the license.

The solution to this problem so far has been an errata page on our wiki.  But, that isn’t apparent from just looking at the license; and the errata page is disorganized and confusing to read.

I’m proud to say that we’re currently testing a tool I wrote last week that will fix this problem: cc.legalerrata.  This was originally intended to be implemented with the upcoming 4.0 licenses, but it turns out the 3.0 licenses have a hook for a tool like this already in place.  The 3.0 licenses include a script at the address, which was blank until a few days ago.  Now the script is used to bootstrap an application in the page.  Once bootstrapped, the tool queries the server for appropriate errata; if errata is returned, a toolbar appears and the user is presented with the option to apply the errata to the text of the page.  Additionally, the changes made can be highlighted via the toolbar.

Currently, this tool is disabled on the live site while we verify that the machine readable errata is actually correct.  You can however try out the tool while we test it, via our staging site.  For example, you can try our BY-SA legalcode on staging here.

If you’re interested, you can read the original proposal for the tool. There are two versions of the tool described there, and some pretty ui diagrams that I drew of both versions.  Here is one of my diagrams:

UI mockup for the errata tool.

The actual implementation of the tool ended up being much simpler than the proposed one.  Json is still used for storing the machine readable errata, but rather than a convoluted scheme of managing text diffs, machine readable errata is a collection of entries that contain a css selector, attribute overrides (optional), and html fragments.  The css selectors are used by jquery to select an element in the dom; the element’s innerHTML attribute is then written over by the html fragment. The html fragment itself is the original innerHTML of the node, but with subtractions noted by <del></del> tags, and additions noted by <ins></ins> tags.  These files are maintained by hand at this current time, with no plans of writing a frontend for it.  The errata tool takes a snapshot of the page’s html before and after overriding it, so that you can efficiently toggle between view modes.  CSS in both errata modes to either make the text look clean (subtractions hidden) or to accomplish highlighting.

Comments Off

See also roundup

ml, January 25th, 2012

The Learning Resource Metadata Initiative specification (which Creative Commons is coordinating) has entered its final public commenting period. Please look if you’re at all interested in education metadata and/or how efforts spurred by (which LRMI is) will shape up.

The W3C published drafts recently that ought be of great interest to the Creative Commons technology community: a family of documents regarding provenance and a guide to using microdata, microformats, and RDFa in HTML. I mentioned these on my personal blog here and here.

Speaking of things mentioned on my personal blog, a couple days ago I posted some analysis of how people are deploying CC related metadata based on a structured data extracted by the Web Data Commons project from a sample of the Common Crawl corpus. Earlier this month I posted a marginally technical explanation of using CSS text overlays to provide attribution and a brief historical overview of ‘open hardware licensing’, something which the CC technology team hasn’t been involved in, but is vaguely labs-ish, and needs deep technical attention.

Other things needing deep technical attention: how CC addresses Digital Restrictions Management in version 4.0 of its licenses is being discussed. We don’t know enough about the technical details of various restricted systems (see last sentence) that CC licensed works are being distributed on/to/with every day, and ought to. Another needs-technical-attention issue is ‘functional content’ for example in games and 3D printing. And we’re still looking for a new CTO.

Finally, Jonathan Rees just posted on how to apply CC0 to an ontology. You should subscribe to Jonathan’s blog as almost every post is of great interest if you’ve read this far.

Addendum: It seems remiss to not mention SOPA, so I’m adding it. Thanks to the technology community for rising up against this bad policy. CC promoted the campaign on its main website through banners and a number of blog posts. Don’t forget that SOPA/PIPA may well rise again, the so-called Research Works Act is very different but is motivated by the same thinking, and ACTA threatens globally. Keep it up! In the long term, is not building a healthy commons (and thus technology needed to facilitate building a healthy commons) a big part of the solution? On that, see yet another post on my personal blog…

Comments Off

Creative Commons: Using Provenance in the Context of Sharing Creative Works

ml, October 3rd, 2011

I provided a brief non-technical writeup on Creative Commons and provenance for the W3C Provenance Working Group‘s Connection Task Force documenting “Communities Addressing Important Issues in Provenance”.

See the writeup on the Provenance WG wiki (please suggest edits in comments below), current version follows.

Creative Commons Creative Commons (CC) provides licenses and public domain tools that can be used for any kind of creative works like texts, images, websites, or other media, as well as databases. CC tools are well known and used, especially in online publications. Each CC license and public domain tool is identified by a unique URL, allowing proper identification and reference of these as part of a work’s provenance information.

Additionally, Creative Commons provides a vocabulary to describe its tools and works licensed or marked with those tools in a machine interpretable way: The Creative Commons Rights Expression Language (CC REL). CC REL can be expressed in RDF.

The provenance of assertions about a work’s license or public domain status is of great important for licensors, licensees, curators, and future potential users. All CC licenses legally require certain information (attribution and license notice) be retained; even in the case of its public domain tools, retaining such information is a service to readers and in accordance with research and other norms. To the extent license and related information is not retained or cannot be trusted, users ability to find and rely upon freedoms to use such works is degraded. In many cases, the original publication location of a work will disappear (linkrot) or rights information will be removed, either unintentionally (eg template changes) or intentionally (here especially, provenance is important; CC licenses are irrevocable). In the degenerate case, a once CC-licensed work becomes just another orphan work.

The core statements needed are who licensed, dedicated to the public domain, or marked as being in the public domain, which work, and when? Each of these statements have sub-statements, eg the relationship of “who” to rights in the work or knowledge about the work, and exactly what work and at what granularity?

Provenance information is also necessary for discovering the uses of shared works and building new metrics of cultural relevance, scientific contribution, etc, that do not strictly require on centralized intermediaries.

Finally, in CC’s broader context, an emphasis on machine-assisted provenance aligns with renewed interest in copyright formalities (eg work registries), puts a work’s relationship to society’s conception of knowledge in a different light (compare intellectual provenance and intellectual property), and is in contrast with technical restrictions which aim to make works less useful to users rather than more.

Comments Off

Converting cc.engine from ZPT to Jinja2 and i18n logical keys to english keys

cwebber, September 2nd, 2011

Some CC-specfic background

Right now I’m in the middle of retooling of our translation infrastructure. cc.engine and related tools have a long, complex history (dating back, as I understand, to TCL scripts running on AOL server software). The short of it is, CC’s tools have evolved a lot over the years, and sometimes we’re left with systems and tools that require a lot of organization-specific knowledge for historical reasons.

This has been the case with CC’s translation tools. Most of the world these days uses english-key translations. CC used logical key translations. This means that if you marked up a bit of text for translation, instead of the key being the actual text being translated (such as “About The Licenses”), the key would be an identifier code which mapped to said english string, like “util.View_Legal_Code”. What’s the problem with this? Actually, there are a number of benefits that I’ll miss and that I won’t get into here, but the real problem is that the rest of the translation world mostly doesn’t work this way. We use Transifex (and previously used Pootle) as a tool for our translators managing our translations. Since these tools don’t expect logical keys we had to write tools to convert from logical keys to english keys on upload and english keys to logical keys back and a whole bunch of other crazy custom tooling.

Another time suck has been that we’d love to be able to just dynamically extract all translations from our python code and templates, but this also turns out to be impossible with our current setup. A strange edge-case in ZPT means that certain situations with dynamic attributes in ZPT-translated-HTML means that we have to edit certain translations after they’re extracted, meaning we can’t rely on an auto-extracted set of translations.

So we’d like to move to a future with no or very few custom translation tools (which means we need English keys) and auto-extraction of translations (which means because of that edge case, no ZPT). Since we need to move to a new templating engine, I decided that we should go with my personal favorite templating engine, Jinja2.

ZPT vs Jinja2

Aside from the issue I’ve described above, briefly I’d like to describe the differences between ZPT and Jinja2, as they’re actually my two favorite templating languages.

ZPT (Zope Page Templates) is an XML-based templating system where your tags and elements actually become part of the templating logic and structure. For example, here’s an example of us looping over a list of license versions on our “helpful” 404 pages for when you type in the wrong license URL (like at

  <h4>Were you looking for:</h4>

  <ul class="archives" id="suggested_licenses">
    <li tal:repeat="license license_versions">
      <a tal:attributes="href license/uri">
        <b tal:content="python: license.title(target_lang)"></b>

As you can see, the for loop, the attributes, and the content are actually elements of the (X)HTML tree. The neat thing about this is that you can be mostly sure that you won’t end up with tag soup. It’s also pretty neat conceptually.

Now, let’s look at the same segment of code in Jinja2:

  <h4>Were you looking for:</h4>

  <ul class="archives" id="suggested_licenses">
    {% for license in license_versions %}
        <a href="{{ license.uri }}">
          <b>{{ license.title(target_lang) }}</b>
    {% endfor %}

If you’ve used Django’s templating system before, this should look very familiar, because that’s the primary source of inspiration for Jinja2. There are a few things I like about Jinja2 though that Django’s templating system doesn’t have, but the biggest and clearest of these things is the ability to pass arguments into functions, as you can see that we’re doing here with license.title(target_lang). Anyway, it massively beats making a template tag every time you want to pass an argument into a function.

The conversion process

Not too much to say about converting from ZPT to Jinja2. It’s really just a lot of manual work, combing through everything and moving it around.

More interestingly might be our translation conversion process. Simply throwing out old translations and re-extracting with new ones is not an option… it’s a lot of effort for translators to go through and translate things and asking them to do it all over again is simply too much to ask and just not going to happen. Pass 1 was to simply get the templates moved over rather than try to both convert templates and the logical->english key system all at once (this move away from logical keys has been tried and fizzled before, probably because there are simply too many moving parts across our codebase… so we wanted to take this incrementally, and this seemed like the best place to go first). We’re simply doing stuff like this:

  <h3>{{ cctrans(locale, "deed.retired")|safe }}</h3>

Where cctrans is a simple logical key translation function. Next steps:

  • Create a script that converts all our .po files to eliminate the logical keys and move them to English-only.
  • Write a script to auto-interpolate {{ cctrans() }} calls in templates to {% trans %}{% endtrans %} Jinja2 tags.
  • Do all the many manual changes to all our python codebases.

At that point, we should be able to wrap this all up.

1 Comment »

Summary of current licensing tools

cwebber, August 31st, 2011

I’ve been considering license integration into a personal project of mine and thoughts of that have spilled over into work. And so we’ve been talking at Creative Commons recently about the current methods for licensing content managed by applications and what the future might be. The purpose of this post is to document the present state of licensing options. (A post on the future of licensing tools may come shortly afterward.)

Present, CC supported tools

To begin with, there are these three CC-hosted options:

  • CC licensing web API — A mostly-RESTful interface for accessing CC licensing information. Some language-specific abstraction layers are provided. Supported and kept up to date. Lacking a JSON layer, which people seem to want. Making a request for every licensing action in your application may be a bit heavy.
  • Partner interface — Oldest thing we support, part of the license engine. Typical case is that you get a popup and when the popup closes the posting webpage can access the info that’s chosen. Still gets you your chooser based interface but on your own site. Internet Archive uses it, among others.
  • LicenseChooser.js — Allows you to get a local chooser user interface by dumping some javascript into your application, and has the advantage of not requiring making any web requests to Creative Commons’ servers. Works, though not recently updated.

All of these have the problem that the chooser of CC licenses is only useful if you want exactly the choices we offer (and specifically the most current version of the licenses we provide). You need to track those changes in the database anyway, which means you either are not keeping track of version used or you are and when we change you might be in for a surprise.

Going it alone

So instead there are these other routes that sites take:

  • Don’t use any tools and store license choices locally — What Flickr and every other major option does: reproduce everything yourself. In the case of Flickr, the six core licenses at version 2.0. In YouTube, just one license (CC BY 3.0). That works when you have one service, when you know what you want, and what you want your users to use. It doesn’t work well when you want people to install a local copy and you don’t know what they want to use.
  • Let any license you want as long as it fits site policy — and you don’t facilitate it, and it gets kind of outside the workflow of the main CMS you’re using… wiki sites are an example of this, but usually have a mechanism for adding a license to footer of media uploaded. The licenses are handled by wiki templates, anyone can make a template for any license they choose.

None of those are really useful for software you expect other people to install where you want to provide some assistance to either administrators of the software who are installing it to be used or where you want the administrator to give the user some choice or choices relevant to that particular site.

The liblicense experiment

This brings us to another solution that CC has persued:

  • liblicense — Packages all licenses we provide, give an api for users to get info and metadata about them. Allows for web-request-free access to the cc licenses. It doesn’t address non-CC licenses, however, and is mostly unmaintained.

So, these are the present options that application developers have at their disposal for doing licensing of application-managed content. There’s a tradeoff with each one of them though: either you have to rely on web requests to CC for each licensing decision you make, you go it alone, or you use something unmaintained which is CC-licensing-specific anyway. Nonetheless, cc.api and the partner interface are supported if you want something from CC, and people do tend to make by with doing things offline. But none of the tools we have are so flexible, so what can software like MediaGoblin or an extension for WordPress or etc do?

There’s one more option, one that too my knowledge hasn’t really been explored, and would be extremely flexible but also well structured.

The semantic web / linked data option?

It goes like this: let either users or admins specify licenses by their URL. Assuming that page self-describes itself via some metadata (be it RDFa, providing a rel=”alternate” RDF page in your headers, or microdata), information about that license could be extracted directly from the URL and stored in the database. (This information could of course then be cached / recorded in the database.) This provides a flexible way of adding new licenses, is language-agnostic, and allows for a canonical set of information about said licenses. Libraries could be written to make the exctraction of said information easier, could even cache metadata for common licenses (and for common licenses which don’t provide any metadata at their canonical URLs…).

I’m hoping that in the near future I’ll have a post up here demonstrating how this could work with a prototypical tool and use case.

Thanks to Mike Linksvayer, for most of this post was just transforming a braindump of his into a readable blogpost.


CC license use in Latin America brief analysis

ml, August 12th, 2011

Carolina Botero (of CC Colombia and now CC regional manager for Latin America) has posted a brief analysis of CC license use in Latin America (es2en machine translation).

As with previous looks at CC licenses across different jurisdictions and regions, this one is based on search engine reported links to jurisdiction “ported” versions of licenses. It is great to see what researchers have done with this limited mechanism.

I am eager to explore other means of characterizing CC usage across regions and fields, and hope to provide more data that will enable researchers to do this. This will be increasingly important as we attempt to forge a universal license that does not call for porting in the same way, with version 4.0 over the coming year (as well as with the increasing importance of CC in the world).

Comments Off

next page