More helpful 404 pages

cwebber, January 28th, 2011

This is one of those little features that tends to go into the license engine that runs on the website which are helpful and small, but not too noticeable if not pointed out. I usually do a pretty bad job of making note of these when they go out, but this time, I’m doing better!

Even most people who don’t know anything about HTTP know that a 404 status code on the web somehow means that the thing you were looking for isn’t actually there. How frustrating! But if it’s not there, maybe we have enough information to help you find what you actually wanted.

That’s the idea between the work that went into Issue 255: “Smart” 404 pages. Maybe we didn’t find a license (or public domain tool) under the URL you put in, but we might be able to help you find a license that does exist. For example, licenses listed under /licenses/ on are parsed out like /licenses/{code}/{version}/ or /licenses/{code}/{version}/{jurisdiction}/. Knowing that, we can give a list of licenses for what licenses someone might have meant when they:

The pages mostly look like a normal 404 page, but with just a bit more contextually helpful information (the “were you looking for” section). And, of course, they still return a 404 status code!

Comments Off

GSoC Project Introduction: CC WordPress Plugin

erlehmann, May 24th, 2010

Who I am

I’m Nils Dagsson Moskopp, a 22 year old student of philosophy and computer science, living in Berlin (German speakers may check out my blog). I dislike the act of programming, but love the results, so I seem to have no other choice than to do it from time to time.

Recently, after submitting a proposal, I got accepted into the Google Summer of Code program, being mentored by Nathan Kinkade. In the rest of this entry, I will do my best to explain how it came to that and what kind of software I intend to create.

The Idea

As far as I know, there currently is no automated, easy way to have human- and machine-readable markup for specific subsections of a blog post in the blogging software WordPress; it is only possible to have an entire page licensed under one specific license. Further complicating the issue is the fact that the WordPress media manager actually does not know about licenses accociated with specific content. This poses a problem for the not-so-uncommon use case of embedding CC-licensed media, mainly photos.

I was first confronted with the idea of having an automated way to markup media with Creative Commons licensing information when reading Matthias Mehldau‘s post More precise Creative Commons HTML-Codes. He envisioned an annotation containing not only the well-known CC license symbols, but also the jurisdiction and a button to show the markup used, for easy re-embedding. Well versed in graphics design, he also created a mockup:

Matthias Mehldau’s Mockup

Shortly after that, Julia Seeliger posted a suggestion how a Creative Commons plugin backend for WordPress could look like. She suggested integrating the official license chooser or a drop down list within the WordPress upload form.

We three chatted about these ideas, me even actually implementing some parts (see next section), but nothing qualifying as currently usable came from that. When approximately one year later, I did come upon the CC wiki site describing the developer challenge titled Support for CC licenses in WordPress Media Manager, I chose to apply for that with Google Summer of Code.

Existing Solutions

As mentioned in the introduction to the last section, a tool that provides easy licensing management for WordPress media content, automating subsequent insertion, currently does not exist. Several existing projects, however, have features related to this goal; code re-use may be possible.

In 2006, George Notaras created the Creative-Commons-Configurator WordPress Plugin. It allows the user to choose a global license for his or her blog, using the web-based license selection engine from and adds localized machine- and human-readable license information to pages and feeds. Similar plugins, putting global licensing information into footer and sidebars, exist.

For GSoC 2009, Dinishika Nuwangi made a WordPress plugin called wprdfa (not to confuse with wp-rdfa). Unfortunately, the project has an empty README file. Judging from a quick glance at the source code, part of its intended purpose is to add buttons to the TinyMCE editor integrated into WordPress, although on installation I was unable to see this functionality. The related developer challenge still seems to be open, on the page the plugin is described as foundational work.

Also in 2009, I created two pieces of software, inspired by a post on the WHATWG mailing list. First, a web application generating HTML5 license markup (enriched with Microdata, see next section), second a WordPress plugin capable of generating similar markup using both Microdata and RDFa. And there was much rejoicing.
It is important to note that since then the HTML5 standard has changed and the generated markup is no longer valid.

On a less technical note, the German blog Spreeblick has a unique way of presenting the license information, only showing a small “cc” in the bottom left corner; on hovering, author name and a link to the source are presented (live example). However, while Spreeblick is using WordPress, the folks behind it have no intention of releasing their plugin: As Max Winde told me in April 2009, it is tightly entangled with their image layout functionality and would require complex cleanup.

Planned Interface

I plan to implement the presentation part using the new HTML5 elements figure and figcaption. Together, they can be used to denote content with a caption […] that is self-contained and is typically referenced as a single unit from the main flow of the document. A code example shows how markup using the figure element may looks like:

<!-- content goes here -->

Naturally, as a rather general markup language HTML5 does not contain any elements to give this construct more specific meaning, such as marking up which license applies to said content. However, two markup extensions provide this capability at the attribute level, the complex and established RDFa and the simpler, newer Microdata proposal, part of HTML5. While both standards are sufficiently open, RDFa is endorsed by Creative Commons; for this reason I will not stray into Microdata territory unless I have time to spare at the end of GSoC.

To this point, I have been only accounting for machine readability. Nevertheless, with CSS it is easily possible to beautify the visual presentation in nearly any way imaginable. The following two screenshots, taken with the now-defunct WordPress plugin I created in 2009, exemplify this — both are based on the same markup.

simple style

Spreeblick style

On the author side, I plan to have the plugin look like the Spreeblick one (screenshot), which will mean adding options to the WordPress media uploader:

  • an additional drop down list, for choosing one of the six main CC licenses
  • an additional text input for the author or rights holder
  • an additional text input for specifying the source URI

Media could then be inserted the usual way, with the RDFa annotation automatically generated.

Now, that the GSoC coding phase has started, I plan to do a weekly progress report; the impatient may want to track the contents of the official Git repository. Stay tuned.


Using virtualenv and zc.buildout together

cwebber, March 16th, 2010

Virtualenv and zc.buildout are both great ways to develop python packages and deploy collections of packages without needing to touch the system library. They are both fairly similar, but also fairly different.

The primary difference between them is that zc.buildout focuses on having a single package, and all relevant dependencies are installed automatically within that package’s directory via the buildout script (Nathan Yergler points out that you don’t have to use things this way, but that seems to me to be the way things happen in practice… anyway, I’m not a buildout expert). The buildout script is very automagical and does all the configuration and installation of dependencies for you.  Since this is a build system, you can also configure it to do a number of other neat things, such as compile all your gettext catalogs, or scp the latest cheesefile.txt from… whatever you need to do to build a package.

Virtualenv is mostly the same creature, but it’s like you reached your hand inside and pulled it inside out. Instead of a bunch  of packages installed within a subdirectory of one package, there is a more generic directory layout that allows you to set up a number of packages within it. Installing a package and keeping it up to date is much more manual in general, but also a bit more flexible in the sense that you can switch paths around within the environment fairly easily and simultaneously developing multiple interwoven packages is not difficult.

I came to CC with a lot of experience with virtualenv and no experience with zc.buildout. Initially I could discern no differences of use case between them, but now I have a pretty good sense of when you’d want to use one over the other. An example use case, which has come up pretty often with me actually: say you have two packages, one of which is a dependency on the other. In this case, we’ll use both cc.license and cc.engine, where cc.engine has cc.license as a

Now say I’m adding a feature to cc.engine, but this feature also requires that I add something in cc.license. At this point it is
easier for me to switch to using virtualenv; I can set up both development packages in the same virtualenv and use them together.  This is great because it means that I should have little to no difficulty switching back and forth between both of them. If I make a change in cc.license it is immediately available to me in cc.engine.  This also  prevents either having to set up a tedious to switch around configuration checking out cc.license into cc.engine and etc, or making a bunch of unnecessary releases just to make sure things work, etc. It’s easier to work on multiple packages at once in
virtualenv in my experience.

Now let’s assume that we got things in working order, cc.license has the new feature and cc.engine is able to use it properly, tests are passing, and et cetera. At this point is where I think returning to zc.buildout is a good idea. One of the things I like about zc.buildout is that it provides a certain type of integrity checking with the buildout command. If you forget to mark a dependency or even remove it from on accident or whatever, buildout will simply unlink it from your path the next time you run it. In this case, I think zc.buildout is especially useful because I might forget to make a cc.license release here or some such thing. There are some other reasons for using zc.buildout (as the name implies, buildout is a full build system, so there are a lot of neat things you can do with it), but for a forgetful person such as myself this is by far the most important to me (and the most relevant to this example).

So I’ve described use cases for both cc.engine and cc.license. How do we get them to work nicely together? Let’s assume we just want to check out these packages once. Let’s also assume that our virtualenv directory is ~/env/ccommons (because I’m clearly basing this off my own current setup currently, heh).

First, we’ll create our virtualenv environment, if we haven’t already:

$ virtualenv ~/env/ccommons

Next, we’ll check out cc.engine and cc.license into ~/devel/ and run
buildout on each:

$ cd ~/devel/
$ git clone git://
$ git clone git://

Next, we’ll buildout the packages:

$ cd ~/devel/cc.license
$ wget*checkout*/zc.buildout/trunk/bootstrap/
$ python
$ ./bin/buildout
$ cd ~/devel/cc.engine
$ python # the cc engine already has checked in
$ ./bin/buildout

Buildout can take a while, so be prepared to go grab some cookies and coffee and/or tea. But once it’s done, getting these packages set up in virtualenv is super simple.

First activate the virtualenv environment:

$ source ~/env/ccommons/bin/activate
$ cd ~/devel/cc.license
$ python develop
$ cd ~/devel/cc.engine
$ python develop

That’s it! Now we can verify that these packages are set up in virtualenv. Open python and verify that you get the following output (adjusted to your own home directory and etc):

>>> import cc.engine
>>> cc.engine.__file__
>>> import cc.license
>>> cc.license.__file__

To leave virtualenv, you can simply type “deactivate”.

That’s it! Now you have a fully functional zc.buildout AND virtualenv setup, where switching back and forth is super simple.


cc.engine and web non-frameworks

cwebber, January 13th, 2010

The sanity overhaul has included a number of reworkings, one of them being a rewrite of cc.engine, which in its previous form was a Zope 3 application. Zope is a full featured framework and we already knew we weren’t using many of its features (most notably the ZODB); we suspected that something simpler would serve us better, but weren’t certain what. Nathan suggested one of two directions: either we go with Django (although it wasn’t clear this was “simpler”, it did seem to be where a large portion of the python knowledge and effort in the web world is pooling), or we go with repoze.bfg, a minimalist WSGI framework that pulls in some Zope components. After some discussion we both agreed: repoze.bfg seemed like the better choice for a couple of reasons: for one, Django seemed like it would be providing quite a bit more than necessary… in cc.engine we don’t have a traditional database (we do have an RDF store that we query, but no SQL), we don’t have a need for a user model, etc… the application is simple: show some pages and apply some specialized logic. Second, repoze.bfg built upon and reworked Zope infrastructure and paradigms, and so in that sense it looked like an easier transition. So we went forward with that.

As I went on developing it, I started to feel more and more like, while repoze.bfg certainly had some good ideas, I was having to create a lot of workarounds to support what I needed. For one thing, the URL routing is unordered and based off a ZCML config file. It was at the point where, for resolving the license views, I had to route to a view method that then called other view methods. We also needed a type of functionality as Django provides with its “APPEND_SLASH=True” feature. I discussed with the repoze.bfg people, and they were accommodating to this idea and actually applied it to their codebase for the next release. There were some other components they provided that were well developed but were not what we really needed (and were besides technically decoupled from repoze.bfg the core framework). As an example, the chameleon zpt engine is very good, but it was easier to just pull Zope’s template functionality into our system than make the minor conversions necessary to go with chameleon’s zpt.

Repoze was also affecting the Zope queryutility functionality in a way that made internationalization difficult. Once again, this was done for reasons that make sense and are good within a certain context, but make did not seem to mesh well with our existing needs. I was looking for a solution and reading over the repoze.bfg documentation when I came across these lines:

repoze.bfg provides only the very basics: URL to code mapping, templating, security, and resources. There is not much more to the framework than these pieces: you are expected to provide the rest.

But if we weren’t using the templating, we weren’t using the security model, and we weren’t using the resources, the URL mapping was making things difficult, and those were the things that repoze.bfg was providing on top of what was otherwise just WSGI + WebOb, how hard would it be to just strip things down to just the WSGI + WebOb layer? It turns out, not too difficult, and with an end result of significantly cleaner code.

I went through Ian Bicking’s excellent tutorial Another Do-It-Yourself Framework and applied those ideas to what we already had in cc.engine. Within a night I had the entire framework replaced with a single module, cc/engine/, which contained these few lines:

import sys
import urllib

from webob import Request, exc

from cc.engine import routing

def load_controller(string):
    module_name, func_name = string.split(':', 1)
    module = sys.modules[module_name]
    func = getattr(module, func_name)
    return func

def ccengine_app(environ, start_response):
    Really basic wsgi app using routes and WebOb.
    request = Request(environ)
    path_info = request.path_info
    route_match = routing.mapping.match(path_info)
    if route_match is None:
        if not path_info.endswith('/') 
                and request.method == 'GET' 
                and routing.mapping.match(path_info + '/'):
            new_path_info = path_info + '/'
            if request.GET:
                new_path_info = '%s?%s' % (
                    new_path_info, urllib.urlencode(request.GET))
            redirect = exc.HTTPTemporaryRedirect(location=new_path_info)
            return request.get_response(redirect)(environ, start_response)
        return exc.HTTPNotFound()(environ, start_response)
    controller = load_controller(route_match['controller'])
    request.start_response = start_response
    request.matchdict = route_match
    return controller(request)(environ, start_response)

def ccengine_app_factory(global_config, **kw):
    return ccengine_app

The main method of importance in this module is ccengine_app. This is a really simple WSGI application: it takes routes as defined in cc.engine.routes (which uses the very enjoyable Routes package) and sees if the current URL (or, the path_info portion of it) matches that URL. If it finds a result, it loads that controller and passes a WebOb-wrapped request into it, with any special URL matching data tacked into the matchdict attribute. And actually, the only reason that this method is even so long at all is because of the “if route_match is None” block in the middle: that whole part is providing APPEND_SLASH=True type functionality, as one would find in Django. (Ie, if you’re visiting the url “/licenses”, and that doesn’t resolve to anything, but the URL “/licenses/” does, redirect to /licenses/.) The portions before and after are just getting the controller for a url and passing the request into it. That’s all! (The current is a few lines longer than this, utilizing a callable class rather than a method in place of ccengine_app for the sake of configurability and attaching a few more things onto the request object, but not longer or complicated by much. The functionality otherwise is pretty much the same.)

Most interesting is that I swapped in this code, changed over the routing, fired up the server and.. it pretty much just worked. I swapped out a framework for about a 50 line module and everything was just as nice and functioning as it was. In fact, with the improved routing provided by Routes, I was able to cut out the fake routing view, and thus the amount of code was actually *less* than what it was before I stripped out the framework. Structurally there was no real loss either; the application still looks familiar to that you’d see in a pylons/django/whatever application.

I’m still a fan of frameworks, and I think we are very fortunate to *have* Zope, Pylons, Django, Repoze.bfg, and et cetera. But in the case of cc.engine I do believe that the position we are at is the right one for us; our needs are both minimal and special case, and the number of components out there for python are quite rich and easily tied together. So it seems the best framework for cc.engine turned out to be no framework at all, and in the end I am quite happy with it.

ADDENDUM: Chris McDonough’s comments below are worth reading.  It’s quite possible that the issues I experienced were my own error, and not repoze.bfg’s.  I also hope that in no way did I give the impression that we moved away from repoze.bfg because it was a bad framework, because repoze.bfg is a great framework, especially if you are using a lot of zope components and concepts.  It’s also worth mentioning that the type of setup that we ended up at, as I described, probably wouldn’t have happened unless I had adapted my concepts directly from repoze.bfg, which does a great job of showing just how usable Zope components are without using the entirety of Zope itself.  Few ideas are born without prior influence; repoze.bfg was built on ideas of Zope (as many Python web frameworks are in some capacity), and so too was the non-framework setup I described here based on the ideas of repoze.bfg.  It is best for us to be courteous to giants as we step on their shoulders, but it is also easier to forget or unintentionally fail to extend that courtesy as I may have done here.  Thankfully I’ve talked to Chris offline and he didn’t seem to have taken this as an offense, so for that I am glad.


MediaWiki, the application platform

nathan, January 8th, 2010

As noted on the CC weblog and elsewhere, AcaWiki launched back in October. I’m much later noting the launch, which is really inexcusable since CC did much of the tech buildout. AcaWiki is only the most recent example of our work with Mediawiki and Semantic Mediawiki. Both are critical pieces of our infrastructure here, and tools we’d like to see developed further. One area that we’ve brainstormed but not attempted to implement is the idea of “actions”; you can read an overview of the idea here.

The combination of Mediawiki and Semantic MediaWiki has allowed us to build applications faster and increase the effective number of “developers” on our team by lowering the barriers to entry. I expect 2010 to be a really interesting year for wiki applications.

1 Comment »

Caching deeds for peak performance

nathan, January 6th, 2010

As Chris mentioned, he’s been working on improving the license chooser, among other things simplifying it and making it a better behaved WSGI citizen. That code also handles generating the license deeds. For performance reasons we like to serve those from static files; I put together some details about wsgi_cache, a piece of WSGI middleware I wrote this week to help with this, on my personal blog:

The idea behind wsgi_cache is that you create a disk cache for results, caching only the body of the response. We only cache the body for a simple reason—we want something else, something faster, like Apache or other web server, to serve the request when it’s a cache hit. We’ll use mod_rewrite to send the request to our WSGI application when the requested file doesn’t exist; otherwise it hits the on disk version. And cache “invalidation” becomes as simple as rm (and as fine grained as single resources).

You can read the full entry here, find wsgi_cache documentation on PyPI, and get the source code from our git repository.

Comments Off

Understanding the State of Sanity (via whiteboards and ascii art)

cwebber, December 18th, 2009

Since I started working at Creative Commons a number of months ago, I’ve been primarily focused on something we refer to as the “sanity overhaul”.  In this case, sanity refers to try and simplify what is kind of a long and complicated code history surrounding Creative Commons’ licenses, both as in terms of the internal tooling to modifying, deploying, and querying licenses and the public facing web interfaces for viewing and downloading them.  Efforts toward the sanity overhaul started before I began working here, executed by both Nathan Yergler and Frank Tobia, but for a long time they were in a kind of state of limbo as other technical efforts had to be dedicated to other important tasks.  The good news is that my efforts have been permitted to be (almost) entirely dedicated toward the sanity overhaul since I have started, and we are reaching a point where all of those pieces are falling into place and we are very close to launch.

To give an idea of the complexity of things as they were and how much that complexity has been reduced, it is useful to look at some diagrams.  When Nathan Kinkade first started working at Creative Commons (well before I did), Nathan Yergler took some time to draw on the whiteboard what the present infrastructure looked like:

as well as what he envisioned the “glorious future” (sanity) would look like:

When I started, the present infrastructure had shifted a little bit further still, but the vision of the “glorious future” (sanity) had mostly stayed the same.

This week (our “tech all-hands week”) I gave a presentation on the “State of Sanity”.  Preparing for that presentation I decided to make a new diagram.  Since I was already typing up notes for the presentation in Emacs, I thought I might try and make the most minimalist and clear ASCII art UML-like diagram that I could (my love of ASCII art is well known to anyone who hangs out regularly in #cc on Freenode).  I figured that I would later convert said diagram to a traditional image using Inkscape or Dia, but I was so pleased with the end result that I just ended up using the ASCII version:


     ( o_o)
     |USER| --.
     '----'   |
         ___   .---.
       .'   ','     '.
     -'               '.
    (     INTARWEBS     )
     '_.     ____    ._'
        '-_-'    '--'
      +---------------+  Web interface user
      |   cc.engine   |  interacts with
      +---------------+  Abstraction layer for
      |  cc.license   |  license querying and
      +---------------+  pythonic license API
      +---------------+  Actual rdf datastore and
      |  license.rdf  |  license RDF operation tools


  |  cc.i18npkg  |
  | .----------. |
  | | i18n.git | |


  +------------+  +-----------+  +---------+  +-------------+
  |    old     |  | old zope  |  | licenze |  | license_xsl |
  | cc.license |  | cc.engine |  +---------+  +-------------+
  +------------+  +-----------+

This isn’t completely descriptive on its own, and I will be annotating as I include it in part of the Sphinx developer docs we are bundling with the new cc.engine.  But I think that even without annotation, it is clear how much cleaner the new infrastructure is at than the old “present infrastructure” whiteboard drawing, which means that we are making good progress!


Creative Commons Drupal Module — GSoC 2009

blaise, September 3rd, 2009

This past year was my last at the University of Toronto, making this summer my last chance to participate in the Google Summer of Code. I searched hard for a project and mentor organization that would suit my interests, and when I noticed that the Creative Commons Drupal module was in need of some developer love, I knew exactly what I wanted to spend my summer doing. With John Doig as my CC mentor, and Kevin Reynen (the module’s maintainer and initial author) as an unofficial Drupal mentor, I’ve been privileged to have spent the past few months updating and extending the module.

A couple years ago, development for Drupal 4.7 was begun, but it was never quite completed. CC Lite came to be the reliable choice for Drupal 6. However, CC Lite’s scope is limited — it allows you to attach a license to content in Drupal, but that’s about it. The main CC module’s vision is broader — to fully integrate CC technology with the Drupal platform — and I hope I’ve helped to realize that just a little.

Some of the module’s features:

  • it uses the CC API for license selection and information (so, for example, when new license versions are released, they become available on your Drupal site automatically)
  • you can set a site-wide default license/jurisdictoin, and user’s can set their own default license/jurisdiction
  • ccREL metadata is supported, output in RDFa (and, optionally, RDF/XML for legacy systems)
  • supports CC0, along with the 6 standard licenses and the Public Domain Certification tool
  • you can control which licenses and metadata fields are available to users
  • basic support for the Views API has been added (including a default /creativecommons view)
  • there’s a CC site search option

The module is still listed as a beta release, as some folks have been submitting bug fixes and patches over the past few weeks, though it’s quite usable. Special thanks to Turadg Aleahmad, who’s helped with a lot of the recent bug fixes towards the end of the GSoC term, and committed to being active in future development. If you’re into Drupal development, we could use help with testing, and any translations would be greatly appreciated too.

Right now, the focus is on getting to a stable release, but we’ve got lots of ideas for the future too. Thanks to John and Kevin for their support through the summer, and to Turadg for his recent help. I look forward to seeing the module put to good use!

Check it out!

I’m a musician, writer, software developer, free culture / free software advocate and recent graduate of the University of Toronto — get in touch at

Comments Off

Developers landing page revamp

greg, August 11th, 2009

If you haven’t looked at the Developers landing page on the Creative Commons wiki lately, you’re missing out! We’ve recently put a lot of effort into reorganizing the information, making the important things easier to find, and overall just making the whole place a bit more welcoming.


First of all, we’ve made the semantic split between information for desktop-based development and web-based development. At each page there is a list (and short description) of the various tools to help you integrate CC-license metadata functionality and a short list of open Developer Challenges. These challenges are things the developer community think would be cool to have; a wishlist everyone can help with!

Also, we’re starting to produce some more “HowTo” guides for developers who are interested in the best practices of integrating CC-license metadata. Thus far we have one for Web Integration which lists the various ways a service could support CC licenses with best practices examples (pictoral and code) of how they did it. See, for example, the page on adding license choice when uploading content.

We hope this redesign will make it easier for developers to find the information they need to improve their services. If you have any other suggestions, don’t hesitate to send an email to

Comments Off

License Engine path changes

nathan, July 21st, 2009

We just pushed out a resolution to Issue 381, updating the path to the license engine. The license engine (the cc.engine application) is responsible for handling the license selection portions of the website as well as the deeds. Some background may be helpful.

When we launched CC0 earlier this year we made a conscious decision to locate the deed on a different path than the rest of the license deeds. CC0 is a waiver, not a license, so the deed appropriately is located at (as opposed to the license deeds which URLs that look like At the time we decided to leave the CC0 chooser in the same group as the other choosers. This unfortunately gave us a URL that contained license in it — — which caused confusion for some users. Is it a license or isn’t it?

It’s not.

So starting this afternoon we’ve relocated the license engine to and CC0 to Redirects are in place so you shouldn’t notice any changes.

Comments Off

next page