Converting cc.engine from ZPT to Jinja2 and i18n logical keys to english keys

cwebber, September 2nd, 2011

Some CC-specfic background

Right now I’m in the middle of retooling of our translation infrastructure. cc.engine and related tools have a long, complex history (dating back, as I understand, to TCL scripts running on AOL server software). The short of it is, CC’s tools have evolved a lot over the years, and sometimes we’re left with systems and tools that require a lot of organization-specific knowledge for historical reasons.

This has been the case with CC’s translation tools. Most of the world these days uses english-key translations. CC used logical key translations. This means that if you marked up a bit of text for translation, instead of the key being the actual text being translated (such as “About The Licenses”), the key would be an identifier code which mapped to said english string, like “util.View_Legal_Code”. What’s the problem with this? Actually, there are a number of benefits that I’ll miss and that I won’t get into here, but the real problem is that the rest of the translation world mostly doesn’t work this way. We use Transifex (and previously used Pootle) as a tool for our translators managing our translations. Since these tools don’t expect logical keys we had to write tools to convert from logical keys to english keys on upload and english keys to logical keys back and a whole bunch of other crazy custom tooling.

Another time suck has been that we’d love to be able to just dynamically extract all translations from our python code and templates, but this also turns out to be impossible with our current setup. A strange edge-case in ZPT means that certain situations with dynamic attributes in ZPT-translated-HTML means that we have to edit certain translations after they’re extracted, meaning we can’t rely on an auto-extracted set of translations.

So we’d like to move to a future with no or very few custom translation tools (which means we need English keys) and auto-extraction of translations (which means because of that edge case, no ZPT). Since we need to move to a new templating engine, I decided that we should go with my personal favorite templating engine, Jinja2.

ZPT vs Jinja2

Aside from the issue I’ve described above, briefly I’d like to describe the differences between ZPT and Jinja2, as they’re actually my two favorite templating languages.

ZPT (Zope Page Templates) is an XML-based templating system where your tags and elements actually become part of the templating logic and structure. For example, here’s an example of us looping over a list of license versions on our “helpful” 404 pages for when you type in the wrong license URL (like at

  <h4>Were you looking for:</h4>

  <ul class="archives" id="suggested_licenses">
    <li tal:repeat="license license_versions">
      <a tal:attributes="href license/uri">
        <b tal:content="python: license.title(target_lang)"></b>

As you can see, the for loop, the attributes, and the content are actually elements of the (X)HTML tree. The neat thing about this is that you can be mostly sure that you won’t end up with tag soup. It’s also pretty neat conceptually.

Now, let’s look at the same segment of code in Jinja2:

  <h4>Were you looking for:</h4>

  <ul class="archives" id="suggested_licenses">
    {% for license in license_versions %}
        <a href="{{ license.uri }}">
          <b>{{ license.title(target_lang) }}</b>
    {% endfor %}

If you’ve used Django’s templating system before, this should look very familiar, because that’s the primary source of inspiration for Jinja2. There are a few things I like about Jinja2 though that Django’s templating system doesn’t have, but the biggest and clearest of these things is the ability to pass arguments into functions, as you can see that we’re doing here with license.title(target_lang). Anyway, it massively beats making a template tag every time you want to pass an argument into a function.

The conversion process

Not too much to say about converting from ZPT to Jinja2. It’s really just a lot of manual work, combing through everything and moving it around.

More interestingly might be our translation conversion process. Simply throwing out old translations and re-extracting with new ones is not an option… it’s a lot of effort for translators to go through and translate things and asking them to do it all over again is simply too much to ask and just not going to happen. Pass 1 was to simply get the templates moved over rather than try to both convert templates and the logical->english key system all at once (this move away from logical keys has been tried and fizzled before, probably because there are simply too many moving parts across our codebase… so we wanted to take this incrementally, and this seemed like the best place to go first). We’re simply doing stuff like this:

  <h3>{{ cctrans(locale, "deed.retired")|safe }}</h3>

Where cctrans is a simple logical key translation function. Next steps:

  • Create a script that converts all our .po files to eliminate the logical keys and move them to English-only.
  • Write a script to auto-interpolate {{ cctrans() }} calls in templates to {% trans %}{% endtrans %} Jinja2 tags.
  • Do all the many manual changes to all our python codebases.

At that point, we should be able to wrap this all up.

1 Comment »

Not Panicking: switching to Virtualenv for deployment

cwebber, July 29th, 2011

I’ve written about zc.buildout and virtualenv before and how to use them both simultaneously, which I find to be useful for development on my own machine. I really admire both of these tools; I especially think that buildout is really great for projects where you want developers to be able to get your package running quickly without having to understand how python packaging works. (I use buildout for this purpose for one of my own personal projects, MediaGoblin, and I think it’s served a wonderful purpose of getting new contributors up and going quickly.)

Anyway, in that previous blogpost about zc.buildout and virtualenv I erroneously suggested that virtualenv is best for multiple packages in development and zc.buildout is better for just one. I was rightly corrected that you can use the develop line of a buildout config file to specify multiple python packages. So this is what we’ve been doing for the last year roughly, running a meta-package with cc.engine checked out of git and the rest running out of python packages.

We’ve been doing packaging and releasing to our own egg basket for a while, and for the most part that has worked out, but our system administrator Nathan Kinkade pointed out that we don’t really need packages, it’s a bunch of extra steps to build, nobody outside of CC is using these packages, and it’s a lot easier to rollback a git repository in case of an emergency than it is a python package.

That lead me to reconsider the way we’re currently doing deployment and my growing feeling that maybe zc.buildout, while great for developing locally, really just isn’t a good option for deployment. Whenever I want to pull down new versions of packages, I would run buildout. But buildout likes to do something which makes this period very, very painful: if for whatever reason it can’t manage to install all packages, it tears down the entire environment. It removes ./bin/python, it removes all other scripts. I’ve found this to be highly stressful, especially because you never know if some package on PyPi is going to time out and then suddenly as punishment your environment no longer works, suddenly parts of aren’t running, and you start to have a minor panic attack as you rush to get things up again. That’s not very great.

Anyway, I always stress out about this, which has lead me to adding coping mechanisms to our fabric deploy script:

Don't Panic! screenshot

This helps reduce my blood pressure somewhat, but anyway, we decided to move from buildout to virtualenv for deployment. Actually, there’s not much more to say; it only took a couple of hours to make the switch and there really wasn’t anything special to say about it. It just works and generally seems a lot simpler.

In short: buildout is pretty great. If you’re looking for an option to make it really, really easy for people who want to try out your project to get something working or start contributing, it’s the closest the python world has to an interface as simple as (or simpler than) `./configure && make`. But as for deployment… especially if you’d like to do code checkouts of your main packages, just go with virtualenv.

Comments Off

Your own python egg baskets / package repositories

cwebber, February 14th, 2011

At Creative Commons sometimes we have python packages that we use a lot but which aren’t generally useful enough to put on PyPi (such as cc.license and cc.licenserdf). Luckily, as long as you and your organization don’t have a problem with having your packages available publicly, it’s fairly easy to put up a public egg basket (that is, a simple repository to put your python eggs/packages). And doing so has several advantages:

  • You can avoid cluttering up PyPi with packages that have a very marginal or internal audience. cc.licenserdf matters a lot for our site, but probably is worthless to anyone searching for “RDF” tools on pypi.
  • Putting a package up on PyPi suggests a certain amount of responsibility to keep that updated and keep its API stable. If it’s for internal use, maybe that responsibility is unwanted or unneeded.
  • Using proper python packaging means you can take full advantage of python’s ecosystem of installation tools, including useful dependency resolution.
  • Making packaged releases encourages a certain amount of responsibility toward your internal dependencies, reducing the “cowboy coding” factor (fun, but often not good in an environment that requires stability).

So, how to do it? The first step is to make a directory that’s statically served by Apache. Ours is at The directory has the sticky bit set so that everyone on the server can write to it without clobbering each other, kind of like in /tmp/ on most GNU/Linux installs. Everyone puts their tarballs up here. As for you or your organization’s “your own” packages, I think it’s fine to put all your eggs in just one basket like this. :)

But what should those packages look like?

Generally at the toplevel of your package you have something that looks like:

from setuptools import setup, find_packages
import sys, os

    description="Tools to clarify any licensing issue you have ever had",

    author='John Doe(ig)',

    # ... etc ...

    # Put your dependencies here...

    # And a link to your basket here

Obviously, replace values with ones that make sense for your package. The main things we’re talking about here are install_requires and dependency_links. Replace install_requires with whatever list of dependencies your package requires, and put your your own egg basket in the dependency_links list. After that, I guess set up whatever other large number of attributes you can apply to the setup function that make sense to your package (entry_points maybe? zip_safe=False is a nice one).

Lastly, maybe now that you have an egg basket, you’d like to know how to make some eggs to put in that basket. Pretty easy once you have! All you really need to do is this:

python sdist

Assuming things run well, do an `ls` into your dist/ directory. Hey, what do you know! There’s a tarball there. cp or scp or (heaven forbid) ftp it to your egg basket, and there you go. You just made a release! Maybe the next time you do it you’ll want to increment the version number.

Appendix: Why you should always list your dependencies in your package

Lastly I’d like to make a comment on only using buildout’s config file or pip’s requirements file to declare requirements: it’s completely crazy, don’t do it! Well okay, it’s not completely crazy, but you really should fill out the install_requires section in regardless of whether you are also using those tools. There are a couple of reasons for this:

  • Recursive dependencies: maybe someone will use your module as a library, and then they’ll have to declare your dependencies all over again if you don’t put them in install_requires. Let python’s package management tools handle the dependency graph logic, that’s where it belongs.
  • What happens if you build your tool in buildout and someone wants to use virtualenv/pip or vice versa? Both of these tools will check the install_requires section anyway, so be nice and fill it out!

Of course there are some awesome things that buildout and the pip requirements file can do to; for example they can install dependencies from VCS and etc which is useful in particular cases, particularly for certain in-development packages. By all means, use these tools to do that (and lots of other cool things, because both buildout and virtualenv/pip are pretty awesome). Just be a good packager and also fill out install_requires.

Now you have your own egg basket (which is completely vegan), you know how to make packages to put in it, and everyone is happy. Horray! Today we learned things (maybe).

Thanks to Asheesh Laroia for his detailed feedback on this post.

1 Comment »

Orgmode and Roundup: Bridging public bugtrackers and local tasklists

cwebber, November 10th, 2010

So maybe you’re already familiar with the problem. You’re collaborating with other people, and especially if you’re in a free software environment (but maybe even some install at your work) you have some bugtracker, and that’s where everyone collaborates. But on the other hand, you have a life, your own todo systems, your own notes, etc. Even for the tasks that are on the bugtracker, you might keep your own local copy of that task and notes on that task. Eventually things start to get out of sync. Chaos!

Wouldn’t it be great if you could sync both worlds? Keep the notes that are relevant to being public on the public bugtracker, but keep private notes that would just clutter up the ticket/issue/bugreport private. Mesh the public task system with your private task system. Well, why not?

So this was the very problem I’d run into. I have my work bugtracker for here at CC, our install of roundup, and then I have my own TODO setup, a collection of Org-mode files.

There are a lot of things I like about org-mode. It’s in emacs (though there’s apparently a lean vim port in the works), it’s plaintext (which means I can sync across all my machines with git… which I do!), tasks are nested trees / outlines (I really tend to break down tasks in very granular fashions as I go so I don’t get lost), notes are integrated directly with tasks (I take a lot of notes), and it’s as simple as you need it or as complex as you want to get (I started out very simple, now my usage of org-mode is fairly intricate). It also does a good job of spanning across multiple files while still retraining the ability to pull everything together with its agenda, which is useful since I like to keep things semi-organized.

And of course, the relevant file here is all my Creative Commons stuff, which I keep in a file called There’s a lot of private data in here, but I’ve uploaded a minimalist version of my file.

So! Syncing things. If you open the file in an emacs version with org-mode installed, you’ll notice 4 sections. Two of these are crucial to my setup, but we won’t be using them today: “Events” holds say, meeting at X time, traveling on certain days; “Various Tasks” contains not roundup-related tasks. Then there’s the other two: “Roundup” will collect all the tasks we need to work on, and “Supporting funcs” has a couple of org-babel blocks in Python and emacs-lisp.

Anyway, enough talk, let’s give it a spin. You’ll need a recent version org-mode and a copy of emacs. Make sure that newer org-mode is on your load-path and then evaluate:

(require 'org)
(require 'org-install)
(require 'ob-python)
(setq org-confirm-babel-evaluate nil)
(setq org-src-fontify-natively t)
(setq org-src-tab-acts-natively t)

Next open up the relevant org-mode file. Move to the “Roundup” line, hit Tab to cycle its visibility, and move to the line that starts with “#+call:”

Now press “Ctrl+c Ctrl+c”. You’ll see it populate with issues from my issues list:

What’s happening here? So we’re executing an org-babel block at point. Org-babel is an org-mode extension that allows you to make blocks of code executable, and even chain from one language to another (it also has some stuff relevant to Donald Knuth’s “literate programming” which is cool but I’m not using here). If we look at the code blocks:

Anyway, there are three code blocks here.

  • ccommons-roundup-parse: uses python to read the CSV file generated by roundup which is relevant to my task list, converts it into a list of two-item lists (task id, task title)
  • ccommons-roundup-insert-func: the function that actually inserts items into our “* Roundup” heading. It checks the ROUNDUPID property to see if that task is already inserted or not. If not, it inserts the task with the appropriate title and ROUNDUPID.
  • ccommons-roundup-insert the actual block we end up invoking. It binds together the data from ccommons-roundup-parse with a function call to the function defined in ccommons-roundup-insert-func.

You can evaluate it multiple times. It’ll only insert new tasks that aren’t on your list currently. Now you can take notes on your tasks, schedule them for various dates, make subtasks, etc. When you’re ready to close out a task close it out both on the ticket and in org-mode. If you want to use a similar setup for org-mode, I think it’s easy enough to borrow these methods and just change the CSV URL to whatever URL is appropriate for your user’s tasks.

Now admittedly this still isn’t even the best setup. It would be good if it told you when some tasks are marked as closed in your org-mode and open in roundup and vice versa. Org-babel still feels a bit hacky… I probably wouldn’t use it on anything other than scripts-I-want-to-embed-in-my-orgmode-files (for now at least). I even had to strip out quotes from the titles because org-babel python doesn’t escape quotations from strings correctly currently (but that’s a bug, one that will hopefully be fixed). Even so, I’ve been trying to close out a lot of roundup tasks lately, and it’s really helped me to bridge both worlds.

Edit: And in case you’re wondering why I didn’t use url.el instead of piping to python, the reason is because of CSV support… there’s none builtin to emacs as far as I know, and splitting on commas doesn’t handle all of the escaping intricacies… and org-babel makes it pretty easy to be lazy and simply use python for what python already handles well.


Using virtualenv and zc.buildout together

cwebber, March 16th, 2010

Virtualenv and zc.buildout are both great ways to develop python packages and deploy collections of packages without needing to touch the system library. They are both fairly similar, but also fairly different.

The primary difference between them is that zc.buildout focuses on having a single package, and all relevant dependencies are installed automatically within that package’s directory via the buildout script (Nathan Yergler points out that you don’t have to use things this way, but that seems to me to be the way things happen in practice… anyway, I’m not a buildout expert). The buildout script is very automagical and does all the configuration and installation of dependencies for you.  Since this is a build system, you can also configure it to do a number of other neat things, such as compile all your gettext catalogs, or scp the latest cheesefile.txt from… whatever you need to do to build a package.

Virtualenv is mostly the same creature, but it’s like you reached your hand inside and pulled it inside out. Instead of a bunch  of packages installed within a subdirectory of one package, there is a more generic directory layout that allows you to set up a number of packages within it. Installing a package and keeping it up to date is much more manual in general, but also a bit more flexible in the sense that you can switch paths around within the environment fairly easily and simultaneously developing multiple interwoven packages is not difficult.

I came to CC with a lot of experience with virtualenv and no experience with zc.buildout. Initially I could discern no differences of use case between them, but now I have a pretty good sense of when you’d want to use one over the other. An example use case, which has come up pretty often with me actually: say you have two packages, one of which is a dependency on the other. In this case, we’ll use both cc.license and cc.engine, where cc.engine has cc.license as a

Now say I’m adding a feature to cc.engine, but this feature also requires that I add something in cc.license. At this point it is
easier for me to switch to using virtualenv; I can set up both development packages in the same virtualenv and use them together.  This is great because it means that I should have little to no difficulty switching back and forth between both of them. If I make a change in cc.license it is immediately available to me in cc.engine.  This also  prevents either having to set up a tedious to switch around configuration checking out cc.license into cc.engine and etc, or making a bunch of unnecessary releases just to make sure things work, etc. It’s easier to work on multiple packages at once in
virtualenv in my experience.

Now let’s assume that we got things in working order, cc.license has the new feature and cc.engine is able to use it properly, tests are passing, and et cetera. At this point is where I think returning to zc.buildout is a good idea. One of the things I like about zc.buildout is that it provides a certain type of integrity checking with the buildout command. If you forget to mark a dependency or even remove it from on accident or whatever, buildout will simply unlink it from your path the next time you run it. In this case, I think zc.buildout is especially useful because I might forget to make a cc.license release here or some such thing. There are some other reasons for using zc.buildout (as the name implies, buildout is a full build system, so there are a lot of neat things you can do with it), but for a forgetful person such as myself this is by far the most important to me (and the most relevant to this example).

So I’ve described use cases for both cc.engine and cc.license. How do we get them to work nicely together? Let’s assume we just want to check out these packages once. Let’s also assume that our virtualenv directory is ~/env/ccommons (because I’m clearly basing this off my own current setup currently, heh).

First, we’ll create our virtualenv environment, if we haven’t already:

$ virtualenv ~/env/ccommons

Next, we’ll check out cc.engine and cc.license into ~/devel/ and run
buildout on each:

$ cd ~/devel/
$ git clone git://
$ git clone git://

Next, we’ll buildout the packages:

$ cd ~/devel/cc.license
$ wget*checkout*/zc.buildout/trunk/bootstrap/
$ python
$ ./bin/buildout
$ cd ~/devel/cc.engine
$ python # the cc engine already has checked in
$ ./bin/buildout

Buildout can take a while, so be prepared to go grab some cookies and coffee and/or tea. But once it’s done, getting these packages set up in virtualenv is super simple.

First activate the virtualenv environment:

$ source ~/env/ccommons/bin/activate
$ cd ~/devel/cc.license
$ python develop
$ cd ~/devel/cc.engine
$ python develop

That’s it! Now we can verify that these packages are set up in virtualenv. Open python and verify that you get the following output (adjusted to your own home directory and etc):

>>> import cc.engine
>>> cc.engine.__file__
>>> import cc.license
>>> cc.license.__file__

To leave virtualenv, you can simply type “deactivate”.

That’s it! Now you have a fully functional zc.buildout AND virtualenv setup, where switching back and forth is super simple.


cc.engine and web non-frameworks

cwebber, January 13th, 2010

The sanity overhaul has included a number of reworkings, one of them being a rewrite of cc.engine, which in its previous form was a Zope 3 application. Zope is a full featured framework and we already knew we weren’t using many of its features (most notably the ZODB); we suspected that something simpler would serve us better, but weren’t certain what. Nathan suggested one of two directions: either we go with Django (although it wasn’t clear this was “simpler”, it did seem to be where a large portion of the python knowledge and effort in the web world is pooling), or we go with repoze.bfg, a minimalist WSGI framework that pulls in some Zope components. After some discussion we both agreed: repoze.bfg seemed like the better choice for a couple of reasons: for one, Django seemed like it would be providing quite a bit more than necessary… in cc.engine we don’t have a traditional database (we do have an RDF store that we query, but no SQL), we don’t have a need for a user model, etc… the application is simple: show some pages and apply some specialized logic. Second, repoze.bfg built upon and reworked Zope infrastructure and paradigms, and so in that sense it looked like an easier transition. So we went forward with that.

As I went on developing it, I started to feel more and more like, while repoze.bfg certainly had some good ideas, I was having to create a lot of workarounds to support what I needed. For one thing, the URL routing is unordered and based off a ZCML config file. It was at the point where, for resolving the license views, I had to route to a view method that then called other view methods. We also needed a type of functionality as Django provides with its “APPEND_SLASH=True” feature. I discussed with the repoze.bfg people, and they were accommodating to this idea and actually applied it to their codebase for the next release. There were some other components they provided that were well developed but were not what we really needed (and were besides technically decoupled from repoze.bfg the core framework). As an example, the chameleon zpt engine is very good, but it was easier to just pull Zope’s template functionality into our system than make the minor conversions necessary to go with chameleon’s zpt.

Repoze was also affecting the Zope queryutility functionality in a way that made internationalization difficult. Once again, this was done for reasons that make sense and are good within a certain context, but make did not seem to mesh well with our existing needs. I was looking for a solution and reading over the repoze.bfg documentation when I came across these lines:

repoze.bfg provides only the very basics: URL to code mapping, templating, security, and resources. There is not much more to the framework than these pieces: you are expected to provide the rest.

But if we weren’t using the templating, we weren’t using the security model, and we weren’t using the resources, the URL mapping was making things difficult, and those were the things that repoze.bfg was providing on top of what was otherwise just WSGI + WebOb, how hard would it be to just strip things down to just the WSGI + WebOb layer? It turns out, not too difficult, and with an end result of significantly cleaner code.

I went through Ian Bicking’s excellent tutorial Another Do-It-Yourself Framework and applied those ideas to what we already had in cc.engine. Within a night I had the entire framework replaced with a single module, cc/engine/, which contained these few lines:

import sys
import urllib

from webob import Request, exc

from cc.engine import routing

def load_controller(string):
    module_name, func_name = string.split(':', 1)
    module = sys.modules[module_name]
    func = getattr(module, func_name)
    return func

def ccengine_app(environ, start_response):
    Really basic wsgi app using routes and WebOb.
    request = Request(environ)
    path_info = request.path_info
    route_match = routing.mapping.match(path_info)
    if route_match is None:
        if not path_info.endswith('/') 
                and request.method == 'GET' 
                and routing.mapping.match(path_info + '/'):
            new_path_info = path_info + '/'
            if request.GET:
                new_path_info = '%s?%s' % (
                    new_path_info, urllib.urlencode(request.GET))
            redirect = exc.HTTPTemporaryRedirect(location=new_path_info)
            return request.get_response(redirect)(environ, start_response)
        return exc.HTTPNotFound()(environ, start_response)
    controller = load_controller(route_match['controller'])
    request.start_response = start_response
    request.matchdict = route_match
    return controller(request)(environ, start_response)

def ccengine_app_factory(global_config, **kw):
    return ccengine_app

The main method of importance in this module is ccengine_app. This is a really simple WSGI application: it takes routes as defined in cc.engine.routes (which uses the very enjoyable Routes package) and sees if the current URL (or, the path_info portion of it) matches that URL. If it finds a result, it loads that controller and passes a WebOb-wrapped request into it, with any special URL matching data tacked into the matchdict attribute. And actually, the only reason that this method is even so long at all is because of the “if route_match is None” block in the middle: that whole part is providing APPEND_SLASH=True type functionality, as one would find in Django. (Ie, if you’re visiting the url “/licenses”, and that doesn’t resolve to anything, but the URL “/licenses/” does, redirect to /licenses/.) The portions before and after are just getting the controller for a url and passing the request into it. That’s all! (The current is a few lines longer than this, utilizing a callable class rather than a method in place of ccengine_app for the sake of configurability and attaching a few more things onto the request object, but not longer or complicated by much. The functionality otherwise is pretty much the same.)

Most interesting is that I swapped in this code, changed over the routing, fired up the server and.. it pretty much just worked. I swapped out a framework for about a 50 line module and everything was just as nice and functioning as it was. In fact, with the improved routing provided by Routes, I was able to cut out the fake routing view, and thus the amount of code was actually *less* than what it was before I stripped out the framework. Structurally there was no real loss either; the application still looks familiar to that you’d see in a pylons/django/whatever application.

I’m still a fan of frameworks, and I think we are very fortunate to *have* Zope, Pylons, Django, Repoze.bfg, and et cetera. But in the case of cc.engine I do believe that the position we are at is the right one for us; our needs are both minimal and special case, and the number of components out there for python are quite rich and easily tied together. So it seems the best framework for cc.engine turned out to be no framework at all, and in the end I am quite happy with it.

ADDENDUM: Chris McDonough’s comments below are worth reading.  It’s quite possible that the issues I experienced were my own error, and not repoze.bfg’s.  I also hope that in no way did I give the impression that we moved away from repoze.bfg because it was a bad framework, because repoze.bfg is a great framework, especially if you are using a lot of zope components and concepts.  It’s also worth mentioning that the type of setup that we ended up at, as I described, probably wouldn’t have happened unless I had adapted my concepts directly from repoze.bfg, which does a great job of showing just how usable Zope components are without using the entirety of Zope itself.  Few ideas are born without prior influence; repoze.bfg was built on ideas of Zope (as many Python web frameworks are in some capacity), and so too was the non-framework setup I described here based on the ideas of repoze.bfg.  It is best for us to be courteous to giants as we step on their shoulders, but it is also easier to forget or unintentionally fail to extend that courtesy as I may have done here.  Thankfully I’ve talked to Chris offline and he didn’t seem to have taken this as an offense, so for that I am glad.


Caching deeds for peak performance

nathan, January 6th, 2010

As Chris mentioned, he’s been working on improving the license chooser, among other things simplifying it and making it a better behaved WSGI citizen. That code also handles generating the license deeds. For performance reasons we like to serve those from static files; I put together some details about wsgi_cache, a piece of WSGI middleware I wrote this week to help with this, on my personal blog:

The idea behind wsgi_cache is that you create a disk cache for results, caching only the body of the response. We only cache the body for a simple reason—we want something else, something faster, like Apache or other web server, to serve the request when it’s a cache hit. We’ll use mod_rewrite to send the request to our WSGI application when the requested file doesn’t exist; otherwise it hits the on disk version. And cache “invalidation” becomes as simple as rm (and as fine grained as single resources).

You can read the full entry here, find wsgi_cache documentation on PyPI, and get the source code from our git repository.

Comments Off

>>> py >> file … Also if __name__ == ‘__main__’:

ankitg, September 1st, 2008

Some major updates and we have the scripts running, thanks Asheesh for the redirection idea, it works but I couldn’t get it to give me a progress bar since everything was being redirected to the file. I tried using two different functions but they needed a shared variable, so that failed, but it was nice since now I ended up with “real” python files with a main().

The journey was interesting, we went from trying >> inside python to including # -*- coding: UTF-8 -*- and # coding: UTF-8 to get it to work and after a few more bumps finally figured out the __main__

I still need to update all the scripts, but licChange which is at the forefront of all the latest developments just got bumped upto version 8.2 (which reminds me of a dire need to update GIT:Loggy!).

This also gave me an idea of how to go about getting data out of S3 for “free” … S3 to EC2 is free … SCP from EC2 is free and voila! Why would I every want to do that? Well, for starters, the EC2 AMI runs out of space around 5 GB (note: logs for are 4.7 GB) and secondly, the scripts seem to run faster locally. The icing on the cake, I wouldn’t have to scp the result files being generated. I could possibly automate the process of running the scripts.

Thats all for now … class at 0830 Hrs in the moring (it’s criminal, I know).

I guess, I’ll just have to keep at it.

Comments Off

EC2, S3Sync and back to Python.

ankitg, August 31st, 2008

So this is where we are.

Now we have EC2, we have S3Sync ruby scripts on the EC2 AMI to pull the data from S3 and we have updated python scripts that read one line at a time and use Geo-IP (which was suprisingly easy to install once GCC was functional and the right versions of the C and Python modules were attained). So deployment is on full throttle and one final bug fix for generating the final results and we are done.

So, now back to the python code. Now we have 4 scripts:

  • License Change (Logs for [Version 7]
  • License Chooser (Logs for [Version 5]
  • CC Search (Logs for [Version 4]
  • Deeds (Logs for*) [Version 2]

Each of which polls a directory for new logs, reads each new log in the stated directory, line by line and uses regular expressions to parse the information into usable statistics. Hitherto throughout the development phase, the results were passed on to stdout / console. With deployment, they now need to be written to a file, while interestingly is still to be resolved. (TypeError: ‘str’ object is not callable sound familiar to anyone?)

I am greatful to Asheesh (whom I should have totally bugged more). I should’ve put in more work into the project when vactioning back home, also having less to do at school would’ve helped (studies + 3 research projects is not a recommended wotk load), but if it would be easy, it wouldn’t be fun! Oh well, I learnt a fair bit through the project and with a bit more troubleshooting we’d be good to go … for now!

Comments Off

License-oriented metadata validator and viewer: summertime is winding up

hugo dworak, August 16th, 2008

Google Summer of Code 2008 approaches its end, as less than forty-eight hours are left to submit the code that will then be evaluated by mentors, therefore it is fitting to pause for a moment and sum up the work that has been done with regard to the license-oriented metadata validator and viewer and to confront it with the original proposal for the project.

A Web application capable of parsing and displaying license information embedded in both well-formed and ill-formed Web pages has been developed. It supports the following means of embedding license information: Dublin Core metadata, RDFa, RDF/XML linked externally or embedded (utilising the data URL scheme) using the link and a elements, and RDF/XML embedded in a comment or as an element (the last two being deprecated). This functionality has been proven by unit testing. The source code of a Web page can be uploaded or pasted by a user, there is also a possibility to provide a URI for the Web application to analyse it. The software has been written in Python and uses the Pylons Web Framework and the Genshi toolkit. Should you be willing to test this Lynx-friendly application, please visit its Web site.

The Web application itself uses a library called “libvalidator”, which in turn is powered by cc.license (a library developed by Creative Commons that returns information about a given license), pyRdfa (a distiller that generates the RDF triples from an (X)HTML+RDFa file), html5lib (an HTML parser/tokenizer), and RDFLib (a library for working with RDF). The choice of this set of tools has not been obvious and the library had undergone several redesigns, which included removing the code that employed encutils, XML canonicalization, µTidylib, and the BeautifulSoup. The idea of using librdf, librdfa, rdfadict has been abandoned. The source code of both the Web application (licensed under the GNU Affero General Public License version 3 or newer) and its core library (licensed under the GNU Lesser General Public License version 3 or newer) is available through the Git repositories of Creative Commons.

In contrast to the contents of the original proposal, the following goals have not been met: traversal of special links, syndication feeds parsing, statistics, and cloning the layout of the Creative Commons Web site. However, these were never mandatory requirements for the Web application. It is also worth noting that the software has been written from scratch, although a now-defunct metadata validator existed. Nevertheless, the development does not end with Google Summer of Code — these and several new features (such as validation of multimedia files via liblicense and support for different language versions) are planned to be added, albeit at a slower pace.

After the test period, the validator will be available under

1 Comment »

next page