cc.engine and web non-frameworks

cwebber, January 13th, 2010

The sanity overhaul has included a number of reworkings, one of them being a rewrite of cc.engine, which in its previous form was a Zope 3 application. Zope is a full featured framework and we already knew we weren’t using many of its features (most notably the ZODB); we suspected that something simpler would serve us better, but weren’t certain what. Nathan suggested one of two directions: either we go with Django (although it wasn’t clear this was “simpler”, it did seem to be where a large portion of the python knowledge and effort in the web world is pooling), or we go with repoze.bfg, a minimalist WSGI framework that pulls in some Zope components. After some discussion we both agreed: repoze.bfg seemed like the better choice for a couple of reasons: for one, Django seemed like it would be providing quite a bit more than necessary… in cc.engine we don’t have a traditional database (we do have an RDF store that we query, but no SQL), we don’t have a need for a user model, etc… the application is simple: show some pages and apply some specialized logic. Second, repoze.bfg built upon and reworked Zope infrastructure and paradigms, and so in that sense it looked like an easier transition. So we went forward with that.

As I went on developing it, I started to feel more and more like, while repoze.bfg certainly had some good ideas, I was having to create a lot of workarounds to support what I needed. For one thing, the URL routing is unordered and based off a ZCML config file. It was at the point where, for resolving the license views, I had to route to a view method that then called other view methods. We also needed a type of functionality as Django provides with its “APPEND_SLASH=True” feature. I discussed with the repoze.bfg people, and they were accommodating to this idea and actually applied it to their codebase for the next release. There were some other components they provided that were well developed but were not what we really needed (and were besides technically decoupled from repoze.bfg the core framework). As an example, the chameleon zpt engine is very good, but it was easier to just pull Zope’s template functionality into our system than make the minor conversions necessary to go with chameleon’s zpt.

Repoze was also affecting the Zope queryutility functionality in a way that made internationalization difficult. Once again, this was done for reasons that make sense and are good within a certain context, but make did not seem to mesh well with our existing needs. I was looking for a solution and reading over the repoze.bfg documentation when I came across these lines:

repoze.bfg provides only the very basics: URL to code mapping, templating, security, and resources. There is not much more to the framework than these pieces: you are expected to provide the rest.

But if we weren’t using the templating, we weren’t using the security model, and we weren’t using the resources, the URL mapping was making things difficult, and those were the things that repoze.bfg was providing on top of what was otherwise just WSGI + WebOb, how hard would it be to just strip things down to just the WSGI + WebOb layer? It turns out, not too difficult, and with an end result of significantly cleaner code.

I went through Ian Bicking’s excellent tutorial Another Do-It-Yourself Framework and applied those ideas to what we already had in cc.engine. Within a night I had the entire framework replaced with a single module, cc/engine/app.py, which contained these few lines:

import sys
import urllib

from webob import Request, exc

from cc.engine import routing

def load_controller(string):
    module_name, func_name = string.split(':', 1)
    __import__(module_name)
    module = sys.modules[module_name]
    func = getattr(module, func_name)
    return func

def ccengine_app(environ, start_response):
    """
    Really basic wsgi app using routes and WebOb.
    """
    request = Request(environ)
    path_info = request.path_info
    route_match = routing.mapping.match(path_info)
    if route_match is None:
        if not path_info.endswith('/') 
                and request.method == 'GET' 
                and routing.mapping.match(path_info + '/'):
            new_path_info = path_info + '/'
            if request.GET:
                new_path_info = '%s?%s' % (
                    new_path_info, urllib.urlencode(request.GET))
            redirect = exc.HTTPTemporaryRedirect(location=new_path_info)
            return request.get_response(redirect)(environ, start_response)
        return exc.HTTPNotFound()(environ, start_response)
    controller = load_controller(route_match['controller'])
    request.start_response = start_response
    request.matchdict = route_match
    return controller(request)(environ, start_response)

def ccengine_app_factory(global_config, **kw):
    return ccengine_app

The main method of importance in this module is ccengine_app. This is a really simple WSGI application: it takes routes as defined in cc.engine.routes (which uses the very enjoyable Routes package) and sees if the current URL (or, the path_info portion of it) matches that URL. If it finds a result, it loads that controller and passes a WebOb-wrapped request into it, with any special URL matching data tacked into the matchdict attribute. And actually, the only reason that this method is even so long at all is because of the “if route_match is None” block in the middle: that whole part is providing APPEND_SLASH=True type functionality, as one would find in Django. (Ie, if you’re visiting the url “/licenses”, and that doesn’t resolve to anything, but the URL “/licenses/” does, redirect to /licenses/.) The portions before and after are just getting the controller for a url and passing the request into it. That’s all! (The current app.py is a few lines longer than this, utilizing a callable class rather than a method in place of ccengine_app for the sake of configurability and attaching a few more things onto the request object, but not longer or complicated by much. The functionality otherwise is pretty much the same.)

Most interesting is that I swapped in this code, changed over the routing, fired up the server and.. it pretty much just worked. I swapped out a framework for about a 50 line module and everything was just as nice and functioning as it was. In fact, with the improved routing provided by Routes, I was able to cut out the fake routing view, and thus the amount of code was actually *less* than what it was before I stripped out the framework. Structurally there was no real loss either; the application still looks familiar to that you’d see in a pylons/django/whatever application.

I’m still a fan of frameworks, and I think we are very fortunate to *have* Zope, Pylons, Django, Repoze.bfg, and et cetera. But in the case of cc.engine I do believe that the position we are at is the right one for us; our needs are both minimal and special case, and the number of components out there for python are quite rich and easily tied together. So it seems the best framework for cc.engine turned out to be no framework at all, and in the end I am quite happy with it.

ADDENDUM: Chris McDonough’s comments below are worth reading.  It’s quite possible that the issues I experienced were my own error, and not repoze.bfg’s.  I also hope that in no way did I give the impression that we moved away from repoze.bfg because it was a bad framework, because repoze.bfg is a great framework, especially if you are using a lot of zope components and concepts.  It’s also worth mentioning that the type of setup that we ended up at, as I described, probably wouldn’t have happened unless I had adapted my concepts directly from repoze.bfg, which does a great job of showing just how usable Zope components are without using the entirety of Zope itself.  Few ideas are born without prior influence; repoze.bfg was built on ideas of Zope (as many Python web frameworks are in some capacity), and so too was the non-framework setup I described here based on the ideas of repoze.bfg.  It is best for us to be courteous to giants as we step on their shoulders, but it is also easier to forget or unintentionally fail to extend that courtesy as I may have done here.  Thankfully I’ve talked to Chris offline and he didn’t seem to have taken this as an offense, so for that I am glad.

2 Comments »

Removing duplicate rows in MySQL

nkinkade, January 12th, 2010

There are thousands of articles out there on removing duplicate rows from a SQL database. However, almost all of the first-page results of a search at Google for something like “mysql remove duplicates” involved creating temporary tables and other convoluted ways of solving the problem. I’m posting this simple method here in the hope that it could simplify this process for someone else. This is likely old news for people highly experienced with MySQL and SQL databases in general, but it’s not that frequent that I have to tackle the duplicate row issue, so I post here for my own reference and that of others. The idea is that you create a unique index on the table based on the columns that should not be duplicated. MySQL will delete the duplicates in order to comply with the uniqueness of the index created. You then simply remove the temporary index.

There was a bug in CiviCRM which was causing duplicate records in a particular table. Find the duplicates:

mysql> SELECT contact_id, contribution_id, receive_date, product_id, count(*) FROM civicrm_contribution_product JOIN civicrm_contribution ON civicrm_contribution_product.contribution_id = civicrm_contribution.id GROUP BY contribution_id, product_id having count(*) > 1 ORDER BY receive_date;

Removing most of the duplicates:

mysql> ALTER IGNORE TABLE civicrm_contribution_product ADD UNIQUE INDEX `tmp_index` (contribution_id, product_id);

Removing the temporary index:

mysql> ALTER TABLE civicrm_contribution_product DROP INDEX tmp_index;

Thanks to Paul Swarthout’s comments on a thread at databasejournal.com for this simple solution.

Comments Off

MediaWiki, the application platform

nathan, January 8th, 2010

As noted on the CC weblog and elsewhere, AcaWiki launched back in October. I’m much later noting the launch, which is really inexcusable since CC did much of the tech buildout. AcaWiki is only the most recent example of our work with Mediawiki and Semantic Mediawiki. Both are critical pieces of our infrastructure here, and tools we’d like to see developed further. One area that we’ve brainstormed but not attempted to implement is the idea of “actions”; you can read an overview of the idea here.

The combination of Mediawiki and Semantic MediaWiki has allowed us to build applications faster and increase the effective number of “developers” on our team by lowering the barriers to entry. I expect 2010 to be a really interesting year for wiki applications.

1 Comment »

Conferencing with Asterisk on a $20/month Linode

nkinkade, January 7th, 2010

For quite some years CC has been using Free Conference Call for tele-conferencing needs. It generally worked pretty well, but people frequently complained of not being able to connect, or getting erroneously dropped into some empty conference room, and things of that nature … to say nothing of the questionable practices used by services like Free Conference Call that allow them to make a service like that free. Paid conferencing systems are actually quite expensive, and CC doesn’t have the resources to be able to roll out some $10,000 to $20,000 custom conferencing system. We ended up deciding to setup Asterisk on one of our servers. We didn’t really want to load one of our core servers, so we decided to give it a try on a $20/month Linode which we were already using for server monitoring.

Getting Asterisk installed on a Debian system is as easy as $ apt-get install asterisk. Configuring it is covered by thousands of other articles on the Internet. However, the problem is that Debian’s default kernel is shipped with a particular configuration that is unacceptable for VoIP applications like Asterisk. Debian’s default kernel sets CONFIG_HZ=250, but for tele-conferencing with Asterisk to have acceptable audio quality it needs to be 1000. No problem, building a custom Debian kernel package is pretty easy, but we wanted to run this on a Linode VPS, which is a Xen environment. So the question became how to get Xen patches applied to the vanilla Debian kernel sources. It may sound trivial, but it actually took me quite some time to work it out. The first issue was resources — a $20/month Linode doesn’t have many, and the VPS ran out of memory while trying to compile the kernel. I got around this by killing virtually every other un-needed process. The next problem was what turned out to be a bug in the Debian kernel-package package, which took me quite a long time to find … I didn’t locate the bug in the code myself, but it took me a long time to realize it was a bug causing my problem and then find the existing bug report and a workaround.

I’ll try to post here again soon with the actual steps needed to get this working, partly for the benefit of the community, but also to document what I did in case I ever have to do it again. For now, suffice it to say that we are successfully running our own conferencing system using FOSS telephony software (Asterisk) on a $20/month VPS. Not only does this give us much more control over the system, but it opened up the possibility for people to start calling in via VoIP (SIP or IAX!) software instead of dialing in through the PSTN. This saves CC even more money because dialing into our conferencing system through the PSTN is not free. For that we had to find a good DID provider. We ended up going with Flowroute and so far I’ve been very happy with their service. Their rates are very competitive (< 1¢/minute), and the web interface for account management is very clean and intuitive. We have had conference calls with 20 people and the call quality has been just fine. Not only that, we can set up as many conference rooms as we want and hold multiple conferences simultaneously.

UPDATE: Wed Aug 11 21:56:24 UTC 2010 Belorussian translation provided by Patricia Clausnitzer.

6 Comments »

Caching deeds for peak performance

nathan, January 6th, 2010

As Chris mentioned, he’s been working on improving the license chooser, among other things simplifying it and making it a better behaved WSGI citizen. That code also handles generating the license deeds. For performance reasons we like to serve those from static files; I put together some details about wsgi_cache, a piece of WSGI middleware I wrote this week to help with this, on my personal blog:

The idea behind wsgi_cache is that you create a disk cache for results, caching only the body of the response. We only cache the body for a simple reason—we want something else, something faster, like Apache or other web server, to serve the request when it’s a cache hit. We’ll use mod_rewrite to send the request to our WSGI application when the requested file doesn’t exist; otherwise it hits the on disk version. And cache “invalidation” becomes as simple as rm (and as fine grained as single resources).

You can read the full entry here, find wsgi_cache documentation on PyPI, and get the source code from our git repository.

Comments Off

Understanding the State of Sanity (via whiteboards and ascii art)

cwebber, December 18th, 2009

Since I started working at Creative Commons a number of months ago, I’ve been primarily focused on something we refer to as the “sanity overhaul”.  In this case, sanity refers to try and simplify what is kind of a long and complicated code history surrounding Creative Commons’ licenses, both as in terms of the internal tooling to modifying, deploying, and querying licenses and the public facing web interfaces for viewing and downloading them.  Efforts toward the sanity overhaul started before I began working here, executed by both Nathan Yergler and Frank Tobia, but for a long time they were in a kind of state of limbo as other technical efforts had to be dedicated to other important tasks.  The good news is that my efforts have been permitted to be (almost) entirely dedicated toward the sanity overhaul since I have started, and we are reaching a point where all of those pieces are falling into place and we are very close to launch.

To give an idea of the complexity of things as they were and how much that complexity has been reduced, it is useful to look at some diagrams.  When Nathan Kinkade first started working at Creative Commons (well before I did), Nathan Yergler took some time to draw on the whiteboard what the present infrastructure looked like:

as well as what he envisioned the “glorious future” (sanity) would look like:

When I started, the present infrastructure had shifted a little bit further still, but the vision of the “glorious future” (sanity) had mostly stayed the same.

This week (our “tech all-hands week”) I gave a presentation on the “State of Sanity”.  Preparing for that presentation I decided to make a new diagram.  Since I was already typing up notes for the presentation in Emacs, I thought I might try and make the most minimalist and clear ASCII art UML-like diagram that I could (my love of ASCII art is well known to anyone who hangs out regularly in #cc on Freenode).  I figured that I would later convert said diagram to a traditional image using Inkscape or Dia, but I was so pleased with the end result that I just ended up using the ASCII version:

*******************
* CORE COMPONENTS *
*******************

      .--.
     ( o_o)
     /'---
     |USER| --.
     '----'   |
              |
              V
         ___   .---.
       .'   ','     '.
     -'               '.
    (     INTARWEBS     )
     '_.     ____    ._'
        '-_-'    '--'
              |
              |
              V
      +---------------+  Web interface user
      |   cc.engine   |  interacts with
      +---------------+
              |
              |
              V
      +---------------+  Abstraction layer for
      |  cc.license   |  license querying and
      +---------------+  pythonic license API
              |
              |
              V
      +---------------+  Actual rdf datastore and
      |  license.rdf  |  license RDF operation tools
      +---------------+  

****************
* OTHER PIECES *
****************

  +--------------+
  |  cc.i18npkg  |
  | .----------. |
  | | i18n.git | |
  +--------------+

********************************************
* COMPONENTS DEPRECATED BY SANITY OVERHAUL *
********************************************

  +------------+  +-----------+  +---------+  +-------------+
  |    old     |  | old zope  |  | licenze |  | license_xsl |
  | cc.license |  | cc.engine |  +---------+  +-------------+
  +------------+  +-----------+

This isn’t completely descriptive on its own, and I will be annotating as I include it in part of the Sphinx developer docs we are bundling with the new cc.engine.  But I think that even without annotation, it is clear how much cleaner the new infrastructure is at than the old “present infrastructure” whiteboard drawing, which means that we are making good progress!

7 Comments »

One click PayPal donations with CiviCRM

nkinkade, November 9th, 2009

About a month ago CC launched its annual Fall fundraising campaign. Along with it we also rolled out a streamlined donation process. I wrote about this on the CiviCRM blog, and also wrote up some documentation on the CC Wiki. This new donation method required some custom code, and leveraging an existing CiviCRM script written by Donald Lobo.

1 Comment »

CC @ Mozilla Service Week

nathan, September 9th, 2009

Next week is Mozilla Service Week and Creative Commons is participating by hosting a week long help desk in IRC. You can find more details on our earlier blog post or in the wiki. Several CC staff members and community volunteers will be available during the week to answer questions about using CC licenses and the associated tools. We’ll be answering questions about:

  • General CC help
  • CC technology (ccREL and software projects)
  • Where and how to publish CC works
  • Where and how to find CC works
  • CC in education and science

If you’d like to help out and educate others about using CC licenses and tools, you can sign up on the wiki page.

Comments Off

Creative Commons Drupal Module — GSoC 2009

blaise, September 3rd, 2009

This past year was my last at the University of Toronto, making this summer my last chance to participate in the Google Summer of Code. I searched hard for a project and mentor organization that would suit my interests, and when I noticed that the Creative Commons Drupal module was in need of some developer love, I knew exactly what I wanted to spend my summer doing. With John Doig as my CC mentor, and Kevin Reynen (the module’s maintainer and initial author) as an unofficial Drupal mentor, I’ve been privileged to have spent the past few months updating and extending the module.

A couple years ago, development for Drupal 4.7 was begun, but it was never quite completed. CC Lite came to be the reliable choice for Drupal 6. However, CC Lite’s scope is limited — it allows you to attach a license to content in Drupal, but that’s about it. The main CC module’s vision is broader — to fully integrate CC technology with the Drupal platform — and I hope I’ve helped to realize that just a little.

Some of the module’s features:

  • it uses the CC API for license selection and information (so, for example, when new license versions are released, they become available on your Drupal site automatically)
  • you can set a site-wide default license/jurisdictoin, and user’s can set their own default license/jurisdiction
  • ccREL metadata is supported, output in RDFa (and, optionally, RDF/XML for legacy systems)
  • supports CC0, along with the 6 standard licenses and the Public Domain Certification tool
  • you can control which licenses and metadata fields are available to users
  • basic support for the Views API has been added (including a default /creativecommons view)
  • there’s a CC site search option

The module is still listed as a beta release, as some folks have been submitting bug fixes and patches over the past few weeks, though it’s quite usable. Special thanks to Turadg Aleahmad, who’s helped with a lot of the recent bug fixes towards the end of the GSoC term, and committed to being active in future development. If you’re into Drupal development, we could use help with testing, and any translations would be greatly appreciated too.

Right now, the focus is on getting to a stable release, but we’ve got lots of ideas for the future too. Thanks to John and Kevin for their support through the summer, and to Turadg for his recent help. I look forward to seeing the module put to good use!

Check it out!

I’m a musician, writer, software developer, free culture / free software advocate and recent graduate of the University of Toronto — get in touch at http://blaise.ca/

Comments Off

Developers landing page revamp

greg, August 11th, 2009

If you haven’t looked at the Developers landing page on the Creative Commons wiki lately, you’re missing out! We’ve recently put a lot of effort into reorganizing the information, making the important things easier to find, and overall just making the whole place a bit more welcoming.

developer_redesign_screenshot

First of all, we’ve made the semantic split between information for desktop-based development and web-based development. At each page there is a list (and short description) of the various tools to help you integrate CC-license metadata functionality and a short list of open Developer Challenges. These challenges are things the developer community think would be cool to have; a wishlist everyone can help with!

Also, we’re starting to produce some more “HowTo” guides for developers who are interested in the best practices of integrating CC-license metadata. Thus far we have one for Web Integration which lists the various ways a service could support CC licenses with best practices examples (pictoral and code) of how they did it. See, for example, the page on adding license choice when uploading content.

We hope this redesign will make it easier for developers to find the information they need to improve their services. If you have any other suggestions, don’t hesitate to send an email to greg@creativecommons.org

Comments Off


previous pagenext page