See also roundup
The Learning Resource Metadata Initiative specification (which Creative Commons is coordinating) has entered its final public commenting period. Please look if you’re at all interested in education metadata and/or how efforts spurred by schema.org (which LRMI is) will shape up.
The W3C published drafts recently that ought be of great interest to the Creative Commons technology community: a family of documents regarding provenance and a guide to using microdata, microformats, and RDFa in HTML. I mentioned these on my personal blog here and here.
Speaking of things mentioned on my personal blog, a couple days ago I posted some analysis of how people are deploying CC related metadata based on a structured data extracted by the Web Data Commons project from a sample of the Common Crawl corpus. Earlier this month I posted a marginally technical explanation of using CSS text overlays to provide attribution and a brief historical overview of ‘open hardware licensing’, something which the CC technology team hasn’t been involved in, but is vaguely labs-ish, and needs deep technical attention.
Other things needing deep technical attention: how CC addresses Digital Restrictions Management in version 4.0 of its licenses is being discussed. We don’t know enough about the technical details of various restricted systems (see last sentence) that CC licensed works are being distributed on/to/with every day, and ought to. Another needs-technical-attention issue is ‘functional content’ for example in games and 3D printing. And we’re still looking for a new CTO.
Finally, Jonathan Rees just posted on how to apply CC0 to an ontology. You should subscribe to Jonathan’s blog as almost every post is of great interest if you’ve read this far.
Addendum: It seems remiss to not mention SOPA, so I’m adding it. Thanks to the technology community for rising up against this bad policy. CC promoted the campaign on its main website through banners and a number of blog posts. Don’t forget that SOPA/PIPA may well rise again, the so-called Research Works Act is very different but is motivated by the same thinking, and ACTA threatens globally. Keep it up! In the long term, is not building a healthy commons (and thus technology needed to facilitate building a healthy commons) a big part of the solution? On that, see yet another post on my personal blog…
No Comments »Creative Commons: Using Provenance in the Context of Sharing Creative Works
I provided a brief non-technical writeup on Creative Commons and provenance for the W3C Provenance Working Group‘s Connection Task Force documenting “Communities Addressing Important Issues in Provenance”.
See the writeup on the Provenance WG wiki (please suggest edits in comments below), current version follows.
Creative Commons Creative Commons (CC) provides licenses and public domain tools that can be used for any kind of creative works like texts, images, websites, or other media, as well as databases. CC tools are well known and used, especially in online publications. Each CC license and public domain tool is identified by a unique URL, allowing proper identification and reference of these as part of a work’s provenance information.
Additionally, Creative Commons provides a vocabulary to describe its tools and works licensed or marked with those tools in a machine interpretable way: The Creative Commons Rights Expression Language (CC REL). CC REL can be expressed in RDF.
The provenance of assertions about a work’s license or public domain status is of great important for licensors, licensees, curators, and future potential users. All CC licenses legally require certain information (attribution and license notice) be retained; even in the case of its public domain tools, retaining such information is a service to readers and in accordance with research and other norms. To the extent license and related information is not retained or cannot be trusted, users ability to find and rely upon freedoms to use such works is degraded. In many cases, the original publication location of a work will disappear (linkrot) or rights information will be removed, either unintentionally (eg template changes) or intentionally (here especially, provenance is important; CC licenses are irrevocable). In the degenerate case, a once CC-licensed work becomes just another orphan work.
The core statements needed are who licensed, dedicated to the public domain, or marked as being in the public domain, which work, and when? Each of these statements have sub-statements, eg the relationship of “who” to rights in the work or knowledge about the work, and exactly what work and at what granularity?
Provenance information is also necessary for discovering the uses of shared works and building new metrics of cultural relevance, scientific contribution, etc, that do not strictly require on centralized intermediaries.
Finally, in CC’s broader context, an emphasis on machine-assisted provenance aligns with renewed interest in copyright formalities (eg work registries), puts a work’s relationship to society’s conception of knowledge in a different light (compare intellectual provenance and intellectual property), and is in contrast with technical restrictions which aim to make works less useful to users rather than more.
No Comments »Converting cc.engine from ZPT to Jinja2 and i18n logical keys to english keys
Some CC-specfic background
Right now I’m in the middle of retooling of our translation infrastructure. cc.engine and related tools have a long, complex history (dating back, as I understand, to TCL scripts running on AOL server software). The short of it is, CC’s tools have evolved a lot over the years, and sometimes we’re left with systems and tools that require a lot of organization-specific knowledge for historical reasons.
This has been the case with CC’s translation tools. Most of the world these days uses english-key translations. CC used logical key translations. This means that if you marked up a bit of text for translation, instead of the key being the actual text being translated (such as “About The Licenses”), the key would be an identifier code which mapped to said english string, like “util.View_Legal_Code”. What’s the problem with this? Actually, there are a number of benefits that I’ll miss and that I won’t get into here, but the real problem is that the rest of the translation world mostly doesn’t work this way. We use Transifex (and previously used Pootle) as a tool for our translators managing our translations. Since these tools don’t expect logical keys we had to write tools to convert from logical keys to english keys on upload and english keys to logical keys back and a whole bunch of other crazy custom tooling.
Another time suck has been that we’d love to be able to just dynamically extract all translations from our python code and templates, but this also turns out to be impossible with our current setup. A strange edge-case in ZPT means that certain situations with dynamic attributes in ZPT-translated-HTML means that we have to edit certain translations after they’re extracted, meaning we can’t rely on an auto-extracted set of translations.
So we’d like to move to a future with no or very few custom translation tools (which means we need English keys) and auto-extraction of translations (which means because of that edge case, no ZPT). Since we need to move to a new templating engine, I decided that we should go with my personal favorite templating engine, Jinja2.
ZPT vs Jinja2
Aside from the issue I’ve described above, briefly I’d like to describe the differences between ZPT and Jinja2, as they’re actually my two favorite templating languages.
ZPT (Zope Page Templates) is an XML-based templating system where your tags and elements actually become part of the templating logic and structure. For example, here’s an example of us looping over a list of license versions on our “helpful” 404 pages for when you type in the wrong license URL (like at http://creativecommons.org/by/2.33333/):
<h4>Were you looking for:</h4> <ul class="archives" id="suggested_licenses"> <li tal:repeat="license license_versions"> <a tal:attributes="href license/uri"> <b tal:content="python: license.title(target_lang)"></b> </a> </li> </ul>
As you can see, the for loop, the attributes, and the content are actually elements of the (X)HTML tree. The neat thing about this is that you can be mostly sure that you won’t end up with tag soup. It’s also pretty neat conceptually.
Now, let’s look at the same segment of code in Jinja2:
<h4>Were you looking for:</h4>
<ul class="archives" id="suggested_licenses">
{% for license in license_versions %}
<li>
<a href="{{ license.uri }}">
<b>{{ license.title(target_lang) }}</b>
</a>
</li>
{% endfor %}
</ul>
If you’ve used Django’s templating system before, this should look very familiar, because that’s the primary source of inspiration for Jinja2. There are a few things I like about Jinja2 though that Django’s templating system doesn’t have, but the biggest and clearest of these things is the ability to pass arguments into functions, as you can see that we’re doing here with license.title(target_lang). Anyway, it massively beats making a template tag every time you want to pass an argument into a function.
The conversion process
Not too much to say about converting from ZPT to Jinja2. It’s really just a lot of manual work, combing through everything and moving it around.
More interestingly might be our translation conversion process. Simply throwing out old translations and re-extracting with new ones is not an option… it’s a lot of effort for translators to go through and translate things and asking them to do it all over again is simply too much to ask and just not going to happen. Pass 1 was to simply get the templates moved over rather than try to both convert templates and the logical->english key system all at once (this move away from logical keys has been tried and fizzled before, probably because there are simply too many moving parts across our codebase… so we wanted to take this incrementally, and this seemed like the best place to go first). We’re simply doing stuff like this:
<h3>{{ cctrans(locale, "deed.retired")|safe }}</h3>
Where cctrans is a simple logical key translation function. Next steps:
- Create a script that converts all our .po files to eliminate the logical keys and move them to English-only.
- Write a script to auto-interpolate {{ cctrans() }} calls in templates to {% trans %}{% endtrans %} Jinja2 tags.
- Do all the many manual changes to all our python codebases.
At that point, we should be able to wrap this all up.
No Comments »Summary of current licensing tools
I’ve been considering license integration into a personal project of mine and thoughts of that have spilled over into work. And so we’ve been talking at Creative Commons recently about the current methods for licensing content managed by applications and what the future might be. The purpose of this post is to document the present state of licensing options. (A post on the future of licensing tools may come shortly afterward.)
Present, CC supported tools
To begin with, there are these three CC-hosted options:
- CC licensing web API — A mostly-RESTful interface for accessing CC licensing information. Some language-specific abstraction layers are provided. Supported and kept up to date. Lacking a JSON layer, which people seem to want. Making a request for every licensing action in your application may be a bit heavy.
- Partner interface — Oldest thing we support, part of the license engine. Typical case is that you get a popup and when the popup closes the posting webpage can access the info that’s chosen. Still gets you your chooser based interface but on your own site. Internet Archive uses it, among others.
- LicenseChooser.js — Allows you to get a local chooser user interface by dumping some javascript into your application, and has the advantage of not requiring making any web requests to Creative Commons’ servers. Works, though not recently updated.
All of these have the problem that the chooser of CC licenses is only useful if you want exactly the choices we offer (and specifically the most current version of the licenses we provide). You need to track those changes in the database anyway, which means you either are not keeping track of version used or you are and when we change you might be in for a surprise.
Going it alone
So instead there are these other routes that sites take:
- Don’t use any tools and store license choices locally — What Flickr and every other major option does: reproduce everything yourself. In the case of Flickr, the six core licenses at version 2.0. In YouTube, just one license (CC BY 3.0). That works when you have one service, when you know what you want, and what you want your users to use. It doesn’t work well when you want people to install a local copy and you don’t know what they want to use.
- Let any license you want as long as it fits site policy — and you don’t facilitate it, and it gets kind of outside the workflow of the main CMS you’re using… wiki sites are an example of this, but usually have a mechanism for adding a license to footer of media uploaded. The licenses are handled by wiki templates, anyone can make a template for any license they choose.
None of those are really useful for software you expect other people to install where you want to provide some assistance to either administrators of the software who are installing it to be used or where you want the administrator to give the user some choice or choices relevant to that particular site.
The liblicense experiment
This brings us to another solution that CC has persued:
- liblicense — Packages all licenses we provide, give an api for users to get info and metadata about them. Allows for web-request-free access to the cc licenses. It doesn’t address non-CC licenses, however, and is mostly unmaintained.
So, these are the present options that application developers have at their disposal for doing licensing of application-managed content. There’s a tradeoff with each one of them though: either you have to rely on web requests to CC for each licensing decision you make, you go it alone, or you use something unmaintained which is CC-licensing-specific anyway. Nonetheless, cc.api and the partner interface are supported if you want something from CC, and people do tend to make by with doing things offline. But none of the tools we have are so flexible, so what can software like MediaGoblin or an extension for WordPress or etc do?
There’s one more option, one that too my knowledge hasn’t really been explored, and would be extremely flexible but also well structured.
The semantic web / linked data option?
It goes like this: let either users or admins specify licenses by their URL. Assuming that page self-describes itself via some metadata (be it RDFa, providing a rel=”alternate” RDF page in your headers, or microdata), information about that license could be extracted directly from the URL and stored in the database. (This information could of course then be cached / recorded in the database.) This provides a flexible way of adding new licenses, is language-agnostic, and allows for a canonical set of information about said licenses. Libraries could be written to make the exctraction of said information easier, could even cache metadata for common licenses (and for common licenses which don’t provide any metadata at their canonical URLs…).
I’m hoping that in the near future I’ll have a post up here demonstrating how this could work with a prototypical tool and use case.
Thanks to Mike Linksvayer, for most of this post was just transforming a braindump of his into a readable blogpost.
1 Comment »CC license use in Latin America brief analysis
Carolina Botero (of CC Colombia and now CC regional manager for Latin America) has posted a brief analysis of CC license use in Latin America (es2en machine translation).
As with previous looks at CC licenses across different jurisdictions and regions, this one is based on search engine reported links to jurisdiction “ported” versions of licenses. It is great to see what researchers have done with this limited mechanism.
I am eager to explore other means of characterizing CC usage across regions and fields, and hope to provide more data that will enable researchers to do this. This will be increasingly important as we attempt to forge a universal license that does not call for porting in the same way, with version 4.0 over the coming year (as well as with the increasing importance of CC in the world).
No Comments »Not Panicking: switching to Virtualenv for deployment
I’ve written about zc.buildout and virtualenv before and how to use them both simultaneously, which I find to be useful for development on my own machine. I really admire both of these tools; I especially think that buildout is really great for projects where you want developers to be able to get your package running quickly without having to understand how python packaging works. (I use buildout for this purpose for one of my own personal projects, MediaGoblin, and I think it’s served a wonderful purpose of getting new contributors up and going quickly.)
Anyway, in that previous blogpost about zc.buildout and virtualenv I erroneously suggested that virtualenv is best for multiple packages in development and zc.buildout is better for just one. I was rightly corrected that you can use the develop line of a buildout config file to specify multiple python packages. So this is what we’ve been doing for the last year roughly, running a meta-package with cc.engine checked out of git and the rest running out of python packages.
We’ve been doing packaging and releasing to our own egg basket for a while, and for the most part that has worked out, but our system administrator Nathan Kinkade pointed out that we don’t really need packages, it’s a bunch of extra steps to build, nobody outside of CC is using these packages, and it’s a lot easier to rollback a git repository in case of an emergency than it is a python package.
That lead me to reconsider the way we’re currently doing deployment and my growing feeling that maybe zc.buildout, while great for developing locally, really just isn’t a good option for deployment. Whenever I want to pull down new versions of packages, I would run buildout. But buildout likes to do something which makes this period very, very painful: if for whatever reason it can’t manage to install all packages, it tears down the entire environment. It removes ./bin/python, it removes all other scripts. I’ve found this to be highly stressful, especially because you never know if some package on PyPi is going to time out and then suddenly as punishment your environment no longer works, suddenly parts of creativecommons.org aren’t running, and you start to have a minor panic attack as you rush to get things up again. That’s not very great.
Anyway, I always stress out about this, which has lead me to adding coping mechanisms to our fabric deploy script:

This helps reduce my blood pressure somewhat, but anyway, we decided to move from buildout to virtualenv for deployment. Actually, there’s not much more to say; it only took a couple of hours to make the switch and there really wasn’t anything special to say about it. It just works and generally seems a lot simpler.
In short: buildout is pretty great. If you’re looking for an option to make it really, really easy for people who want to try out your project to get something working or start contributing, it’s the closest the python world has to an interface as simple as (or simpler than) `./configure && make`. But as for deployment… especially if you’d like to do code checkouts of your main packages, just go with virtualenv.
No Comments »LRMI tech WG CFP
If you know your stuff, the you might be able to guess from the subject what this is about. Perhaps LR = Learning Resource is not obvious. More on the main CC blog…
No Comments »CTO
Creative Commons is hiring its next Chief Technology Officer.
If you follow the links in the post linked above, you can find out a lot about the technology we’re looking for someone to be chief officer of. Why not submit a patch, bug report, or documentation edit with your resume? ;-)
No Comments »RDFaCE: an RDFa-enhanced TinyMCE rich editor
For a long time — it feels like much longer than the RDFa Plugin for WordPress tech challenge has been on the wiki (28 months) — the idea that there should be such a thing has been around. I recall multiple Summer of Code applications proposing to tackle the problem. However, it is a really hard UI problem.
I’m really happy to see the announcement of RDFaCE, which does most of the hard work.
Without reading any documentation or watching their screencast (still haven’t watched it, no idea if it is any good!) I was able to add a cc:attributionName annotation specific to the image in their demo on my first try:
- select the photographer name, insert cc:attributionName annotation with literal value already in the text. RDFaCE seems to already know the correct cc: namespace mapping.
- select content around photo, set subject to photo URL
- verify that triples produced are correct
Granted I more or less know what I’m doing. But, so do lots of other people. Contrary to some impressions, annotating stuff on the web with name-value pairs (“stuff” is the subject in the “triple”) is hardly brain-twisting.
I look forward to seeing RDFaCE bundled in a WordPress plugin with some awareness of the WordPress media manager, and using on this very blog.
TinyMCE is the free software rich text editor used in lots of projects in addition to WordPress, so this is a great step forward!
1 Comment »Libre Graphics Magazine interview at Libre Graphics Meeting
As discussed previously, I represented Creative Commons at Libre Graphics Meeting 2011. Also attending were the people behind Libre Graphics Magazine. If you aren’t already familiar with Libre Graphics Magazine, it’s a cool project crossing free software and free cultural works. It isn’t as much a magazine of free software design tutorials (though to some extent it is that also) as it is a design magazine offering a critical perspective on and showcasing works made with such tools.
It’s valuable that we have a magazine that can show off the strengths of libre graphics tools when put in the hands of capable artists. But the people behind this magazine can probably describe this much better themselves. On that note, Danny Piccirillo recorded an interview with the main people behind Libre Graphics Magazine (Ana Carvalho, ginger “all lowercase” coons, Ricardo Lafuente). Amongst other things, the interview touched on why even the printing of the magazine itself is useful:
Ana Carvalho: In the professional world, one of the things that is usually pointed out to people that use FLOSS [Free/Libre/Open Source Software] for design is that it’s not good for printing.
ginger coons: We proved them wrong!
Ana Carvalho: Yes. And… you can see it’s possible. And you can do it with the same quality that you can do it with other kinds of tools. So that’s a very strong point.
ginger coons: That really is a constant refrain even within our own community. People always still talk about the printing problem. So… what printing problem?
Ricardo Lafuente:There’s a lot of edges to be ironed out, but on the other hand we do get compliments from printers on how good our PDFs are constructed. And that’s thanks to the quality of FLOSS software. There’s still this kind of misconception that FLOSS software is not up to par with professional standards… that’s not true, people still don’t believe that, but that’s their problem, and this is one of our ways to try and prove them wrong and actually try and get their interest toward alternate ways of making beautiful things.
There are plenty of other gems in the interview. Assuming we’ve piqued your interest, you can watch the whole thing below:

View on YouTube or archive.org / CC BY-SA 3.0
And of course check out Libre Graphics Magazine itself. The magazine is licensed as CC BY-SA 3.0, and PDFs are available at no cost on the site, but it really is a magazine that is designed for and shines best in print, so consider purchasing a physical copy. Thanks to ginger coons and also Ana Carvalho and Ricardo Lafuente of Manufactura Independente for taking the time to do this interview and to Danny Piccirillo for the large time investment in both filming it and editing it down.
No Comments »