software
Creative Commons Drupal Module — GSoC 2009
Blaise Alleyne, September 3rd, 2009
This past year was my last at the University of Toronto, making this summer my last chance to participate in the Google Summer of Code. I searched hard for a project and mentor organization that would suit my interests, and when I noticed that the Creative Commons Drupal module was in need of some developer love, I knew exactly what I wanted to spend my summer doing. With John Doig as my CC mentor, and Kevin Reynen (the module’s maintainer and initial author) as an unofficial Drupal mentor, I’ve been privileged to have spent the past few months updating and extending the module.
A couple years ago, development for Drupal 4.7 was begun, but it was never quite completed. CC Lite came to be the reliable choice for Drupal 6. However, CC Lite’s scope is limited — it allows you to attach a license to content in Drupal, but that’s about it. The main CC module’s vision is broader — to fully integrate CC technology with the Drupal platform — and I hope I’ve helped to realize that just a little.
Some of the module’s features:
- it uses the CC API for license selection and information (so, for example, when new license versions are released, they become available on your Drupal site automatically)
- you can set a site-wide default license/jurisdictoin, and user’s can set their own default license/jurisdiction
- ccREL metadata is supported, output in RDFa (and, optionally, RDF/XML for legacy systems)
- supports CC0, along with the 6 standard licenses and the Public Domain Certification tool
- you can control which licenses and metadata fields are available to users
- basic support for the Views API has been added (including a default /creativecommons view)
- there’s a CC site search option
The module is still listed as a beta release, as some folks have been submitting bug fixes and patches over the past few weeks, though it’s quite usable. Special thanks to Turadg Aleahmad, who’s helped with a lot of the recent bug fixes towards the end of the GSoC term, and committed to being active in future development. If you’re into Drupal development, we could use help with testing, and any translations would be greatly appreciated too.
Right now, the focus is on getting to a stable release, but we’ve got lots of ideas for the future too. Thanks to John and Kevin for their support through the summer, and to Turadg for his recent help. I look forward to seeing the module put to good use!
I’m a musician, writer, software developer, free culture / free software advocate and recent graduate of the University of Toronto — get in touch at http://blaise.ca/
No Comments »New validator released!
asheesh, January 6th, 2009
This past summer, Hugo Dworak worked with us (thanks to Google Summer of Code) on a new validator. This work was greatly overdue, and we are very pleased that Google could fund Hugo to work on it. Our previous validator had not been updated to reflect our new metadata standards, so we disabled it some time ago to avoid creating further confusion. The textbook on CC metadata is the “Creative Commons Rights Expression Language”, or ccREL, which specifies the use of RDFa on the web. (If this sounds like keyword soup, rest assured that the License Engine generates HTML that you can copy and paste; that HTML is fully compliant with ccREL.) We hoped Hugo’s work on a new validator would let us offer a validator to the Creative Commons community so that publishers can test their web pages to make sure they encode the information they intended.
Hugo’s work was a success; he announced in August 2008 a test version of the validator. He built on top of the work of others: the new validator uses the Pylons web framework, html5lib for HTML parsing and tokenizing, and RDFlib for working with RDF. He shared his source code under the recent free software license built for network services, AGPLv3.
So I am happy to announce that the test period is complete, and we are now running the new code at http://validator.creativecommons.org/. Our thanks go out to Hugo, and we look forward to the new validator gaining some use as well as hearing your feedback. If you want to contribute to the validator’s development or check it out for any reason, take a look at the documentation on the CC wiki.
1 Comment »liblicense 0.8.1: The bugfixiest release ever
asheesh, December 25th, 2008
I’m greatly pleased to announce liblicense 0.8.1. Steren and Greg found a number of major issues (Greg found a consistent crasher on amd64, and Steren found a consistent crasher in the Python bindings). These issues, among
some others, are fixed by the wondrous liblicense 0.8.1. I mentioned to Nathan Y. that liblicense is officially “no longer ghetto.”
The best way enjoy liblicense is from our Ubuntu and Debian package repository, at http://mirrors.creativecommons.org/packages/. More information on what liblicense does is available on our wiki page about liblicense. You can also get them in fresh Fedora 11 packages. And the source tarball is available for download from sourceforge.net.
P.S. MERRY CHRISTMAS!
The full ChangeLog snippet goes like this:
liblicense 0.8.1 (2008-12-24):
* Cleanups in the test suite: test_predicate_rw’s path joiner finally works
* Tarball now includes data_empty.png
* Dynamic tests and static tests treat $HOME the same way
* Fix a major issue with requesting localized informational strings, namely that the first match would be returned rather than all matches (e.g., only the first license of a number of matching licenses). This fixes the Python bindings, which use localized strings.
* Add a cooked PDF example that actually works with exempi; explain why that is not a general solution (not all PDFs have XMP packets, and the XMP packet cannot be resized by libexempi)
* Add a test for writing license information to the XMP in a PNG
* Fix a typo in exempi.c
* Add basic support for storing LL_CREATOR in exempi.c
* In the case that the system locale is unset (therefore, is of value “C”), assume English
* Fix a bug with the TagLib module: some lists were not NULL-terminated
* Use calloc() instead of malloc()+memset() in read_license.c; this improves efficiency and closes a crasher on amd64
* Improve chooser_test.c so that it is not strict as to the *order* the results come back so long as they are the right licenses.
* To help diagnose possible xdg_mime errors, if we detect the hopeless application/octet-stream MIME type, fprintf a warning to stderr.
* Test that searching for unknown file types returns a NULL result rather than a segfault.
liblicense 0.8 (important) fixes RDF predicate error
asheesh, July 30th, 2008
Brown paper bag release: liblicense claims that the RDF predicate for a file’s license is http://creativecommons.org/ns#License rather than http://creativecommons.org/ns#license. Only the latter is correct.
Any code compiled with liblicense between 0.6 and 0.7.1 (inclusive) contains this mistake.
This time I have audited the library for other insanities like the one fixed here, and there are none. Great thanks to Nathan Yergler for spotting this. I took this chance to change ll_write() and ll_read() to *NOT* take NULL as a valid predicate; this makes the implementation simpler (and more correct).
Sadly, I have bumped the API and ABI numbers accordingly. It’s available in SourceForge at http://sf.net/projects/cctools, and will be uploaded to Debian and Fedora shortly (and will follow from Debian to Ubuntu).
I’m going to head to Argentina for a vacation and Debconf shortly, so there’ll be no activity from on liblicense for a few weeks. I would love help with liblicense in the form of further unit tests. Let’s squash those bugs by just demonstrating all the cases the license should work in.
No Comments »32 to 64 bit remotely
Nathan Kinkade, July 15th, 2008
A couple months ago I posted here about some of our experiences with Varnish Cache as an HTTP accelerator. By and large I have been very impressed with Varnish. We even found that it had the unexpected benefit of acting as a buffer in front of Apache, preventing Apache from getting overwhelmed with too many slow requests. Apache would get wedged once it had reached it’s MaxClients limit, whereas Varnish seems to happily queue up thousands of requests even if the backend (Apache) is going slowly.
However, after a while we started running into other problems with Varnish, and I found the probable answer in a bug report at the Varnish site. It turns out that Varnish was written with a 64 bit system in mind. That isn’t to say that it won’t work nicely on a 32 bit system, just that you better not expect high server load, or else you’ll start running into resource limitations in a hurry. This left us with about 2 options: Move to 64 bit or ditch Varnish for something like Squid. Seeing as how I was loathe to do the latter, we decided to go 64 bit, which in any case is another logical step into the 21st century.
The problem was that our servers are co-located in data centers around the country. We didn’t want to hassle with reprovisioning all of the them. Asheesh did the the first remote conversion based on some outdated document he found on remotely converting from Red Hat Linux to Debian. It went well and we haven’t had a single problem on that converted machine since. Varnish loves 64bit.
I have now converted two more machines, and this last time I documented the steps I took. I post them here for future reference and with the hope that it may help someone else. Note that these steps are somewhat specific to Debian Linux, but the concepts should be generally applicable to any UNIX-like system. There are no real instructions below, so you just have to infer the method from the steps. See the aforementioned article for more verbose, though dated, explanations. BE WARNED that if you make a mistake and don’t have some lovely rescue method then you may be forced to call your hosting company to salvage the wreckage:
- [ssh server]
- aptitude install linux-image-amd64
- reboot
- [ssh server]
- sudo su -
- aptitude install debootstrap # if not already installed
- swapoff -a
- sfdisk -l /dev/sda # to determine swap partition, /dev/sda5 in this case
- mke2fs -j /dev/sda5
- mount /dev/sda5 /mnt
- cfdisk /dev/sda # set /dev/sda5 to type 83 (Linux)
- debootstrap –arch amd64 etch /mnt http://http.us.debian.org/debian
- mv /mnt/etc /mnt/etc.LOL
- cp -a /etc /mnt/
- mv /mnt/boot /mnt/boot.LOL
- cp -a /boot /mnt/ # this is really just so that the dpkg post-install hooks don’t issue lots of warnings about things not being in /boot that it expects.
- chroot /mnt
- aptitude update
- aptitude dist-upgrade
- aptitude install locales
- dpkg-reconfigure locales # optional (I selected All locales, default UTF-8)
- aptitude install ssh sudo grub vim # and any other things you want
- aptitude install linux-image-amd64
- vi /etc/fstab # change /dev/sda5 to mount on / and comment out old swap entry
- mkdir /home/nkinkade # just so I have a home, not necessary really
- exit # get out of chroot
- vi /boot/grub/menu.lst # change root= of default option from sda6 to sda5
- reboot
- [ssh server]
- sudo su -
- mount /dev/sda6 /mnt
- chroot mnt
- dpkg –get-selections > ia32_dpkg_selections
- exit
- mv /home /home.LOL
- cp -a /mnt/home /
- mv /root /root.LOL
- cp -a /mnt/root /
- mkdir /mnt/ia32
- mv /mnt/* /mnt/ia32
- mv /mnt/.* /mnt/ia32
- cp -a bin boot dev etc etc.LOL home initrd initrd.img lib lib64 media opt root sbin srv tmp usr var vmlinuz /mnt
- mkdir /mnt/proc /mnt/sys
- vi /mnt/etc/fstab # make /dev/sda6 be mounted on / again, leave swap commented out
- vi /boot/grub/menu.lst # change the default boot option back to root=/dev/sda6
- reboot
- [ssh server]
- sudo su -
- mkswap /dev/sda5
- vi /etc/fstab (uncomment swap line)
- swapon -a
- dpkg –set-selections < /ia32/ia32_dpkg_selections
- apt-get dselect-upgrade # step through all the questions about changed /etc/files, etc.
License-oriented metadata validator and viewer: the development has just started
Hugo Dworak, May 26th, 2008
Creative Commons participates in Google Summer of Code™ and has accepted a proposal (see the abstract) of Hugo Dworak based on its description of a task to rewrite its now-defunct metadata validator. Asheesh Laroia has been assigned as the mentor of the project. The work began on May 26th, 2008 as per the project timeline. It is expected to be completed in twelve weeks. More details will be provided in the dedicated CC Wiki article and the progress will be weekly featured on this blog.
The project focuses on developing an on-line tool — free software written in Python — to validate digitally embedded Creative Commons licenses within files of different types. Files will be pasted directly to a form, identified by a URL, or uploaded by a user. The application will present the results in a human?readable fashion and notify the user if the means used to express the license terms are deprecated.
1 Comment »Varnish Cache at CC
Nathan Kinkade, April 3rd, 2008
Over the past few months we have been migrating most of our web services to new servers. Squid Cache was in use on a number of the old servers as an HTTP accelerator, and we decided that while upgrading hardware and OS we might as well bring our HTTP accelerator fully into the 21st century. Enter Varnish Cache, which has some interesting architectural/design features.
Varnish was easy to install thanks to the Debian package management system, and the configuration file is vastly simpler than that of Squid despite a horrendous dearth of documentation. Varnish runs well and we are generally happy with it. However, after a few months we have encountered a number of gotchas, most of which probably have workarounds:
- Varnish seems to choke on files that are larger than around 600MB. No errors, just sends the client a 200 response with no other data.
- For some reason Bazaar (bzr) apparently does not function through Varnish, even when Varnish was instructed to “pass” requests to bzr repositories.
- bbPress for some unknown reason won’t function through Varnish.
- KeepAlives must to be turned off in Apache, otherwise pages randomly take 1 to 2 minutes to load sometimes. There is an open bug report for this at Varnish’s Trac page.
- Varnish logs are big. They get out of hand in a hurry. For creativecommons.org the log file can grow to 2GB+ in less than 30 minutes. No problem, but varnishlog doesn’t seem to want to write to a file larger than 2GB. It could have something to do with an email thread I read at Varnish’s site, which makes it seems like it might be related to the fact that we are running everything in 32 bit mode, though I believe our hardware support both 32 and 64 bit operation. This means that I have to run a special logrotate script about every 10 or 15 minutes to keep varnishlog from crashing.
I was recently experimenting and discovered that for some things that were apparently broken, configuring Varnish to “pipe” requests works, while using “pass” does not. This won’t make any sense unless you are familiar with VCL (Varnish Configuration Language). I know that “piping” fixed the bbPress issue, and I suspect that it will fix the Bazaar issue as well, though I haven’t tested it.
A week or so ago I experimented with turning off Varnish for creativecommons.org to see how Apache would handle the load unaided. Things seemed to be going well for a while, but within a weeks time the site went down twice. The second time I couldn’t revive Apache. There were kernel messages like ip_conntrack: table full: packet dropped. Apparently the machine was just flooded and Apache was pegged at it’s MaxClients limit. I re-enabled Varnish and the problem went away immediately. So it appears that not only is Varnish doing a nice job of caching, but it also is able to handle many more simultaneous TCP connections than Apache without blowing up. Asheesh and I ran some experiments that seemed to demonstrate that Varnish actually helps to mitigate floods of traffic, whether they be natural or malicious.
Exempi 1.99.3 Released
Jason Kivlighn, July 11th, 2007
Hubert Figuiere has released Exempi 1.99.3
An important addition in this release is the ability to serialize XMP to a string, making sidecar XMP possible. The soon-to-be-released Liblicense 0.1 already takes advantage of this feature; it uses Exempi to read and write licenses within XMP sidecar.
Hopefully, the API will soon stabilize in preparation for the 2.0 release.
No Comments »It’s coming! CC OpenOffice.Org Add-in
ksiomelo, June 27th, 2007
As you may know, I am working on this project as part of the Google Summer of Code program. Before starting, François Dechelle was already engaged and now we are joining efforts to develop this potentially-popular application!
The current state of the prototype is already adding the licenses in the body of the document.

After the license have been chosen, it becomes available as an Auto-Text, being able to be easily replicable in the document without having to create a new wizard. The name of the license and the URL are also being stored in the document’ metadata.

Currently I am working on a weird bug which is not allowing the ooo API to retrieve some images of the available licenses at http://i.creativecommons.org/l/ (it was working perfectly few days ago…)
We’re going to commit a stable version ready for download in the next weeks, but if you are eager to use it you can check out this folder at cctools repository:
Feedbacks and suggestions are welcome!
Cheers
3 Comments »System Integrated Licensing
Scott Shawcroft, June 19th, 2007
I’ve been asked, as a tech intern here at Creative Commons, to create a way of locally tracking file licenses on a system. A while back Jon wrote down his ideas about system-wide license tracking on the Creative Commons wiki. The purpose of this system would be to provide an interface for developers to access the available licenses on a system. Additionally, like the existing online license chooser, this library, called libLicense, will feature a way to choose a license through toggling certain flags available for a family of licenses. Naturally, the first family available will be the Creative Commons licenses. The larger goal for the summer is to utilize this library in a few initial systems. Currently, I’m looking at integration into Gnome and Sugar (from the One Laptop Per Child project). This further work will occur after libLicense is working.
Data
To run libLicense the data of all the licenses will need to be stored in some sort of fashion. My initial thought is this:
- All data will be stored in a directory. On Linux this directory would be /usr/share/licenses . (This is borrowed from Jon’s thoughts.)
- Families of licenses will be stored in a subdirectory of the licenses directory. For example, the Creative Commons licenses would be stored within creative_commons.
- Within these family directories each specific license will be stored in a file with the naming scheme <bitcode>-<short name>-<jurisdiction>-<locale>.license . These files will store the license uri, name, status (active or retired), description and legal text. How this will be stored is up in the air. My intial thoughts include separating each attribute on a line or having a format similar to .desktop files.
- In addition to storing license data, some family information must be stored, namely the family bit flags. In the case of the Creative Commons licenses, the bit flags would be Attribution, Share-Alike, Non-Commercial and No Derivatives. They would combine to create the bitcode present in the license filename. These bit flags would be the heart of the license chooser logic. If the combination does not exist, the flags are incompatible.
API
The library would potentially have these functions:
get_jurisdiction(uri) - returns the jurisdiction for the given license.
get_jurisdictions(short or bitcode) - returns the available jurisdiction for the given short name or bitcode.
get_locale(uri) – returns the locale for the given license.
get_locales(jurisdiction, short or bitcode) – returns the available locales for the given jurisdiction and short name or bitcode.
get_name(uri) – returns the name of the license.
get_version(uri) – returns the version of the license.
get_versions(short, jurisdiction) - returns the available versions for the given short name or bitcode and jurisdiction.
get_short(uri) - returns the short name for the given uri.
has_flag(attribute,uri) – returns if the flag is set for the given uri.
family_flags(family) - returns the flags available for a given family.
family(uri) – returns the family the given uri belongs to.
get_notification(uri[,url]) - returns the notification string for the given url with an option to provide a verification url.
verify_uri(uri) - returns whether or not the given uri is recognized by the system.
get_license(family,flags, jurisdiction,locale) – returns the uri which satisfies the given attributes.
get_all_licenses() - returns all general licenses available.
get_general_licenses(family) - returns all general licenses in a family.
get_families() – returns a list of available families.
Did I miss something? Does something not make sense? Please post a comment.
2 Comments »