This is a special guest post by John Bishop of John Bishop Images.
Prior to Adobe’s Creative Suite 4, adding Creative Commons license metadata via the FileInfo… dialog (found in Photoshop, Illustrator, InDesign and more) meant coding a relatively simple text based XML panel definition and has been available from the Creative Commons Wiki since 2007.
Starting with Creative Suite 4 Adobe migrated the XMP FileInfo panel to a Flash based application, meaning that adding Creative Commons metadata became much more complex, requiring Adobe’s XMP SDK and the ability to develop applications in Flash, C++ or Java.
After significant development and testing john bishop images is pleased to announce the availability of a custom Creative Commons XMP FileInfo Panel for Creative Suite 4 and Creative Suite 5 – free of charge.
This comprehensive package offers the ability to specify Creative Commons license metadata directly in first class, industry standard tools and places Creative Commons licensing metadata on the same footing as the standardized, commercial metadata sets like Dublin Core (DC), IPTC and usePLUS and tightly integrates all the metadata fields required for a Creative Commons license in one panel.
Also included is a metadata panel definition that exposes the Creative Commons license metadata in the mini metadata panels found in Bridge, Premiere Pro, etc. And finally a set of templates that can be customized for the various license types and more is also included; these templates can be accessed from Acrobat.
For more information and to download the Creative Commons XMP FileInfo panel visit john bishop images’ Creative Commons page.
Note: The panels are localized and a English-US language file is supplied. To contribute localization files in other languages please contact john bishop images.1 Comment »
Who I am
I’m Nils Dagsson Moskopp, a 22 year old student of philosophy and computer science, living in Berlin (German speakers may check out my blog). I dislike the act of programming, but love the results, so I seem to have no other choice than to do it from time to time.
Recently, after submitting a proposal, I got accepted into the Google Summer of Code program, being mentored by Nathan Kinkade. In the rest of this entry, I will do my best to explain how it came to that and what kind of software I intend to create.
As far as I know, there currently is no automated, easy way to have human- and machine-readable markup for specific subsections of a blog post in the blogging software WordPress; it is only possible to have an entire page licensed under one specific license. Further complicating the issue is the fact that the WordPress media manager actually does not know about licenses accociated with specific content. This poses a problem for the not-so-uncommon use case of embedding CC-licensed media, mainly photos.
I was first confronted with the idea of having an automated way to markup media with Creative Commons licensing information when reading Matthias Mehldau‘s post More precise Creative Commons HTML-Codes. He envisioned an annotation containing not only the well-known CC license symbols, but also the jurisdiction and a button to show the markup used, for easy re-embedding. Well versed in graphics design, he also created a mockup:
Matthias Mehldau’s Mockup
Shortly after that, Julia Seeliger posted a suggestion how a Creative Commons plugin backend for WordPress could look like. She suggested integrating the official license chooser or a drop down list within the WordPress upload form.
We three chatted about these ideas, me even actually implementing some parts (see next section), but nothing qualifying as currently usable came from that. When approximately one year later, I did come upon the CC wiki site describing the developer challenge titled Support for CC licenses in WordPress Media Manager, I chose to apply for that with Google Summer of Code.
As mentioned in the introduction to the last section, a tool that provides easy licensing management for WordPress media content, automating subsequent insertion, currently does not exist. Several existing projects, however, have features related to this goal; code re-use may be possible.
In 2006, George Notaras created the Creative-Commons-Configurator WordPress Plugin. It allows the user to choose a global license for his or her blog, using
the web-based license selection engine from CreativeCommons.org and adds localized machine- and human-readable license information to pages and feeds. Similar plugins, putting global licensing information into footer and sidebars, exist.
For GSoC 2009, Dinishika Nuwangi made a WordPress plugin called wprdfa (not to confuse with wp-rdfa). Unfortunately, the project has an empty README file. Judging from a quick glance at the source code, part of its intended purpose is to add buttons to the TinyMCE editor integrated into WordPress, although on installation I was unable to see this functionality. The related developer challenge still seems to be open, on the page the plugin is described as
Also in 2009, I created two pieces of software, inspired by a post on the WHATWG mailing list. First, a web application generating HTML5 license markup (enriched with Microdata, see next section), second a WordPress plugin capable of generating similar markup using both Microdata and RDFa. And there was much rejoicing.
It is important to note that since then the HTML5 standard has changed and the generated markup is no longer valid.
On a less technical note, the German blog Spreeblick has a unique way of presenting the license information, only showing a small “cc” in the bottom left corner; on hovering, author name and a link to the source are presented (live example). However, while Spreeblick is using WordPress, the folks behind it have no intention of releasing their plugin: As Max Winde told me in April 2009, it is tightly entangled with their image layout functionality and would require complex cleanup.
I plan to implement the presentation part using the new HTML5 elements figure and figcaption. Together, they can be used to denote content
with a caption […] that is self-contained and is typically referenced as a single unit from the main flow of the document. A code example shows how markup using the figure element may looks like:
<!-- content goes here -->
Naturally, as a rather general markup language HTML5 does not contain any elements to give this construct more specific meaning, such as marking up which license applies to said content. However, two markup extensions provide this capability at the attribute level, the complex and established RDFa and the simpler, newer Microdata proposal, part of HTML5. While both standards are sufficiently open, RDFa is endorsed by Creative Commons; for this reason I will not stray into Microdata territory unless I have time to spare at the end of GSoC.
To this point, I have been only accounting for machine readability. Nevertheless, with CSS it is easily possible to beautify the visual presentation in nearly any way imaginable. The following two screenshots, taken with the now-defunct WordPress plugin I created in 2009, exemplify this — both are based on the same markup.
On the author side, I plan to have the plugin look like the Spreeblick one (screenshot), which will mean adding options to the WordPress media uploader:
- an additional drop down list, for choosing one of the six main CC licenses
- an additional text input for the author or rights holder
- an additional text input for specifying the source URI
Media could then be inserted the usual way, with the RDFa annotation automatically generated.10 Comments »
Since I started working at Creative Commons a number of months ago, I’ve been primarily focused on something we refer to as the “sanity overhaul”. In this case, sanity refers to try and simplify what is kind of a long and complicated code history surrounding Creative Commons’ licenses, both as in terms of the internal tooling to modifying, deploying, and querying licenses and the public facing web interfaces for viewing and downloading them. Efforts toward the sanity overhaul started before I began working here, executed by both Nathan Yergler and Frank Tobia, but for a long time they were in a kind of state of limbo as other technical efforts had to be dedicated to other important tasks. The good news is that my efforts have been permitted to be (almost) entirely dedicated toward the sanity overhaul since I have started, and we are reaching a point where all of those pieces are falling into place and we are very close to launch.
To give an idea of the complexity of things as they were and how much that complexity has been reduced, it is useful to look at some diagrams. When Nathan Kinkade first started working at Creative Commons (well before I did), Nathan Yergler took some time to draw on the whiteboard what the present infrastructure looked like:
as well as what he envisioned the “glorious future” (sanity) would look like:
When I started, the present infrastructure had shifted a little bit further still, but the vision of the “glorious future” (sanity) had mostly stayed the same.
This week (our “tech all-hands week”) I gave a presentation on the “State of Sanity”. Preparing for that presentation I decided to make a new diagram. Since I was already typing up notes for the presentation in Emacs, I thought I might try and make the most minimalist and clear ASCII art UML-like diagram that I could (my love of ASCII art is well known to anyone who hangs out regularly in #cc on Freenode). I figured that I would later convert said diagram to a traditional image using Inkscape or Dia, but I was so pleased with the end result that I just ended up using the ASCII version:
******************* * CORE COMPONENTS * ******************* .--. ( o_o) /'--- |USER| --. '----' | | V ___ .---. .' ',' '. -' '. ( INTARWEBS ) '_. ____ ._' '-_-' '--' | | V +---------------+ Web interface user | cc.engine | interacts with +---------------+ | | V +---------------+ Abstraction layer for | cc.license | license querying and +---------------+ pythonic license API | | V +---------------+ Actual rdf datastore and | license.rdf | license RDF operation tools +---------------+ **************** * OTHER PIECES * **************** +--------------+ | cc.i18npkg | | .----------. | | | i18n.git | | +--------------+ ******************************************** * COMPONENTS DEPRECATED BY SANITY OVERHAUL * ******************************************** +------------+ +-----------+ +---------+ +-------------+ | old | | old zope | | licenze | | license_xsl | | cc.license | | cc.engine | +---------+ +-------------+ +------------+ +-----------+
This isn’t completely descriptive on its own, and I will be annotating as I include it in part of the Sphinx developer docs we are bundling with the new cc.engine. But I think that even without annotation, it is clear how much cleaner the new infrastructure is at than the old “present infrastructure” whiteboard drawing, which means that we are making good progress!7 Comments »
This past year was my last at the University of Toronto, making this summer my last chance to participate in the Google Summer of Code. I searched hard for a project and mentor organization that would suit my interests, and when I noticed that the Creative Commons Drupal module was in need of some developer love, I knew exactly what I wanted to spend my summer doing. With John Doig as my CC mentor, and Kevin Reynen (the module’s maintainer and initial author) as an unofficial Drupal mentor, I’ve been privileged to have spent the past few months updating and extending the module.
A couple years ago, development for Drupal 4.7 was begun, but it was never quite completed. CC Lite came to be the reliable choice for Drupal 6. However, CC Lite’s scope is limited — it allows you to attach a license to content in Drupal, but that’s about it. The main CC module’s vision is broader — to fully integrate CC technology with the Drupal platform — and I hope I’ve helped to realize that just a little.
Some of the module’s features:
- it uses the CC API for license selection and information (so, for example, when new license versions are released, they become available on your Drupal site automatically)
- you can set a site-wide default license/jurisdictoin, and user’s can set their own default license/jurisdiction
- ccREL metadata is supported, output in RDFa (and, optionally, RDF/XML for legacy systems)
- supports CC0, along with the 6 standard licenses and the Public Domain Certification tool
- you can control which licenses and metadata fields are available to users
- basic support for the Views API has been added (including a default /creativecommons view)
- there’s a CC site search option
The module is still listed as a beta release, as some folks have been submitting bug fixes and patches over the past few weeks, though it’s quite usable. Special thanks to Turadg Aleahmad, who’s helped with a lot of the recent bug fixes towards the end of the GSoC term, and committed to being active in future development. If you’re into Drupal development, we could use help with testing, and any translations would be greatly appreciated too.
Right now, the focus is on getting to a stable release, but we’ve got lots of ideas for the future too. Thanks to John and Kevin for their support through the summer, and to Turadg for his recent help. I look forward to seeing the module put to good use!
I’m a musician, writer, software developer, free culture / free software advocate and recent graduate of the University of Toronto — get in touch at http://blaise.ca/Comments Off
This past summer, Hugo Dworak worked with us (thanks to Google Summer of Code) on a new validator. This work was greatly overdue, and we are very pleased that Google could fund Hugo to work on it. Our previous validator had not been updated to reflect our new metadata standards, so we disabled it some time ago to avoid creating further confusion. The textbook on CC metadata is the “Creative Commons Rights Expression Language”, or ccREL, which specifies the use of RDFa on the web. (If this sounds like keyword soup, rest assured that the License Engine generates HTML that you can copy and paste; that HTML is fully compliant with ccREL.) We hoped Hugo’s work on a new validator would let us offer a validator to the Creative Commons community so that publishers can test their web pages to make sure they encode the information they intended.
Hugo’s work was a success; he announced in August 2008 a test version of the validator. He built on top of the work of others: the new validator uses the Pylons web framework, html5lib for HTML parsing and tokenizing, and RDFlib for working with RDF. He shared his source code under the recent free software license built for network services, AGPLv3.
So I am happy to announce that the test period is complete, and we are now running the new code at http://validator.creativecommons.org/. Our thanks go out to Hugo, and we look forward to the new validator gaining some use as well as hearing your feedback. If you want to contribute to the validator’s development or check it out for any reason, take a look at the documentation on the CC wiki.1 Comment »
I’m greatly pleased to announce liblicense 0.8.1. Steren and Greg found a number of major issues (Greg found a consistent crasher on amd64, and Steren found a consistent crasher in the Python bindings). These issues, among
some others, are fixed by the wondrous liblicense 0.8.1. I mentioned to Nathan Y. that liblicense is officially “no longer ghetto.”
The best way enjoy liblicense is from our Ubuntu and Debian package repository, at http://mirrors.creativecommons.org/packages/. More information on what liblicense does is available on our wiki page about liblicense. You can also get them in fresh Fedora 11 packages. And the source tarball is available for download from sourceforge.net.
P.S. MERRY CHRISTMAS!
The full ChangeLog snippet goes like this:
liblicense 0.8.1 (2008-12-24):
* Cleanups in the test suite: test_predicate_rw’s path joiner finally works
* Tarball now includes data_empty.png
* Dynamic tests and static tests treat $HOME the same way
* Fix a major issue with requesting localized informational strings, namely that the first match would be returned rather than all matches (e.g., only the first license of a number of matching licenses). This fixes the Python bindings, which use localized strings.
* Add a cooked PDF example that actually works with exempi; explain why that is not a general solution (not all PDFs have XMP packets, and the XMP packet cannot be resized by libexempi)
* Add a test for writing license information to the XMP in a PNG
* Fix a typo in exempi.c
* Add basic support for storing LL_CREATOR in exempi.c
* In the case that the system locale is unset (therefore, is of value “C”), assume English
* Fix a bug with the TagLib module: some lists were not NULL-terminated
* Use calloc() instead of malloc()+memset() in read_license.c; this improves efficiency and closes a crasher on amd64
* Improve chooser_test.c so that it is not strict as to the *order* the results come back so long as they are the right licenses.
* To help diagnose possible xdg_mime errors, if we detect the hopeless application/octet-stream MIME type, fprintf a warning to stderr.
* Test that searching for unknown file types returns a NULL result rather than a segfault.
Brown paper bag release: liblicense claims that the RDF predicate for a file’s license is http://creativecommons.org/ns#License rather than http://creativecommons.org/ns#license. Only the latter is correct.
Any code compiled with liblicense between 0.6 and 0.7.1 (inclusive) contains this mistake.
This time I have audited the library for other insanities like the one fixed here, and there are none. Great thanks to Nathan Yergler for spotting this. I took this chance to change ll_write() and ll_read() to *NOT* take NULL as a valid predicate; this makes the implementation simpler (and more correct).
Sadly, I have bumped the API and ABI numbers accordingly. It’s available in SourceForge at http://sf.net/projects/cctools, and will be uploaded to Debian and Fedora shortly (and will follow from Debian to Ubuntu).
I’m going to head to Argentina for a vacation and Debconf shortly, so there’ll be no activity from on liblicense for a few weeks. I would love help with liblicense in the form of further unit tests. Let’s squash those bugs by just demonstrating all the cases the license should work in.Comments Off
A couple months ago I posted here about some of our experiences with Varnish Cache as an HTTP accelerator. By and large I have been very impressed with Varnish. We even found that it had the unexpected benefit of acting as a buffer in front of Apache, preventing Apache from getting overwhelmed with too many slow requests. Apache would get wedged once it had reached it’s MaxClients limit, whereas Varnish seems to happily queue up thousands of requests even if the backend (Apache) is going slowly.
However, after a while we started running into other problems with Varnish, and I found the probable answer in a bug report at the Varnish site. It turns out that Varnish was written with a 64 bit system in mind. That isn’t to say that it won’t work nicely on a 32 bit system, just that you better not expect high server load, or else you’ll start running into resource limitations in a hurry. This left us with about 2 options: Move to 64 bit or ditch Varnish for something like Squid. Seeing as how I was loathe to do the latter, we decided to go 64 bit, which in any case is another logical step into the 21st century.
The problem was that our servers are co-located in data centers around the country. We didn’t want to hassle with reprovisioning all of the them. Asheesh did the the first remote conversion based on some outdated document he found on remotely converting from Red Hat Linux to Debian. It went well and we haven’t had a single problem on that converted machine since. Varnish loves 64bit.
I have now converted two more machines, and this last time I documented the steps I took. I post them here for future reference and with the hope that it may help someone else. Note that these steps are somewhat specific to Debian Linux, but the concepts should be generally applicable to any UNIX-like system. There are no real instructions below, so you just have to infer the method from the steps. See the aforementioned article for more verbose, though dated, explanations. BE WARNED that if you make a mistake and don’t have some lovely rescue method then you may be forced to call your hosting company to salvage the wreckage:
- [ssh server]
- aptitude install linux-image-amd64
- [ssh server]
- sudo su -
- aptitude install debootstrap # if not already installed
- swapoff -a
- sfdisk -l /dev/sda # to determine swap partition, /dev/sda5 in this case
- mke2fs -j /dev/sda5
- mount /dev/sda5 /mnt
- cfdisk /dev/sda # set /dev/sda5 to type 83 (Linux)
- debootstrap –arch amd64 etch /mnt http://http.us.debian.org/debian
- mv /mnt/etc /mnt/etc.LOL
- cp -a /etc /mnt/
- mv /mnt/boot /mnt/boot.LOL
- cp -a /boot /mnt/ # this is really just so that the dpkg post-install hooks don’t issue lots of warnings about things not being in /boot that it expects.
- chroot /mnt
- aptitude update
- aptitude dist-upgrade
- aptitude install locales
- dpkg-reconfigure locales # optional (I selected All locales, default UTF-8)
- aptitude install ssh sudo grub vim # and any other things you want
- aptitude install linux-image-amd64
- vi /etc/fstab # change /dev/sda5 to mount on / and comment out old swap entry
- mkdir /home/nkinkade # just so I have a home, not necessary really
- exit # get out of chroot
- vi /boot/grub/menu.lst # change root= of default option from sda6 to sda5
- [ssh server]
- sudo su -
- mount /dev/sda6 /mnt
- chroot mnt
- dpkg –get-selections > ia32_dpkg_selections
- mv /home /home.LOL
- cp -a /mnt/home /
- mv /root /root.LOL
- cp -a /mnt/root /
- mkdir /mnt/ia32
- mv /mnt/* /mnt/ia32
- mv /mnt/.* /mnt/ia32
- cp -a bin boot dev etc etc.LOL home initrd initrd.img lib lib64 media opt root sbin srv tmp usr var vmlinuz /mnt
- mkdir /mnt/proc /mnt/sys
- vi /mnt/etc/fstab # make /dev/sda6 be mounted on / again, leave swap commented out
- vi /boot/grub/menu.lst # change the default boot option back to root=/dev/sda6
- [ssh server]
- sudo su -
- mkswap /dev/sda5
- vi /etc/fstab (uncomment swap line)
- swapon -a
- dpkg –set-selections < /ia32/ia32_dpkg_selections
- apt-get dselect-upgrade # step through all the questions about changed /etc/files, etc.
Creative Commons participates in Google Summer of Code™ and has accepted a proposal (see the abstract) of Hugo Dworak based on its description of a task to rewrite its now-defunct metadata validator. Asheesh Laroia has been assigned as the mentor of the project. The work began on May 26th, 2008 as per the project timeline. It is expected to be completed in twelve weeks. More details will be provided in the dedicated CC Wiki article and the progress will be weekly featured on this blog.
The project focuses on developing an on-line tool — free software written in Python — to validate digitally embedded Creative Commons licenses within files of different types. Files will be pasted directly to a form, identified by a URL, or uploaded by a user. The application will present the results in a human?readable fashion and notify the user if the means used to express the license terms are deprecated.1 Comment »
Over the past few months we have been migrating most of our web services to new servers. Squid Cache was in use on a number of the old servers as an HTTP accelerator, and we decided that while upgrading hardware and OS we might as well bring our HTTP accelerator fully into the 21st century. Enter Varnish Cache, which has some interesting architectural/design features.
Varnish was easy to install thanks to the Debian package management system, and the configuration file is vastly simpler than that of Squid despite a horrendous dearth of documentation. Varnish runs well and we are generally happy with it. However, after a few months we have encountered a number of gotchas, most of which probably have workarounds:
- Varnish seems to choke on files that are larger than around 600MB. No errors, just sends the client a 200 response with no other data.
- For some reason Bazaar (bzr) apparently does not function through Varnish, even when Varnish was instructed to “pass” requests to bzr repositories.
- bbPress for some unknown reason won’t function through Varnish.
- KeepAlives must to be turned off in Apache, otherwise pages randomly take 1 to 2 minutes to load sometimes. There is an open bug report for this at Varnish’s Trac page.
- Varnish logs are big. They get out of hand in a hurry. For creativecommons.org the log file can grow to 2GB+ in less than 30 minutes. No problem, but varnishlog doesn’t seem to want to write to a file larger than 2GB. It could have something to do with an email thread I read at Varnish’s site, which makes it seems like it might be related to the fact that we are running everything in 32 bit mode, though I believe our hardware support both 32 and 64 bit operation. This means that I have to run a special logrotate script about every 10 or 15 minutes to keep varnishlog from crashing.
I was recently experimenting and discovered that for some things that were apparently broken, configuring Varnish to “pipe” requests works, while using “pass” does not. This won’t make any sense unless you are familiar with VCL (Varnish Configuration Language). I know that “piping” fixed the bbPress issue, and I suspect that it will fix the Bazaar issue as well, though I haven’t tested it.
A week or so ago I experimented with turning off Varnish for creativecommons.org to see how Apache would handle the load unaided. Things seemed to be going well for a while, but within a weeks time the site went down twice. The second time I couldn’t revive Apache. There were kernel messages like
ip_conntrack: table full: packet dropped. Apparently the machine was just flooded and Apache was pegged at it’s MaxClients limit. I re-enabled Varnish and the problem went away immediately. So it appears that not only is Varnish doing a nice job of caching, but it also is able to handle many more simultaneous TCP connections than Apache without blowing up. Asheesh and I ran some experiments that seemed to demonstrate that Varnish actually helps to mitigate floods of traffic, whether they be natural or malicious.