Labs News
More Summer of Code — Liblicense, Tracker, Beagle,…
Jason Kivlighn, June 30th, 2007
Let’s see, where am I at.
Code in GStreamer to read the license URI is getting pushed through. Now there’s Bug #451939 that updates the GStreamer API with a license and copyright uri tag. When this all gets pushed through, access to the license URI will be available through GST_TAG_LICENSE_URI and/or GST_TAG_COPYRIGHT_URI.
In Tracker, I’ve written code to handle generic indexing of embedded/sidecar XMP. Previously it just extracted the license, and now any elements can be pulled out and indexed. Currently, Dublin Core and CC elements are indexed. The code is still local, and yet to be committed.
In another direction, I’ve been lending a hand to liblicense. As mentioned in Scott’s previous post, I’ve got two i/o modules ready. Both are based on Exempi. One reads/writes license metadata directly into Quicktime, AVI, PDF, PNG, TIFF, and JPEG formats. The other read/writes sidecar XMP for any format. There’s more to come.
I also want to look into a liblicense config module and frontend for KDE4. I figure I can put my KDE programming experience to good use.
And in yet another direction, I’m looking into indexing licenses in Beagle. After browsing the code, I can adapt most of what I learned about license metadata while working with Tracker to extending Beagle. I even notice that their image formats filters already support extracting XMP, so adding the extra license checks is straightforward. A preliminary patch and request for feedback has been posted on their mailing list.
All in all, I’ve done some work here and there, for this project and that…
1 Comment »Liblicense is alpha!
Scott Shawcroft, June 29th, 2007
Over the last week a lot of progress has been made on liblicense. Yesterday Jason and I got the module_read and module_write functions working with a stub io module and an XMP sidecar module. Tuesday and Wednesday I got the library’s system license functions working. Today I did some memory leak plugging and wrote out the system default functions. Nearly every part of the library works as planned. While its still rough, the bulk of the library work is done.
The most common data structure I’ve been using is a null-terminated list (really an array) of strings (char*). Yesterday I wrote out some common methods to be shared throughout the library. These are in list.c. My hope is that these common functions will allow the other code to be cleaner. Next week I plan on fixing up system_licenses.c to use the list functions. At the moment it is the largest, ugliest and leakiest of all the files. That will all be fixed Monday.
After the code cleanup on Monday the much more exciting task of creating modules and clients of the library begins. We’d like to support embedding in as many file formats as possible. Without this ability, the license tracking only works locally. One of the most useful libraries so far is Exempi which can embed in a number of formats. Jason wrote an Exempi liblicense module yesterday. On my list of clients to do is a Gnome Control Panel system default, Nautilus license select, Sugar license select and Creative Commons default license chooser. Am I missing anything important? Where could licenses be integrate besides this? Perhaps Amarok or an equivalent? ccHost? Let me know what you think.
No Comments »It’s coming! CC OpenOffice.Org Add-in
ksiomelo, June 27th, 2007
As you may know, I am working on this project as part of the Google Summer of Code program. Before starting, François Dechelle was already engaged and now we are joining efforts to develop this potentially-popular application!
The current state of the prototype is already adding the licenses in the body of the document.

After the license have been chosen, it becomes available as an Auto-Text, being able to be easily replicable in the document without having to create a new wizard. The name of the license and the URL are also being stored in the document’ metadata.

Currently I am working on a weird bug which is not allowing the ooo API to retrieve some images of the available licenses at http://i.creativecommons.org/l/ (it was working perfectly few days ago…)
We’re going to commit a stable version ready for download in the next weeks, but if you are eager to use it you can check out this folder at cctools repository:
Feedbacks and suggestions are welcome!
Cheers
3 Comments »Enhanced Metadata Graduates from Labs
Nathan Yergler, June 21st, 2007
Early this morning we launched some functionality on the main “license chooser”:http://creativecommons.org/license previously available only on “Labs”:http://labs.creativecommons.org. As many (ok, at least a few) people have noted, we previously stopped embedding RDF in the HTML generated by the chooser. As we’ve “noted”:http://wiki.creativecommons.org/Extend_Metadata#Embedding_RDF_in_HTML in the past, RDF in a comment has several draw backs, not the least of which is that it’s opaque to parsers. The new update to the license chooser restores the embedded metadata using “RDFa”:http://rdfa.info.
As the name implies, RDFa is a way of expressing RDF using _attributes_ in the HTML. This is similar to microformats, but different in that any RDFa parser can read any RDFa information — no special knowledge required. So the new metadata once again allows you to encode the name of your work, your name, and the type of work, all in the HTML. A full example (with all fields filled in) is shown here:

CC TechBlog by
Creative Commons is licensed under a
Creative Commons Attribution 3.0 License.
Based on a work at
creativecommons.org.
Permissions beyond the scope of this license may be available at
http://creativecommons.org/policies.
So how do you know the metadata is there? Check out the “RDFa Bookmarklets”:http://www.w3.org/2006/07/SWD/RDFa/impl/js/ which demonstrate how you can expose the information using some simple Javascript.
*UPDATE* Unfortunately WordPress MU strips out attributes it doesn’t recognize, so the example above isn’t as complete as it could be.
No Comments »System Integrated Licensing
Scott Shawcroft, June 19th, 2007
I’ve been asked, as a tech intern here at Creative Commons, to create a way of locally tracking file licenses on a system. A while back Jon wrote down his ideas about system-wide license tracking on the Creative Commons wiki. The purpose of this system would be to provide an interface for developers to access the available licenses on a system. Additionally, like the existing online license chooser, this library, called libLicense, will feature a way to choose a license through toggling certain flags available for a family of licenses. Naturally, the first family available will be the Creative Commons licenses. The larger goal for the summer is to utilize this library in a few initial systems. Currently, I’m looking at integration into Gnome and Sugar (from the One Laptop Per Child project). This further work will occur after libLicense is working.
Data
To run libLicense the data of all the licenses will need to be stored in some sort of fashion. My initial thought is this:
- All data will be stored in a directory. On Linux this directory would be /usr/share/licenses . (This is borrowed from Jon’s thoughts.)
- Families of licenses will be stored in a subdirectory of the licenses directory. For example, the Creative Commons licenses would be stored within creative_commons.
- Within these family directories each specific license will be stored in a file with the naming scheme <bitcode>-<short name>-<jurisdiction>-<locale>.license . These files will store the license uri, name, status (active or retired), description and legal text. How this will be stored is up in the air. My intial thoughts include separating each attribute on a line or having a format similar to .desktop files.
- In addition to storing license data, some family information must be stored, namely the family bit flags. In the case of the Creative Commons licenses, the bit flags would be Attribution, Share-Alike, Non-Commercial and No Derivatives. They would combine to create the bitcode present in the license filename. These bit flags would be the heart of the license chooser logic. If the combination does not exist, the flags are incompatible.
API
The library would potentially have these functions:
get_jurisdiction(uri) - returns the jurisdiction for the given license.
get_jurisdictions(short or bitcode) - returns the available jurisdiction for the given short name or bitcode.
get_locale(uri) – returns the locale for the given license.
get_locales(jurisdiction, short or bitcode) – returns the available locales for the given jurisdiction and short name or bitcode.
get_name(uri) – returns the name of the license.
get_version(uri) – returns the version of the license.
get_versions(short, jurisdiction) - returns the available versions for the given short name or bitcode and jurisdiction.
get_short(uri) - returns the short name for the given uri.
has_flag(attribute,uri) – returns if the flag is set for the given uri.
family_flags(family) - returns the flags available for a given family.
family(uri) – returns the family the given uri belongs to.
get_notification(uri[,url]) - returns the notification string for the given url with an option to provide a verification url.
verify_uri(uri) - returns whether or not the given uri is recognized by the system.
get_license(family,flags, jurisdiction,locale) – returns the uri which satisfies the given attributes.
get_all_licenses() - returns all general licenses available.
get_general_licenses(family) - returns all general licenses in a family.
get_families() – returns a list of available families.
Did I miss something? Does something not make sense? Please post a comment.
2 Comments »Indexing License Metadata in Tracker, Week 2
Jason Kivlighn, June 19th, 2007
I’ve made progress extracting licenses from the following formats: Vorbis, MP3, FLAC, PDF, JPEG, TIFF, PNG, PDF, HTML, and MSOffice. They are by no means all done, but for several formats I have patches and am awaiting approval from Tracker.
I’ve written a GStreamer bug report and submitted a patch to allow reading the WCOP (License URI) id3v2 tag. Discussion continues there.
No luck with video metadata (AVI, Matroska, OGM, Quicktime). Things are just too ad-hoc in that arena to get anything worthwhile done. For Tracker, GStreamer is doing all the work on extracting video metadata, but as far as I can tell, nothing relating to licenses ever gets extracted and passed on to Tracker. GStreamer would need to be updated to read the tags, but that can’t be done unless there are consistent specs on how to do so. Exempi can embed XMP into MOV and AVI, but I don’t know how to get it back out. It may or may not be feasible to write an extractor that only extracts XMP using Exempi.
Information on various file formats’ metadata is available here: http://wiki.creativecommons.org/Tracker_CC_Indexing While Tracker won’t specifically be indexing every format mentioned, I’m trying to document the formats relevant to Creative Commons. If I’m missing any important formats, please let me know.
Overall, things are progressing well. At the rate things are going, by the end of the summer I’ll have become a manual for file format specifications :-/
Cheers
1 Comment »Indexing License Metadata in Tracker, Week 1
Jason Kivlighn, June 12th, 2007
Week 1 of Google Summer of Code is complete and already I’m seeing much progress. There’s a mess of formats to embed licenses into and a mess of ways to embed them. My first task has been straightening out where licenses are embedded in each format and how exactly to go about extracting them. Here’s where I’m at:
| Format | Form of Metadata | Location of Metadata | Extraction with Tracker | Test content |
| MP3 |
|
|
Extracting MP3 tags has moved from an ID3 parser to handing off the work to GStreamer/MPlayer/Totem. As far as I can tell, this prevents me from extracting the XMP. | XMP embedded with Exempi |
| XMP | metadata field | Extend the current PDF extractor (which uses Poppler) to read the metadata field. However reading the metadata field isn’t wrapped in Poppler’s glib bindings, but I have written and submitted a patch. | XMP embedded with Exempi | |
| OGG |
|
|
Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10). | XMP embedded with vorbiscomment |
| JPEG | XMP | Exif XML Packet field | Extend the Imagemagick extractor, using ‘convert file.jpg xmp:-’ to read XMP | XMP embedded with Exempi |
| PNG | XMP | iTXt, XML:com:adobe:xmp field | Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.) | XMP embedded with Exempi |
| HTML | RDFa | <a rel=”license” href=”…”></a> | Write a new HTML extractor, using libxml2, and scan for RDFa | Various actual sites, including creativecommons.org |
| SVG | RDF | /svg/metadata/rdf | I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also??? | Inkscape |
| Any XML | XMP | Wherever valid | Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2 | |
| OpenOffice.org (OASIS) | OO.org CC License Add-In SoC Project is working on the spec | OO.org Add-In | ||
| MS Office | DocumentSummaryInformation Infile, CreativeCommons_LicenseURL property | Extend existing msoffice extractor | MSOffice Add-in | |
If this is all well and good, I’d like to help update the CC Wiki with updated embedding specifications.
As far as coding goes, I wrote the code for Tracker to check for and extract metadata from XMP sidecar files. XMP is parsed by Hubert’s XMP library. The timing of Adobe’s release of their XMP Toolkit and Hubert subsequently release of Exempi 1.99.x, have been an early boon to the project. The ‘license’ tag in the CC namespace is the only metadata extracted at the moment.
I’ve also been hacking the extractors of the above list of formats to determine the feasibility and processes of extracting license metadata from each.
Where I stand now is that feedback on the above would be much appreciated and if all is well I can get the XMP sidecar code I have pushed into Tracker’s Subversion repository soon.
Happy hacking, indeed.
6 Comments »Exempi 1.99.0 Released
alex, May 30th, 2007
Following on from Jon’s follow-up, Hubert Figuiere has released Exempi 1.99.0, now based on Adobe’s XMP SDK.
His blog post includes a nice snippet of code showing how to apply a CC license to a PDF:
#include <exempi/xmp.h>
...
XmpFilePtr f;
f = xmp_files_open_new("test.pdf", XMP_OPEN_FORUPDATE);
XmpPtr xmp = xmp_files_get_new_xmp(f);
xmp_set_property(xmp, NS_XAP_RIGHTS, "Copyright", "(c) ACME Inc., some rights reserved"
" - This work is licensed to the public under the Creative Commons Attribution-ShareAlike "
"license http://creativecommons.org/licenses/by-sa/2.0/");
xmp_files_put_xmp(f, xmp);
xmp_free(xmp);
xmp_files_close(f, XMP_CLOSE_SAFEUPDATE);
Excellent news for the community, and for the continuing saga of XMP.
No Comments »ccPublisher 5.0 requirements
Mike Linksvayer, May 18th, 2007
Described at Move My Data. Found via this comment.
(But don’t get too excited — ccPublisher is actually only at version 2.2.1 and Move My Data looks like vaporware at this point.)
No Comments »CC: The Clear Choice for BotNets Everywhere
Nathan Yergler, May 17th, 2007
I was working this morning and noticed that “#cc”:http://wiki.creativecommons.org/IRC, our IRC channel, was particularly active. But I couldn’t figure out what they were talking about. It didn’t look like any conversation about licensing _I’d_ ever seen. And then I realized: it was a botnet rental negotiation. What I especially loved is the question, “you a fed?” Presumably *K_Soze* is under the false impression that if a law enforcement officer answers the question dishonestly, they’re guilty of entrapment.
Presented for your enjoyment, the logs:
[09:19am] K_Soze: Evenin all
[09:20am] K_Soze: I’m wondering what seller sells?
[09:20am] Seller: hi K_Soze
[09:20am] Seller: that depends…
[09:20am] Seller: what you want to buy?
[09:21am] K_Soze: Well that’s always the question… but it’s also a matter of whether people can supply… and people can supply the right quality and quantity?
[09:21am] Seller: K_Soze: you will have to be more specific?
[09:22am] K_Soze: I thought a channel called cc , talking a guy called seller was specific enough, but maybe not….
[09:23am] Seller: you are not a fed are you?
[09:23am] K_Soze: There’s no need to get spooked… I’ve dealt with people on this chan before who I’m sure will vouch for me
[09:24am] Seller: lets just say I like to help people out
[09:24am] Seller: you got a problem – i got a solution
[09:25am] K_Soze: Yeah but not everyone is looking for such a high quality solution… the cc bit is generally easy… it’s other more…… “mechanical” things I’m interested in…
[09:25am] Seller: hmmm
[09:25am] Seller: i sell by the thousands
[09:25am] Seller: top quality
[09:26am] K_Soze: Thousands? Well I’ve spoken to people who sell by the thousands… they ask if I want one or two thousand….
[09:26am] K_Soze: but that’s obviously not what’ i’m interested in
[09:26am] Seller: you dont want a couple thousand mechanical friends?
[09:26am] Seller: to help out?
[09:26am] K_Soze: No, I want more than a couple…
[09:26am] Seller: how much more?
[09:27am] K_Soze: Well I guess it’s not so much the number as the commotion such a bunch of friends could induce…
[09:28am] Seller: can do pings, can do http, can do smtp
[09:28am] K_Soze: Geographically dispersed?
[09:28am] Seller: very very effective
[09:28am] Seller: all over the place – china, russia, usa
[09:29am] Seller: australia
[09:29am] Seller: i can mix them up for ya
[09:29am] K_Soze: Hrmmmm…..
[09:29am] K_Soze: by the job or by day / week?
[09:29am] Seller: you rent them per week
[09:29am] Seller: web interface – very easy
[09:30am] K_Soze: web? no IRC?
[09:30am] Seller: yep web – very easy
[09:30am] K_Soze: hrmm interesting…. would be curious to give them a run…. this collection… they attracting much attention?
[09:31am] Seller: barely used so far
[09:31am] nathany: uh, do you guys realize this channel is for Creative Commons license-related discussion?
[09:31am] Seller: oh oops
[09:31am] Seller: K_Soze: see ya later
[09:31am] nathany: yeah, oops
[09:31am] Seller left the chat room.
[09:31am] K_Soze: cc = Creative commons? laters.
[09:31am] K_Soze left the chat room. (”leaving”)
