metadata
RFC 4946
Mike Linksvayer, July 19th, 2007
James Snell writes that the Atom License Extension is now Experimental RFC 4946.
Many thanks to James Snell for at least two years of work on this.
What is needed to move further along the standards track? More implementations.
There’s a page on the CC wiki about licensing and syndication standards.
1 Comment »Exempi 1.99.3 Released
Jason Kivlighn, July 11th, 2007
Hubert Figuiere has released Exempi 1.99.3
An important addition in this release is the ability to serialize XMP to a string, making sidecar XMP possible. The soon-to-be-released Liblicense 0.1 already takes advantage of this feature; it uses Exempi to read and write licenses within XMP sidecar.
Hopefully, the API will soon stabilize in preparation for the 2.0 release.
No Comments »Sidecar XMP and License Extractors in Tracker
Jason Kivlighn, July 10th, 2007
Tracker has accepted my patches to read XMP sidecar, as well as patches to extract licenses from MS Office (old format), TIFF, HTML, PNG, and PDF. This support will be available in the 0.6 release, which potentially will be released later this week.
My final set of patches will additionally add support for extracting licenses from JPEG, SVG, and OpenOffice’s OASIS. Also, through GStreamer, Tracker already recognizes licenses of Vorbis and FLAC.
This marks the half-way point of Summer of Code 2007.
1 Comment »CC OpenOffice.Org AddIn updates
ksiomelo, July 7th, 2007
Hello all,
updates of the version 0.0.2:
* Creative Commons menu became visible in Calc and Impress
Although it’s is not working properly yet, the addin is now supporting the other ooo applications.
* License Image bug fixed
Now the addin is retrieving the licenses images perfectly!
* Display dialog when opening licensed documents
A simple dialog box is shown when a CC licensed document is opened.
* Checks if the document is already licensed and warns the user
Something like “You have chosen a different license, do you want to proceed anyway?”
… serveral other minor-updates were made in the addin.
Take a look at the screen shots!
License inserted in the document
Document already licensed
Opening a licensed document
Using the CC autotext to replicate the license
Want to try? Just download the ccooo.oxt file and install it from Extension Manager in OpenOffice.Org!
Next steps:
More Summer of Code — Liblicense, Tracker, Beagle,…
Jason Kivlighn, June 30th, 2007
Let’s see, where am I at.
Code in GStreamer to read the license URI is getting pushed through. Now there’s Bug #451939 that updates the GStreamer API with a license and copyright uri tag. When this all gets pushed through, access to the license URI will be available through GST_TAG_LICENSE_URI and/or GST_TAG_COPYRIGHT_URI.
In Tracker, I’ve written code to handle generic indexing of embedded/sidecar XMP. Previously it just extracted the license, and now any elements can be pulled out and indexed. Currently, Dublin Core and CC elements are indexed. The code is still local, and yet to be committed.
In another direction, I’ve been lending a hand to liblicense. As mentioned in Scott’s previous post, I’ve got two i/o modules ready. Both are based on Exempi. One reads/writes license metadata directly into Quicktime, AVI, PDF, PNG, TIFF, and JPEG formats. The other read/writes sidecar XMP for any format. There’s more to come.
I also want to look into a liblicense config module and frontend for KDE4. I figure I can put my KDE programming experience to good use.
And in yet another direction, I’m looking into indexing licenses in Beagle. After browsing the code, I can adapt most of what I learned about license metadata while working with Tracker to extending Beagle. I even notice that their image formats filters already support extracting XMP, so adding the extra license checks is straightforward. A preliminary patch and request for feedback has been posted on their mailing list.
All in all, I’ve done some work here and there, for this project and that…
1 Comment »Liblicense is alpha!
Scott Shawcroft, June 29th, 2007
Over the last week a lot of progress has been made on liblicense. Yesterday Jason and I got the module_read and module_write functions working with a stub io module and an XMP sidecar module. Tuesday and Wednesday I got the library’s system license functions working. Today I did some memory leak plugging and wrote out the system default functions. Nearly every part of the library works as planned. While its still rough, the bulk of the library work is done.
The most common data structure I’ve been using is a null-terminated list (really an array) of strings (char*). Yesterday I wrote out some common methods to be shared throughout the library. These are in list.c. My hope is that these common functions will allow the other code to be cleaner. Next week I plan on fixing up system_licenses.c to use the list functions. At the moment it is the largest, ugliest and leakiest of all the files. That will all be fixed Monday.
After the code cleanup on Monday the much more exciting task of creating modules and clients of the library begins. We’d like to support embedding in as many file formats as possible. Without this ability, the license tracking only works locally. One of the most useful libraries so far is Exempi which can embed in a number of formats. Jason wrote an Exempi liblicense module yesterday. On my list of clients to do is a Gnome Control Panel system default, Nautilus license select, Sugar license select and Creative Commons default license chooser. Am I missing anything important? Where could licenses be integrate besides this? Perhaps Amarok or an equivalent? ccHost? Let me know what you think.
No Comments »It’s coming! CC OpenOffice.Org Add-in
ksiomelo, June 27th, 2007
As you may know, I am working on this project as part of the Google Summer of Code program. Before starting, François Dechelle was already engaged and now we are joining efforts to develop this potentially-popular application!
The current state of the prototype is already adding the licenses in the body of the document.

After the license have been chosen, it becomes available as an Auto-Text, being able to be easily replicable in the document without having to create a new wizard. The name of the license and the URL are also being stored in the document’ metadata.

Currently I am working on a weird bug which is not allowing the ooo API to retrieve some images of the available licenses at http://i.creativecommons.org/l/ (it was working perfectly few days ago…)
We’re going to commit a stable version ready for download in the next weeks, but if you are eager to use it you can check out this folder at cctools repository:
Feedbacks and suggestions are welcome!
Cheers
3 Comments »Enhanced Metadata Graduates from Labs
Nathan Yergler, June 21st, 2007
Early this morning we launched some functionality on the main “license chooser”:http://creativecommons.org/license previously available only on “Labs”:http://labs.creativecommons.org. As many (ok, at least a few) people have noted, we previously stopped embedding RDF in the HTML generated by the chooser. As we’ve “noted”:http://wiki.creativecommons.org/Extend_Metadata#Embedding_RDF_in_HTML in the past, RDF in a comment has several draw backs, not the least of which is that it’s opaque to parsers. The new update to the license chooser restores the embedded metadata using “RDFa”:http://rdfa.info.
As the name implies, RDFa is a way of expressing RDF using _attributes_ in the HTML. This is similar to microformats, but different in that any RDFa parser can read any RDFa information — no special knowledge required. So the new metadata once again allows you to encode the name of your work, your name, and the type of work, all in the HTML. A full example (with all fields filled in) is shown here:

CC TechBlog by
Creative Commons is licensed under a
Creative Commons Attribution 3.0 License.
Based on a work at
creativecommons.org.
Permissions beyond the scope of this license may be available at
http://creativecommons.org/policies.
So how do you know the metadata is there? Check out the “RDFa Bookmarklets”:http://www.w3.org/2006/07/SWD/RDFa/impl/js/ which demonstrate how you can expose the information using some simple Javascript.
*UPDATE* Unfortunately WordPress MU strips out attributes it doesn’t recognize, so the example above isn’t as complete as it could be.
No Comments »Indexing License Metadata in Tracker, Week 2
Jason Kivlighn, June 19th, 2007
I’ve made progress extracting licenses from the following formats: Vorbis, MP3, FLAC, PDF, JPEG, TIFF, PNG, PDF, HTML, and MSOffice. They are by no means all done, but for several formats I have patches and am awaiting approval from Tracker.
I’ve written a GStreamer bug report and submitted a patch to allow reading the WCOP (License URI) id3v2 tag. Discussion continues there.
No luck with video metadata (AVI, Matroska, OGM, Quicktime). Things are just too ad-hoc in that arena to get anything worthwhile done. For Tracker, GStreamer is doing all the work on extracting video metadata, but as far as I can tell, nothing relating to licenses ever gets extracted and passed on to Tracker. GStreamer would need to be updated to read the tags, but that can’t be done unless there are consistent specs on how to do so. Exempi can embed XMP into MOV and AVI, but I don’t know how to get it back out. It may or may not be feasible to write an extractor that only extracts XMP using Exempi.
Information on various file formats’ metadata is available here: http://wiki.creativecommons.org/Tracker_CC_Indexing While Tracker won’t specifically be indexing every format mentioned, I’m trying to document the formats relevant to Creative Commons. If I’m missing any important formats, please let me know.
Overall, things are progressing well. At the rate things are going, by the end of the summer I’ll have become a manual for file format specifications :-/
Cheers
1 Comment »Indexing License Metadata in Tracker, Week 1
Jason Kivlighn, June 12th, 2007
Week 1 of Google Summer of Code is complete and already I’m seeing much progress. There’s a mess of formats to embed licenses into and a mess of ways to embed them. My first task has been straightening out where licenses are embedded in each format and how exactly to go about extracting them. Here’s where I’m at:
| Format | Form of Metadata | Location of Metadata | Extraction with Tracker | Test content |
| MP3 |
|
|
Extracting MP3 tags has moved from an ID3 parser to handing off the work to GStreamer/MPlayer/Totem. As far as I can tell, this prevents me from extracting the XMP. | XMP embedded with Exempi |
| XMP | metadata field | Extend the current PDF extractor (which uses Poppler) to read the metadata field. However reading the metadata field isn’t wrapped in Poppler’s glib bindings, but I have written and submitted a patch. | XMP embedded with Exempi | |
| OGG |
|
|
Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10). | XMP embedded with vorbiscomment |
| JPEG | XMP | Exif XML Packet field | Extend the Imagemagick extractor, using ‘convert file.jpg xmp:-’ to read XMP | XMP embedded with Exempi |
| PNG | XMP | iTXt, XML:com:adobe:xmp field | Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.) | XMP embedded with Exempi |
| HTML | RDFa | <a rel=”license” href=”…”></a> | Write a new HTML extractor, using libxml2, and scan for RDFa | Various actual sites, including creativecommons.org |
| SVG | RDF | /svg/metadata/rdf | I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also??? | Inkscape |
| Any XML | XMP | Wherever valid | Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2 | |
| OpenOffice.org (OASIS) | OO.org CC License Add-In SoC Project is working on the spec | OO.org Add-In | ||
| MS Office | DocumentSummaryInformation Infile, CreativeCommons_LicenseURL property | Extend existing msoffice extractor | MSOffice Add-in | |
If this is all well and good, I’d like to help update the CC Wiki with updated embedding specifications.
As far as coding goes, I wrote the code for Tracker to check for and extract metadata from XMP sidecar files. XMP is parsed by Hubert’s XMP library. The timing of Adobe’s release of their XMP Toolkit and Hubert subsequently release of Exempi 1.99.x, have been an early boon to the project. The ‘license’ tag in the CC namespace is the only metadata extracted at the moment.
I’ve also been hacking the extractors of the above list of formats to determine the feasibility and processes of extracting license metadata from each.
Where I stand now is that feedback on the above would be much appreciated and if all is well I can get the XMP sidecar code I have pushed into Tracker’s Subversion repository soon.
Happy hacking, indeed.
6 Comments »