embedding

liblicense 0.8.1: The bugfixiest release ever

asheesh, December 25th, 2008

I’m greatly pleased to announce liblicense 0.8.1. Steren and Greg found a number of major issues (Greg found a consistent crasher on amd64, and Steren found a consistent crasher in the Python bindings). These issues, among
some others, are fixed by the wondrous liblicense 0.8.1. I mentioned to Nathan Y. that liblicense is officially “no longer ghetto.”

The best way enjoy liblicense is from our Ubuntu and Debian package repository, at http://mirrors.creativecommons.org/packages/. More information on what liblicense does is available on our wiki page about liblicense. You can also get them in fresh Fedora 11 packages. And the source tarball is available for download from sourceforge.net.

P.S. MERRY CHRISTMAS!

The full ChangeLog snippet goes like this:

liblicense 0.8.1 (2008-12-24):
* Cleanups in the test suite: test_predicate_rw’s path joiner finally works
* Tarball now includes data_empty.png
* Dynamic tests and static tests treat $HOME the same way
* Fix a major issue with requesting localized informational strings, namely that the first match would be returned rather than all matches (e.g., only the first license of a number of matching licenses). This fixes the Python bindings, which use localized strings.
* Add a cooked PDF example that actually works with exempi; explain why that is not a general solution (not all PDFs have XMP packets, and the XMP packet cannot be resized by libexempi)
* Add a test for writing license information to the XMP in a PNG
* Fix a typo in exempi.c
* Add basic support for storing LL_CREATOR in exempi.c
* In the case that the system locale is unset (therefore, is of value “C”), assume English
* Fix a bug with the TagLib module: some lists were not NULL-terminated
* Use calloc() instead of malloc()+memset() in read_license.c; this improves efficiency and closes a crasher on amd64
* Improve chooser_test.c so that it is not strict as to the *order* the results come back so long as they are the right licenses.
* To help diagnose possible xdg_mime errors, if we detect the hopeless application/octet-stream MIME type, fprintf a warning to stderr.
* Test that searching for unknown file types returns a NULL result rather than a segfault.

Comments Off

Government Information Licensing Framework

frank, December 12th, 2008

David Torpie of the Office of Economic and Statistical Research at the Queensland Treasury gave a talk on “Government Information Licensing Framework: a multidisciplinary project improving access to Public Sector Information.” This is a project to give greater access to Australian government data, to make government more transparent, and in doing so to develop a standard set of terms and conditions that are broadly applicable to other government contexts.

David first answered the question of why the Australian government even needs to worry about licensing its works. In Australia, unlike the United States, the government has copyright over works produced by government agencies. Australian copyright law also extends to less-than-creative works (such as telephone directories), which increases the importance that public licensing is clear and simple.

The solution developed at the Queensland Treasury is “digital license management”, or DLM. DLM is a technology to embed license metadata into documents and other works, developed in Java. Benefits include ease of linking from data to license, and finding information based on its license. DLM was developed before a suitable alternative was available; liblicense now provides a similar functionality in C. The team developing DLM is working with CC’s tech team for collaboration, and initial indications are that a dedicated Java tool may prove very useful.

Comments Off

Sidecar XMP and License Extractors in Tracker

jakin, July 10th, 2007

Tracker has accepted my patches to read XMP sidecar, as well as patches to extract licenses from MS Office (old format), TIFF, HTML, PNG, and PDF. This support will be available in the 0.6 release, which potentially will be released later this week.

My final set of patches will additionally add support for extracting licenses from JPEG, SVG, and OpenOffice’s OASIS. Also, through GStreamer, Tracker already recognizes licenses of Vorbis and FLAC.

This marks the half-way point of Summer of Code 2007.

1 Comment »

More Summer of Code — Liblicense, Tracker, Beagle,…

jakin, June 30th, 2007

Let’s see, where am I at.

Code in GStreamer to read the license URI is getting pushed through. Now there’s Bug #451939 that updates the GStreamer API with a license and copyright uri tag. When this all gets pushed through, access to the license URI will be available through GST_TAG_LICENSE_URI and/or GST_TAG_COPYRIGHT_URI.

In Tracker, I’ve written code to handle generic indexing of embedded/sidecar XMP. Previously it just extracted the license, and now any elements can be pulled out and indexed. Currently, Dublin Core and CC elements are indexed. The code is still local, and yet to be committed.

In another direction, I’ve been lending a hand to liblicense. As mentioned in Scott’s previous post, I’ve got two i/o modules ready. Both are based on Exempi. One reads/writes license metadata directly into Quicktime, AVI, PDF, PNG, TIFF, and JPEG formats. The other read/writes sidecar XMP for any format. There’s more to come.

I also want to look into a liblicense config module and frontend for KDE4. I figure I can put my KDE programming experience to good use.

And in yet another direction, I’m looking into indexing licenses in Beagle. After browsing the code, I can adapt most of what I learned about license metadata while working with Tracker to extending Beagle. I even notice that their image formats filters already support extracting XMP, so adding the extra license checks is straightforward. A preliminary patch and request for feedback has been posted on their mailing list.

All in all, I’ve done some work here and there, for this project and that…

1 Comment »

Liblicense is alpha!

tannewt, June 29th, 2007

Over the last week a lot of progress has been made on liblicense. Yesterday Jason and I got the module_read and module_write functions working with a stub io module and an XMP sidecar module. Tuesday and Wednesday I got the library’s system license functions working. Today I did some memory leak plugging and wrote out the system default functions. Nearly every part of the library works as planned. While its still rough, the bulk of the library work is done.

The most common data structure I’ve been using is a null-terminated list (really an array) of strings (char*). Yesterday I wrote out some common methods to be shared throughout the library. These are in list.c. My hope is that these common functions will allow the other code to be cleaner. Next week I plan on fixing up system_licenses.c to use the list functions. At the moment it is the largest, ugliest and leakiest of all the files. That will all be fixed Monday.

After the code cleanup on Monday the much more exciting task of creating modules and clients of the library begins. We’d like to support embedding in as many file formats as possible. Without this ability, the license tracking only works locally. One of the most useful libraries so far is Exempi which can embed in a number of formats. Jason wrote an Exempi liblicense module yesterday. On my list of clients to do is a Gnome Control Panel system default, Nautilus license select, Sugar license select and Creative Commons default license chooser. Am I missing anything important? Where could licenses be integrate besides this? Perhaps Amarok or an equivalent? ccHost? Let me know what you think.

Comments Off

Indexing License Metadata in Tracker, Week 2

jakin, June 19th, 2007

I’ve made progress extracting licenses from the following formats: Vorbis, MP3, FLAC, PDF, JPEG, TIFF, PNG, PDF, HTML, and MSOffice. They are by no means all done, but for several formats I have patches and am awaiting approval from Tracker.

I’ve written a GStreamer bug report and submitted a patch to allow reading the WCOP (License URI) id3v2 tag. Discussion continues there.

No luck with video metadata (AVI, Matroska, OGM, Quicktime). Things are just too ad-hoc in that arena to get anything worthwhile done. For Tracker, GStreamer is doing all the work on extracting video metadata, but as far as I can tell, nothing relating to licenses ever gets extracted and passed on to Tracker. GStreamer would need to be updated to read the tags, but that can’t be done unless there are consistent specs on how to do so. Exempi can embed XMP into MOV and AVI, but I don’t know how to get it back out. It may or may not be feasible to write an extractor that only extracts XMP using Exempi.

Information on various file formats’ metadata is available here: http://wiki.creativecommons.org/Tracker_CC_Indexing While Tracker won’t specifically be indexing every format mentioned, I’m trying to document the formats relevant to Creative Commons. If I’m missing any important formats, please let me know.

Overall, things are progressing well. At the rate things are going, by the end of the summer I’ll have become a manual for file format specifications :-/

Cheers

1 Comment »

Indexing License Metadata in Tracker, Week 1

jakin, June 12th, 2007

Week 1 of Google Summer of Code is complete and already I’m seeing much progress. There’s a mess of formats to embed licenses into and a mess of ways to embed them. My first task has been straightening out where licenses are embedded in each format and how exactly to go about extracting them. Here’s where I’m at:

Format Form of Metadata Location of Metadata Extraction with Tracker Test content
MP3
  • XMP
  • Native id3 tags
  • For id3v24, the PRIV,XMP field
  • WCOP tag
Extracting MP3 tags has moved from an ID3 parser to handing off the work to GStreamer/MPlayer/Totem. As far as I can tell, this prevents me from extracting the XMP. XMP embedded with Exempi
PDF XMP metadata field Extend the current PDF extractor (which uses Poppler) to read the metadata field. However reading the metadata field isn’t wrapped in Poppler’s glib bindings, but I have written and submitted a patch. XMP embedded with Exempi
OGG
  • XMP
  • Native comment field
  • XMP comment field
  • LICENSE comment field
Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10). XMP embedded with vorbiscomment
JPEG XMP Exif XML Packet field Extend the Imagemagick extractor, using ‘convert file.jpg xmp:-‘ to read XMP XMP embedded with Exempi
PNG XMP iTXt, XML:com:adobe:xmp field Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.) XMP embedded with Exempi
HTML RDFa <a rel=”license” href=”…”></a> Write a new HTML extractor, using libxml2, and scan for RDFa Various actual sites, including creativecommons.org
SVG RDF /svg/metadata/rdf I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also??? Inkscape
Any XML XMP Wherever valid Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2
OpenOffice.org (OASIS) OO.org CC License Add-In SoC Project is working on the spec OO.org Add-In
MS Office DocumentSummaryInformation Infile, CreativeCommons_LicenseURL property Extend existing msoffice extractor MSOffice Add-in

If this is all well and good, I’d like to help update the CC Wiki with updated embedding specifications.

As far as coding goes, I wrote the code for Tracker to check for and extract metadata from XMP sidecar files. XMP is parsed by Hubert’s XMP library. The timing of Adobe’s release of their XMP Toolkit and Hubert subsequently release of Exempi 1.99.x, have been an early boon to the project. The ‘license’ tag in the CC namespace is the only metadata extracted at the moment.

I’ve also been hacking the extractors of the above list of formats to determine the feasibility and processes of extracting license metadata from each.

Where I stand now is that feedback on the above would be much appreciated and if all is well I can get the XMP sidecar code I have pushed into Tracker’s Subversion repository soon.

Happy hacking, indeed.

6 Comments »

More XMP Toolkit Plugs

rejon, May 13th, 2007

In a follow-up to Mike’s post about XMP, I (through CC) have been working with Adobe XMP’s product manager, Gunar Penikis, on how CC and Adobe can work together on XMP. Also, in the same line, I’m friends with and working with Cyrille Berger and Hubert Figuiere, who have each noted how positive of a step releasing XMP SDK/Toolkit under a BSD license is for the larger community.

I’m having some other discussions with all the above mentioned folks with regards to how this is going to pan out, but all I can say is that it is going to encouage XMP to flourish, and return help smooth out metadata and embedding across the board.

This really frees up the space for more developments

1 Comment »