metadata
New validator released!
asheesh, January 6th, 2009
This past summer, Hugo Dworak worked with us (thanks to Google Summer of Code) on a new validator. This work was greatly overdue, and we are very pleased that Google could fund Hugo to work on it. Our previous validator had not been updated to reflect our new metadata standards, so we disabled it some time ago to avoid creating further confusion. The textbook on CC metadata is the “Creative Commons Rights Expression Language”, or ccREL, which specifies the use of RDFa on the web. (If this sounds like keyword soup, rest assured that the License Engine generates HTML that you can copy and paste; that HTML is fully compliant with ccREL.) We hoped Hugo’s work on a new validator would let us offer a validator to the Creative Commons community so that publishers can test their web pages to make sure they encode the information they intended.
Hugo’s work was a success; he announced in August 2008 a test version of the validator. He built on top of the work of others: the new validator uses the Pylons web framework, html5lib for HTML parsing and tokenizing, and RDFlib for working with RDF. He shared his source code under the recent free software license built for network services, AGPLv3.
So I am happy to announce that the test period is complete, and we are now running the new code at http://validator.creativecommons.org/. Our thanks go out to Hugo, and we look forward to the new validator gaining some use as well as hearing your feedback. If you want to contribute to the validator’s development or check it out for any reason, take a look at the documentation on the CC wiki.
1 Comment »liblicense 0.8.1: The bugfixiest release ever
asheesh, December 25th, 2008
I’m greatly pleased to announce liblicense 0.8.1. Steren and Greg found a number of major issues (Greg found a consistent crasher on amd64, and Steren found a consistent crasher in the Python bindings). These issues, among
some others, are fixed by the wondrous liblicense 0.8.1. I mentioned to Nathan Y. that liblicense is officially “no longer ghetto.”
The best way enjoy liblicense is from our Ubuntu and Debian package repository, at http://mirrors.creativecommons.org/packages/. More information on what liblicense does is available on our wiki page about liblicense. You can also get them in fresh Fedora 11 packages. And the source tarball is available for download from sourceforge.net.
P.S. MERRY CHRISTMAS!
The full ChangeLog snippet goes like this:
liblicense 0.8.1 (2008-12-24):
* Cleanups in the test suite: test_predicate_rw’s path joiner finally works
* Tarball now includes data_empty.png
* Dynamic tests and static tests treat $HOME the same way
* Fix a major issue with requesting localized informational strings, namely that the first match would be returned rather than all matches (e.g., only the first license of a number of matching licenses). This fixes the Python bindings, which use localized strings.
* Add a cooked PDF example that actually works with exempi; explain why that is not a general solution (not all PDFs have XMP packets, and the XMP packet cannot be resized by libexempi)
* Add a test for writing license information to the XMP in a PNG
* Fix a typo in exempi.c
* Add basic support for storing LL_CREATOR in exempi.c
* In the case that the system locale is unset (therefore, is of value “C”), assume English
* Fix a bug with the TagLib module: some lists were not NULL-terminated
* Use calloc() instead of malloc()+memset() in read_license.c; this improves efficiency and closes a crasher on amd64
* Improve chooser_test.c so that it is not strict as to the *order* the results come back so long as they are the right licenses.
* To help diagnose possible xdg_mime errors, if we detect the hopeless application/octet-stream MIME type, fprintf a warning to stderr.
* Test that searching for unknown file types returns a NULL result rather than a segfault.
Government Information Licensing Framework
Frank Tobia, December 12th, 2008
David Torpie of the Office of Economic and Statistical Research at the Queensland Treasury gave a talk on “Government Information Licensing Framework: a multidisciplinary project improving access to Public Sector Information.” This is a project to give greater access to Australian government data, to make government more transparent, and in doing so to develop a standard set of terms and conditions that are broadly applicable to other government contexts.
David first answered the question of why the Australian government even needs to worry about licensing its works. In Australia, unlike the United States, the government has copyright over works produced by government agencies. Australian copyright law also extends to less-than-creative works (such as telephone directories), which increases the importance that public licensing is clear and simple.
The solution developed at the Queensland Treasury is “digital license management”, or DLM. DLM is a technology to embed license metadata into documents and other works, developed in Java. Benefits include ease of linking from data to license, and finding information based on its license. DLM was developed before a suitable alternative was available; liblicense now provides a similar functionality in C. The team developing DLM is working with CC’s tech team for collaboration, and initial indications are that a dedicated Java tool may prove very useful.
No Comments »liblicense 0.8 (important) fixes RDF predicate error
asheesh, July 30th, 2008
Brown paper bag release: liblicense claims that the RDF predicate for a file’s license is http://creativecommons.org/ns#License rather than http://creativecommons.org/ns#license. Only the latter is correct.
Any code compiled with liblicense between 0.6 and 0.7.1 (inclusive) contains this mistake.
This time I have audited the library for other insanities like the one fixed here, and there are none. Great thanks to Nathan Yergler for spotting this. I took this chance to change ll_write() and ll_read() to *NOT* take NULL as a valid predicate; this makes the implementation simpler (and more correct).
Sadly, I have bumped the API and ABI numbers accordingly. It’s available in SourceForge at http://sf.net/projects/cctools, and will be uploaded to Debian and Fedora shortly (and will follow from Debian to Ubuntu).
I’m going to head to Argentina for a vacation and Debconf shortly, so there’ll be no activity from on liblicense for a few weeks. I would love help with liblicense in the form of further unit tests. Let’s squash those bugs by just demonstrating all the cases the license should work in.
No Comments »License-oriented metadata validator and viewer: libvalidator
Hugo Dworak, July 8th, 2008
As the Google Summer of Code 2008 midterm evaluation deadline is approaching, it is a good time to report the progress when it comes to the license-oriented metadata validator and viewer.
The source code is located in two dedicated git repositories. The first being validator, which contains the source code of the Web application based on Pylons and Genshi. The second repository is libvalidator, which hosts the files that constitute the core library that the project will utilise. This is the component that the development focuses on right now.
The purpose of the aforementioned library is to parse input files, scan them for relevant license information, and output the results in a machine-readable fashion. More precisely, its workflow is the following: parse the file and associated RDF information so that a complete set of RDF data is available, filter the results with regard to license information (not only related to the document itself, but also to other objects described within it), and return the results in a manner preferable for the usage by the Web application.
pyRdfa seems to be the best tool for the parsing stage so far. It handles the current recommendation for embedding license metadata (namely RDFa) as well as other non-deprecated methods: linking to an external or embedded (using the “data” URL scheme) RDF files and utilising the Dublin Core. The significant lacking is handling of the invalid direct embedding of RDF/XML within the HTML/XHTML source code (as an element or in a comment) and this is resolved by first capturing all such instances using a regular expression and then parsing the data just as external RDF/XML files.
Once the RDF triples are extracted, one can use SPARQL to narrow the results just to the triples related to the licensed objects. Both librdf and rdflib support this language. Moreover, the RDF/XML related to the license must be parsed, so that its conditions (permissions, requirements, and restrictions) are then presented to the user.
The library takes advantage of standard Python tools such as Buildout and nose. When it is completed, the project will be all about writing a Web application that will serve as an interface to libvalidator.
No Comments »Juggling Metadata at CC
Nathan Yergler, July 3rd, 2008
It’s the day before a long weekend and inspiration hits — RDFa and ccREL are exactly like juggling!
In other news of the weird, I was actually able to use PiTiVi to trim the video up after shooting it on my phone. Cool.
No Comments »RDFa for Semantic MediaWiki [GSoC 2008]
David McCabe, July 1st, 2008
Hello, world!
My name is David McCabe, and this summer I am adding RDFa support to Semantic MediaWiki, as part of the Google Summer of Code 2008. I am an undergraduate in Mathematics at Portland State University. For the Google Summer of Code 2006, I wrote Liquid Threads, a MediaWiki extension that replaces talk pages with a threaded discussion system.
Semantic MediaWiki (SMW) is the software used for the CC wiki and many other wikis. SMW allows authors to mark up wiki pages so that their contents and relationships are machine-readable. SMW already publishes this machine-readable data in RDF/XML format.
You can read about RDFA on the CC Wiki. There is also a Google Tech Talk on RDFa.
No Comments »License-oriented metadata validator and viewer: the development has just started
Hugo Dworak, May 26th, 2008
Creative Commons participates in Google Summer of Code™ and has accepted a proposal (see the abstract) of Hugo Dworak based on its description of a task to rewrite its now-defunct metadata validator. Asheesh Laroia has been assigned as the mentor of the project. The work began on May 26th, 2008 as per the project timeline. It is expected to be completed in twelve weeks. More details will be provided in the dedicated CC Wiki article and the progress will be weekly featured on this blog.
The project focuses on developing an on-line tool — free software written in Python — to validate digitally embedded Creative Commons licenses within files of different types. Files will be pasted directly to a form, identified by a URL, or uploaded by a user. The application will present the results in a human?readable fashion and notify the user if the means used to express the license terms are deprecated.
1 Comment »Metadata work of interest
Mike Linksvayer, August 3rd, 2007
Some of these could turn out to be interesting for describing licensed content on the web, all rather interesting.
hAudio proposed microformat.
Proposed hAudio to RDFa mapping.
RDFa-deployed Multimedia Medata (ramm.x) may be an effort to map and standardize use of existing and upcoming media description standards in RDFa … I had to skim “ramm.x in 10 sec” and “what ramm.x is NOT” a few times to gather that, but the key description on that page seems to be:
- Does ramm.x replace RDF-based multimedia vocabularies, as, e.g., the Music Ontology Specification?
- No! ramm.x aims at bringing existing formats, as MPEG-7 and the like, into the Semantic Web. It acts as a bridge using a certain formalisation of an existing vocabulary.
Getting a bit more esoteric, Protocol for Web Description Resources (POWDER):
facilitates the publication of descriptions of multiple resources such as all those available from a Web site.
Which is a bit of an understatement.
No Comments »CC OpenOffice Addin Update: Added Impress funcionality!
ksiomelo, July 24th, 2007
Hello everyone!
I’ve made some changes in the addin and enabled to put licenses in Impress documents. The entire architecture has changed and I’m working on this right now. The newer version is in the CC sourceforge repository!
No Comments »
