rdf
New validator released!
asheesh, January 6th, 2009
This past summer, Hugo Dworak worked with us (thanks to Google Summer of Code) on a new validator. This work was greatly overdue, and we are very pleased that Google could fund Hugo to work on it. Our previous validator had not been updated to reflect our new metadata standards, so we disabled it some time ago to avoid creating further confusion. The textbook on CC metadata is the “Creative Commons Rights Expression Language”, or ccREL, which specifies the use of RDFa on the web. (If this sounds like keyword soup, rest assured that the License Engine generates HTML that you can copy and paste; that HTML is fully compliant with ccREL.) We hoped Hugo’s work on a new validator would let us offer a validator to the Creative Commons community so that publishers can test their web pages to make sure they encode the information they intended.
Hugo’s work was a success; he announced in August 2008 a test version of the validator. He built on top of the work of others: the new validator uses the Pylons web framework, html5lib for HTML parsing and tokenizing, and RDFlib for working with RDF. He shared his source code under the recent free software license built for network services, AGPLv3.
So I am happy to announce that the test period is complete, and we are now running the new code at http://validator.creativecommons.org/. Our thanks go out to Hugo, and we look forward to the new validator gaining some use as well as hearing your feedback. If you want to contribute to the validator’s development or check it out for any reason, take a look at the documentation on the CC wiki.
1 Comment »License-oriented metadata validator and viewer: summertime is winding up
Hugo Dworak, August 16th, 2008
Google Summer of Code 2008 approaches its end, as less than forty-eight hours are left to submit the code that will then be evaluated by mentors, therefore it is fitting to pause for a moment and sum up the work that has been done with regard to the license-oriented metadata validator and viewer and to confront it with the original proposal for the project.
A Web application capable of parsing and displaying license information embedded in both well-formed and ill-formed Web pages has been developed. It supports the following means of embedding license information: Dublin Core metadata, RDFa, RDF/XML linked externally or embedded (utilising the data URL scheme) using the link and a elements, and RDF/XML embedded in a comment or as an element (the last two being deprecated). This functionality has been proven by unit testing. The source code of a Web page can be uploaded or pasted by a user, there is also a possibility to provide a URI for the Web application to analyse it. The software has been written in Python and uses the Pylons Web Framework and the Genshi toolkit. Should you be willing to test this Lynx-friendly application, please visit its Web site.
The Web application itself uses a library called “libvalidator”, which in turn is powered by cc.license (a library developed by Creative Commons that returns information about a given license), pyRdfa (a distiller that generates the RDF triples from an (X)HTML+RDFa file), html5lib (an HTML parser/tokenizer), and RDFLib (a library for working with RDF). The choice of this set of tools has not been obvious and the library had undergone several redesigns, which included removing the code that employed encutils, XML canonicalization, µTidylib, and the BeautifulSoup. The idea of using librdf, librdfa, rdfadict has been abandoned. The source code of both the Web application (licensed under the GNU Affero General Public License version 3 or newer) and its core library (licensed under the GNU Lesser General Public License version 3 or newer) is available through the Git repositories of Creative Commons.
In contrast to the contents of the original proposal, the following goals have not been met: traversal of special links, syndication feeds parsing, statistics, and cloning the layout of the Creative Commons Web site. However, these were never mandatory requirements for the Web application. It is also worth noting that the software has been written from scratch, although a now-defunct metadata validator existed. Nevertheless, the development does not end with Google Summer of Code — these and several new features (such as validation of multimedia files via liblicense and support for different language versions) are planned to be added, albeit at a slower pace.
After the test period, the validator will be available under http://validator.creativecommons.org/.
1 Comment »liblicense 0.8 (important) fixes RDF predicate error
asheesh, July 30th, 2008
Brown paper bag release: liblicense claims that the RDF predicate for a file’s license is http://creativecommons.org/ns#License rather than http://creativecommons.org/ns#license. Only the latter is correct.
Any code compiled with liblicense between 0.6 and 0.7.1 (inclusive) contains this mistake.
This time I have audited the library for other insanities like the one fixed here, and there are none. Great thanks to Nathan Yergler for spotting this. I took this chance to change ll_write() and ll_read() to *NOT* take NULL as a valid predicate; this makes the implementation simpler (and more correct).
Sadly, I have bumped the API and ABI numbers accordingly. It’s available in SourceForge at http://sf.net/projects/cctools, and will be uploaded to Debian and Fedora shortly (and will follow from Debian to Ubuntu).
I’m going to head to Argentina for a vacation and Debconf shortly, so there’ll be no activity from on liblicense for a few weeks. I would love help with liblicense in the form of further unit tests. Let’s squash those bugs by just demonstrating all the cases the license should work in.
No Comments »RDFa for Semantic MediaWiki [GSoC 2008]
David McCabe, July 1st, 2008
Hello, world!
My name is David McCabe, and this summer I am adding RDFa support to Semantic MediaWiki, as part of the Google Summer of Code 2008. I am an undergraduate in Mathematics at Portland State University. For the Google Summer of Code 2006, I wrote Liquid Threads, a MediaWiki extension that replaces talk pages with a threaded discussion system.
Semantic MediaWiki (SMW) is the software used for the CC wiki and many other wikis. SMW allows authors to mark up wiki pages so that their contents and relationships are machine-readable. SMW already publishes this machine-readable data in RDF/XML format.
You can read about RDFA on the CC Wiki. There is also a Google Tech Talk on RDFa.
No Comments »Metadata work of interest
Mike Linksvayer, August 3rd, 2007
Some of these could turn out to be interesting for describing licensed content on the web, all rather interesting.
hAudio proposed microformat.
Proposed hAudio to RDFa mapping.
RDFa-deployed Multimedia Medata (ramm.x) may be an effort to map and standardize use of existing and upcoming media description standards in RDFa … I had to skim “ramm.x in 10 sec” and “what ramm.x is NOT” a few times to gather that, but the key description on that page seems to be:
- Does ramm.x replace RDF-based multimedia vocabularies, as, e.g., the Music Ontology Specification?
- No! ramm.x aims at bringing existing formats, as MPEG-7 and the like, into the Semantic Web. It acts as a bridge using a certain formalisation of an existing vocabulary.
Getting a bit more esoteric, Protocol for Web Description Resources (POWDER):
facilitates the publication of descriptions of multiple resources such as all those available from a Web site.
Which is a bit of an understatement.
No Comments »Sidecar XMP and License Extractors in Tracker
Jason Kivlighn, July 10th, 2007
Tracker has accepted my patches to read XMP sidecar, as well as patches to extract licenses from MS Office (old format), TIFF, HTML, PNG, and PDF. This support will be available in the 0.6 release, which potentially will be released later this week.
My final set of patches will additionally add support for extracting licenses from JPEG, SVG, and OpenOffice’s OASIS. Also, through GStreamer, Tracker already recognizes licenses of Vorbis and FLAC.
This marks the half-way point of Summer of Code 2007.
1 Comment »Liblicense has licenses! 376 of them…
Jason Kivlighn, July 6th, 2007
Prepping for the 0.1 release, I’ve generated RDF descriptions of all CC licenses in all available jurisdictions, as well as the GPL, LGPL, and Public Domain.
Available here:
https://cctools.svn.sourceforge.net/svnroot/cctools/liblicense/trunk/licenses/
Each license, if applicable, has all the attributes laid out on the wiki, including localization. One problem, however, is getting localized descriptions of the licenses. That isn’t available at https://cctools.svn.sourceforge.net/svnroot/cctools/i18n/trunk/i18n/
Licenses were generated with this python script, which reads the relevant information from creativecommons.org and cctools svn.
No Comments »Liblicense is alpha!
Scott Shawcroft, June 29th, 2007
Over the last week a lot of progress has been made on liblicense. Yesterday Jason and I got the module_read and module_write functions working with a stub io module and an XMP sidecar module. Tuesday and Wednesday I got the library’s system license functions working. Today I did some memory leak plugging and wrote out the system default functions. Nearly every part of the library works as planned. While its still rough, the bulk of the library work is done.
The most common data structure I’ve been using is a null-terminated list (really an array) of strings (char*). Yesterday I wrote out some common methods to be shared throughout the library. These are in list.c. My hope is that these common functions will allow the other code to be cleaner. Next week I plan on fixing up system_licenses.c to use the list functions. At the moment it is the largest, ugliest and leakiest of all the files. That will all be fixed Monday.
After the code cleanup on Monday the much more exciting task of creating modules and clients of the library begins. We’d like to support embedding in as many file formats as possible. Without this ability, the license tracking only works locally. One of the most useful libraries so far is Exempi which can embed in a number of formats. Jason wrote an Exempi liblicense module yesterday. On my list of clients to do is a Gnome Control Panel system default, Nautilus license select, Sugar license select and Creative Commons default license chooser. Am I missing anything important? Where could licenses be integrate besides this? Perhaps Amarok or an equivalent? ccHost? Let me know what you think.
No Comments »Enhanced Metadata Graduates from Labs
Nathan Yergler, June 21st, 2007
Early this morning we launched some functionality on the main “license chooser”:http://creativecommons.org/license previously available only on “Labs”:http://labs.creativecommons.org. As many (ok, at least a few) people have noted, we previously stopped embedding RDF in the HTML generated by the chooser. As we’ve “noted”:http://wiki.creativecommons.org/Extend_Metadata#Embedding_RDF_in_HTML in the past, RDF in a comment has several draw backs, not the least of which is that it’s opaque to parsers. The new update to the license chooser restores the embedded metadata using “RDFa”:http://rdfa.info.
As the name implies, RDFa is a way of expressing RDF using _attributes_ in the HTML. This is similar to microformats, but different in that any RDFa parser can read any RDFa information — no special knowledge required. So the new metadata once again allows you to encode the name of your work, your name, and the type of work, all in the HTML. A full example (with all fields filled in) is shown here:

CC TechBlog by
Creative Commons is licensed under a
Creative Commons Attribution 3.0 License.
Based on a work at
creativecommons.org.
Permissions beyond the scope of this license may be available at
http://creativecommons.org/policies.
So how do you know the metadata is there? Check out the “RDFa Bookmarklets”:http://www.w3.org/2006/07/SWD/RDFa/impl/js/ which demonstrate how you can expose the information using some simple Javascript.
*UPDATE* Unfortunately WordPress MU strips out attributes it doesn’t recognize, so the example above isn’t as complete as it could be.
No Comments »Summer of Code: “Converting biomedical text mining data to RDF, integrating results with existing Neurocommons RDF data, generating a RDFa-based web interface for presentation”
matthiassamwald, April 18th, 2007
About the project
The Neurocommons project wants to lay the foundation for a Semantic Web for neuroscience by creating resources that others will want to link to, extend, and build upon, and in so doing, to set an example that can be replicated in other scientific disciplines. One aspect of that goal is to make relationships in biomedical texts openly accessible on the Semantic Web.
The proposed project will add to the value of the Neurocommons project in two ways:
1) It will use the Whatizit text mining resource of the European Bioinformatics Institute as a source to generate RDF.
The text mining data of Whatizit are of a high quality and provide rich information that complements existing text mining data from the Neurocommons project. The RDF derived from Whatizit will be integrated with the existing Neurocommons annotations, resulting in a significantly increased coverage. The software written will demonstrate a typical pipeline for conversion of text mining results to good quality RDF.
The Whatizit service can mine either Pubmed abstracts or free text supplied by a user. We will provide a simple interface so that open access scientific articles can be easily acquired and mined by the tool, so that users with rights to non-open access scientific articles can mine that text and submit the resultant RDF, which, as knowledge, is not protected by copyright.
2) Software will be written to set up a website to present the information derived from the Neurocommons textmining and the Whatizit derived facts. Pages on this website will use RDFa for markup, integrating the human readable text with the machine readable markup in a single document.
The resulting resource will not only be valuable for neuroscientific researchers – it will also serve as a model implementation that unifies text mining, Semantic Web standards and the philosophy of Science Commons / Creative Commons for the advancement of scientific research and global information exchange in general.
About myself
My name is Matthias Samwald and I come from Austria. I have studied neurobiology at the University of Vienna from 2000 to 2005. The work of my doctoral thesis, starting in 2005, is focussed on the use of Semantic Web technologies in neuroscience and biomedicine. Since 2006 I am a member of the World Wide Web Consortium (W3C) as an ‘invited expert’. I am an active participant of the “Semantic Web in Health Care and Life Science Interest Group†of the W3C.
My curriculum
My doctoral thesis and corresponding publications
