rdfa

New validator released!

asheesh, January 6th, 2009

This past summer, Hugo Dworak worked with us (thanks to Google Summer of Code) on a new validator. This work was greatly overdue, and we are very pleased that Google could fund Hugo to work on it. Our previous validator had not been updated to reflect our new metadata standards, so we disabled it some time ago to avoid creating further confusion. The textbook on CC metadata is the “Creative Commons Rights Expression Language”, or ccREL, which specifies the use of RDFa on the web. (If this sounds like keyword soup, rest assured that the License Engine generates HTML that you can copy and paste; that HTML is fully compliant with ccREL.) We hoped Hugo’s work on a new validator would let us offer a validator to the Creative Commons community so that publishers can test their web pages to make sure they encode the information they intended.

Hugo’s work was a success; he announced in August 2008 a test version of the validator. He built on top of the work of others: the new validator uses the Pylons web framework, html5lib for HTML parsing and tokenizing, and RDFlib for working with RDF. He shared his source code under the recent free software license built for network services, AGPLv3.

So I am happy to announce that the test period is complete, and we are now running the new code at http://validator.creativecommons.org/. Our thanks go out to Hugo, and we look forward to the new validator gaining some use as well as hearing your feedback. If you want to contribute to the validator’s development or check it out for any reason, take a look at the documentation on the CC wiki.

1 Comment »

Closing: What’s next in 2009

Frank Tobia, December 12th, 2008

Nathan Yergler proceeded to wrap up the tech conference with some humble predictions about where CC tech will be headed.

The following are a brief list of these future initiatives:

  • Using RDFa to publish metadata in a distributed fashion
  • The Next Generation of MozCC
  • Making attribution easier
  • Universal Education Search
  • CC0 & Public Domain Assertion
  • OSCRI / CC Network and creating an interoperable registry with Safe Creative and Registered Commons
No Comments »

Life After REC: The Future of RDFa

Frank Tobia, December 12th, 2008

Ben Adida is back again from the first tech conference with a new talk about RDFa.

First he gave a brief review of RDFa: there exists a huge chasm between the human web and the data web. RDFa addresses our need to bridge this gap. We want machine-readable metadata so we can use computer programs to answer simple questions about a work to save on time and effort. He then moved on to explaining ccREL, the Creative Commons Rights Expression Language. There are four principles for publishing in HTML: 1) visual correspondence, 2) don’t repeat yourself, 3) remix friendliness, 4) extensibility and modularity.

For the main portion of his talk, Ben went over the events of the past six months regarding RDFa adoption.

  • April-May: Digg deployed RDFa.
  • June: RDFa goes W3C “Candidate Recommendation” with around 12 implementations (parsers).
  • June: Open Archives Initiatives supports RDFa; UK National Archive uses RDFa.
  • September: Yahoo SearchMonkey deploys RDFa support.
  • October: RDFa goes W3C Recommendation.
  • November: CC launches the CC network in November; Drupal announces roadmap for RDFa integration.

He concluded with a demonstration of sample SearchMonkey functionality that grabs CC license metadata from search results and displays that information on the search page.

What’s next?, asks Ben, with a strong disclaimer that this is just a small glimpse of what is possible. He points to HTML 4 and 5 integration, the simplification of common cases (not having to keep redefining common namespaces), finding common ground with the microformat community, and better search and in-browser tools.

The question Ben asks you to take away is this: What are you waiting for to consume and/or publish RDFa?

No Comments »

Government Information Licensing Framework

Frank Tobia, December 12th, 2008

David Torpie of the Office of Economic and Statistical Research at the Queensland Treasury gave a talk on “Government Information Licensing Framework: a multidisciplinary project improving access to Public Sector Information.” This is a project to give greater access to Australian government data, to make government more transparent, and in doing so to develop a standard set of terms and conditions that are broadly applicable to other government contexts.

David first answered the question of why the Australian government even needs to worry about licensing its works. In Australia, unlike the United States, the government has copyright over works produced by government agencies. Australian copyright law also extends to less-than-creative works (such as telephone directories), which increases the importance that public licensing is clear and simple.

The solution developed at the Queensland Treasury is “digital license management”, or DLM. DLM is a technology to embed license metadata into documents and other works, developed in Java. Benefits include ease of linking from data to license, and finding information based on its license. DLM was developed before a suitable alternative was available; liblicense now provides a similar functionality in C. The team developing DLM is working with CC’s tech team for collaboration, and initial indications are that a dedicated Java tool may prove very useful.

No Comments »

Building the CC Network

Frank Tobia, December 12th, 2008

Creative Commons CTO Nathan Yergler discussed the Creative Commoner network, which was developed beginning in October and is still under active development. The network allows creators to collect references to their work in one place — to act as a registry. It also serves to bring people in to the CC community, and aid interoperability and connection of existing data and works.

The CC network sports personalized profile pages, OpenID, and a simple registry, which Nathan discussed in turn. Creative Commons can build layers of trust by validating a user’s “confirmed” name from PayPal transactions, meaning that license claims are more legitimate than otherwise. But there are issues, such as name changes and incorrect or outdated information from PayPal.

OpenID is an open single sign-on standard which CC provides with a Commoner account. There are issues around this as well, such as a need to trust your provider. Nathan laid out the various ways CC is working to mitigate these issues.

But “the meat of the CC network” is in the work registry. As yet it is a simple implementation. Reciprocal claims and validation are key, where the registration learns about the validity about a work registration claim based on the presence of similar license data on that page. This shows that the user making the claim does indeed have the ability to edit the work in question.

Future developments include better identification of works and metadata, registration of feeds, the ability to follow creators in their subsequent works, and general future efforts exploring registry technology.

No Comments »

CC Boston Tech Summit kickoff

Frank Tobia, December 12th, 2008

CC board member Hal Abelson kicked off today’s Boston tech summit with a brief history of where CC tech has been, and where CC tech is going.

Where the licenses provide interoperability for legal code, the interoperability for technologies has taken longer to develop. RDFa, which allows for distributed embedding of metadata in web pages, has been accepted as a W3C recommendation. CC and others are developing inherently distributed technologies that are inherently extensible. Clearly interoperability is the name of the game.

Hal also touted the as-yet-unreleased book Viral Spiral: how the commoners built a digital republic of their own as a story of what we have accomplished, and what more is possible.

No Comments »

License-oriented metadata validator and viewer: summertime is winding up

Hugo Dworak, August 16th, 2008

Google Summer of Code 2008 approaches its end, as less than forty-eight hours are left to submit the code that will then be evaluated by mentors, therefore it is fitting to pause for a moment and sum up the work that has been done with regard to the license-oriented metadata validator and viewer and to confront it with the original proposal for the project.

A Web application capable of parsing and displaying license information embedded in both well-formed and ill-formed Web pages has been developed. It supports the following means of embedding license information: Dublin Core metadata, RDFa, RDF/XML linked externally or embedded (utilising the data URL scheme) using the link and a elements, and RDF/XML embedded in a comment or as an element (the last two being deprecated). This functionality has been proven by unit testing. The source code of a Web page can be uploaded or pasted by a user, there is also a possibility to provide a URI for the Web application to analyse it. The software has been written in Python and uses the Pylons Web Framework and the Genshi toolkit. Should you be willing to test this Lynx-friendly application, please visit its Web site.

The Web application itself uses a library called “libvalidator”, which in turn is powered by cc.license (a library developed by Creative Commons that returns information about a given license), pyRdfa (a distiller that generates the RDF triples from an (X)HTML+RDFa file), html5lib (an HTML parser/tokenizer), and RDFLib (a library for working with RDF). The choice of this set of tools has not been obvious and the library had undergone several redesigns, which included removing the code that employed encutils, XML canonicalization, µTidylib, and the BeautifulSoup. The idea of using librdf, librdfa, rdfadict has been abandoned. The source code of both the Web application (licensed under the GNU Affero General Public License version 3 or newer) and its core library (licensed under the GNU Lesser General Public License version 3 or newer) is available through the Git repositories of Creative Commons.

In contrast to the contents of the original proposal, the following goals have not been met: traversal of special links, syndication feeds parsing, statistics, and cloning the layout of the Creative Commons Web site. However, these were never mandatory requirements for the Web application. It is also worth noting that the software has been written from scratch, although a now-defunct metadata validator existed. Nevertheless, the development does not end with Google Summer of Code — these and several new features (such as validation of multimedia files via liblicense and support for different language versions) are planned to be added, albeit at a slower pace.

After the test period, the validator will be available under http://validator.creativecommons.org/.

1 Comment »

RDFa for Semantic MediaWiki [GSoC 2008]

David McCabe, July 1st, 2008

Hello, world!

My name is David McCabe, and this summer I am adding RDFa support to Semantic MediaWiki, as part of the Google Summer of Code 2008. I am an undergraduate in Mathematics at Portland State University. For the Google Summer of Code 2006, I wrote Liquid Threads, a MediaWiki extension that replaces talk pages with a threaded discussion system.

Semantic MediaWiki (SMW) is the software used for the CC wiki and many other wikis. SMW allows authors to mark up wiki pages so that their contents and relationships are machine-readable. SMW already publishes this machine-readable data in RDF/XML format.

You can read about RDFA on the CC Wiki. There is also a Google Tech Talk on RDFa.

No Comments »

License-oriented metadata validator and viewer: the development has just started

Hugo Dworak, May 26th, 2008

Creative Commons participates in Google Summer of Code™ and has accepted a proposal (see the abstract) of Hugo Dworak based on its description of a task to rewrite its now-defunct metadata validator. Asheesh Laroia has been assigned as the mentor of the project. The work began on May 26th, 2008 as per the project timeline. It is expected to be completed in twelve weeks. More details will be provided in the dedicated CC Wiki article and the progress will be weekly featured on this blog.

The project focuses on developing an on-line tool — free software written in Python — to validate digitally embedded Creative Commons licenses within files of different types. Files will be pasted directly to a form, identified by a URL, or uploaded by a user. The application will present the results in a human?readable fashion and notify the user if the means used to express the license terms are deprecated.

1 Comment »

Metadata work of interest

Mike Linksvayer, August 3rd, 2007

Some of these could turn out to be interesting for describing licensed content on the web, all rather interesting.

hAudio proposed microformat.

Proposed hAudio to RDFa mapping.

RDFa-deployed Multimedia Medata (ramm.x) may be an effort to map and standardize use of existing and upcoming media description standards in RDFa … I had to skim “ramm.x in 10 sec” and “what ramm.x is NOT” a few times to gather that, but the key description on that page seems to be:

Does ramm.x replace RDF-based multimedia vocabularies, as, e.g., the Music Ontology Specification?
No! ramm.x aims at bringing existing formats, as MPEG-7 and the like, into the Semantic Web. It acts as a bridge using a certain formalisation of an existing vocabulary.

Getting a bit more esoteric, Protocol for Web Description Resources (POWDER):

facilitates the publication of descriptions of multiple resources such as all those available from a Web site.

Which is a bit of an understatement.

No Comments »
Page 1 of 212»