Toward expressive and interoperable Common Core metadata

Alex Kozak, December 10th, 2010

It’s been suggested with increasing frequency that an educational resource complying (or not) with the new Common Core standards would be the kind of thing that could be published as metadata on the web. This metadata could provide a platform upon which tools could be built. For example, an educational search tool could would allow anyone to search for learning objects that satisfy one or more Common Core standards.

The CCSSO, who published and stewards the Common Core standards through adoption, have not yet proposed a format for this metadata. In that vacuum, others are proposing their own solutions.

Karen Fasimpaur (who is awesome and was interviewed for a CC Talks With feature) recently published a set of tags to identify the Common Core standards. These tags are strings of text that uniquely identify the standards. For example, the College and Career Readiness Anchor Standards for Reading, Key Ideas and Details, Standard 1 is identified as “cc-k-5e-r-ccr-1″.

The goal, it seems, is to publish unique identifiers for the Common Core standards so that those unique identifiers could be attached to objects on the web as metadata, identifying which educational standards those objects meet.

We applaud efforts to identify when educational resources meet educational standards, and projects to catalog or tag resources with that data. This is one step forward in providing human-readable tags that encode that data, much like the string “by-sa” identifies the Creative Commons Attribution ShareAlike license.

The next step would be to provide stable URIs as identifiers for those tags such that machines, in addition to humans, can parse that metadata. These URIs could be maintained by authoritative organizations such as the CCSSO, but that isn’t technically necessary.

In addition, the URIs to the Common Core standards ought to be self-descriptive. That is, there should be metadata about that URI discoverable within that URI. For example, the CC licenses are self-descriptive. They contain metadata about the licenses so that when someone marks up a work as CC licensed, a machine could discover facts about that license by visiting the URL. This metadata is encoded in RDFa, and can be seen by looking at the source to the deed or viewing it through an RDFa distiller.

A URI identifying each standard brings other benefits. When used in subject-predicate-object expressions in metadata standards like RDFa, the expressive power of the identifier increases greatly. One could, for example, identify an arbitrary URI as being standards aligned and make complex statements about the standard, wheras with a human-readable tag interpretation is left to the reader. For example, you could place metadata referencing an educational resource on a “landing page” rather than the resource itself, or mark up specific blocks of text as meeting certain standards. Stable URIs to the Common Core standards, coupled with a metadata standard like RDFa, would allow for subject precision that is lacking in the K12 OpenEd metadata proposal.

Efforts like K12 OpenEd’s to publish Common Core standards metadata for educational resources are good progress. It gives us all a starting point that can inform future work.

1 Comment »

«Semantic Copyright» and CC REL

Nathan Yergler, September 2nd, 2010

On Monday Safe Creative announced a new project, Semantic Copyright. The project includes an ontology for describing the rights associated with a work, as well as for modeling copyright registration. Safe Creative and Creative Commons are both part of OSCRI, an ad hoc group of organizations developing copyright registry technology. Enabling tools which reduce the cost of using copyrighted material is an important part of Creative Commons’ mission, and making the rights information machine readable is critical to fulfilling that. Unfortunately there are a couple of issues with the project in its current form.

Safe Creative’s proposed ontology includes a model for Creative Commons licenses (for example, CC BY), including the rights and permissions associated with each license. While we’re very happy to have support for CC licenses in the platform, our goal over the past three years has been to move the authoritative information about the licenses to the appropriate location: with the license.

This is an easy principle to overlook, one we ourselves didn’t consider initially. When Creative Commons first begin generating HTML for marking works, we included a copy of the license metadata with every work. We later realized that this had several issues, not the least of which is that users have no reason to trust the description of the license published with the work — the work’s publisher is not the authoritative source of information regarding the license. The idea that authority matters is reflected in the fact that we now encourage people to link to the license; tools can follow the link to discover information about the license. This is the sort of basic principle that becomes obvious with lots of experience stewarding the technical infrastructure for licensing online.

This principle is also reflected in our more recent work with the Free Software Foundation. When we worked with the FSF to build a CC REL model of their licenses that they would host (ie, GPL 3.0), we were also pushing the authoritative version to the license publisher (the CC GPL is a silly wrapper, and people have no reason to trust our assessment of the GPL).

In addition to issues of authority, Safe Creative’s project does not (as far as I have seen) provide instructions for processing CC REL and their ontology side by side. One of the use cases for machine readable rights information is the ability to ask, “can I make derivatives of this work?” and “are these works under the same license?” Because Safe Creative provides its own model for CC licenses and the rights and restrictions associated with them, software that wants to work with both will need two different paths — one for works described with Semantic Copyright, and one for the 350 million (and growing) works described with CC REL. One way to deal with this would be through the development of axioms that convert one to another, but this is unfortunately absent from the documentation.

Safe Creative’s project (and OSCRI) has the potential to improve our ability to model licenses and rights in a compatible and interoperable way. At this point, without deployed code or examples, it’s impossible to predict how this project will be adopted. Hopefully the 0.10 version number indicates that further improvements are coming.


Search and Discovery for OER

Nathan Yergler, January 14th, 2010

Last summer the Open Society Institute generously provided funding for CC to host a meeting about search and discovery for OER. The goal was to bring together people with experience and expertise in different areas of OER discovery and see if there was common ground: pragmatic recommendations publishers could follow to achieve increased visibility right now.

After many months of inactivity, I’m happy to announce that we’ve published a draft of an initial document, a basic publishing guide for OER. “Towards a Global Infrastructure For Sharing Learning Resources” describes steps creators and publishers of OER can take today to make sure their work spreads as widely as possible. This draft was developed by attendees of the meeting, and is currently being reviewed; as such it may (and probably will) change.

As you can see from the meeting notes, this isn’t the last thing we hope to get from this meeting. The next steps involve getting our hands a little dirtier — figuring out how we link registries of OER repositories and implementing code to do so. It should be interesting to see how this work develops, and how it influences our prototype, DiscoverEd.

1 Comment »

Open Source Knowledge Management: What Comes After Access?

Frank Tobia, December 12th, 2008

Jonathan Rees of Science Commons discussed the open source knowledge management system that Science Commons is developing. He discussed the importance of interfacing different stores of data and knowledge, and elucidated how Science Commons is making progress on these issues. In the process Jonathan gave six layers of what comprises an interface: permission, access, container, syntax, vocabulary, and semantics.

The focus of this project is on data integration, and the importance of data integration is reducing the huge transactions costs of using different data stores which have been assembled for different purposes. Data integration does happen, but at huge expense of effort; it is hard, complex, and fragile;”glue” is necessary at all levels, and the process is manual and error-prone.

By developing and testing the whole interface stack for scientific data, the data integration problem becomes vastly easier to understand, browse, search, consult, transform, analyze, visualize, model, annotate, and organize data.

Jonathan closed with a call to action is to “choose, promote, and nourish sharing solutions at every level in the stack”.

No Comments »

Copyright Registries 2.0

Frank Tobia, December 12th, 2008

Mario Pena of Safe Creative, Joe Benso of Registered Commons, and Mike Linksvayer of CC gave a talk on “Copyright Registries 2.0″ as a continuation of the registration conversation we had at our first tech summit in June.

Mario began with a summary of registries and how they should work: they must provide pointers to works, and they must facilitate the sharing of relevant information. He pointed to RDFa and ccREL as examples of technologies in this sphere promoting interoperability. He also mentioned the Open Standards for Copyright Registry Interop as an example of the work being done to help foster online registries interoperability and standardization.

Next, Joe discussed what he sees as necessary for registries moving forward. The big point he made was that Registered Commons feels a registry authority is a necessary condition for registries to be successfully implemented. He started with a brief history of Registered Commons and named the features they provide, including use of the CC API, timestamping of works, and physical identity verification. He finished with the need for an authority: to allocate namespaces, appoint registries based on criteria, identify entities to be certified, etc.

No Comments »

liblicense 0.8 (important) fixes RDF predicate error

asheesh, July 30th, 2008

Brown paper bag release: liblicense claims that the RDF predicate for a file’s license is rather than Only the latter is correct.

Any code compiled with liblicense between 0.6 and 0.7.1 (inclusive) contains this mistake.

This time I have audited the library for other insanities like the one fixed here, and there are none. Great thanks to Nathan Yergler for spotting this. I took this chance to change ll_write() and ll_read() to *NOT* take NULL as a valid predicate; this makes the implementation simpler (and more correct).

Sadly, I have bumped the API and ABI numbers accordingly. It’s available in SourceForge at, and will be uploaded to Debian and Fedora shortly (and will follow from Debian to Ubuntu).

I’m going to head to Argentina for a vacation and Debconf shortly, so there’ll be no activity from on liblicense for a few weeks. I would love help with liblicense in the form of further unit tests. Let’s squash those bugs by just demonstrating all the cases the license should work in.

No Comments »

RDFa for Semantic MediaWiki [GSoC 2008]

David McCabe, July 1st, 2008

Hello, world!

My name is David McCabe, and this summer I am adding RDFa support to Semantic MediaWiki, as part of the Google Summer of Code 2008. I am an undergraduate in Mathematics at Portland State University. For the Google Summer of Code 2006, I wrote Liquid Threads, a MediaWiki extension that replaces talk pages with a threaded discussion system.

Semantic MediaWiki (SMW) is the software used for the CC wiki and many other wikis. SMW allows authors to mark up wiki pages so that their contents and relationships are machine-readable. SMW already publishes this machine-readable data in RDF/XML format.

You can read about RDFA on the CC Wiki. There is also a Google Tech Talk on RDFa.

No Comments »

RFC 4946

Mike Linksvayer, July 19th, 2007

James Snell writes that the Atom License Extension is now Experimental RFC 4946.

Many thanks to James Snell for at least two years of work on this.

What is needed to move further along the standards track? More implementations.

There’s a page on the CC wiki about licensing and syndication standards.

1 Comment »

Atom License Extension

Mike Linksvayer, May 4th, 2007

Thanks to the work of James Snell the Atom License Extension has been approved for publishing as an Experimental RFC.

Read about CC license support in RSS 1.0, RSS 2.0, and Atom 1.0 on our wiki page about syndication.

No Comments »