One of our goals is to continue to make the licenses more useful as self-describing resources. They’ve described the licenses themselves (using CC REL) for quite a while. Last year we started marking up the license name, so software could dereference the license and show the human readable name to users. Last month we added support for the identifiers (short names), as well. While working with OpenAttribute, I realized that one thing we weren’t doing well was scoping our assertions. In RDFa, the default scope of an assertion is the URI of the current page. That means if you follow a link to a specific translation of a license (such as the French translation of CC BY 3.0 Unported), the RDFa was actually describing that license.
It’s a subtle but important point: the canonical license URI is the “bare” URI, without any translation component; for example,
http://creativecommons.org/licenses/by/3.0/, and not
http://creativecommons.org/licenses/by/3.0/deed.fr. At the same time I realized that there while the license translations link to one another, that relationship is not described. To improve this situation, we’ve made three changes to the license deeds (all in the RDFa, not visible to humans browsing the pages):
- We’ve started asserting that the information published with the license summary is about the canonical URL.
- We’re declaring that the license is actually a License (the fact that it was used as part of a
rel="license"assertion implied as much, but explicit is better than implicit in my book).
- We’re describing the relationship between translations, using the FRBR Core vocabulary.
The choice of vocabulary to describe the translation wasn’t obvious; an inquiry on the semantic-web mailing list revealed no clear winners, so we wound up choosing one that seemed to best fit the semantics of the license summaries (to be clear, these assertions only apply to the summary of the license — the “deed” — and not the actual text of the license). It’s possible we’ll revise this in the future, but one of the great things about RDFa is that we don’t have to choose one; if we find one that works better, we can easily publish assertions using both vocabularies, easing the transition for any tools using the RDFa.Comments Off
Yesterday we launched a refresh of the site design for creativecommons.org. Included in the changes pushed was one small one originally suggested by our international Affiliate Network: the inclusion of the license identifier on the deeds.
Anyone who’s been in the CC community for any length of time has seen people refer to the licenses by their short-hand names: CC BY for Attribution, BY-SA for Attribution-ShareAlike, etc. But that short hand, while useful, has been a bit of inside baseball: it’s part of the URL, but never appeared on the deeds, which we want to be the human readable summary of the license. As of yesterday the short-hand name is now on the deed. We’ve also annotated it with RDFa, so the licenses self-describe their short name (software can dereference the license URI and look for information describing it there). Thanks again to Alek and the affiliate network for suggesting this change.1 Comment »
Coding period of GSoC 2010 officially ended 16th. I was able to release the new stable version of the OpenOffice Plug-in on that day. So it’s time to have a look back on what I did for the past 4+ months and discuss about the issues I met.
The proposal for the development had following tasks.
- Supporting OpenOffice.org 3.0 – 3.1.
- Adding support for CC0.
- Adding support for Draw.
- User Interface Improvements.
- Adding references to important pages like FAQ and About.
- Provide help around what each option means (“what is Share Alike”).
- Display license information when opening CC licensed documents.
- Other improvements for the UI.
- Speed up the first time license insertion process.
- Support for Internationalization.
I started working on the project since mid April. Moving the code base to OpenOffice.org 3.1 SDK was easy. I also updated the other libraries to their newest version. Then I tried to implement the Draw support. I found that implementation for Draw should be same as the implementation for Impress. So adding Draw support was fairly easy, but I had some problem with the page sizes. The available code did not conceder about the page sizes and the margins. The next problem was; the visible license notice was added to the master page of the Draw/Impress document like a background of an Impress document. Therefore the user can not arrange the notice as he wants.
While I fixing those issues, Nathan asked me to have a look at the Flickr Image Re-Use plug-in. Then I started look at it and integrating its functionality to the CC plug-in. I had to update the API wrappers for the Flickr plug-in to make it working. Then I started using that code to add image reuse support for Open Clip Art. I also improved the loading time of the Flickr dialog.
Then I started RDF adding RDF Metadata in the document. Currently this works only in Writer documents, because OpenOffice.org currently supports adding RDF only for writer documents. Then I get back to improve the image insertion I added Wikimedia Commons and Picasa for that. So the users can now add images using four well known image sharing sites; Flickr, Picasa, Wikimedia Commons and Open Clip Art.
The next thing I did is the UI improvement and adding public domain tools. Those two things were carried in parallel. After discussing with my mentors Nathan and Christopher we decided to use a tabbed window. This will make the licensing less confusing and easy to use.
Next was the internationalization phase. I used the existing i18n repository to add some translations to the licensing dialog, but still there are new strings available in the dialog that should be added to Transifex. Hope those new strings will added in the future. These translations files (.po files) need to be converted to Java resource bundles (internationalization method used in Java). The source folder includes a script for this, and the instructions can be found in the README.
For the last few weeks I was testing and trying to find out the issues related to the plug-in.
Though the new version released as a stable release, there are some problems which actually out of our control. This plug-in will not work in Mac OS Leopard where Java 64bit is the default; this is a known issue in OpenOffice.org. In additions to that, Mac OS X also have a problem in displaying AWT windows (issue 92926). I am still trying to see whether there is anything I can do.
The next issue is with Linux. We tested the plug-in in Debian and Ubuntu, in both cases if the plug-in is installed through the extension manager, the menus either become gray or they will not be available at all. This issue is most probably a problem in 9.10 and 10.04 because the plug-in worked well in Ubuntu 9.04 and Windows. I am glad to say that I have found a workaround for this. If the plug-in installed using the terminal there will not be any problem. Users can use both CLI and GUI versions of the unopkg (Extension manager).
To use the CLI version run
/usr/lib/openoffice/program/unopkg -f ccooo.oxt
and for the GUI version use
/usr/lib/openoffice/program/unopkg gui -f ccooo.oxt
Actually this is not an issue of the plug-in itself it should be a problem with Ubuntu. Similar plug-ins likeOoo2GDand Open Cards are also affected by this problem.
I tested this plug-in in Windows and Linux in both x86 and x86-64 versions and it work without any problem (in Linux with the workaround).
Windows users can simply install it by double clicking the plug-in. Ubuntu (may be other Linux users also) need to install it trough the terminal using
/usr/lib/openoffice/program/unopkg gui -f ccooo.oxt
/usr/lib/openoffice/program/unopkg -f ccooo.oxt
The source code of the plug-in can be found at http://code.creativecommons.org/viewsvn/ccooo/branches/akila-gsoc-2010/
Comments and Feedbacks
While I was searching the Internet to find more issues related to the plug-in, I found some blog posts and comments about the plug-in. I have listed them below.
All of these feedbacks were good, except the Mac OS X issue. I am trying to fix it as soon as possible.
To add new feedbacks, users can use the extension page at extension repository. Still there can be issues related to the plug-in. Those issues and feature requests can be added to the CC issues list.
Nathan suggested a feature for Flickr; to extract add images from a given URL. So users can copy and paste the URL to OpenOffice.org and the plug-in will automatically download the image and add licensing data also. This can be used for others services also.
In addition to that localization should be completed.
This is time time to thank all the people who helped me in achieving this. I should thank the following people specially for helping me;
- Nathan Yergler: for helping me with good suggestions in the pre GSoC period and in the community bonding period.
- Christopher Allan Webber: for mentoring me in this process and for all the good suggestions he gave.
- Nathan Kinkade: for helping with the SVN issues (yes I had some :)).
- Alex Roberts and Greg Grossmeier: for their valuable suggestions in making the GUI.
Now it is the time to rip off the “test” part from the plug-in name “Creative Commons Licensing (test)” :). The plug-in was released for beta testing since May 2010 and it was downloaded more than 400 times. But making a perfect program is impossible thing. So I appreciate if the users can report any issues and suggestions as I mentioned in the Feedback section. I will always try to fix/implement them.
More information about the plug-in and the screen shots are available in the wiki page.6 Comments »
Google Summer of Code 2008 approaches its end, as less than forty-eight hours are left to submit the code that will then be evaluated by mentors, therefore it is fitting to pause for a moment and sum up the work that has been done with regard to the license-oriented metadata validator and viewer and to confront it with the original proposal for the project.
A Web application capable of parsing and displaying license information embedded in both well-formed and ill-formed Web pages has been developed. It supports the following means of embedding license information: Dublin Core metadata, RDFa, RDF/XML linked externally or embedded (utilising the data URL scheme) using the link and a elements, and RDF/XML embedded in a comment or as an element (the last two being deprecated). This functionality has been proven by unit testing. The source code of a Web page can be uploaded or pasted by a user, there is also a possibility to provide a URI for the Web application to analyse it. The software has been written in Python and uses the Pylons Web Framework and the Genshi toolkit. Should you be willing to test this Lynx-friendly application, please visit its Web site.
The Web application itself uses a library called “libvalidator”, which in turn is powered by cc.license (a library developed by Creative Commons that returns information about a given license), pyRdfa (a distiller that generates the RDF triples from an (X)HTML+RDFa file), html5lib (an HTML parser/tokenizer), and RDFLib (a library for working with RDF). The choice of this set of tools has not been obvious and the library had undergone several redesigns, which included removing the code that employed encutils, XML canonicalization, µTidylib, and the BeautifulSoup. The idea of using librdf, librdfa, rdfadict has been abandoned. The source code of both the Web application (licensed under the GNU Affero General Public License version 3 or newer) and its core library (licensed under the GNU Lesser General Public License version 3 or newer) is available through the Git repositories of Creative Commons.
In contrast to the contents of the original proposal, the following goals have not been met: traversal of special links, syndication feeds parsing, statistics, and cloning the layout of the Creative Commons Web site. However, these were never mandatory requirements for the Web application. It is also worth noting that the software has been written from scratch, although a now-defunct metadata validator existed. Nevertheless, the development does not end with Google Summer of Code — these and several new features (such as validation of multimedia files via liblicense and support for different language versions) are planned to be added, albeit at a slower pace.
After the test period, the validator will be available under http://validator.creativecommons.org/.1 Comment »
The source code is located in two dedicated git repositories. The first being validator, which contains the source code of the Web application based on Pylons and Genshi. The second repository is libvalidator, which hosts the files that constitute the core library that the project will utilise. This is the component that the development focuses on right now.
The purpose of the aforementioned library is to parse input files, scan them for relevant license information, and output the results in a machine-readable fashion. More precisely, its workflow is the following: parse the file and associated RDF information so that a complete set of RDF data is available, filter the results with regard to license information (not only related to the document itself, but also to other objects described within it), and return the results in a manner preferable for the usage by the Web application.
pyRdfa seems to be the best tool for the parsing stage so far. It handles the current recommendation for embedding license metadata (namely RDFa) as well as other non-deprecated methods: linking to an external or embedded (using the “data” URL scheme) RDF files and utilising the Dublin Core. The significant lacking is handling of the invalid direct embedding of RDF/XML within the HTML/XHTML source code (as an element or in a comment) and this is resolved by first capturing all such instances using a regular expression and then parsing the data just as external RDF/XML files.
Once the RDF triples are extracted, one can use SPARQL to narrow the results just to the triples related to the licensed objects. Both librdf and rdflib support this language. Moreover, the RDF/XML related to the license must be parsed, so that its conditions (permissions, requirements, and restrictions) are then presented to the user.
The library takes advantage of standard Python tools such as Buildout and nose. When it is completed, the project will be all about writing a Web application that will serve as an interface to libvalidator.Comments Off
Last week Kinkade asked me for a brief overview of how the license engine, web services and other bits of code all fit together to create the joy that is creativecommons.org. “Sure,” I thought; “that’s simple!”
Er, maybe not. Fourty-five minutes, five marker colors and multiple digressions later, I had the following diagram of life as it is today.
Asheesh joined us and we started talking about how we can make this better. The above, while eminently sucky, has grown up during my time at Creative Commons. All those decisions made sense at the time, but in aggregate we’ve got lots of duplicated code, a branch of code named the
gradually-increasing-sanity-branch which doesn’t (I take the blame for that one), and plenty of unnecessary complexity. Half an hour later, we had mapped out The Glorious Future®:
A little simpler, huh? And the “future” diagram shows all the functionality of the present, plus three packages not displayed on the original diagram. Our immediate goal in moving in this direction is the completion of
cc.license (labeled as “cc.licenze” in the diagrams to distinguish it from the existing implementation) which will replace the existing XSLT processing using for issuing licenses and wraps the RDF (which is the canonical representation of the licenses anyway). We’ll also manage to dramatically reduce the number of
svn:externals we use, which is good since we’re moving away from Subversion for some projects. My goal is to get this upgrade done as soon as possible so we can focus on things that are actually interesting instead of our own infrastructure.