summer of code

A Farewell

Scott Shawcroft, August 24th, 2007

Well, today is my last day. I’ve thoroughly enjoyed my time here at CC. Thanks to everyone here at the office and elsewhere who I interacted with this summer. I’ve met a lot of innovative, hardworking folks and they all deserve a big kudos. While I’ve been here I’ve created liblicense which had been conceived for a while by Jon. I think its at a point where inclusion into content platforms is now the key. We’ve done three releases but technically four. None of this would have been possible without the help of Jason Kivlighn. Back at the University of Washington (UW) him and I were roommates and will be living in the same house this fall. Jason is a super coder who finished his own Google Summer of Code project about halfway through the summer. Instead of just fizzling out he helped me by revamping parts of liblicense and writing modules to support all of the formats we currently support. I wouldn’t have gotten nearly as far without Jason’s help. Before I bid farewell and move on, let me recap a bit of what I did.

Well, my primary project was liblicense. Basically, I was handed down the idea of tracking licenses on ‘the desktop’. I arrived to find tons of UI mockups created by Rebecca for my implementing pleasure. Before I began implementing the UIs I spent the first three or four weeks conceptualizing and implementing the core liblicense library. The first (and second) release was on July 13. After this release I began focusing on Gnome desktop integration. I really enjoyed seeing a frontend to the hardwork I did on the backend. Hopefully, more will be done. At the same time as I did the Gnome integration, Jason mirrored my efforts in KDE4. After the release of the gnome stuff and library changes on July 30, I moved onto Sugar integration. I’ve since finished the Sugar integration and the files are available here: http://cctools.svn.sourceforge.net/viewvc/cctools/liblicense-sugar/trunk/ . We’ll probably never officially release it due to its exclusivity to the Sugar interface. I’m proud of where liblicense has come. Like I’ve written before, I think that license awareness is important and liblicense is a small, important step in the right direction.

My other project which I volunteered to help with was the LiveContent liveCD. When I got here Tim was assigned the task of producing the new Creative Commons LiveCD with a target of LinuxWorld (August 8th) as a release date. After a week or so here I offered to technically advise the project. While my intention was to only be a reference I quickly became technically responsible for the CD and its creation. Although this was not what I had bargained for, it was well worth it. To create the CD we utilized livecd-creator, a Fedora LiveCD creator. It was a challenge for me to adapt to a different packaging system (from portage for Gentoo to rpm for Fedora). Additionally, although livecd-creator is a step in the right direction (like liblicense) it did not fulfill all of our needs and resulted in some hacky scripts to build the CD. To counter those frustrations I added an easter egg to the CD which I will not disclose. :-D Having Tim orchestrating the entire project also reduced my stress level quite a bit. However, it is quite nerve-racking creating a LiveCD to be duplicated 1000 times and pushed by both Creative Commons and Fedora. All in all, it turned out great. The next version(s), which I will not be a part of, should be even better.

I’m moving on. While I’ve enjoyed my time here at CC, I’ll not be continuing on these projects. I’ve got a somewhat short attention span for projects and it has run dry for both LiveContent and liblicense. Over the last few days I’ve been brain dumping so that Asheesh can pick up where I left off. This move should be good for the projects because Asheesh is a very smart and driven person. You should check out his current project jsWidget. Since I’m done, I’ll be driving home to Washington (the state not the district) tomorrow morning. When I get there I’ll be enjoying home and working on educational materials for teaching Python in the intro Computer Science course at UW. After that I plan on focusing on my own projects.

Lastly, I’d like to thank everyone here at CC. Alex, our graphic designer, has been a huge help on both liblicense (he wrote the Ruby bindings) and LiveContent ( he did the sweet packaging). Nathan Yergler’s flexibility allowed me to switch between projects on my own will and thus keep stress to a minimum. All of the other CC interns, Tim, Rebecca, Cameron and Thierry, made this summer great because we bonded as a group as we worked hard and hung out at various events. Finally, Jon has been an immense help by entertaining my questions, dealing with my grumpiness and encouraging me on both of my projects. As always, there are many others who made this a great experience but the folks mentioned are those who I worked with day-to-day. Thanks, everyone. I hope that what I started while here at CC sees much use in the future. Cheers.

No Comments »

CC OpenOffice Addin Update: Added Impress funcionality!

ksiomelo, July 24th, 2007

Hello everyone!

I’ve made some changes in the addin and enabled to put licenses in Impress documents. The entire architecture has changed and I’m working on this right now. The newer version is in the CC sourceforge repository!

CC OpenOffice Addin working on Impress

No Comments »

Sidecar XMP and License Extractors in Tracker

Jason Kivlighn, July 10th, 2007

Tracker has accepted my patches to read XMP sidecar, as well as patches to extract licenses from MS Office (old format), TIFF, HTML, PNG, and PDF. This support will be available in the 0.6 release, which potentially will be released later this week.

My final set of patches will additionally add support for extracting licenses from JPEG, SVG, and OpenOffice’s OASIS. Also, through GStreamer, Tracker already recognizes licenses of Vorbis and FLAC.

This marks the half-way point of Summer of Code 2007.

1 Comment »

CC OpenOffice.Org AddIn updates

ksiomelo, July 7th, 2007

Hello all,

updates of the version 0.0.2:

* Creative Commons menu became visible in Calc and Impress

Although it’s is not working properly yet, the addin is now supporting the other ooo applications.

* License Image bug fixed

Now the addin is retrieving the licenses images perfectly!

* Display dialog when opening licensed documents

A simple dialog box is shown when a CC licensed document is opened.

* Checks if the document is already licensed and warns the user

Something like “You have chosen a different license, do you want to proceed anyway?”

… serveral other minor-updates were made in the addin.

Take a look at the screen shots!

License inserted in the document
Document already licensed
Opening a licensed document
Using the CC autotext to replicate the license

Want to try? Just download the ccooo.oxt file and install it from Extension Manager in OpenOffice.Org!

Next steps:

  • Internationalization support;
  • Exception handling (including timeout);
  • Some changes on GUI, such as adding progress bars;
  • Settings menu?
  • Work on the same functionalities in Calc and Impress.
  • 2 Comments »

    Liblicense has licenses! 376 of them…

    Jason Kivlighn, July 6th, 2007

    Prepping for the 0.1 release, I’ve generated RDF descriptions of all CC licenses in all available jurisdictions, as well as the GPL, LGPL, and Public Domain.

    Available here:
    https://cctools.svn.sourceforge.net/svnroot/cctools/liblicense/trunk/licenses/

    Each license, if applicable, has all the attributes laid out on the wiki, including localization. One problem, however, is getting localized descriptions of the licenses. That isn’t available at https://cctools.svn.sourceforge.net/svnroot/cctools/i18n/trunk/i18n/

    Licenses were generated with this python script, which reads the relevant information from creativecommons.org and cctools svn.

    No Comments »

    More Summer of Code — Liblicense, Tracker, Beagle,…

    Jason Kivlighn, June 30th, 2007

    Let’s see, where am I at.

    Code in GStreamer to read the license URI is getting pushed through. Now there’s Bug #451939 that updates the GStreamer API with a license and copyright uri tag. When this all gets pushed through, access to the license URI will be available through GST_TAG_LICENSE_URI and/or GST_TAG_COPYRIGHT_URI.

    In Tracker, I’ve written code to handle generic indexing of embedded/sidecar XMP. Previously it just extracted the license, and now any elements can be pulled out and indexed. Currently, Dublin Core and CC elements are indexed. The code is still local, and yet to be committed.

    In another direction, I’ve been lending a hand to liblicense. As mentioned in Scott’s previous post, I’ve got two i/o modules ready. Both are based on Exempi. One reads/writes license metadata directly into Quicktime, AVI, PDF, PNG, TIFF, and JPEG formats. The other read/writes sidecar XMP for any format. There’s more to come.

    I also want to look into a liblicense config module and frontend for KDE4. I figure I can put my KDE programming experience to good use.

    And in yet another direction, I’m looking into indexing licenses in Beagle. After browsing the code, I can adapt most of what I learned about license metadata while working with Tracker to extending Beagle. I even notice that their image formats filters already support extracting XMP, so adding the extra license checks is straightforward. A preliminary patch and request for feedback has been posted on their mailing list.

    All in all, I’ve done some work here and there, for this project and that…

    1 Comment »

    It’s coming! CC OpenOffice.Org Add-in

    ksiomelo, June 27th, 2007

    As you may know, I am working on this project as part of the Google Summer of Code program. Before starting, François Dechelle was already engaged and now we are joining efforts to develop this potentially-popular application!

    The current state of the prototype is already adding the licenses in the body of the document.

    The license wizard

    After the license have been chosen, it becomes available as an Auto-Text, being able to be easily replicable in the document without having to create a new wizard. The name of the license and the URL are also being stored in the document’ metadata.

    License text inserted

    Currently I am working on a weird bug which is not allowing the ooo API to retrieve some images of the available licenses at http://i.creativecommons.org/l/ (it was working perfectly few days ago…)

    We’re going to commit a stable version ready for download in the next weeks, but if you are eager to use it you can check out this folder at cctools repository:

    ./ccooo/

    Feedbacks and suggestions are welcome!

    Cheers

    3 Comments »

    Indexing License Metadata in Tracker, Week 2

    Jason Kivlighn, June 19th, 2007

    I’ve made progress extracting licenses from the following formats: Vorbis, MP3, FLAC, PDF, JPEG, TIFF, PNG, PDF, HTML, and MSOffice. They are by no means all done, but for several formats I have patches and am awaiting approval from Tracker.

    I’ve written a GStreamer bug report and submitted a patch to allow reading the WCOP (License URI) id3v2 tag. Discussion continues there.

    No luck with video metadata (AVI, Matroska, OGM, Quicktime). Things are just too ad-hoc in that arena to get anything worthwhile done. For Tracker, GStreamer is doing all the work on extracting video metadata, but as far as I can tell, nothing relating to licenses ever gets extracted and passed on to Tracker. GStreamer would need to be updated to read the tags, but that can’t be done unless there are consistent specs on how to do so. Exempi can embed XMP into MOV and AVI, but I don’t know how to get it back out. It may or may not be feasible to write an extractor that only extracts XMP using Exempi.

    Information on various file formats’ metadata is available here: http://wiki.creativecommons.org/Tracker_CC_Indexing While Tracker won’t specifically be indexing every format mentioned, I’m trying to document the formats relevant to Creative Commons. If I’m missing any important formats, please let me know.

    Overall, things are progressing well. At the rate things are going, by the end of the summer I’ll have become a manual for file format specifications :-/

    Cheers

    1 Comment »

    Indexing License Metadata in Tracker, Week 1

    Jason Kivlighn, June 12th, 2007

    Week 1 of Google Summer of Code is complete and already I’m seeing much progress. There’s a mess of formats to embed licenses into and a mess of ways to embed them. My first task has been straightening out where licenses are embedded in each format and how exactly to go about extracting them. Here’s where I’m at:

    Format Form of Metadata Location of Metadata Extraction with Tracker Test content
    MP3
    • XMP
    • Native id3 tags
    • For id3v24, the PRIV,XMP field
    • WCOP tag
    Extracting MP3 tags has moved from an ID3 parser to handing off the work to GStreamer/MPlayer/Totem. As far as I can tell, this prevents me from extracting the XMP. XMP embedded with Exempi
    PDF XMP metadata field Extend the current PDF extractor (which uses Poppler) to read the metadata field. However reading the metadata field isn’t wrapped in Poppler’s glib bindings, but I have written and submitted a patch. XMP embedded with Exempi
    OGG
    • XMP
    • Native comment field
    • XMP comment field
    • LICENSE comment field
    Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10). XMP embedded with vorbiscomment
    JPEG XMP Exif XML Packet field Extend the Imagemagick extractor, using ‘convert file.jpg xmp:-’ to read XMP XMP embedded with Exempi
    PNG XMP iTXt, XML:com:adobe:xmp field Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.) XMP embedded with Exempi
    HTML RDFa <a rel=”license” href=”…”></a> Write a new HTML extractor, using libxml2, and scan for RDFa Various actual sites, including creativecommons.org
    SVG RDF /svg/metadata/rdf I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also??? Inkscape
    Any XML XMP Wherever valid Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2
    OpenOffice.org (OASIS) OO.org CC License Add-In SoC Project is working on the spec OO.org Add-In
    MS Office DocumentSummaryInformation Infile, CreativeCommons_LicenseURL property Extend existing msoffice extractor MSOffice Add-in

    If this is all well and good, I’d like to help update the CC Wiki with updated embedding specifications.

    As far as coding goes, I wrote the code for Tracker to check for and extract metadata from XMP sidecar files. XMP is parsed by Hubert’s XMP library. The timing of Adobe’s release of their XMP Toolkit and Hubert subsequently release of Exempi 1.99.x, have been an early boon to the project. The ‘license’ tag in the CC namespace is the only metadata extracted at the moment.

    I’ve also been hacking the extractors of the above list of formats to determine the feasibility and processes of extracting license metadata from each.

    Where I stand now is that feedback on the above would be much appreciated and if all is well I can get the XMP sidecar code I have pushed into Tracker’s Subversion repository soon.

    Happy hacking, indeed.

    6 Comments »

    Summer of Code: “Converting biomedical text mining data to RDF, integrating results with existing Neurocommons RDF data, generating a RDFa-based web interface for presentation”

    matthiassamwald, April 18th, 2007

    About the project

    The Neurocommons project wants to lay the foundation for a Semantic Web for neuroscience by creating resources that others will want to link to, extend, and build upon, and in so doing, to set an example that can be replicated in other scientific disciplines. One aspect of that goal is to make relationships in biomedical texts openly accessible on the Semantic Web.

    The proposed project will add to the value of the Neurocommons project in two ways:

    1) It will use the Whatizit text mining resource of the European Bioinformatics Institute as a source to generate RDF.
    The text mining data of Whatizit are of a high quality and provide rich information that complements existing text mining data from the Neurocommons project. The RDF derived from Whatizit will be integrated with the existing Neurocommons annotations, resulting in a significantly increased coverage. The software written will demonstrate a typical pipeline for conversion of text mining results to good quality RDF.
    The Whatizit service can mine either Pubmed abstracts or free text supplied by a user. We will provide a simple interface so that open access scientific articles can be easily acquired and mined by the tool, so that users with rights to non-open access scientific articles can mine that text and submit the resultant RDF, which, as knowledge, is not protected by copyright.

    2) Software will be written to set up a website to present the information derived from the Neurocommons textmining and the Whatizit derived facts. Pages on this website will use RDFa for markup, integrating the human readable text with the machine readable markup in a single document.

    The resulting resource will not only be valuable for neuroscientific researchers – it will also serve as a model implementation that unifies text mining, Semantic Web standards and the philosophy of Science Commons / Creative Commons for the advancement of scientific research and global information exchange in general.

    About myself

    My name is Matthias Samwald and I come from Austria. I have studied neurobiology at the University of Vienna from 2000 to 2005. The work of my doctoral thesis, starting in 2005, is focussed on the use of Semantic Web technologies in neuroscience and biomedicine. Since 2006 I am a member of the World Wide Web Consortium (W3C) as an ‘invited expert’. I am an active participant of the “Semantic Web in Health Care and Life Science Interest Group” of the W3C.

    My curriculum
    My doctoral thesis and corresponding publications

    No Comments »
    Page 3 of 4«1234»