summer of code
Creative Commons Drupal Module — GSoC 2009
Blaise Alleyne, September 3rd, 2009
This past year was my last at the University of Toronto, making this summer my last chance to participate in the Google Summer of Code. I searched hard for a project and mentor organization that would suit my interests, and when I noticed that the Creative Commons Drupal module was in need of some developer love, I knew exactly what I wanted to spend my summer doing. With John Doig as my CC mentor, and Kevin Reynen (the module’s maintainer and initial author) as an unofficial Drupal mentor, I’ve been privileged to have spent the past few months updating and extending the module.
A couple years ago, development for Drupal 4.7 was begun, but it was never quite completed. CC Lite came to be the reliable choice for Drupal 6. However, CC Lite’s scope is limited — it allows you to attach a license to content in Drupal, but that’s about it. The main CC module’s vision is broader — to fully integrate CC technology with the Drupal platform — and I hope I’ve helped to realize that just a little.
Some of the module’s features:
- it uses the CC API for license selection and information (so, for example, when new license versions are released, they become available on your Drupal site automatically)
- you can set a site-wide default license/jurisdictoin, and user’s can set their own default license/jurisdiction
- ccREL metadata is supported, output in RDFa (and, optionally, RDF/XML for legacy systems)
- supports CC0, along with the 6 standard licenses and the Public Domain Certification tool
- you can control which licenses and metadata fields are available to users
- basic support for the Views API has been added (including a default /creativecommons view)
- there’s a CC site search option
The module is still listed as a beta release, as some folks have been submitting bug fixes and patches over the past few weeks, though it’s quite usable. Special thanks to Turadg Aleahmad, who’s helped with a lot of the recent bug fixes towards the end of the GSoC term, and committed to being active in future development. If you’re into Drupal development, we could use help with testing, and any translations would be greatly appreciated too.
Right now, the focus is on getting to a stable release, but we’ve got lots of ideas for the future too. Thanks to John and Kevin for their support through the summer, and to Turadg for his recent help. I look forward to seeing the module put to good use!
I’m a musician, writer, software developer, free culture / free software advocate and recent graduate of the University of Toronto — get in touch at http://blaise.ca/
No Comments »OpenOffice.org Add-in Updates – GSoC 2009
NiMaL, July 6th, 2009
I’ve been working on the GSoC 2009 project, where I’m working on certain updates to the existing Creative Commons Add-in for OpenOffice.org. This project is mentored by Nathan Yergler.
The main goals for this project is to provide the following updates to the existing plugin.
- Update the codebase to the OOo 3 SDK
- The license selection UI could be refined to provide help around what each option means (”what is Share Alike”).
- Display license information when opening CC licensed documents
- Internationalization – prepare the code for translation and write the scripts to integrate PO files prepared by translators
- Support for OOo Draw
- Add support for CC0
- Make a release incorporating Flickr Image Re-Use for OpenOffice.org.
As there are many distinct tasks, I’m working on complete the most I can. I have been struggling a bit in the past couple of weeks with my progress, but I’m now bouncing back and coming back to track.
I hope I’ll be able to present the CC community with an updated version of the Add-in at the end of the project completion.
No Comments »New validator released!
asheesh, January 6th, 2009
This past summer, Hugo Dworak worked with us (thanks to Google Summer of Code) on a new validator. This work was greatly overdue, and we are very pleased that Google could fund Hugo to work on it. Our previous validator had not been updated to reflect our new metadata standards, so we disabled it some time ago to avoid creating further confusion. The textbook on CC metadata is the “Creative Commons Rights Expression Language”, or ccREL, which specifies the use of RDFa on the web. (If this sounds like keyword soup, rest assured that the License Engine generates HTML that you can copy and paste; that HTML is fully compliant with ccREL.) We hoped Hugo’s work on a new validator would let us offer a validator to the Creative Commons community so that publishers can test their web pages to make sure they encode the information they intended.
Hugo’s work was a success; he announced in August 2008 a test version of the validator. He built on top of the work of others: the new validator uses the Pylons web framework, html5lib for HTML parsing and tokenizing, and RDFlib for working with RDF. He shared his source code under the recent free software license built for network services, AGPLv3.
So I am happy to announce that the test period is complete, and we are now running the new code at http://validator.creativecommons.org/. Our thanks go out to Hugo, and we look forward to the new validator gaining some use as well as hearing your feedback. If you want to contribute to the validator’s development or check it out for any reason, take a look at the documentation on the CC wiki.
1 Comment »Loggy: some results
Ankit Guglani, September 2nd, 2008
So after setting up EC2, S3, grabbing the files from S3, SCP-ing the python scripts and running them, one would expect to see some results. Upon the polite request of Asheesh here is a sampler.
The first script (dealing with urls that change their license, named licChange.py) results in an output which lists the URLs (that change their license [type, version or jurisdiction]), the license info and the date(s) of change:
http://blog.aikawa.com.ar/ [['by-nc-sa', '2.5', 'ar'], ['by-nc-nd', '2.5', 'ar']] ['21/Sep/2007:11:38:56 +0000', '22/Sep/2007:05:40:22 +0000']
The line above shows that the license for the URL ‘http://blog.aikawa.com.ar/’ was changed from ‘by-nc-sa 2.5 Argentina’ to ‘by-nc-nd 2.5 Argentina’ some time between 11:38:56 GMT on the 21st of September 2007 to 05:40:22 GMT on 22nd of September 2007. The format may seem a bit awkward but you can expect a facelift for the results file. I was previously planning to re-read the file to generate statistics but we can have a seperate file for storing data and another one for the stats.
Similarly, the following lines out of the results file for licChange.py from 2007-09 show license changes for ‘http://0.0.0.0:3000/’ and ‘http://127.0.0.1/actibands/castellano/licencias.htm’ and *many other internal URLs:
http://0.0.0.0:3000/ [['by-nc-sa', '3.0', ''], ['by-nc-sa', '3.0', ''], ['by-nc-sa', '3.0', ''], ['by-nc-sa', '3.0', ''], ['by-nc-sa', '3.0', ''], ['by-nc-nd', '3.0', 'nl'], ['by-nc-nd', '3.0', 'nl']] ['17/Sep/2007:08:10:28 +0000', '17/Sep/2007:17:50:28 +0000', '18/Sep/2007:16:25:47 +0000', '19/Sep/2007:13:03:23 +0000', '19/Sep/2007:13:11:16 +0000', '20/Sep/2007:22:16:09 +0000', '20/Sep/2007:22:16:39 +0000']
http://127.0.0.1/actibands/castellano/licencias.htm [['by-sa', '2.5', 'es'], ['by-nc-sa', '2.5', 'es'], ['by-sa', '2.5', 'es'], ['by-nc-sa', '2.5', 'es'], ['by-sa', '2.5', 'es'], ['by-nc-sa', '2.5', 'es']] ['27/Sep/2007:20:50:44 +0000', '27/Sep/2007:20:50:44 +0000', '27/Sep/2007:20:51:00 +0000', '27/Sep/2007:20:51:00 +0000', '27/Sep/2007:20:51:23 +0000', '27/Sep/2007:20:51:23 +0000']
The licenses for http://0.0.0.0:3000/ are ported for Netherlands (nl) and the one for http://127.0.0.1/actibands/castellano/licencias.htm are ported for Spain (es). Note that presently all the occurences of any URL that changes its license is outputted, this will be changed in the next nightly build. This included a better formatted result file with stats on total number of URLs changing licenses and even stats distinguishing changes between license change and version change.
Akin to this (licChange.py) there are 3 more scripts, licChooser.py, licSearch.py and deedLogs.py.
licChooser.py grabs metadata usage information and generates stats in absolute numbers and percentage of all entries, eg.: “16 out of 100 items are tagged as Audio [16%] of total entries and 29% of items with Metadata”
licSearch.py grabs information from the logs for search.creativecommons.org like the query, the engine and the search options (commercial use and derivatives).
deedLogs.py looks at the logs for the deed pages, employs MaxMind GeoIP to do a location lookup and grabs the deed page being loked at.
So this is what we have so far.
No Comments »>>> py >> file … Also if __name__ == ‘__main__’:
Ankit Guglani, September 1st, 2008
Some major updates and we have the scripts running, thanks Asheesh for the redirection idea, it works but I couldn’t get it to give me a progress bar since everything was being redirected to the file. I tried using two different functions but they needed a shared variable, so that failed, but it was nice since now I ended up with “real” python files with a main().
The journey was interesting, we went from trying >> inside python to including # -*- coding: UTF-8 -*- and # coding: UTF-8 to get it to work and after a few more bumps finally figured out the __main__
I still need to update all the scripts, but licChange which is at the forefront of all the latest developments just got bumped upto version 8.2 (which reminds me of a dire need to update GIT:Loggy!).
This also gave me an idea of how to go about getting data out of S3 for “free” … S3 to EC2 is free … SCP from EC2 is free and voila! Why would I every want to do that? Well, for starters, the EC2 AMI runs out of space around 5 GB (note: logs for i.creativecommons.org are 4.7 GB) and secondly, the scripts seem to run faster locally. The icing on the cake, I wouldn’t have to scp the result files being generated. I could possibly automate the process of running the scripts.
Thats all for now … class at 0830 Hrs in the moring (it’s criminal, I know).
I guess, I’ll just have to keep at it.
No Comments »EC2, S3Sync and back to Python.
Ankit Guglani, August 31st, 2008
So this is where we are.
Now we have EC2, we have S3Sync ruby scripts on the EC2 AMI to pull the data from S3 and we have updated python scripts that read one line at a time and use Geo-IP (which was suprisingly easy to install once GCC was functional and the right versions of the C and Python modules were attained). So deployment is on full throttle and one final bug fix for generating the final results and we are done.
So, now back to the python code. Now we have 4 scripts:
- License Change (Logs for i.creativecommons.org) [Version 7]
- License Chooser (Logs for creativecommons.org) [Version 5]
- CC Search (Logs for search.creativecommons.org) [Version 4]
- Deeds (Logs for creativecommons.org/licenses/*) [Version 2]
Each of which polls a directory for new logs, reads each new log in the stated directory, line by line and uses regular expressions to parse the information into usable statistics. Hitherto throughout the development phase, the results were passed on to stdout / console. With deployment, they now need to be written to a file, while interestingly is still to be resolved. (TypeError: ’str’ object is not callable sound familiar to anyone?)
I am greatful to Asheesh (whom I should have totally bugged more). I should’ve put in more work into the project when vactioning back home, also having less to do at school would’ve helped (studies + 3 research projects is not a recommended wotk load), but if it would be easy, it wouldn’t be fun! Oh well, I learnt a fair bit through the project and with a bit more troubleshooting we’d be good to go … for now!
No Comments »License-oriented metadata validator and viewer: summertime is winding up
Hugo Dworak, August 16th, 2008
Google Summer of Code 2008 approaches its end, as less than forty-eight hours are left to submit the code that will then be evaluated by mentors, therefore it is fitting to pause for a moment and sum up the work that has been done with regard to the license-oriented metadata validator and viewer and to confront it with the original proposal for the project.
A Web application capable of parsing and displaying license information embedded in both well-formed and ill-formed Web pages has been developed. It supports the following means of embedding license information: Dublin Core metadata, RDFa, RDF/XML linked externally or embedded (utilising the data URL scheme) using the link and a elements, and RDF/XML embedded in a comment or as an element (the last two being deprecated). This functionality has been proven by unit testing. The source code of a Web page can be uploaded or pasted by a user, there is also a possibility to provide a URI for the Web application to analyse it. The software has been written in Python and uses the Pylons Web Framework and the Genshi toolkit. Should you be willing to test this Lynx-friendly application, please visit its Web site.
The Web application itself uses a library called “libvalidator”, which in turn is powered by cc.license (a library developed by Creative Commons that returns information about a given license), pyRdfa (a distiller that generates the RDF triples from an (X)HTML+RDFa file), html5lib (an HTML parser/tokenizer), and RDFLib (a library for working with RDF). The choice of this set of tools has not been obvious and the library had undergone several redesigns, which included removing the code that employed encutils, XML canonicalization, µTidylib, and the BeautifulSoup. The idea of using librdf, librdfa, rdfadict has been abandoned. The source code of both the Web application (licensed under the GNU Affero General Public License version 3 or newer) and its core library (licensed under the GNU Lesser General Public License version 3 or newer) is available through the Git repositories of Creative Commons.
In contrast to the contents of the original proposal, the following goals have not been met: traversal of special links, syndication feeds parsing, statistics, and cloning the layout of the Creative Commons Web site. However, these were never mandatory requirements for the Web application. It is also worth noting that the software has been written from scratch, although a now-defunct metadata validator existed. Nevertheless, the development does not end with Google Summer of Code — these and several new features (such as validation of multimedia files via liblicense and support for different language versions) are planned to be added, albeit at a slower pace.
After the test period, the validator will be available under http://validator.creativecommons.org/.
1 Comment »Flickr Image Re-Use for OpenOffice.org new updates
Mihai Husleag, August 12th, 2008
Since my last article new functionalities were implemented :
- more results per page (16 to be more exactly)
- an image is inserted if you double click on it(previous was on a single click)
- i add it the functionality for Impress and Calc
- fixed some bugs related to search
Unfortunately i have a problem with the popup menu on right click menu. It seems if that if set the location of the popup on the place where the right click happens, the popup indeed will appear but only for a moment. This happens not for all those 16 results, but for lets say more than half.
Now i found some settings and at this moment the popup will appear for each result, unfortunately the location where the popup appears is not exactly on the result (slightly above). I have to work more on this.
Some screenshots :
Download extension (right click and save as)
4 Comments »GeoIP Hates Me … phail.
Ankit Guglani, August 6th, 2008
Not that I am expecting much trouble coding using the Geo-IP module, but trying to get it on to the system itself has me believing that this module is out to get me! First, mac OS X (Leopard) doesn’t come with GCC installed (shocker!) and this module needs building, so I go to get it. GCC is in packaged in with the developers tool, which is about a 2 GB install and I can’t hand-pick the components … fail. So I go get myself darwin ports, and try that route. It installs, gives me the sweet *ding*, install complete sound and when I go to terminal and … fail … no such file or directory. So I give in to its terrorist demands and make room for the developers pack thinking I’ll make up for it by actually using these tools. So I wait 19 minutes for it to complete installing, I check I have GCC [i686-apple-darwin9-gcc-4.0.1] … happily I go and python setup.py build … and what followed was not nice … a screen full of Warnings and Errors and No Build. =(
I am going to find another source and try again till it finally works!
In other news, changing all my codes to methods and including append to file for results, looking to add file-list comparison as a feature. Coming soon to a GIT repository near you!
No Comments »Flickr Image Re-Use for OpenOffice.org Demo availlable
Mihai Husleag, July 12th, 2008
Never trust a programmer when he gives you a date for something to be done . Thats what i did in my last article (2 weeks i think i said then) and here we are a month later.
What has be done since my last article :
- right click on an result (image) will show you a popup menu with the available sizes on Flickr server
- left click on a result will insert directly in Writer the image with size medium as default
- once the image is inserted some text will be added beside the image(title , link to the image, license and link to the license)
- i improved searching and the way the image is adding into Writer
- i added a more friendly interface when you want to search over license (similar with advanced search on Flickr website)
- added a previous button to see previous results if needed
- if you insert a image, when you open again the extension the previous search will be done immediately ( on the exact the same position if you used the previous or next buttons)
- about searching : we can have multiple tags (separated by space : ” “) and the relation between them is AND . also the results are ordered by interestingness
- a progress bar was added
- the extension after installation can be found here : Insert \ Picture \ From Flickr …
Some screenshots :
The results from a search by the extension vs Flickr search
Also i would to add that this extension, at this momment, works only in Writer.
Download (right click and save as)
Any suggestions or remarks are greatly appreciated.
7 Comments »