Upgrade to Debian Squeeze and Mediawiki woes

Nathan Kinkade, February 10th, 2011

Just a number of days ago Debian released Squeeze as the new stable version. I decided to test the upgrade one or two of CC’s servers to see how it would go. The upgrade process was standard and went without problems, as one comes to expect with Debian. Any problems with the upgrade didn’t manifest until I noticed that one of our sites running on Mediawiki had apparently broken.

I narrowed the problem down to several extensions. Upgrading to Squeeze brought in a new version of PHP, taking it from 5.2.6 (in Lenny) to 5.3.3. PHP was emitting warnings in the Apache logs like:

Warning: Parameter 1 to somefunction() expected to be a reference, value given in /path/to/some/file.php on line ##

Looking at the PHP code in question didn’t immediately reveal the problem to me. I finally stumbled across PHP bug 50394. A specific comment on that bug revealed that the issues I was seeing were not a bug, necessarily, but the result of the way PHP 5.3.x handles a specific form of incorrect coding.

In summary, it turns out the problem is related to Mediawiki hooks and its use of the call_user_func_array() PHP built-in function. The function takes two arguments: a user function name, and an array of arguments. If the called function expects some of the arguments to be passed in by reference, then each element of the passed array must be explicitly marked as a reference. For example, this is correct:

function lol ( &$var1, $var2 ) { //do something };
$a = 'foo';
$b = 'bar';
$args = array( &$a, $b )
call_user_func_array('lol', $args);

However, you will get a PHP warning, and a subsequent failure of call_user_func_array(), if $args is defined like (missing the & before $a):

$args = array( $a, $b );

Interestingly, the “correct” way of handling this case, where the callback function expects referenced variables, also happens to be deprecated, as a form of call-time referencing, and the call_user_func_array() documentation states this:

Referenced variables in param_arr are passed to the function by reference, regardless of whether the function expects the respective parameter to be passed by reference. This form of call-time pass by reference does not emit a deprecation notice, but it is nonetheless deprecated, and will most likely be removed in the next version of PHP.

As far as I can tell, this deprecated method is the only way to handle this, yet PHP may drop this functionality. Presumably another method will replace it before that happens, but the ambiguity at the moment leaves one wondering how to properly code for this without risking that the code will break in a future release of PHP. I suppose the only sure way is to make sure that your call-back doesn’t require or need any referenced variables. I’d be happy for someone to point me to the right way to handle this, if for some reason my research just failed to produce the correct method.

I found this breakage in the following modules, but presumably it exists in many more:


The fix for the ReCAPTCHA extension was easy, since it’s published on the extension’s page. For the other extensions, I investigated the places where this problem was occurring and removed the references from the function definitions, but not before poking around a bit to make reasonably sure that the references weren’t fully necessary.

Lesson: use caution when doing any upgrade that moves you from PHP <5.2.x to >5.3.x. Google searches reveal that this issue is rife in not only Mediawiki, but also Joomla!, and presumably any other CMS or framework that makes use of call_user_func_array().


Technical case studies now on the wiki

Alex Kozak, November 18th, 2010

Have you ever tried to implement CC licensing into a publishing platform? Would it have been helpful to know how other platforms have done it?

I’ve just added a collection of technical case studies on the CC wiki looking at how some major adopters have implemented CC. The studies look at how some major platforms have implemented CC license choosers, the license chooser partner interface, CC license marks, search by license, and license metadata.

Several of the case studies are missing some more general information about the platform, so feel free to add your own content to the pages. Also, everyone is welcome to add their own case studies to the CC wiki.

Here is a list of new technical case studies:

No Comments »

Case study of a simple but highly effective use of Semantic MediaWiki on the CC Wiki

Alex Kozak, September 8th, 2010

Whether it’s setting up the columns of a spreadsheet or defining a data structure in python, any project that involves gathering structure-able data for some purpose requires technical support. Choosing the right technical tool for the job involves careful consideration of your requirements while acknowledging your constraints.

At Creative Commons, we’re never running out of ideas for useful collections of information. We’re always looking for ways to highlight interesting uses of our legal and technical tools, to approximate the impact that our tools are having, and to better engage our variety of user bases and communities. But while we have a lot of exciting ideas for new datasets, we don’t always have the resources or infrastructure to build new data collection and management tools for each of those projects.

As a solution to this constraint, we rely heavily on Semantic MediaWiki on the CC Wiki to manage various data-sets related to Creative Commons. For the uninitiated, Semantic MediaWiki is an extension to MediaWiki, the popular open source wiki platform that powers Wikipedia (all of the extensions discussed here are also open source). Semantic MediaWiki adds powerful organizational tools to your MediaWiki installation, allowing data queries, data I/O, powerful methods for page organization and collection, and when combined with some useful helper extensions such as Semantic Forms and Semantic Drilldown, user-friendly template call creation and data browsing.

One example of an effective use of Semantic MediaWiki, which recently underwent some maintenance, is the Case Studies database on the CC wiki. The Case Studies database uses Semantic MediaWiki and Semantic Forms to collect, annotate, and aggregate data contained on wiki pages about uses of Creative Commons license from around the world. For an example page, see the Case Study on Cory Doctorow.

Each Case Study page contains two basic elements: A template call and free text. The free text is unstructured and typically contains no semantic annotations (unless provided by the user). When you create a new wiki page in MediaWiki, you’re just editing the page text. In the form to create a page, we’ve set it up to pre-populate the free text with some suggested structure for the Case Study, but otherwise the free text is a blank slate for whatever content the contributor wants to provide.

The template call is where all of the semantic annotations and interesting data queries are enabled. In each Case Study, the “Case Study” template is called with parameters defined in that template. The strings in each template call parameter gets assigned to semantic properties and processed for rendering (e.g. to turn a string into a link to a wiki-page if it exists). There can be many arbitrarily-named parameters for any MediaWiki template, and it wouldn’t be easy for anyone to add a Case Study if they had to know the proper parameters for the template. Thanks to Semantic Forms, we’re able to construct forms for users to fill out that then construct a template call and free text on a page.

But you might ask: What are the qualities of Semantic MediaWiki that make it useful for my projects?

Semantic MediaWiki enables “view source” for databases. This means that all of the template, property pages, forms, drilldown filters, and pages are viewable and editable with a complete page history for each. That is, the markup defining the database is editable by anyone. Of course, the pages could be protected from edits, but in general the markup is at least accessible. This gets you the possibility for user-driven development and rapid feedback. You might define a data structure for a community of users and come back to find that it’s been modified to be more useful for its application without you having been involved at all.

Any registered user on the CC Wiki can create a wiki page, and thus, any user can contribute an item to the Case Studies database. Because Semantic Forms just creates or modifies template calls on pages, each page constructed or edited with Semantic Forms shows up on Recent Changes, and the complete page history will be available for review. In this respect, the data collection process in a Semantic MediaWiki database is transparent. This is important for most kinds of data, since usually the two types of data you might collect will be data requiring some subjective assessment or data that is meant to represent facts. For projects which you expect a large contributor base, you can expect with near-certainty that someone will eventually make a subjective assessment that diverges from common sense, or will add data that misrepresents some important fact. In either case, having a transparent data collection process mitigates the risk of bad data. This holds true for non-page namespace pages as well (template pages, property pages, forms, etc).

Lastly, the database structure is highly mutable. In many data collection efforts, particularly those with some idea how the data will be analyzed or applied, the process of gathering data informs the data you collect. For example, in collecting case studies of Creative Commons licenses, you might find that almost all of them fit into a few media types. With Semantic MediaWiki, it becomes trivial to create a new field in a form and associated property in the template, or if you have an existing structure for that type of data, to modify the kinds of data that property accepts. You could even change the allowed values for a property or change the data type and easily fix any incompatibilities that arise.

For example, we recently decided to add a method for Case Studies evaluation to the database. All it required was to create a partial form using Semantic Forms that populate two new property mappings in the template (Quality and Importance). This new form just contains two drop-down menus that let users select Quality or Importance values for the page and save that data back into the template call on the page. SMW allowed us to extend the data we collected on each page. But additionally, halfway through the development process we decided to use a different metric for quality. It was trivial to change the list of allowed values on the property page for that property and then query the existing data for pages needing updating to the new metric.

In short, Semantic MediaWiki is a powerful tool allowing rapid, decentralized development of complex databases that requires minimal investment into technical infrastructure. It’s also a method to create a truly collaborative database that is an asset to you and to your community.

1 Comment »

RDFa for Semantic MediaWiki [GSoC 2008]

David McCabe, July 1st, 2008

Hello, world!

My name is David McCabe, and this summer I am adding RDFa support to Semantic MediaWiki, as part of the Google Summer of Code 2008. I am an undergraduate in Mathematics at Portland State University. For the Google Summer of Code 2006, I wrote Liquid Threads, a MediaWiki extension that replaces talk pages with a threaded discussion system.

Semantic MediaWiki (SMW) is the software used for the CC wiki and many other wikis. SMW allows authors to mark up wiki pages so that their contents and relationships are machine-readable. SMW already publishes this machine-readable data in RDF/XML format.

You can read about RDFA on the CC Wiki. There is also a Google Tech Talk on RDFa.

No Comments »

Semantic Annotations on CC Wiki

Thierry Kennes, August 9th, 2007

We have just implemented Semantic MediaWiki on our wiki. SMW allows additionnal markup into the wiki-text and improve the overall quality and consistency of the wiki. It may appear to make things more complicated but it actually makes easier for users to find more information. Using SMW’s own inline querying tools, a page could then be created that lists almost everything you want.

When you add contents as for now, please use semantic annotations. We have created special pages for you that will help you to easily do that.
At the moment, you can find instructions for Books, Content Curators and Content Registry.
Do not hesitate also to use our forms, there are much more easier.

No Comments »

OpenID on the CC Wiki

Thierry Kennes, July 26th, 2007

Creative Commons’ wiki is now an OpenID enabled-site. Don’t hesitate – if you don’t have one yet – to create an OpenID account. There are severals OpenID providers so merely choose one from the list below :

VeriSign Personnal Identity Provider

A really nice screencast by Don McAllister explaining how to use OpenID can be viewed here.

Edit:  Another screencast from Simon Willison

1 Comment »

Semantic Videowiki

Mike Linksvayer, July 6th, 2007

Don’t get too excited by the post title, just pointing out a nice 5 minute video (mp4) from Oxford Geek Nights explaining the basic features of Semantic Mediawiki. Thierry is experimenting with SMW features on the CC Wiki.

No Comments »


Mike Linksvayer, May 10th, 2007

Today’s release of MediaWiki 1.10 reminds me that the CC Wiki is badly in need of an upgrade (from 1.6). Here’s the todo list:

  • Upgrade to MediaWiki 1.10
  • Add more spam prevention, remove barriers
    • possibly no image upload on new accounts?
    • add capcha?
    • remove need for email confirmation?
  • Install OpenId extension
  • Install Semantic MediaWiki extension

That’s in addition to migrating and adding lots more content there and doing a theme refresh.

The release also reminds me to look at an old patch to AJAX-ify Creative Commons license selection option in installer, makie it work in 1.10, and try to get it into the mainline.

No Comments » skin available

Nathan Yergler, April 23rd, 2007

The skin for Creative Commons’ “wiki”: is now available from Subversion. You can find decidedly minimalist details (where else) “in the wiki”:

No Comments »