Using wget to login to Mediawiki

nkinkade, April 30th, 2011

For a couple years CC has been using the Pywikipediabot to do a few small operations on a password-protected, private installation of Mediawiki. It used to create a basic page, then ask people to add information to that page, and then a few days later it would email the contents of that page to a group of people.

As of a today we are no longer using Pywikipediabot to create a page, but only to mail the contents of a page. It occurred to me that Pywikipediabot was really overkill for such a small task. I decided to write a simple shell script using wget to accomplish this task. My initial thought was to use the Mediawiki API, but all the documents I found indicated that if one merely wanted the content of a page, to use the action query parameter to index.php, such as /SomeArticle?action=raw. It wasn’t even clear to me that there would be a way to accomplish what I wanted via the API without having to parse an XML response (there may be, I just didn’t readily find it).

So I decided to use wget to work with the normal user interface of Mediawiki, but I didn’t quickly find any good information on how to go about this, or what I found was outdated and no longer worked. I’m posting this here in case it could be useful to anyone else. Here is the basic idea:


MAIL_FROM="'John Q. Public' <>"

MW_LOGIN="Some Login"

# Mediawiki uses a login token, and we must have it for this to work.
WP_LOGIN_TOKEN=$(wget -q -O - --save-cookies cookies.txt --keep-session-cookies

                                     | grep wpLoginToken | grep -o '[a-z0-9]{32}')

wget -q --load-cookies cookies.txt --save-cookies cookies.txt --keep-session-cookies 
        --post-data "wpName=${MW_LOGIN}&wpPassword=${MW_PASSWD}

wget -q -O email_body.txt --load-cookies cookies.txt 

cat email_body.txt | mail -s "${MAIL_SUBJECT}" -a "From: ${MAIL_FROM}" ${RCPT_TO}

3 Responses to “Using wget to login to Mediawiki”

  1. D says:

    Nice script, i’ve been searching for the updated process to do this.

    Strangely WP_LOGIN_TOKEN has no value. What is this for. Is’nt this value stored in cookies.txt ?

    The cookie part now works again with the latest mediawiki 1.16 however feeding the cookie back into wget to download a wiki entry does not work.

    Are you sure about the ?action=raw

  2. D says:

    I had missed a / in the wrong place. It works perfectly.

    It may be easier to include a URL instead of hard coding the website address.

    Thanks a lot man !

  3. carchaias says:

    Very helpful script I modified it to download semantic data from a File:-Page.