$url = sprintf('http://en.wikipedia.org/w/api.php?action=query&titles=%s&prop=info&format=json', urlencode($search)); $f = fopen($url, 'r'); $res = ''; while (!feof($f)) { $res .= fgets($f); } require_once 'Zend/Json.php'; $val = Zend_Json::decode($res);Once this has executed, $val is an array with the response details.
The problem we started to encounter was that this code started to throw a 403 HTTP status error. The 403 status code means access is denied.
A quick investigation turned up the following page meta.wikimedia.org/wiki/User-Agent_policy which details how, in order to use the API, you now need to pass a User Agent string along with the request. Requests without the User Agent string are refused. User Agent strings are sent by requests from browsers and are used to describe the software that is making the request.
The problem was that fopen() doesn't send a User Agent string and can't be used to do so.
This is where cURL comes in (www.php.net/manual/en/intro.curl.php). cURL is a library for communicating over various internet protocols and allows you to set headers in requests. The same code above, rewritten to use cURL, is as follows:
$url = sprintf('http://en.wikipedia.org/w/api.php?action=query&titles=%s&prop=info&format=json', urlencode($search)); $ch=curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_USERAGENT, 'your website address or app name'); $res = curl_exec($ch); curl_close($ch); require_once 'Zend/Json.php'; $val = Zend_Json::decode($res);This small change was all that was needed to appease wikipedia's User Agent requirement.
Thank you so much for this article; I was looking at all the wrong reasons for the new 403 my script was recieving.
ReplyDeleteOne question though (I'm new to php) -- what does "Zend_Json::decode($res);" do (including the handling in the Zend_Json.php file)?
Thanks,
Casey
Hi Casey
ReplyDeleteZend_Json::decode() is part of the Zend Framework (http://zendframework.com/manual/en/zend.json.basics.html) and is used here to turn the JSON string that Wikipedia returns into native PHP associative arrays.
Thanks! Exactly what I wanted to know.
ReplyDeleteHi moo. Thankyou for your example, ive managed to get a response however the array only tells me this information from wikipedia (searched for football) -
ReplyDelete[pages] => Array
(
[23976719] => Array
(
[pageid] => 23976719
[ns] => 0
[title] => Football
[touched] => 2011-04-28T16:48:04Z
[lastrevid] => 426407882
[counter] =>
[length] => 89919
)
)
how would i actually get the information about the subject football?
Thanks,
DIM3NSION
Hi DIM3NSION
ReplyDeleteTo get the actual content, I used the parse action from the wiki API. So, the URL in your case would be
http://en.wikipedia.org/w/api.php?action=parse&page=football&redirects=1&format=json&prop=text
I hope this helps.
Michael
Perfect, thanks for that. I'm struggling to figure out a way of parsing just the introduction text that accompanies the article. Any ideas?
ReplyDeleteThanks,
DIM3NSION
In follow up to my previous message moo ive been able to cut it down by adding - §ion=0 to the end of the URL. However this returns the images all i want is the introduction text? Id much appreciate your help on this topic
ReplyDeletethanks,
DIM3NSION
Is goog, I want user it in my web http://www.satelliteview.org
ReplyDeleteThank you for this helpful post.
ReplyDelete