5 Answers
5

Data retrieval: Bots may not be used
to retrieve bulk content for any use
not directly related to an approved
bot task. This includes dynamically
loading pages from another website,
which may result in the website being
blacklisted and permanently denied
access. If you would like to download
bulk content or mirror a project,
please do so by downloading or hosting
your own copy of our database.

"That is why Python is blocked. " I don't get what is this sentence means? However, even I made a list of 'User-Agent' and randomly choose one of them to construct a url, the website will sent me "urllib2.URLError: <urlopen error [Errno 10060] >" or just blocked my ip from visiting their website. Can you give me more ideas? Many thanks.
–
MaiTianoMar 6 '12 at 1:52

It's totally ridiculous that they also block HEAD request which are useful e.g. to validate all links posted by a user.
–
ThiefMaster♦Mar 28 '12 at 16:09

Often times websites will filter access by checking if they are being accessed by a recognised user agent. Wikipedia is just treating your script as a bot and rejecting it. Try spoofing as a browser. The following link takes to you an article to show you how.

Some websites will block access from scripts to avoid 'unnecessary' usage of their servers by reading the headers urllib sends. I don't know and can't imagine why wikipedia does/would do this, but have you tried spoofing your headers?