UPDATE: 2.x support is now mainline! Please read the wiki page for important information about the update.

A warm welcome to you, traveller. You have arrived at the home of Py-StackExchange, the library definitively proven† to be the best library for using the SE API from Python. If you are still interested (and by golly, you should be) after glancing at the masterpiece below, please check the wiki on Github.

† Ahem.

About

So, what is Py-StackExchange? Well, I'm glad you asked.

It is a Python library for querying the StackExchange API from your Python applications. Integration, ahoy!

So why should you use it? After all, the SE API is sooo simple that you might think it'd be quicker to just write your own, and that it'd be faster and you wouldn't have to look at all that documentation and do all that thinking... well:

Let's start with the API coverage - what can the API do? And, more importantly, what can the library do?

Access any StackExchange site, with just its URL! Even those that aren't online yet!

If you just can't decide which one to use, you can use StackAuth to look up the full list of sites.

Once you're online, you can view everything about users, questions, answers, badges, comments and tags.

You can even go back in time by playing with post revisions.

StalkGenerate a detailed profile of a user's life Help users by looking up every StackExchange account they have. Every single one.

And, on any of those sites, peruse a detailed history of everything they've ever done - every edit, every comment, every time they were awarded a badge... Watch StackOverflow become the new Facebook overnight with the timeline feature.

See how well an SE site is doing; obsessively check its site statistics.

Search the questions of StackExchange sites.

So, why not write your own classes to consume said pure, concentrated brilliance?

Let someone else deal with all that laborious HTTP request business... you know you want to...

All the little idiosyncratic potholes on your road to API happiness have been filled in for you. We have little elves which jump into your code and parse your JSON and your dates and your lists until every response is itself a little baby python.

URLs change 99.9% more often than the interface of this module. Fact.

It's faster than Michael Palin on a broken bicycle. It also knows about request throttling, so when it gets too fast for its own good, it applies the brakes just enough to restore order.

It loads lazily information that would take another request to fetch, meaning you never use more of your limit than you need to.

It caches requests automatically, so you need to care slightly less about writing efficient code! (new in 1.1)

Now, onto the religious advantages:

Documentation? Bah, we have naming conventions. (This feature was inspired by Rails.)Pssst - don't tell anyone, but there is documentation too, if that's your style. (README/Wiki)

Naming conventions? Who needs them? We have an interactive program that writes your code for you while you look around the StackExchange site of your choice. (This feature was inspired by Jon Skeet.)

Almost-sentient, artificially intelligent programming programs? Ugh, how 20th century. There are metric heaps of example code available in the source repo, a small excerpt of which is presented below for your viewing pleasure.

Please note: This is not an official product of Stack Overflow Internet Services, Inc.

Code Snippet

The wiki has details of all the example code in the code repository. In fact, here's a small taster from the Narcissism demo.

However, there is a new and improved way to get Py-StackExchange: you can install it straight from the PyPI! Just type:

~$ easy_install py-stackexchange

Also, distutils gives me fantastical benefits on the side, such as a completely original Windows installer with an all-new design. You can also find a stable source distribution on the downloads page @ Github.

Platform

The library is written in standard Python 2.6, with, as far as I am aware, no specific platform dependency. As long as your Python install has the full standard library available, it should work fine.

Python 2.6 is required for the json module. (EDIT: @ADB in the comments has noted that the SimpleJson library can be used instead. This means it works on Python 2.5 and also on the Google App Engine.)

Python 3.x is also supported.

Contact

The library is being written by Lucas Jones (lucasjones.co.uk / SO). If you want to contact me, send me some mail at lucas @ lucasjones.co.uk.

@George: Thanks. Neat site, by the way - more structured than StackApps for that kind of thing.
–
Lucas JonesJul 6 '10 at 21:42

@Lucas: Thank you! Any suggestions for new features or enhancements are much appreciated.
–
Nathan Osman♦Jul 6 '10 at 22:17

1

@Lucas great work, any plan to support also the V2.0 Api?
–
systempuntooutApr 12 '12 at 10:27

1

@apnorton: I've added support for it just now; you'll need to use the latest version in the repository (github.com/lucjon/Py-StackExchange). If you call Site#question or Site#answer with the filter= optional parameter pointing to a filter set to return the body_markdown field, then the returned object will have a body_markdown attribute.
–
Lucas JonesApr 25 at 17:52

1

@Tim: It is indeed possible to get the Markdown; you can request the body_markdown attributes on questions and answers using an appropriate filter. See my responses to apnorton on Apr 25 for more detail.
–
Lucas JonesMay 26 at 16:07

Fyi, this doesn't work in Python 2.5. Since I'm using Django, I modified the above to try and import from django as well (from django.utils import simplejson as json). I don't know if you should add this to the wrapper or not, but if anyone is using Django, just use that line instead.
–
Edan MaorJun 16 '10 at 8:08

@Edan: I think it might be too specialised a case, but I'll definitely include it in the FAQ.
–
Lucas JonesJun 16 '10 at 15:09

My problem below has been "solved", and it's a problem with the fact that I'm getting back gzip-compressed data from stackapps. See the SO Answer. I'm still unsure why this happens only on my computer (possible reason: routers in my network adding content-headers), but I'm guessing this should be fixed in the wrapper itself.

Come to think of it, the wrapper should probably be requesting gzip-compress data in the first place, to save download time.

Ah, sorry. Due to a change in the API (I think - or perhaps me just being stupid and not reading the docs close enough!) question and answer bodies need to be explicitly requested. You can do this through so.be_inclusive(), which you should call before your first request.
–
Lucas JonesJun 13 '10 at 15:36

Yeah just figured out the be_inclusive() bit. Is there any way to specify it per-request? (I assume most people won't want to get all the bodies and all the comments from every request, just from specific requests). I can see that there are places where you check for a "body" keyword param, but I'm not sure where I can send one in the snippet above...
–
Edan MaorJun 13 '10 at 15:38

I've also updated the FAQ with this, as it's not documented very clearly.
–
Lucas JonesJun 13 '10 at 15:57

@Edan: Not in your specific case right now (see the FAQ - link in question - for those that are covered), as I'm not sure how best to implement it. Do you think calling me.answers.fetch(body='true') would be best?
–
Lucas JonesJun 13 '10 at 15:59

I think so. It's the only use case I've actually run into.
–
Edan MaorJun 13 '10 at 16:07

Right. I have an idea how I'd do this! :)
–
Lucas JonesJun 14 '10 at 18:13

Is this done in the latest version? If so, do I use it like your wrote in your comment?
–
Edan MaorJun 15 '10 at 13:44

@Edan: Not quite, but it should be done soon.
–
Lucas JonesJun 15 '10 at 16:48

1

@Edan: Done now in the latest revision, with the same syntax. fetch_page (but not other overloads right now) can do the same, too.
–
Lucas JonesJun 15 '10 at 17:18

Aha. I should probably rename build.sh to release.sh - it's the script I use for publishing new releases to PyPI. I'll write a README or something too... anyway, it should just work, from Github, out-of-the-box without running the script. If it doesn't, let me know! :)
–
Lucas JonesFeb 23 '11 at 22:07

Updated version on Github, and the egg should work now.
–
Lucas JonesFeb 23 '11 at 23:50

Here is the full traceback.
I'd be very happy if any of you could tell me what the problem is. Is it the fact that I'm using python3?
I converted the example files using 2to3 and it seems to me that it should work fine...

This is indeed a Python 3 issue; I have fixed this and a few other compatibility problems just now in the latest version of the library available on Github. The master branch targets v2.x of the StackExchange API, as will the latest PyPI version in the near future. If you want to continue using v1.x (e.g. you have an API key registered), the compatibility fixes have been backported to the v1.1 branch on Github.
–
Lucas JonesJun 26 '14 at 22:13

Many thanks! I had no problems after your fix.
–
TomazzJun 30 '14 at 13:47

The idea of producing a transparent iterator (getting the next page as needed) is cool, but it just isn't working; the page size affects the total number of results returned (apparently to a multiple of the page size)...:-(

My code needs to look at all Qs asked for a given tag in a given month:

With a page size of 100, as here, I see, from a loop calling this for different months on a certain tag:

2014/11: 242 good Qs out of 600 (40%)
2014/12: 242 good Qs out of 500 (48%)
2015/1: 235 good Qs out of 500 (47%)

but e.g with an arbitrary page size of 37, it's instead

2014/11: 179 good Qs out of 518 (35%)
2014/12: 179 good Qs out of 518 (35%)
2015/1: 172 good Qs out of 518 (33%)

(518 is 37 * 14, whence my hypothesis that the returned number of items is somehow constrained to be a multiple of the page size -- but clearly it's not just that, as the page size of 100 gave 600 questions for the tag in Nov'14, but the page size of 37 still gives up at 518). I guess this is connected with the already reported bug of fetch_next returning nothing at unpredictable times, even though here it's buried in the iteration.

But I don't understand at all the code of the next method of StackExchangeResultset in the core.py file...: it starts...:

def next(self):
for obj in self.items:
yield obj
current = self
while current.has_more:
for obj in current.items:
yield obj

won't this yield each item in self.itemstwice? Once from self.items, and then again from current.items after setting current = self?! I just don't understand the logic of this snippet. I'm going to instrument my code to check for duplicates, which should be present if my doubts are well-founded, and report on that check...

EDIT: yep, confirmed, the duplicates appear exactly as I had thought they would -- I've added a set of ids and return prematurely after seeing a duplicate, and the number of total results reported for each and every month is exactly the page size -- confirming that each question is being yielded again right after all items on the first page have been yielded, exactly as the code I show above appears to say they would be.

So, any suggested workaround so I can examine all Qs meeting the constraints, and, only once each?-)

EDIT AGAIN: so I partly fixed def next(): in the result-set class (there's still a bug I can't yet fathom -- shows duplicates with a page size of 37 -- but at least it seems to work with a page size of 100, with no duplicates nor truncation to a multiple of 100). To make python3 setup.py install work I also had to change setup.py since (at the github master) it was missing some modules (?).

Using the most recent version of the library as of yesterday. I needed to get some information on users answers and the question itself. However this didn't work. Every time I tried to access another page of answers I got ()

Thanks - I'll add the diff for now, to keep compatibility. I may factor it into the be_inclusive() method later, though.
–
Lucas JonesJun 12 '10 at 21:43

I've added the patch manually (it's only small :D), but I can't figure out how to do it with Git - git apply didn't work. Just for future reference, do you know how I'd do that?
–
Lucas JonesJun 12 '10 at 21:47

Right now, there is no way to get the url of a Question/Answer from the Answer object (I'm talking about the actual URL on the site, e.g. stackoverflow.com/answers/id). This is also not returned by the api itself (see this answer).

Ideally, I think the wrapper should include a method that builds the url for you, i.e. the Answer object will have a getUrl method which will build up the url based on which site you're querying.

If not, another good idea would be to provide the Site object with a method that gets the url of the site. When you build a Site object you send in a constant like "api.stackoverflow.com", so the object should have a method which strips out the api part.

Is there any way to get a user's answers without going through the user object? I want to save a fetch of a user's answers, and since I already have his id, I could go to: http://api.stackoverflow.com/0.8/users/id/answers?body=true&pagesize=100.

Also, for bonus points, when getting a user's answers, you get back info on the user as well (e.g. you get the user's display name). So such a call could automatically create the user object that's linked to each answer.

Looking good so far. One quick question: right now, doing site.answers(user_id=some_id_that_does_not_exist) just gives back an empty list. I don't know about this, but would it be better to raise a "User does not exist" exception? You could also raise this in other places that have the same "problem". Not sure if this would actually be better, just an idea.
–
Edan MaorJun 16 '10 at 6:50

I was thinking about that when I accidentally requested page 2 of what I expected to be a multi-page list (it was only 14 items!). Some distinction could be useful...
–
Lucas JonesJun 16 '10 at 15:07

@Lucas so there is no way for each request to be done per user, as supposed to the API Key for the app itself. so I dont have to worry about the limit since I am just displaying data that my app request on the users behalf.
–
garbagecollectorMay 3 '11 at 15:46

Not as far as I know, unless you use client-side JavaScript or something like that to fetch the result. You'll probably be fine with a single key, with an increased quota if necessary. The SO team are very approachable ;).
–
Lucas JonesMay 4 '11 at 20:57

I'm building a Django app, and I'l like to be able to search for users' SO account based on their name. I've got the following, which works from a python shell, but this doesn't work when called from within a Django view. Any ideas?

Sorry for the delay - OpenID/hosting problems... Anyway, doing some testing, I noticed that Django view parameters are Unicode strings. I hadn't thought to have handled Unicode explicitly in the library, and it turned out that the urllib module encodes it as UTF-16 by default, which the API (understandably) doesn't like. I've changed it to UTF-8, which now works under Django for me. Summary: update to the latest version in the Git repo. (Or, I can push a new release of the egg out if necessary.)
–
Lucas JonesMay 4 '11 at 21:00

Do you have any debugging suggestions? If I could turn on a debugging flag, that would cause py-SE to print the exact URL that was requested, that would help... Then I could visit that URL myself in the browser and see if a more detailed error message was shown, such as "your parameter X was invalid" or "you've exceeded your API key limit" or something.

I could modify __init__.py myself to print such debugging messages, but I'm not up on how to recompile python functions within an egg and redeploy them.

Thanks...

Update:

P.S. It wasn't my intention to ask you to find the actual problem, but heck, I'd be just as happy to have that answer as to have the debugging tool described above, and it might be easier to supply. So here's my code:

Hello; sorry for the delay, glad you solved your problem. To answer your first question, you can turn on debug printing of URLs by setting stackexchange.web.WebRequestManager.debug = True. I agree that the documentation in that area is lacking; a wiki page is coming up. I'll look into that inconsistency; it seems quite pointless... I apologise on behalf of my younger self.
–
Lucas JonesMay 18 '11 at 15:05

Doing some testing... strange: with from_date=aWhileAgo and to_date=currentDate it works, as it does with fromdate=int(aWhileAgo) and todate. But using a float as fromdate (as aWhileAgo was), which is formatted with a trailing .0 fails with error 500. So both work with integers; this might be an API inconsistency.
–
Lucas JonesMay 18 '11 at 15:21

Ahem. Revisited this, doing browser testing. The reason from_date and to_date work is that they are ignored. Compare the total field on api.stackoverflow.com/1.1/… to api.stackoverflow.com/1.1/…. Either way, you can pass in a float if you want now; in the latest revision it'll be appropriately converted behind-the-scenes.
–
Lucas JonesJun 10 '11 at 0:46

I'll double-check this, but unaccepted_questions is probably one of the fields that is not fetched automatically; instead use ii.get_user().unaccepted_questions.fetch().count. I know this is a little clunky, but otherwise it would need to pull in tonnes of data each request.
–
Lucas JonesMay 25 '11 at 20:37

Unfortunately, about all you can do is pass pagesize=100 to Site.questions. You can then use the methods on the returned resultset to advance through the pages. You won't be able to get anything bigger than 100 at a time, though. There are a couple of other options: you could try and export results from data.stackexchange.com, but those will probably be size-limited too. 'Worst case', you could download the data dump, but it's rather large (multi-gigabyte range) and hard to deal with.
–
Lucas JonesJun 9 '11 at 23:38

As you can see I am displaying the title and id attributes of each question object.

My question has 2 parts:

How can I get a list of attributes to each method (such as recent_questions)? I simply guessed correctly that id would work, but it does not seem to be listed when I refer to >>> help(stackexchange)

I would like to extend my script to mark as 'starred' ('favourite') some questions that grab my interest. Is this a reasonable prospect? Is there a method/attribute for this, and in which class? Most importantly, is authentication even possible via the API? This would presumably be necessary in order to 'star' questions.

Currently, keyword arguments are effectively passed verbatim to the API (albeit with a bit of processing to make them work in a URL). The best documentation I can offer in that regard are the official API docs. Regarding your second question: unfortunately, the current version of the API (v1.1) and the next planned version (v2.0) are read-only; you won't be able to star questions, although I think v2.0 will provide authentication functionality.
–
Lucas JonesJan 7 '12 at 0:34

I am running py-stackexchange v1.1-4 and I'm hitting a strange issue after not touching this library for perhaps eight months or so... My queries used to be relatively fast... I could get results in 30 seconds.

Now my queries literally take hours with the same code... example query:

I have a feeling I may have fixed this bug in the latest version of the code, but that I haven't pushed it to PyPI. I'll try and push the latest code up; in the meantime you can get it straight from the repository at github.com/lucjon/Py-StackExchange.
–
Lucas JonesMay 7 '12 at 18:06

OK. The behaviour of the iterators changed recently. Site.questions.no_answers, Site.questions.recent_questions and Site.questions.unansweredwill go on forever. The pagesize= argument actually tells the API how many items to return on each iteration - it will just keep iterating until it's returned all the pages. I agree this is not the most useful behaviour for this case, and I'm now planning to add a cleaner interface. For now, though, change your loop to for qq in ii(...).items, which will only iterate through the questions on the first page, of size pagesize.
–
Lucas JonesMay 11 '12 at 14:31

@LucasJones, I will try that. Thank you for following up
–
Mike PenningtonMay 11 '12 at 15:18

Thanks for getting in touch. The demo works for me locally; could I ask you to do the following: (1) ensure that you are using the latest version of the library from the Github repository; (2) if the problem persists, enable debug mode by adding the line stackexchange.web.WebRequestManager.debug = True to the top of your script.
–
Lucas JonesMay 27 '14 at 22:40

Thanks for letting me know about this; however, I've not been able to reproduce it myself. Is there anything unusual about your setup (in particular, are there any proxies between you and Stack Exchange?) which might get in the way? Either way, if changing the 'encoding' is as harmless as it seems to be from my initial look at the documentation, I may well just change it.
–
Lucas JonesJul 4 '14 at 0:56

I'm not sure what caused the problem. It seems to be something quite unusual that appears when you combine things like windows 8.1, cygwin and python3 altogether.
–
TomazzJul 10 '14 at 14:02

Thanks for bringing this to my attention, I've fixed the problem in the latest revision on the Github repository (github.com/lucjon/Py-StackExchange). Your code should now work if you use the version of the library from there.
–
Lucas JonesAug 28 '14 at 16:58