I'd like to use the Python Requests library to GET a file from a url and use it as a mulitpart encoded file in a post request. The catch is that the file could be very large (50MB-2GB) and I don't want to load it in memory. (Context here.)

replace the with ... as f: statement with f = ... and the f.content with f because with needs __enter__ and __exit__ and the docs tell me you can pass the file directly
– UserApr 12 '13 at 15:50

1

@User thanks for the tip, but I get this error: AttributeError: 'Response' object has no attribute 'read' - I assume this means that my code expects the file and not the response
– ergeloApr 13 '13 at 8:45

4 Answers
4

There actually is an issue about that on Kenneth Reitz's GitHub repo.
I had the same problem (although I'm just uploading a local file), and I added a wrapper class that is a list of streams corresponding to the different parts of the requests, with a read() attribute that iterates through the list and reads each part, and also gets necessary values for the headers (boundary and content-length) :

This solution requires a valid Content-Length header (known file size) in the GET response. If the file size is unknown then the chunked transfer encoding could be used to upload the multipart/form-data content. A similar solution could be implemented using urllib3.filepost that is shipped with requests library e.g., based on @AdrienF's answer without using poster.

I'd read about poster in my research, but didn't look deep enough - thought it had the same limitations as requests - my bad. This is a good solution, but I accepted @AdrienF 's as I was actually planning on using requests exclusively. Thanks anyway :)
– ergeloApr 28 '13 at 10:41

If you absolutely need to do multipart/form-data encoding then you will have to create an abstraction layer that will take the generator in the constructor, and the Content-Length header from response (to provide an answer for len(file)) that will have a read attribute that will read from the generator. The issue again is that I'm pretty sure the entire thing will be read into memory before it will be uploaded.

Edit #2

You might be able to make a generator of your own that produces the multipart/form-data encoded data yourself. You could pass that in the same way as you would chunk-encoded-requests but you'd have to make sure you set your own Content-Type and Content-Length headers. I don't have time to sketch an example but it shouldn't be too difficult.

iter_chunks throws an exception - did you mean iter_content? with the latter I get an error similar to what I mentioned in the comments above: TypeError: object of type 'generator' has no len() in the call to write
– ergeloApr 13 '13 at 8:46

I'm going to update the answer because I forgot something, sorry. (In short you can not know the content length of that big file (I'm guessing) so you need to use chunked encoding to upload it.)
– Ian Stapleton CordascoApr 13 '13 at 14:44

I can see why this would work, thanks. Unfortunately I do actually need the multipart/form-data as I need to send it to the GAE blobstore handler, as you can see in the linked context question.
– ergeloApr 14 '13 at 21:08

Sorry. I hadn't seen the context question. I do have a different idea now though. :)
– Ian Stapleton CordascoApr 14 '13 at 21:40

thanks, I guess that would be similar to @AdrienF 's answer?
– ergeloApr 28 '13 at 10:38