This comment has been minimized.

@greglinch thanks, fixed!@markwk to my understanding there is no way to get these without using a 3rd party or asking the user to download their history
@riznad hard to say what's going on there, is is possible an extra space got inserted on that line? There should only be one tab on that line.

@greglinch thanks, fixed!@markwk to my understanding there is no way to get these without using a 3rd party or asking the user to download their history
@riznad hard to say what's going on there, is is possible an extra space got inserted on that line? There should only be one tab on that line.

By the way, I've created a twitter API for myself. In the tweet_dumper.py file, I've entered my 4 Twitter API credentials. And in the last line of the .py file, I've put in the username whose tweets I want to download.

By the way, I've created a twitter API for myself. In the tweet_dumper.py file, I've entered my 4 Twitter API credentials. And in the last line of the .py file, I've put in the username whose tweets I want to download.

This comment has been minimized.

This comment has been minimized.

This worked great! Thanks for this! Had to get pip and tweepy installed, but it worked out great. Also, note that if the targeted user's twitter account is protected, the account used to authorize the api calls must be following the targeted user.

This worked great! Thanks for this! Had to get pip and tweepy installed, but it worked out great. Also, note that if the targeted user's twitter account is protected, the account used to authorize the api calls must be following the targeted user.

This comment has been minimized.

Thanks for posting the script in the fist place - good way to start tweaking with this library. After playing a bit around with it, it seems like the updated versions of the library solve both the "cope with # of requests/window" and the "don't get busted by the error".

<wait_on_rate_limit> parameter for the api to have it deal with the server

Thanks for posting the script in the fist place - good way to start tweaking with this library. After playing a bit around with it, it seems like the updated versions of the library solve both the "cope with # of requests/window" and the "don't get busted by the error".

<wait_on_rate_limit> parameter for the api to have it deal with the server

This comment has been minimized.

Hi guys, i'm using python 2.7 and the script works fine. I've just a problem with the csv. Is there a way to ignore \n in tweets retrieved? A new line cause the text to span in a new column, so in excel or openrefine it's almost impossible to edit the manually all the cells in the "id" column.

Hi guys, i'm using python 2.7 and the script works fine. I've just a problem with the csv. Is there a way to ignore \n in tweets retrieved? A new line cause the text to span in a new column, so in excel or openrefine it's almost impossible to edit the manually all the cells in the "id" column.

This comment has been minimized.

@Sourabh87 thanks for the offer! i ended up figuring it out by just using tweet.user.screen_name. Super easy. Now, I am working on migrating the code from python 2 to python 3.4. Has anyone else done this yet on windows?

@Sourabh87 thanks for the offer! i ended up figuring it out by just using tweet.user.screen_name. Super easy. Now, I am working on migrating the code from python 2 to python 3.4. Has anyone else done this yet on windows?

This comment has been minimized.

I'm using this to pull tweets for list of users. But I'm running into an error every so often. I think it might have to do with the amount of queries you can make to the Twitter API but I'm not sure. Here's the error below, please help.

I'm using this to pull tweets for list of users. But I'm running into an error every so often. I think it might have to do with the amount of queries you can make to the Twitter API but I'm not sure. Here's the error below, please help.

This comment has been minimized.

thanks for code @yanofsky
i have modified your code. I am using pandas csv to store downloaded tweets into csv. Along with csv i have used another information too.
Also i had another code which uses csv created by your code to download latest tweets of user timeline.
here is my github link:https://github.com/suraj-deshmukh/get_tweets

Also i am working on cassandra python integration to download all tweets in cassandra database instead of csv file

thanks for code @yanofsky
i have modified your code. I am using pandas csv to store downloaded tweets into csv. Along with csv i have used another information too.
Also i had another code which uses csv created by your code to download latest tweets of user timeline.
here is my github link:https://github.com/suraj-deshmukh/get_tweets

Also i am working on cassandra python integration to download all tweets in cassandra database instead of csv file

This comment has been minimized.

Thanks for this @yanofsky - its awesome code. I'm trying to rework it so I can drop the data into a MySQL table. I'm running into some issues and wondering if you can take a look at the snippet of my code to see if I'm doing anything obvious? Much appreciated.

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

Thanks for this @yanofsky - its awesome code. I'm trying to rework it so I can drop the data into a MySQL table. I'm running into some issues and wondering if you can take a look at the snippet of my code to see if I'm doing anything obvious? Much appreciated.

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

This comment has been minimized.

Just an FYI for people trying to utilize this in Sublime (and you happen to be using Anaconda on a windows machine), you need to run python -m pip install tweepy while in the proper directory that Sublime expects it to be installed in; pip install tweepy alone may not work. Some people who run the code and think they installed tweepy may get an error saying otherwise.

This truly is a glorious script yanofsky! I plan on playing around with it for the next few days for a stylometry project, and thanks to you getting the raw data desired is no longer an issue!

Just an FYI for people trying to utilize this in Sublime (and you happen to be using Anaconda on a windows machine), you need to run python -m pip install tweepy while in the proper directory that Sublime expects it to be installed in; pip install tweepy alone may not work. Some people who run the code and think they installed tweepy may get an error saying otherwise.

This truly is a glorious script yanofsky! I plan on playing around with it for the next few days for a stylometry project, and thanks to you getting the raw data desired is no longer an issue!

I've seen speed gains of over 150 seconds for users with a greater amount of posted tweets than the maximum retrievable. The error handling is a bit trickier but doable thanks to the max ID parameter (just stick the stuff I posted into a try/except and put that into a while (1) and the cursor will refresh with each error). Try it out!

BTW, MAX_TIMELINE_PAGES theoretically goes up to 16 but I've seen it go to 17.

I've seen speed gains of over 150 seconds for users with a greater amount of posted tweets than the maximum retrievable. The error handling is a bit trickier but doable thanks to the max ID parameter (just stick the stuff I posted into a try/except and put that into a while (1) and the cursor will refresh with each error). Try it out!

BTW, MAX_TIMELINE_PAGES theoretically goes up to 16 but I've seen it go to 17.

This comment has been minimized.

Thank you for this code. It worked as expected to pull a given user's tweets.

However, I have a side problem with retrieving the tweets after saving them to a json file. I saved the list of "alltweets" in a json file using the following. Note that without "repr", i wasn't able to dump the alltweets list into json file.

with open('file.json, 'a') as f: json.dump(repr(alltweets), f)

Attached is a sample json file containing the dump. Now, I need to access the text in each tweet, but I'm not sure how to deal with "Status".

I tried to iterate over the lines in the file, but the file is being seen as a single line.

with open(fname, 'r') as f: for line in f: tweet = json.loads(line)

I also tried to iterate over statuses after reading the json file as a string, but iteration rather takes place on the individual characters in the json file.

Edited 1 time

adixxov edited Aug 22, 2016

Thank you for this code. It worked as expected to pull a given user's tweets.

However, I have a side problem with retrieving the tweets after saving them to a json file. I saved the list of "alltweets" in a json file using the following. Note that without "repr", i wasn't able to dump the alltweets list into json file.

with open('file.json, 'a') as f: json.dump(repr(alltweets), f)

Attached is a sample json file containing the dump. Now, I need to access the text in each tweet, but I'm not sure how to deal with "Status".

I tried to iterate over the lines in the file, but the file is being seen as a single line.

with open(fname, 'r') as f: for line in f: tweet = json.loads(line)

I also tried to iterate over statuses after reading the json file as a string, but iteration rather takes place on the individual characters in the json file.

This comment has been minimized.

I read all the comments, but have not tried it yet... So... Assuming I had a user (not me or anyone I know personally) that has roughly 15.5k tweets, is there any way to get just the FIRST few thousand and not the last? Thanks! 👍

I read all the comments, but have not tried it yet... So... Assuming I had a user (not me or anyone I know personally) that has roughly 15.5k tweets, is there any way to get just the FIRST few thousand and not the last? Thanks! 👍

This comment has been minimized.

Has anyone figured out how to grab the "retweeted_status.text" if the retweeted_status is "True"? It seems that one to specify: "api.user_timeline(screen_name = screen_name,count=200,include_rts=True)"

Has anyone figured out how to grab the "retweeted_status.text" if the retweeted_status is "True"? It seems that one to specify: "api.user_timeline(screen_name = screen_name,count=200,include_rts=True)"

This comment has been minimized.

thank you for posting this! May I ask how did you find out that each "tweet" has information like "id_str", "location" and etc. I used dir() to look at it, but the "location" is not included, so I was a bit confused.

thank you for posting this! May I ask how did you find out that each "tweet" has information like "id_str", "location" and etc. I used dir() to look at it, but the "location" is not included, so I was a bit confused.

This comment has been minimized.

@invrl are you using python 3.X? there is a chance that this cuold be the issue. The sintax for print changed with th 3.x now if you want to print something you have to pass a functioprint (getting tweets before %s" % (oldest))

@invrl are you using python 3.X? there is a chance that this cuold be the issue. The sintax for print changed with th 3.x now if you want to print something you have to pass a functioprint (getting tweets before %s" % (oldest))

This comment has been minimized.

I edited it for Python 3.x. Also, I removed the URLs and the RTs from the user.

def get_all_tweets(screen_name):
"""Download the last 3240 tweets from a user. Do text processign to remove URLs and the retweets from a user.
Adapted from https://gist.github.com/yanofsky/5436496"""
#Twitter only allows access to a users most recent 3240 tweets with this method

I edited it for Python 3.x. Also, I removed the URLs and the RTs from the user.

def get_all_tweets(screen_name):
"""Download the last 3240 tweets from a user. Do text processign to remove URLs and the retweets from a user.
Adapted from https://gist.github.com/yanofsky/5436496"""
#Twitter only allows access to a users most recent 3240 tweets with this method

This comment has been minimized.

i work with a similar code, with the code that i use i can input the username as i download the timeline directly without having to edit the code itself.... but the output format it's unreadable.... so, is there any way of making this code into a macro? like with an excel table put in a bunch of user and download every timeline???

Edited 1 time

santiag080 edited Jun 28, 2017

i work with a similar code, with the code that i use i can input the username as i download the timeline directly without having to edit the code itself.... but the output format it's unreadable.... so, is there any way of making this code into a macro? like with an excel table put in a bunch of user and download every timeline???

This comment has been minimized.

Hi! The code works just fine, thanks for sharing.
Yet, I would like to extend the code to retrieve non-english tweets as with this method the arabic letters are translated into funny combinations of roman letters and numbers. I have seen other people asking the same question but so far no answer. Maybe this time it attracts more attention.
Has someone found a solution? I'm a bit desperate.
Merci bien!

Edited 1 time

states-of-fragility edited Oct 1, 2017

Hi! The code works just fine, thanks for sharing.
Yet, I would like to extend the code to retrieve non-english tweets as with this method the arabic letters are translated into funny combinations of roman letters and numbers. I have seen other people asking the same question but so far no answer. Maybe this time it attracts more attention.
Has someone found a solution? I'm a bit desperate.
Merci bien!

This comment has been minimized.

It uses two steps, first Selenium, essentially taking over a browser to get as many tweet IDs as possible by going to each page day by day. I believe this should be possible as well with the API approach above. The second step uses Tweepy to make requests the IDs for metadata.

It uses two steps, first Selenium, essentially taking over a browser to get as many tweet IDs as possible by going to each page day by day. I believe this should be possible as well with the API approach above. The second step uses Tweepy to make requests the IDs for metadata.

This comment has been minimized.

After successfully running the code .I face one problem that is for long tweet i get only a portion of tweet.For eample
One instance of what i get"Glad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the râ€¦ https://t.co/qG85rbCgIh"
And what actually the tweet isGlad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the rich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
That means from my output this portion is missingrich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
Have anyone face this problem?
suggest some solution.

After successfully running the code .I face one problem that is for long tweet i get only a portion of tweet.For eample
One instance of what i get"Glad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the râ€¦ https://t.co/qG85rbCgIh"
And what actually the tweet isGlad to have joined the Bahubali Mahamasthakabhisheka Mahotsava at Shravanabelagola in Karnataka. Spoke about the rich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
That means from my output this portion is missingrich contribution of saints and seers to our society. Here is my speech. http://nm-4.com/tf25 .
Have anyone face this problem?
suggest some solution.