How to collect data from Instagram business profiles

May 14, 2018 7 min read

If for your work you need to collect data from Instagram business profiles, you probably used a mobile application for it. You were forced to do it because there were no some business data in the web version. In particular, it was impossible to determine if you are looking at the business profile or personal. Now it’s possible to process them automatically with a web scraper using the mobile API. We found this solution on the Internet, one of our users wrote it and shared it with the community on one of the popular Internet marketing resources. Let’s examine how the web scraper works.

To use the parser, you must specify the login and password for your Instagram account, and the list of accounts you want to collect business information about. Bear in mind that Instagram can block your account if using this web scraper may violate the TOS, so use it at your own risk, we are publishing it just for educational purposes. Below is the actual web scraper code:

As you probably already know, the config section is intended for presetting the scraper. In this case, to set the debug mode level (which is only required for development and could be omitted) and the browser name on behalf of which the web scraper sends requests to the server. Technically, there could be Chrome or Safari, but the author decided that there should be Firefox. By the way, sometimes the server can give different data, depending on the name of the browser. Also, sometimes it may be necessary to use a complete User-Agent string instead of a preset, they can be found here.

config:
agent: Firefox
debug: 2

The main logic block of the scraper is located in the section do. At the very beginning, the variables are initialized with your login, the password for the Instagram and the account list you want to extract:

Now in our context, there is an extracted Javascript object (JSON) as DOM, and we can walk through its elements as if it was a standard HTML page. So we find the config node and inside of it the csrf_token node, parse the content and extract the token that we need for the login to the Instagram. We save it to the token variable. Then we log in to Instagram using the token, username and password, which we are already keeping in variables:

So if not, you will see an error and the scraper finishes the work. If you see this error in the log, try logging in through your browser and manually resolve the challenge. After that, you’ll be able to sign in to your account from web scraper. If the authorization is successful, the scraper will continue to work and transfer the necessary cookies to the variables to be able to use them in requests:

Then the scraper reads the variable with the list of accounts into the register and converts the text in the register to the block and switches to this context. It is done to use the command split since the command works with the contents of the block, not the register. After splitting, the scraper iterates through each account and executes commands in the do block:

- split:
context: text
delimiter: ','
- find:
path: div.splitted
do:

All that happens next applies to every account listed in the CSV string you passed. The scraper parses the block which contains the account name, clears it of extra spaces and writes it to the variable so it can be used in requests.

- parse
- space_dedupe
- trim
- variable_set: account

The scraper takes the page of the channel to extract the channel ID because we need channel ID to call the mobile API. The ID is stored in the variable.

The mobile API returns a response in JSON format. Diggernaut automatically converts it to XML and lets you work with the DOM structure using the standard find command. So all further code picks up the data using specific CSS selectors and saves them to the data object.