Start using ScrapeBox

ScrapeBox is not a blackhat tool. Yes, it can be used for spam comments, but I find its usage much beyond that. But first and foremost, ScrapeBox is a scraping tool (duh). You can scrape for any given keyword and footprint, and that makes it really powerful.

When you create a new website or a blog, it is important to get the word out. To do that you need to get to the place and people where your ideas are noted, encouraged, debated or discussed. ScrapeBox is the tool to do just that. Use ScrapeBox to –

Get related and long tail keywords for a given keyword

Get articles/posts from Google that target the same keywords

Get blogs and websites that are in the targeted niche, and have articles based on the targeted keywords

Make your commenting tasks easier

At each step, you can check how popular the website and content is, and get the PageRank, social media activities, and so on.

How to start using ScrapeBox?

Remember we are targeting this tutorial at absolute beginners to start with.

Fire up ScrapeBox after installation, and you will see the following screen.

Though the screen looks confusing, remember that it is built for maximum usage with minimum fuss.

First things first. The most basic function is to enter some keywords and find out where they are being used.

Get started by entering keywords at the left.

Hit the “Start Harvesting” button.

Congratulations you now are an official ScrapeBox user!

You will learn a lot more than this, but this is the basic idea of ScrapeBox. It is also one that I use for the most part.

Get proxies for/from ScrapeBox

You probably would need proxies. Of course, ScrapeBox works without proxies, but you will work faster with more threads working for you when you have proxies.

New to proxies? No worries – you can get easily started using public proxies.

There are a lot of proxies on the internet that allow you to browse other areas of the internet through them. When you are sending 100s or 1000s of queries at the search engine, you would not want search engines to know that you are up to potentially hack the websites, or spam them. You may also get banned when there are too many queries originating from your IP.

So, you need proxies for sure. It so happens that ScrapeBox is one of the best scrapers of proxies around.

Scrape proxies using ScrapeBox

Click the “Manage” button on in the Proxy part of ScrapeBox.

Click “Harvest Proxies” in the next box.

You can just accept defaults, and click “Start” to start harvesting proxies.

Note that you can also “Add Source” for public proxies by using that button. ScrapeBox provides its own sources (designated as “Supported Source XX”. Although that list is periodically refreshed, there are thousands of users using the same proxies. So, you need fresh sources. Google is your friend, find out more!

After the harvester completes its job, click on “Apply”. This will copy over the harvested proxies to the “Proxy Manager” window. Click the “Test All Proxies” to remote proxies that don’t work.

You would need a lot of patience. Majority of the proxies that you scrape by this means turn out to be duds.

Once you find the proxies that meet the requirements, you can filter the result set, remove whatever is required, select the ones that you love, and click the “Save” button > “Save proxies to ScrapeBox” option. This will get all the proxies to the ScrapeBox proxy window.

Easy, huh?

Buy Proxies for ScrapeBox

Unless you develop your own method to scrape for proxies, you will not them easily. You would have to search and search again until you come back and repeat the entire process.

Even when you do that, the proxies get changed, expired, or invalidated, or whatever happens to proxies that get blasted with unknown traffic for a few hours. So, you end up scraping for proxies continuously to replace inactive proxies.

Unless you have multiple VPS and have ScrapeBox running 24×7 on a rinse/repeat mode this is waste of time. So you do the next best thing.

Subscribe to proxy lists that people with multiple VPS scrape for money. You can then use those proxies, but do remember that those proxies also get extinguished after some time. So, you will continuously refresh from the list to stay current

Get private proxies

Private proxies are maintained by individuals/organizations, and let out to users like you and me.

Typically you are provided with a list of proxies that you buy, user ids and passwords to use those proxies.

Cheap proxy providers typically offer ~10 proxies for $5 per month. The proxies are moderately fast and just work. If they don’t work they get replaced.

No matter whether you scrape, or by proxies, remember to check “Use Proxies” checkbox. ScrapeBox does warn you when you start harvesting keywords without proxies, but all of us have the itch to click OK whenever we encounter a popup box. You are forewarned.

Moving on with rest of the settings to optimize your ScrapeBox experience.

Additional help – Use VPS

At this point, let me also tell you this – ScrapeBox can work quite well from your local desktop computer, but you may not want to run ScrapeBox all the time, your internet speeds may be slow, or you may spend more money on electricity bills if you keep the entire infrastructure running.

Serious scrapers use Virtual Private Servers (VPS).

Think of VPS as your own desktop that is remotely placed, and administered. All you have to do is to access the machine, and use it just like your own computer. Even if you use VPS you would need proxies – remember that you will be sending thousands of automated queries and you don’t want your VPS IP banned.

You do not need VPS to get started.

Learn about Footprints in ScrapeBox

In simple words, footprint tells you the type of website that you are dealing with.

Websites may –

allow guest logs

show a link to your latest post (e.g. using CommentLuv WordPress plugin)

have a specific domain extension like .edu (that is considered better as compared with .org)

allow images or photos to be uploaded

exhibit specific characteristics based on technology platform used (for e.g. Plogger PHP script is used to collect and showcase images)

Depending on what you are trying to do you need to collate and use footprints in ScrapeBox.

For example, I want to spread the word about my ScrapeBox tutorial. I know most SEO guys use WordPress platform (haha). Instead of scraping for each and every site with my keywords, I will just use WordPress platform footprint.

In ScrapeBox you can use a pre-selected set of platform footprints, or you can use your own by just entering in the top most text box.

While I can choose WordPress as the platform for my above problem, not all problems are simple.

For example, I want to get only those websites that are using KeywordLuv WordPress plugin. This plugin allows you to use specific keywords while commenting on the website and linking back to your own site. Since this presents more interesting opportunities to build links through comments, you would, of course, love sites that use KeywordLuv.

You can find those in ScrapeBox by selecting the “Custom Footprint”, and typing in “This site uses KeywordLuv” (with quotes). ScrapeBox will then find only those websites that have your keyword and has the footprint you specified.

The next obvious question is where to get additional footprints. The process is easier than you think.

You can generate as many footprints as you like. Go to your favourite website from where you would love to build links/relationships through comments.

Now, look out for any distinct factors that make the site stand out. For example, you may find the software that is used for the site at the bottom. Take an example of XenForo, a forum software.

You can see the footprint of XenForo forums at the bottom of the website. Now, you can Google for

inanchor:"Forum software by XenForo"

If this provides the SERPs as you expected, this is the footprint that you want to use.

Set up your engines

For Google, you can add country-specific search engines if you are into that. You can also use the “T” dropdown to select only videos, news or blogs for scraping.

While you are here, also select the number of “Results” (defaulted to 50). I leave it at 25 to 50 since I use the results to manually comment. If you are experimenting with automatically posting comments, you can set this to a higher number.

Do remember here that you are going to hit the max number of results – no matter what you try to do. For e.g. a simple “ScrapeBox Tutorial” search throws 200,000+ results.

What you want to do is to use long tail keywords, or use stop words.

If for some reason I try to only target the tutorial that also covers footprints for XenForo, I will Google “ScrapeBox tutorial custom footprint xenforo” and find 1/10th the results.

Although I have not taken a good example to demonstrate this, you can easily use longer keywords like “review of reel mowers uneven lawns” and compare the results against a more generic search that says “review of lawn mowers”.

The other option is to use stop words that can get you different set of search results that you can use.

Know about ScrapeBox add-ons

What makes ScrapeBox even more powerful are the add-ons.

Go to Addons menu > Show Available Addons to check out a big list. A few add-ons are available as paid options (check out Premium Plugins), you can stick to the free ones for now. The most commonly used add-ons are available in the screenshot – be sure to add them!

Now that you have learnt all about settings, it is time to get ahead with the task.

Scrape Keywords using ScrapeBox

ScrapeBox can help you get keywords related to your keywords by thousands using multiple sources. Just hit “Scrape” button > “Keyword Scraper” under the big left text box.

This opens up a new popup window where you get to set a few options and get related keywords.

You can rinse and repeat this process to get longer phrases and immensely large set of related keywords. Remember to use “Remove Duplicates” to remove duplicate keywords.

Also note the additional helper functions here –

Use the “Remove” button to filter out keywords with words that you don’t like

Look up domain name availability for the keywords using “Lookup Domain” button

Once you are done, use the “Save” button to save results to a text file, or to the main ScrapeBox window.

You do remember that you hit the shiny “Scrape” button at the beginning of this post, don’t you? Just do that again to see ScrapeBox rip.

You now have a big list of keywords and the URLs that have targeted those keywords.

Important things to do after harvesting URLs in ScrapeBox

Explore the buttons on the right side of the harvested URLs. Play around a bit to find what they can do.

Typically I do two things –

Apply filters to remove URLs that I don’t like. I would remove URLs with .ru, .cc that are typically full of spam.
Next, I manually go through the URLs to check whether any unrelated URLs are included. For e.g., if I want to remove all references to blackhat, I will use “Remove URLs Containing” and instruct ScrapeBox to remove URLs with word “blackhat”.

I check for Google PageRank. This was a quick way to check reputed sites that I would want to spend my time on.
I typically just use the PR for domain instead of the page itself. Once the list is filtered I will run the URLs through the addon called “Fake PageRank Checker” and clean out the URLs that are just doing redirects or earning PRs in some other way. Since PageRank is updated infrequently, it may not provide a correct picture of the trust Google places in the site, or whether the site is indeed ranking high for the selected keyword.

More recently I have started using another addon called “ScrapeBox Page Authority”. This uses Moz API to find out the Domain Authority (DA) / Page Authority (PA) of the scraped URLs. Typically you can observe that higher the Domain Authority, higher is the ranking of the URL in Google. Since DA/PA are up to date, these can be invaluable in determining the value of a URL.

That is all there is to scraping.

Export the harvested URLs to a ‘txt’ file for future use.

Comment using ScrapeBox

After doing a lot of hard work on scraping the heck out of internet for your keyword, it is time to get down to business and get the backlinks and engagement you deserve.

The action shifts to the right bottom window in ScrapeBox.

You need to set up Names, Emails, Websites, Comments and Blog Lists before you start commenting. You can also generate names, and emails using the option on “Tools” menu > “Name and Email Generator”.

We will only see “Manual Poster” function here.

For any white hat uses, I end up having one or two names and the corresponding email ids.

Go to the installation folder of ScrapeBox. You will find a “Comment Poster” folder in which you will see a bunch of text (*.txt) files.

Edit the text files using Notepad to add names, emails, websites (or the complete URL), and typical comments that you enter.

Go back to ScrapeBox, and point to these files in the Comment Poster section. You can also edit the files from ScrapeBox itself, but do note that the edits will be temporary.

Once you load up the required files, click on “Test Comments” button to check how your comment would look like.

ScrapeBox can select one combination of many given names, emails and comments to generate one unique combination at one time.

Next, make sure that you have chosen the ‘txt’ file of the harvested URLs against the “Blog Lists”.

Hit “Start Posting” button to open the internal browser in ScrapeBox against each of the blog lists. ScrapeBox automatically takes you to the bottom of the web page so that you can easily add comments. Move to the next blog, and repeat the process.

So far you have seen how ScrapeBox helps you scrape URLs, and how to comment them using manual posting. A few advanced uses of ScrapeBox, along with a few of my go-to plugins are next.

4 Comments

Ed Brancheau
on September 5, 2014 at 6:46 am

Boy, that’s was really detailed. I’m scraping right now as you suggested by I was wondering if you need real emails addresses?