Automating OSINT with Python (by Lorand Bodo)

Let me introduce your friendly OSINT helper, Python – a powerful, fast and easy to learn programming language, thanks to its elegant syntax, dynamic typing and interpreted nature. It is, therefore, no surprise that it’s widely used across many domains. Python’s standard library supports many Internet protocols, such as HTML, XML, JSON and others, making it the ideal OSINT helper. And, that’s not all. There are numerous useful modules, packages and libraries that you can use for different purposes, such as data scraping, network analysis, natural language processing and many others – all for free!

There are three main reasons why I learned Python and why I’d recommend it for OSINT-related purposes. First, you can automate tasks that are either labour-intensive, tedious or even both. Automating certain processes during your research would save you valuable time, which in turn could be spent on data analysis or report writing. But to be clear, automation is no silver bullet. It also has its limitations and being aware of this is very important.

Second, you don’t want to pay for
tools, especially those that often look better than they actually are. But more
importantly, when you start learning Python and make your first HTTP
request or API
call, you will start to understand how these “OSINT tools” actually work
and above all, where the limitations lie. This will provide you with a solid
understanding of automating OSINT. For example, just because a Python script
hasn’t found that piece of information you were looking for, doesn’t necessarily
mean it’s not there. Automating OSINT can aid the analyst, but it should not be
a replacement! Finding the needle in the haystack is still a human task.

Lastly, having a good understanding
of Python will also allow you to customise scripts, powerful scripts that were
developed by amazing Python-Gurus. This could be in the form re-using snippets
of code or even contribute by adding entire new modules that others could use. And
if something goes wrong and you can’t find the bug, you can either use Google
or ask for help on Stackoverflow.

When it comes to “automating OSINT”,
you don’t have to reinvent the wheel. In most cases, someone has already written
a Python script for what you want to do. So, do the research first before you
start developing something from scratch.

One last thing I want to point out
is, don’t get intimidated by a bunch of lines of code. I don’t have a computer
science or tech background, so learning how to code was something completely
new. But I also want to say that it was lots of fun and still is! There’s no
downside to it. In fact, you will only benefit from it. So, if you’re interested
in learning Python, here are a couple of resources that I highly recommend (in
no particular order) that will get you quickly up to speed. Happy coding!

Automation with Python

How I Use Python (By: WebBreacher)

I remember when I learned how to code in Python in 2012. I did all the tutorials I could find. I read books and listened to podcasts. But when I tried to write my own scripts, I found that I really needed a reason to code. That reason was OSINT.

My Code

Over the years since those days, I’ve written a bunch of modules for the Recon-ng tool, written my own scripts (https://github.com/webbreacher), and modified others’ work. Originally, I used the version 2.7.x branch of Python but have since upgraded most of my work to the 3.7.x.

I use Python for two reasons:

To scrape data from remote websites

To manipulate data and file that I already have

When I began learning to code, I tried to figure out the overall action I needed to perform (ex. visit a web site, grab data, store it on my system) and then I built the final project in steps (1. figure out how to visit a web site; 2. how do I grab the data I want from it; 3. how do a write that to a file). Then, I’d put those pieces together into my uber script.

My Suggestions for You

Find something that interests you (crypto, data science, web hacking, web scraping, file manipulation, creating a Twitter bot…whatever). Python can do ALL of that and so much more!!

Treat learning python just like you would learning a foreign language.

You need to take time to learn the vocabulary (functions, methods, and variables) and how it works together (the syntax and grammar).

Finally, you have to practice regularly if you want to be able to remember it. Think back to when you were in school and maybe were forced to take a language. I’m betting that, if you haven’t been using it, you’ve probably forgotten much of what you learned. Same goes for coding. Practicing it makes it easier to remember.

Useful Scripts (by Sector035)

Once you have gotten yourself into learning the basics of Python and followed the advice of WebBreacher, you are ready for some testing. Of course you need Python itself, which is included in nearly all Linux distributions. And if you are running Windows, there are different ways you can run Python3 on it, with the easiest being to install Python for Windows.

To start with a simple script that does not need any extra configuration, setup or installing of libraries, we will be having a look at GitHub-OSINT. You don’t even have to pull the whole GitHub repository to your own computer, since downloading the “github-osint.py” is enough to get you started. To do that open the page on GitHub, click the button that says “raw” and download the file to the directory of your choice. Then open a terminal session, go to the directory and start the script by typing:

python github-osint.py vulnbe

The first part says Python, that tells the operating system we want to run a script that needs to be interpreted by Python. The second part is the script itself that we want to run. And the very last part is what we input into the script, namely a GitHub user accound. In this case it is the maker of this script, named vulnbe. The script queries some API endpoints on GitHub and requests information about the user that is being investigated. The result is shown here:

Running the OSINT-Github script

Cloning From GitHub

Scripts that have multiple files, will have to be ‘cloned’ from GitHub before you can use them. Simply copying a single file won’t be enough in that case. We need to use a “git clone” command to create a local copy of the complete tool and all of its files. For that I recommend you to first chose a directory where you want your custom programs to land. A lot of people choose the directory /usr/local but you can of course also run the “git clone” command in your Documents folder. The advantage of using a directory like /usr/local is that it will be automatically indexed via the $PATH environment, so you can run the downloaded tools right away, no matter where you are.

To clone a tool from GitHub, you go to the repository and look for the “Clone or download” button in the repository. The script I am going to mention here does also need access to the Twitter API, so do bare in mind that after following all the steps you also need to apply for a Twitter API key.

Copy the URL that is displayed, go to your shell and run the following command:

git clone {URL copied from GitHub}

Cloning a tool from GitHub and it shows up in a subdirectory with the same name

After that you will find the tool in the desired directory “tweets_analyzer”. Most scripts, like this one, need some extra libraries before they can work. For that Python has a little tool called “pip” that stands for Python Installs Packages. When you come across a script that has certain dependencies, you find them in a file called “requirements.txt” file in the GitHub repository and now also in your local directory. You don’t have to open the file to look inside, but you can simply tell Python to install all the needed libraries by running:

pip install -r requirements.txt

With this command you instruct Python to look at the file and install all the libraries mentioned there. To make sure that this script works, we need to run this command

Installing libraries that are needed

After “pip” is done, this script is ready to be used. There are scripts that require you to run a “setup.py” or a similar script to install and configure the script before use. The steps needed to get it ready are usually described in the “README.md” file in the repository.

But this script is now ready to use. So you can start investigating tweets from users!

An analysis into the tweets of Donald Trump

Resources

In regards to resources, a lot has been mentioned already. For absolute beginners I would like to direct people towards Codecademy. They cover the bare basics, it is interactive and fun. After completing this the basics of Python should be known enough to understand most scripts.

After that, one can look at another free course, this time at Cybrary. The course is meant security professionals, but can be a very useful basis for some readers.