Sunday, December 30, 2012

Automated Open Source Intelligence (OSINT) Using APIs

Introduction

The first step to performing any successful security engagement is reconnaissance. How much information one is able to enumerate about given personnel (for social engineering engagements) or systems can often impact the effectiveness of the engagement. In this post, we will discuss what Open Source Intelligence (OSINT) is and why it takes so much time, as well as ways we can use various application programming interfaces (APIs) to automate much of this process for us. Hopefully this post will help shed light on the importance of proper privacy settings, and the threat of automated information gathering due to APIs.

Table of Contents

This this blog covers quite a bit of information, I thought it might be handy to have a short outline/table of contents to for those who may find it useful. For the sake of brevity, in this post I will only be covering APIs for finding information about individuals (as opposed to information about systems and networks).

The process of gathering information from publicly available sources is known as Open Source Intelligence (OSINT). Publicly available sources can be anything from websites to WHOIS information to published court documents, etc. Also, the information we are looking for can simply be anything we want. From names and positions of company employees, to subdomain information and web server versions in use - it's all fair game.

Why We Should Try to Automate the Process

Since there are so many sources of information, it can often be overwhelming to try and manage the information gathered about a person or company. Also, this process can take a large amount of time if only manual techniques are used. Fortunately, many sites have APIs that make this process easier for us by returning the results in a very manageable JSON format. Let's take a look at a few social networking APIs now.

Facebook Open Graph API and Batch Requests

Facebook unveiled its Graph API in 2010 as a way to help streamline access to information. From an OSINT point of view, this API allows a social engineer to quickly and easily search for user profiles and public posts. This functionality is provided by the "search" feature. This feature allows us to search for public profiles, posts, events, groups, and more based on a given keyword. An example URL might look like the following:

While this format may look a bit unfamiliar to some, it's actually very convenient and easy to work with. Before continuing coverage of the API's features, let's look at how we can easily obtain and access this data using Python.

As we can see, it is very easy to programmatically obtain, access, and manipulate this data. This makes the process of gathering this data automatic, and very quick.

While in our previous example we used the search feature to find people based on name, the query ("q") parameter also searches other fields for matches. For example, if we want to find people that have either had their education at, or work for Texas Tech University, we would use the following URL:

This same technique can be extended to any company. Usually, the results are very accurate, however there will be some outliers - especially if we are searching for a big company like Google or Microsoft (since these terms can appear in quite a few fields on people's profiles).

But wait, there's more!

If we thought the search feature was neat already, it actually has even more functionality that we can use to our advantage. For example, by changing the "type" parameter to "post", we can find public posts that include the word we search for. We can use this to find out what people are saying about our target company, and we might be able to use this to our advantage.

In addition to this, a little-known feature of the API search is that we can find profiles using a particular email address or phone number. If we put the email address at the "q" parameter, we can see whether or not there is a Facebook profile that uses this email address or phone number, and the owner of the profile allows themselves to be searched using these attributes (enabled by default, I believe).

There's a ton of other features offered by the Graph API which we can use to our advantage as social engineers. I would highly recommend reading through the documentation to see other features that might suit whatever need you have. Facebook also offers the ability to make Batch Requests, which essentially allow developers to make multiple API requests in one call to Facebook. An example of when this can be handy would be checking for matches of multiple email addresses to Facebook profiles.

As a side note, you may have noticed that these queries require an access token. To always generate a new access token, it is pretty simple to create your own Facebook App, then use a user profile to generate an access token which can then be used by the app to execute these queries.

Google Custom Search API

In 2010, Google depreciated its Web Search API, which has previously been the most efficient way for developers to programmatically access Google search results. Since then, Google has encouraged developers to migrate to the new Custom Search API. This new API allows developers to setup a Custom Search Engine (CSE) which is used to search for results from a specific set of domains, and then programmatically access these results in JSON or Atom format. While only being able to search a subset of domains may seem restricting, with a little bit of effort, we can create a CSE that includes all sites - emulating the previous Web Search API.

After setting up this CSE, we can use our Google-fu to easily pull results for things like Twitter users, Linkedin Users, documents from the companies website, etc. Let's take a look at a few examples.

LinkedInUsing the CSE we created, we can create queries which will help us quickly find profile information for LinkedIn users of a particular company. While these will be the public profiles of users, it is very common for privacy settings to be lax and allow us to see an individuals current position and company, prior work and educational experience, as well as any specific occupation/education related information they want potential employers to know about. There are times when this can amount to a large amount of information that is very useful to a social engineer.

This query searches LinkedIn for profiles of people who have either had past or present occupation (or in this case educational) experience at Texas Tech University. This can also be extended to fit any company we wish. Let's see what kind of results we get when performing this query on our CSE using the fantastic Python Requests module:

We can see that it's very straightforward to access and manipulate this data using the Requests modules and our CSE. More importantly, we can see just how much data is provided about each LinkedIn profile. Let's take a look at the useful data.

We can see that the "person" attribute contains the "role" and "location" of the person. With regards to parsing, it would probably just be best to consider the "location" attribute of this key, since the "role" is also listed elsewhere. The "hcard" attribute is arguably the most key in terms of simple data. It contains the name, title (which is the same as the previous role attribute), and picture URL for the user. In addition to this, it contains the full names of all affiliations or associations with which the user identifies himself/herself. This could be extremely useful in Social Engineering if we wish to create rapport with the user ("Why yes, I'm a member of the 'Caribbean Jobs', too!"), or by making phishing emails much more targeted and effective.

Also, if we ever wanted more data that may not have been included in these results (such as specific job descriptions and projects worked on), the "formattedUrl" attribute provides us with a direct link to the person's public LinkedIn proile.

Let's see a quick example of how we can extract the useful information from this data. Let's aim to get the name, position, company, location, and other affiliations. We'll pick up right where we left off in the previous code example.

It's should be clear by now just how easy it is to manipulate this data. This is considered very passive reconnaissance because you can notice that we never browse to LinkedIn directly to gather this information. It should be noted that LinkedIn does have its own API, but with very strict ToS, and I can't think of much information LinkedIn's API provides that is not listed in the Custom Search API results.

This same automation with the Google Custom Search API can be extended to find files on company websites with a specific extension (such as .xls, .doc, etc.), and much, much more (perhaps there will be more coverage in a future post). For now, let's see how we can find Twitter profiles using this API, and then let's see what we can do with them.

Twitter (finding profiles using Google Custom Search API)Now let's take a look at how we can find Twitter profiles using the Google Custom Search API. Again, we will turn to our simple Google-fu skills to search for only profile pages. However, there isn't an easy way (that I know of) to only find profiles of people who work for a specific company, however we can search for the company name as another keyword, and Google will of course return profiles that are associated in some way with that company name, which proves to be fairly successful. Here's the query that we will use:

As you can see, we are able to easily enumerate profiles related to Texas Tech University. Most importantly, this search provides us with the profile link (and also the Twitter handle). We can extract this information in the same way we extracted the LinkedIn information above. Now that we have acquired the profile links and other information, what else can we obtain about the profiles using Twitter's own API?

Twitter API

Twitter has recently made changes in its API that caused problems for quite a few third party applications. However, we can still use this API to our advantage to find quite a bit of information about the profiles we enumerated using the Custom Search API.

As a quick note, Twitter recently "upgraded" their API to version 1.1. This version of the API no longer allows anonymous queries, so we will need to create an application to use with OAuth (much like we did with Facebook). In addition to this, new query limits have been placed on particular API calls.

Our main source of information will be found in the documentation regarding API calls for user information. Let's briefly take a look at the useful API functions that will allow us to gather the information we want.

users/lookup
This function allows us to retrieve the "extended information" for up to 100 users in one call. This information includes the following (and more):

Twitter handle

Name

Profile display information

Profile Description

Links to profile image, profile, etc.

Whether or not they have Geolocation enabled on Tweets

Profile Description

With the ability to specify a substantial amount of users in one API call, we can quickly get the extended information for our enumerated user profiles. A typical API call would look like the following

We can also use this API to get another critical piece of information: users following our enumerated profile, and who the profile is following. These API calls return the "user objects" (similar to the output of users/lookup) about each of the friends or followers. This information can be a critical asset when preparing for a social engineering engagement. Typical API calls to these functions will look like the following:

As another resource, Google+ offers an API for developers which allows us to enumerate information for potential users. As before, we can use the Google Custom Search API with the following query to find users working for a specific company:

After finding the profile for users, we can easily extract their user ID since it will be part of the profile URL. We can use the ID in a GET request to obtain the "people resource" for the profile using the "People:get" API function.

Granted, this is the result from the main Texas Tech page. If we were looking for a standard person's profile, we could also obtain education and work history, more description information, and potentially emails.

Unfortunately, Google does not offer an official API call to retrieve the circles information for a particular profile. However, with a little bit of reverse engineering, it is fairly simple to create our own that works just fine. I may leave this for another post, since it is a bit of an involved process.

With this summary of some basic APIs concluded, let's briefly discuss some other automated tools and techniques for information enumeration.

Other Automated Resources

There are many other tools that can help us in our OSINT gathering process. Let's discuss a couple of them now:

Jigsaw.rb - The tool jigsaw.rb is included by default in Backtrack. It is a ruby script which scrapes the contact website Jigsaw for contact details and generates email addresses on the fly. It's a very handy script, and I am planning on posting a quick howto guide for it in the upcoming couple of days (I'll update this post when it's published).

Maltego - One of the most useful and widely used in the industry is Maltego, the free community version of which is included by default in Backtrack. This tool provides automatic OSINT gathering techniques using "transforms". The data is then presented and manipulated using an intuitive graphical interface of a force-directed graph.

Spokeo - With the tagline "Not your grandma's phone book", Spokeo is a search engine for social information. By just entering a name, email address, username, phone number, etc., one can find information across a variety of social networking platforms and other sources.

Username Enumeration
Once we have a username (such as a Twitter username), how would we go about finding other sites this username is registered to? This kind of information is very useful in determining other interests or profiles for a given target. There are quite a few sites that do this for us, but here are my two favorites:

namechk.com - Quick and easy, namechk provides an easy interface that searches over 150 popular sites for occurrences of the given username.

checkusernames.com - Very similar to namechk, checkusernames.com provides an easy interface that checks a substantial amount of sites (160) to see if a given username is registered.

But, checking usernames manually is no fun. With a little reverse engineering, I've created a simple script which automatically uses checkusernames interface for occurrences of a username. Here it is:

It is important to note that there are countless other (more manual) resources that can provide information for personnel, and we haven't even started covering APIs for finding system and network entity information. However, just as a quick recap, let's review the information we gathered using the resources above:

26 comments:

I often use archives.com, criminalsearches.com, and freecourtdockets.com alongside spokeo, pipl, jigsaw, linkedin, facebook, twitter, and google+. It used to be that pipl was good at finding facebook and linkedin profiles, but seemingly more often lately archives.com produces better results.

Almost all intelligence analyst tools are not up to my standards. They will be soon. I want something that combines Palantir with esearchy that gets into all of these automated in order to create data sources. Maltego looks great, as always, but the price needs to come down, or it needs to be as good as Palantir.

Thanks for such social platform which give us variety of idea to explore ourself technically .This exposure give benefits to everyone to fit or to survive in global market which is very essential in the global era.Time Attendance Software In Pune

open source intelligence can be used to automated marketing and big data analysis. There few open source companies provide api management tool to integrate API with different platforms. I have good experience with WSO2 api manager and you could find more info here - open source api