Facebook Scraped Data Issue Surfaces in Vietnam

The security research team, led by Anurag Sen, at Safety Detectives has uncovered a significant leak of Facebook data. As much as 3 gigabytes of scraped Facebook user data was found on an Elastic server, which raises additional concerns regarding the company’s security measures.

This follows not only the Cambridge Analytica scandal of March 2018, but a previous data scrape of Facebook users by hackers purportedly based in Vietnam in January 2020. The data that our research found is on top of what was already found, and adds another 12 million records to the list. Many, but not all, of the entries included full details of personally identifying information (PII), stemming from multiple sources – Facebook included. We still do not know who is ultimately responsible for this scrape and how they were able to perform such an extensive and invasive action.

Since discovering the leak, the server has subsequently been taken offline.

What is Data Scraping?

Data scraping is a means of extracting private information from a website. It’s a fairly common practice, with several vendors providing tools to allow anyone to scrape data.

Most data scraping is completely innocuous and carried out by web developers, business intelligence analysts, honest businesses such as travel booker sites, as well as being done for market research purposes online. It is only when the practice is weaponized and done with a specific goal of extracting personal information, can negative consequences occur.

Social media companies such as Facebook will allow users to access third-party websites by using their existing Facebook login information. However, if security protocols are not properly instituted, it can allow hackers with so-called ‘scraper bots’ to extract private information.

Why is Data Scraping dangerous?

When private information – such as login details, addresses, and birth information – is extracted, it can allow unauthorized users to commit heinous acts, including identity theft and financial fraud.

Given the fact that such leaks are often automated with bots conducting all of the data extraction, it can mean millions of innocent users can have their information leaked within a short period of time. It is worth noting that Facebook currently has around 2.2 billion monthly users.

Data can then be sold or provided to other malicious parties, thereby making the potential ramifications of data scraping wide-ranging and severe.

What has been leaked?

The Elastic server provided data related to 12 million Facebook users, with as much as 3 gigabytes leaked.

Number of Records Leaked: 12 million Facebook users

Size: 3 gigabytes

Location: Vietnam

Of the scraped and leaked data, it is important to note that much of the information included is not meant to be publicly visible, especially without the knowledge and approval of the user.

Data included in the breach:

Full name

Hometown location

Current location

Education details

Family relations with other Facebook users

Birthdates

GPS coordinates

Email addresses

Facebook usernames and IDs

Profile scores

PII, including birthdate, city of residence, gender, email address, and the individual’s unique Facebook ID

Family relationships and connections – along with each member’s Facebook ID.

Education information about each user, including unique Facebook user IDs and school IDs.

Facebook concerns

This latest breach in Vietnam is particularly sensitive because of Facebook’s recent history with data scraping. The beginning of the year saw an older leak of Vietnamese Facebook users’ data, and this discovery shows that the extent of the threat against said data goes even further than believed. In March 2018, a political consulting group called Cambridge Analytica was able to ‘harvest’ – or scrape – personal data related to 57 million Facebook users. The number of people affected was later revised up to 87 million, with Facebook declaring that resolving the vulnerabilities would be a “multi-year process”.

The incident made headlines around the world because of the political connections and suspected impact on US elections. In response, Facebook decided to lock down some of its API functions, including data scraping, in order to make this practice more difficult to conduct. The US social media giant also blocked users from using its reverse search tool — a means of using snippets of data to identify and garner even larger data sets.

It was this feature that allowed malicious actors to scrape Facebook user data, and the social media company stated that it was resolved.

Clearly, there are still data-scraping vulnerabilities that can be exploited, especially where there is a mismatch of security protocols being implemented by third-party websites and Facebook.

Preventing Data Exposure

How can you prevent your personal information from being exposed in a data leak and ensure that you’re not a victim of attacks – both online and offline – if it is leaked?

Be cautious of what information you give out and to whom

Check that the website you’re on is secure (look for a https designation and/or a closed lock in the URL address bar)

Only give out what you feel confident cannot be used against you (avoid government ID information and sensitive personal information that can compromise you, if made public)

Create secure passwords by combining letters, numbers and symbols – a password manager such as Dashlane can help with this

Do not click links in emails unless you are sure that the sender is legitimate and trustworthy

Double-check any social media accounts (even ones you no longer use) to ensure that the privacy of your posts and personal details are visible only to people you trust

Avoid using credit card information and typing out passwords over unsecured WiFi networks

Conduct further research into what constitutes ‘cybercrime’ and remain updated to the latest hacks and cyber threats online such as phishing attacks and ransomware.

About Us

SafetyDetective.com is the world’s largest antivirus review website. The Safety Detective research lab is a pro bono service that aims to help the online community defend itself against cyber threats, while educating organizations on protecting their users’ data. You can view some of our top antivirus recommendations here.