zvelo BLOG

Articles, news, advice, research and open discussions from zveloLABS® – a team of software engineers, web analysts and other professionals dedicated to the development and enhancement of zvelo’s content and context categorization databases and technologies, and malicious website detection capabilities.

The URL checker found on the zvelo.com homepage, previously known as the “Test-a-site” tool, serves to demo various contextual data sets about URLs that can be derived by licensing zvelo’s contextual categorization and malicious website detection services. When queried, the URL checker yields a sample of data sets stored within zvelo’s URL database, via an intuitive GUI, that technology vendors within the ad tech, data analytics, network security, web filtering and other markets can integrate into their own offerings via a cloud API or a local SDK.

The Internet age has shown us a myriad of online scams, from get rich quick schemes to winning the lottery, typically originating via an email hook. This is a blind way of distributing scams, since scammers have no way of knowing if the scam is relevant to the person they are trying to lure. For example, a scammer might send a Lotto scam in a country where there is no Lotto. This has changed; scammers now have access to online advertising networks, which have proven to be very powerful targeting tools because of the vast amount of demographic and web usage behavioral information they possess about real people.

If we have a thousand monkeys typing away on a thousand typewriters, surely they can produce great works of literature – or so goes the popular adaptation of the Infinite Monkey Theorem. But in the context of information security, a similar idea has been taking shape in past few years. Crowdsourced security, leveraging on input from a host of geographically dispersed systems, is slowly gaining ground as a means to provide actionable threat intelligence for both the public and private sectors.

Advertisements are everywhere, from print publications to road-side billboards, and of course TV and on the Web. The intent of advertising is no different regardless of the medium. Advertisers are constantly feuding to win over consumer sentiment. On the Internet, ad-serving technologies have become so advanced that ads can now be targeted based on one’s individual web browsing history and behaviors, likes, shares, location, device type and other factors. From time to time, however, ad placements land severely out-of-context, and here is one such example of online advertising gone bad.

Ad fraud continues to plague the online advertising industry and advertiser trust in automated ad-serving technologies continues to dwindle. It’s not just traditional display advertising that’s susceptible. Digital video and mobile advertising are seeing their fair share of bot (non-human) generated impressions and clicks as well. zvelo has recently become an Associate Member of the Interactive Advertising Bureau (IAB) to help mold industry best practices to combat ad fraud.

Our willingness to surrender personal privacy in exchange for services that we now consider essential, as discussed in a previous article, has made it much easier for large governments and private individuals alike to collect information.

We are constantly reminded of the growing number of privacy concerns from the use of Information and Communications Technology (ICT). Some are quick to blame governments or commercial entities when our personal information is compromised. Very few stop to think whether or not the blame should be pointed at ourselves. To what extent are we as end-users responsible for facilitating our own personal privacy?

If one performs the search “use www or not,” well over a billion results in many of the most popular search engines are returned. The focus of each result may differ. For zvelo, the usage is irrelevant because its contextual categorization processes are designed to identify and handle each component of a URL. At a simplistic view, the basic components of a URL are the following:

Cybercrime against high-profile entities like eBay and Target is on the rise, and the media has conjured up nightmarish scenarios of cyber-criminals going on shopping sprees with our well-earned cash – easily obtained through stolen credit card information. The risks that the general public faces vary and should not be applied equally.

zvelo once offered 53 categories that were used to classify content on websites about Businesses & Services, Politics & Law, Portal Sites and others. This was later raised to 141 categories to help cover even more topics. The latest version boasts nearly 500 categories, making it one of the most granular categorization sets in the industry. We’ve managed to upgrade our categorization systems to better serve the needs of our existing and future technology partners and following is one example why this matters.

The importance of the Alexa top websites can never be discounted in zvelo’s day-to-day operations. Providing contextual data sets about the Alexa top sites is a vital element for the online advertising market because it can assist in determining the most ideal and brand-safe placement of online ads and other promotional materials.

Recent events serve as the best example of how the context of security has shifted from the once server-centric model to that of a decentralized threat landscape. From the Heartbleed attacks to the widespread Internet Explorer vulnerabilities and finally the sensationalized OAuth issues, it appears that even organizations with a hardened perimeter infrastructure are just as vulnerable as an end-user at home. Although threats geared towards enterprise infrastructure are by no means going away, the prevalence of vulnerabilities affecting end-users are alarming to say the least.

Given the dynamic nature of the majority of today’s websites, categorization at the full path URL versus the base domain is superior and now required. Parts of a website include the top-level domain (.com, .org, etc.), the base domain (example.com), sub-domain (subdomain.example.com) or sub-path (example.com/page). When categorizing content, it is highly important to recognize exactly what is being classified within a website because content can differ dramatically across full path URLs.

What is a URL parameter? Quite simply it is a string of characters, or a query string, that is appended to a URL that contains data. This data is passed to predefined web applications to find the appropriate content and return it back to the user’s web browser which then generates the entire web page. The query string can also be used for various other methods such as identifying a user’s session or using it as a way to look up information about your online bank account after you have logged in. URLs with parameters are used by various types of web sites however online shopping, auction, and banking type sites are probably the most prevalent.

Manually classifying the content on a single web page takes but a few seconds to accomplish. Analyzing the keywords – words or phrases – used and the number of instances of each – keyword density – is one way to go about it. When needing to classify the content on billions of web pages at a time, however, the task becomes overwhelmingly daunting for any human eye to handle. In this scenario, only an automated content classification engine can succeed.

Prior to this blog post, zveloLABS published a phishing URL alert about fake Apple account verification websites. Now, zvelo’s team of engineers and researchers has unearthed a new phishing attack campaign using fraudulent Facebook log-in sites.

Instances of large-scale compromises of both private industry and public institutions in 2013 prompted a flurry of activity among security researchers to identify emerging and established threats. Commonly identified as Advance Persistent Threats (APTs), this phenomenon is expected to continue well into the foreseeable future. Fundamental to the spread of these threats is one of their foremost methods of propagation – a water hole attack.

zvelo has received many requests from its technology partners who are in the web filtering and parental control sectors to institute and support a new category that can be used to identify websites that promote self-harm behaviors. As a result of such demand, a new “Self Harm” category has been added to the zveloDB® URL database.

How does zvelo provide the most accurate content categorization service and the best URL database available? The approach is two-fold and while a substantial chunk of the workload is handled by zvelo’s line-up of machine learning and artificial intelligence-based categorization processes and systems, the quality assurance and other daily efforts put forth by its human Web Analysts can never be discounted.

This article will be updated periodically, in support of numerous global online safety awareness campaigns occuring every year – Safer Internet Day (promoted in February), Cyber Security Awareness Month (October), IWF Awareness Day (also in October) and others. During these times, web safety advocates, companies, organizations and professionals worldwide raise awareness about safer and more responsible use of online technologies and mobile devices. Following is a living repository of online resources, guides, tips and entities aimed at helping everyone enjoy worry-free Internet experiences. Additional web safety resources will be hand-picked and added as they are discovered. To possibly be included in this list, or if other online safety resources exist that deserve mention, please feel free to comment below. Including a link and a brief description with each comment helps.

Static HTML websites are becoming increasingly rare, and nowadays sites pack quite the punch. We’ve grown accustomed to photo and video slideshows, widgets, feeds, social network integrations, and other dynamic elements. Websites come overloaded with media, are more interactive, and the content can vary dramatically from page-to-page and can differ even more between end-users or browsing sessions. Much of the content is pulled in dynamically from external sources and most of us fuel the Internet’s growth by creating and uploading content of our own daily and at extremely high upload rates. Making sense of it all can be quite the challenge for technology vendors “needing to know” and following are insights into zvelo’s content categorization approach.

Reports are plentiful of non-human bots gaming the online advertising industry by delivering fraudulent impressions and click traffic, and the Internet Advertising Bureau (IAB) took note. The IAB released the “Traffic Fraud: Best Practices for Reducing Risk to Exposure” on December 5, 2013, to help online media buyers, publishers and ad networks mitigate the dilemma.

People don’t seem to worry much about privacy when “checking in” to a favorite local restaurant or coffee shop, or from other social media posts that reveal one’s location. What if you were approached by a complete stranger who knew your name and other personally indefinable information within minutes after making an upload? A few socialites got quite the shock after a social media experiment revealed how much personal information can be extracted from publicly viewable status updates.

In mid-2013, British Prime Minister, David Cameron, began a push to block pornographic material on the Web in UK households. Under the new legislation, porn would be filtered by default and citizens would have to opt-in to view such adult content. Enforcement of such an ambitious initiative comes with many content categorization and technical challenges, not just in the UK, but within any internet service provider infrastructure.

zveloLABS once again attended the 2013 Hack In The Box (HITB) conference in Kuala Lumpur, Malaysia, held in mid-October. Of all the wide variety of talks conducted during the conference, I found two correlated with the vulnerabilities of RFID systems to be the most intriguing. I’ve summarized them below.

Wi-Fi hotspots commonly found in many American coffee shops, restaurants and other popular after-school hang outs are providing kids with what they demand – free Internet access. This may help keep them connected with family or friends, in addition to sparing parents from costly data plan overages, but the complimentary Web access was proven to come with a twist in an Adaptive Mobile independent study. The adult, dating, extremist, drug, gambling and other similarly objectionable content typically blocked at home by some type of parental controls solution is easily accessible by kids at these Wi-Fi locations.

Once again, zveloLABS participated in the 2013 ROOTCON annual hacker conference and security gathering in Cebu City, Philippines. It aims to share best practices and technologies through talks by qualified speakers and demos of exciting hacks, tools, tips, and more. The event was attended by groups and individuals who share similar interests in information security. Following is a summary of a few of the topics presented.

Ad blocking has gained wide consumer acceptance over the past couple of years and a PageFair report suggests it could be costing web-based businesses hundreds of thousands of dollars in lost advertising revenue. In some instances, ad blocking negatively impacted a select number of websites so much they are no longer online. With the use of ad blocking software on the rise, there exists a significant requirement by the ad-tech market to make the most of those actual ad placements that make the cut. In other words, it’s more important than ever for ad units to be in-context with content on web pages, no matter how deep within a website the placements land.

I attended one of the Black Hat training sessions titled “Advanced C++ Source Code Analysis.” It was quite fascinating! Looking through source code for bugs seems to be a different mindset from writing software. While reading the buggy code I often found myself thinking, “Yes, that should work,” and then realized that what looked fine was actually horribly dangerous.

The annual DEF CON® hacker conference came and went as swiftly as a light rain against the hot Las Vegas strip. Consumer tech was a big focus and speakers demonstrated how various network-connected gadgets, once hacked, could be controlled to affect the real, physical world. Here are some highlights from two particular lectures about the hacking of network-connected and radio-frequency identification (RFID) enabled devices that got much attention.

The Anti-Phishing Working Group (APWG) released their quarterly Phishing Attack Trends Report for the first quarter of 2013. Payment Services were reported as the most phished industry sector, followed by Financial Services. When considering the goal of cyber-criminals behind such scams – typically usernames, passwords and credit card information for monetary gains – these industries certainly make sense. While the total number of reported phishing website detections is seemingly on the decline, as illustrated in the trend line below, actual attacks may tell a different story.

With the growing number of alleged cyber-attacks that are taking place between the United States and the People’s Republic of China, the talks in early June of 2013 between President Barrack Obama and President Xi Jinping were viewed as a much needed response to the crisis. Unfortunately, such steps may end in either half-hearted agreements or may collapse entirely under their own weight. Depressing as this outlook may be, such pessimism is rooted in the fact that cyber space, as a medium on which to expand national policy, is too good to pass up on for either party. Central to this idea is the fact that both countries have invested heavily in cyber space not only as a means of communication, but for economic growth as well.

I got my hands on a copy of a Northwestern University research paper titled “Evaluating Android Anti-malware against Transformation Attacks.” After digging into it, my zveloLABS colleagues and I decided to conduct an experiment of our own based on the information provided in the research paper.

The Internet Watch Foundation has celebrated a major milestone. It has taken action against its 100,000th URL containing inappropriate child sexual abuse content. In addition, the IWF reports it has aided the rescue of at least 12 children in the past two years. The body’s 2012 accomplishments deserve some praise.

In early 2013, zvelo deployed a new approach to detect spam web pages. These web pages have little value and consist mostly of meaningless content and links, sometimes objectionable in nature, or worse yet they can be used to host and spread malware. Spam web pages continue to sprout online and following are some interesting trends about the types of web content spammers are targeting, which zveloLABS has mapped out.

The Dow Jones Industrial Average recently dropped by about 145 points and the S&P 500 index lost $136.5 billion dollars in value after a tweet from the Associated Press claimed that an explosion had taken place in the White House and that President Obama was injured. The tweet turned out to be false and stemmed from a hacked Associated Press Twitter account. The precedent has been set for us to take a long, hard and uncomfortable look at the challenges we face when relying on automated trading systems that gauge and react to public sentiment and that end with drastic results.

Consumers will soon know exactly how much of their personal information is being collected online, by whom, and may one day be able to correct errors or opt-out entirely from such activity. The name of the game is “privacy” and thanks to a combination of recent investigative reporting and pressure from advocacy groups, regulatory entities and politicians, the urgency to reach this point is now mainstream news.

There have been two notable botnets that have cost online advertisers millions of dollars in advertising click fraud in recent weeks. The first botnet, Bamital, was taken down by Microsoft and Symantec in February. A second botnet was later identified and dubbed Chameleon by Spider.io, a security company that specializes in analyzing web traffic. Since zvelo is also in the business of analyzing and categorizing web content viewed by actual users, this story resonated hard with zveloLABS.