Dr. Andrew G. Westhttps://works.bepress.com/andrew_g_west/Recent works by Dr. Andrew G. Westen-usCopyright (c) 2018 All rights reserved.Thu, 01 Jun 2017 00:00:00 +00003600Analyzing the Keystroke Dynamics of Web Identifiershttps://works.bepress.com/andrew_g_west/39/<div class="line" id="line-25">Web identifiers such as usernames, hashtags, and domain names serve important roles in online navigation, communication, and community building. Therefore the entities that choose such names must ensure that end-users are able to quickly and accurately enter them in applications. Uniqueness requirements, a desire for short strings, and an absence of delimiters often constrain this name selection process. </div><div class="line" id="line-27"><br></div><div class="line" id="line-29">To gain perspective on the speed and correctness of name entry, we crowdsource the typing of 51,000+ web identifiers. Surface level analysis reveals, for example, that typing speed is generally a linear function of identifier length. Examining keystroke dynamics at finer granularity proves more interesting. First, we identify features predictive of typing time/accuracy, finding: (1) the commonality of character bi-grams inside a name, and (2) the degree of ambiguity when tokenizing a name -- to be most indicative. A machine-learning model built over 10 such features exhibits moderate predictive capability. Second, we evaluate our hypothesis that users subconsciously insert pauses in their typing cadence where text delimiters (e.g., spaces) would exist, if permitted. The data generally supports this claim, suggesting its application alongside algorithmic tokenization methods, and possibly in name suggestion frameworks. </div>Andrew G. WestThu, 01 Jun 2017 00:00:00 +0000https://works.bepress.com/andrew_g_west/39/Work at Verisign LabsChatter: Classifying Malware Families Using System Event Orderinghttps://works.bepress.com/andrew_g_west/34/<p>Using runtime execution artifacts to identify malware and its associated "family" is an established technique in the security domain. Many papers in the literature rely on explicit features derived from network, file system, or registry interaction. While effective, use of these fine-granularity data points makes these techniques computationally expensive. Moreover, the signatures and heuristics this analysis produces are often circumvented by subsequent malware authors.</p>
<p>To this end we propose CHATTER, a system that is concerned only with the order in which high-level system events take place. Individual events are mapped onto an alphabet and execution traces are captured via terse concatenations of those letters. Then, leveraging an analyst labeled corpus of malware, n-gram document classification techniques are applied to produce a classifier predicting malware family. This paper describes that technique and its proof-of-concept evaluation. In its prototype form only network events are considered and three malware families are highlighted. We show the technique achieves roughly 80% accuracy in isolation and makes non-trivial performance improvements when integrated with a baseline classifier of non-ordered features (with accuracy of roughly 95%).</p>
Aziz Mohaisen et al.Wed, 01 Oct 2014 00:00:00 +0000https://works.bepress.com/andrew_g_west/34/Work at Verisign LabsADAM: Automated Detection and Attribution of Malicious Webpageshttps://works.bepress.com/andrew_g_west/37/<p>Malicious webpages are a prevalent and severe threat in the Internet security landscape. This fact has motivated numerous static and dynamic techniques to alleviate such threats. Building on this existing literature, this work introduces the design and evaluation of ADAM, a system that uses machine-learning over network metadata derived from the sandboxed execution of webpage content. ADAM aims to detect malicious webpages and identify the nature of those vulnerabilities using a simple set of features. Machine-trained models are not novel in this problem space. Instead, it is the dynamic network artifacts (and their subsequent feature representations) collected during rendering that are the greatest contribution of this work. Using a real-world operational dataset that includes different forms of malicious behavior, our results show that dynamic, low-cost network artifacts can be used effectively to detect most vulnerabilities -- achieving an accuracy reaching 96%. The system is also able to identify the precise type of that vulnerability in 91% of cases. Further, this work highlights those cases that frequently evade detection, suggesting areas of future emphasis, alongside the desire to extend this work to practical contexts.</p>
Ahmed E. Kosba et al.Fri, 01 Aug 2014 00:00:00 +0000https://works.bepress.com/andrew_g_west/37/Work at Verisign LabsOn the Privacy Concerns of URL Query Stringshttps://works.bepress.com/andrew_g_west/33/<p>URLs often utilize query strings (i.e., key-value pairs appended to the URL path) as a means to pass session parameters and form data. Often times these arguments are not privacy sensitive but are necessary to render the web page. However, query strings may also contain tracking mechanisms, user names, email addresses, and other information that users may not wish to reveal. In isolation such URLs are not particularly problematic, but the growth of Web 2.0 platforms such as social networks and micro-blogging means URLs (often copy-pasted from web browsers) are increasingly being publicly broadcast.</p>
<p>This position paper argues that the threat posed by such privacy disclosures is significant and prevalent. It demonstrates this by analyzing 892 million user-submitted URLs, many disseminated in (semi)-public forums. Within this corpus our case-study identifies troves of personal data including 1.7 million email addresses. In the most egregious examples the query string contains plaintext usernames and passwords for administrative and extremely sensitive accounts. With this as motivation the authors propose a privacy-aware service they name "CleanURL". CleanURL's goal is to transform addresses by stripping non-essential key-value pairs and/or notifying users when sensitive data is critical to proper page rendering. This logic is based on difference algorithms, mining of URL corpora, and human feedback loops. Though realized as a link shortener in its prototype implementation, CleanURL could be leveraged on any platform to scan URLs before they are published or retroactively sanitize existing links.</p>
Andrew G. West et al.Thu, 01 May 2014 00:00:00 +0000https://works.bepress.com/andrew_g_west/33/Work at Verisign LabsDamage Detection and Mitigation in Open Collaboration Applicationshttps://works.bepress.com/andrew_g_west/29/<p>Collaborative functionality is changing the way information is amassed, refined, and disseminated in online environments. A subclass of these systems characterized by "open collaboration" uniquely allow participants to modify content with low barriers-to-entry. A prominent example and our case study, English Wikipedia, exemplifies the vulnerabilities: 7%+ of its edits are blatantly unconstructive. Our measurement studies show this damage manifests in novel socio-technical forms, limiting the effectiveness of computational detection strategies from related domains. In turn this has made much mitigation the responsibility of a poorly organized and ill-routed human workforce. We aim to improve all facets of this incident response workflow.</p>
<p>Complementing language based solutions we first develop content agnostic predictors of damage. We implicitly glean reputations for system entities and overcome sparse behavioral histories with a spatial reputation model combining evidence from multiple granularity. We also identify simple yet indicative metadata features that capture participatory dynamics and content maturation. When brought to bear over damage corpora our contributions: (1) advance benchmarks over a broad set of security issues ("vandalism"), (2) perform well in the first anti-spam specific approach, and (3) demonstrate their portability over diverse open collaboration use cases.</p>
<p>Probabilities generated by our classifiers can also intelligently route human assets using prioritization schemes optimized for capture rate or impact minimization. Organizational primitives are introduced that improve workforce efficiency. The whole of these strategies are then implemented into a tool ("STiki") that has been used to revert 350,000+ damaging instances from Wikipedia. These uses are analyzed to learn about human aspects of the edit review process, properties including scalability, motivation, and latency. Finally, we conclude by measuring practical impacts of work, discussing how to better integrate our solutions, and revealing outstanding vulnerabilities that speak to research challenges for open collaboration security.</p>
Andrew G. WestWed, 01 May 2013 00:00:00 +0000https://works.bepress.com/andrew_g_west/29/Collaborative security and broader perspectivesAS-TRUST: A Trust Quantification Scheme for Autonomous Systems in BGPhttps://works.bepress.com/andrew_g_west/12/The Border Gateway Protocol (BGP) works by frequently exchanging updates that disseminate reachability information about IP prefixes (i.e., IP address blocks) between Autonomous Systems (ASes) on the Internet. The ideal operation of BGP relies on three major behavioral assumptions (BAs): (1) information contained in the update is legal and correct, (2) a route to a prefix is stable, and (3) the route adheres to the valley free routing policy. The current operation of BGP implicitly trusts all ASes to adhere to these assumptions. However, several documented violation of these assumptions attest to the fact that such an assumption of trust is perilous. This paper presents AS-TRUST, a scheme that comprehensively characterizes the trustworthiness of ASes with respect to their adherence of the behavioral assumptions. AS-TRUST quantifies trust using the notion of AS reputation. To compute reputation, AS-TRUST analyzes updates received in the past. It then classifies the resulting observations into multiple types of feedback. The feedback is used by a reputation function that uses Bayesian statistics to compute a probabilistic view of AS trustworthiness. This information can then be used for improving quotidian BGP operation by enabling improved route preference and dampening decision making at the ASes. Our implementation of AS-TRUST scheme using publicly available BGP traces demonstrates: (1) the number of ASes involved in violating the BGP behavioral assumptions is significant, and (2) the proposed reputation mechanism provides multi-fold improvement in the ability of ASes to operate in the presence of BA violations.Jian Chang et al.https://works.bepress.com/andrew_g_west/12/Routing reputationTrust in Collaborative Web Applicationshttps://works.bepress.com/andrew_g_west/21/Collaborative functionality is increasingly prevalent in web applications. Such functionality permits individuals to add - and sometimes modify - web content, often with minimal barriers to entry. Ideally, large bodies of knowledge can be amassed and shared in this manner. However, such software also provide a medium for nefarious persons to operate. By determining the extent to which participating content/agents can be trusted, one can identify useful contributions. In this work, we define the notion of trust for Collaborative Web Applications and survey the state-of-the-art for calculating, interpreting, and presenting trust values. Though techniques can be applied broadly, Wikipedia's archetypal nature makes it a focal point for discussion.https://works.bepress.com/andrew_g_west/21/Collaborative security and broader perspectivesAn Evaluation Framework for Reputation Management Systemshttps://works.bepress.com/andrew_g_west/1/Reputation management (RM) is employed in distributed and peer-to-peer networks to help users compute a measure of trust in other users based on initial belief, observed behavior, and run-time feedback. These trust values influence how, or with whom, a user will interact. Existing literature on RM focuses primarily on algorithm development, not comparative analysis. To remedy this, we propose an evaluation framework based on the trace-simulator paradigm. Trace file generation emulates a variety of network configurations, and particular attention is given to modeling malicious user behavior. Simulation is trace-based and incremental trust calculation techniques are developed to allow experimentation with networks of substantial size. The described framework is available as open source so that researchers can evaluate the effectiveness of other reputation management techniques and/or extend functionality.
This chapter reports on our framework’s design decisions. Our goal being to build a general-purpose simulator, we have the opportunity to characterize the breadth of existing RM systems. Further, we demonstrate our tool using two reputation algorithms (EigenTrust and a modified TNA-SL) under varied network conditions. Our analysis permits us to make claims about the algorithms’ comparative merits. We conclude that such systems, assuming their distribution is secure, are highly effective at managing trust, even against adversarial collectives.https://works.bepress.com/andrew_g_west/1/Trust and reputation managementCalculating and Presenting Trust in Collaborative Contenthttps://works.bepress.com/andrew_g_west/5/Collaborative functionality is increasingly prevalent in Internet applications. Such functionality permits individuals to add -- and sometimes modify -- web content, often with minimal barriers to entry. Ideally, large bodies of knowledge can be amassed and shared in this manner. However, such software also provides a medium for biased individuals, spammers, and nefarious persons to operate. By computing trust/reputation for participating agents and/or the content they generate, one can identify quality contributions.
In this work, we survey the state-of-the-art for calculating trust in collaborative content. In particular, we examine four proposals from literature based on: (1) content persistence, (2) natural-language processing, (3) metadata properties, and (4) incoming link quantity. Though each technique can be applied broadly, Wikipedia provides a focal point for discussion. Finally, having critiqued how trust values are calculated, we analyze how the presentation of these values can benefit end-users and application security.https://works.bepress.com/andrew_g_west/5/Collaborative security and broader perspectivesTowards the Effective Temporal Association Mining of Spam Blacklistshttps://works.bepress.com/andrew_g_west/2/IP blacklists are a well-regarded anti-spam mechanism that capture global spamming patterns. These properties make such lists a practical ground-truth by which to study email spam behaviors. Observing one blacklist for nearly a year-and-a-half, we collected data on roughly *half a billion* listing events. In this paper, that data serves two purposes.
First, we conduct a measurement study on the dynamics of blacklists and email spam at-large. The magnitude/duration of the data enables scrutiny of long-term trends, at scale. Further, these statistics help parameterize our second task: the mining of blacklist history for temporal association rules. That is, we search for IP addresses with correlated histories. Strong correlations would suggest group members are not independent entities and likely share botnet membership.
Unfortunately, we find that statistically significant groupings are rare. This result is reinforced when rules are evaluated in terms of their ability to: (1) identify shared botnet members, using ground-truth from botnet infiltrations and sinkholes, and (2) predict future blacklisting events. In both cases, performance improvements over a control classifier are nominal. This outcome forces us to re-examine the appropriateness of blacklist data for this task, and suggest refinements to our mining model that may allow it to better capture the dynamics by which botnets operate.https://works.bepress.com/andrew_g_west/2/Miscellaneous security