Monday, August 24, 2009

Known by his colleagues as simply "Ous" (pronounced like moose "without" the "m"), Oussama originally did his undergrad at the American University of Beirut in Computer and Communications Engineering. Shortly after, he enrolled at WVU in the Electrical Engineering Graduate program, where he conducted his research in software engineering models, search based AI, and software project management. His research produced, in addition to his final Masters thesis, several papers that have been accepted at conferences such as ASE, ICSE and ICSP among others.

Currently Oussama is commencing his PhD studies, working as and GTA and continuing his research work with Dr. Tim Menzies. When not occupied with his work, Oussama enjoys reading technology news, researching open source software and its use, listening to various genres of metal music, and spending time with his beautiful wife.

In their report, "Strengthening Forensic Science in the United States: A Path Forward" the National Academy of Sciences (NAS) laid out not only the challenges facing the forensic science community but also put forward recommendations for improvements required alleviate the problems, disparities, and lack of mandatory standards currently facing the community.Highlighted in this report is the concern that faulty forensic science can contribute to the wrongful convictions of innocent people. One the other side of the coin, someone who is guilty of a crime can be wrongfully acquitted.

Take for instance a hypothetical case of a suspect being connected to a crime scene by trace evidence such as glass. It would be very easy to challenge the forensic analyses by simply requesting that the analysis be done by at least five(5) different forensic labs. One can almost be 100% certain that they would have no choice but to throw out the evidence because of the inconsistency of the respective results.

It is for this reason we have taken up the NAS's challenge to develop a standard forensic interpretation methods to convincingly "demonstrate a connection between evidence and a specific individual or source". It is our hope that this project will not only supply the analyst with an interpretation of the data, but also a measure of what minimal changes in the data can result in a different interpretation.

For this project we studied current methods of forensic interpretation of glass evidence. In this subfield of forensics we have so far identified at least three separate branches of research leading to three different 'best' models: the 1995 Evett model, the 1996 Walsh model, and the 2002 Koons model. To date our research has shown that the former models are 'broken'. There are two root causes for the broken mathematical models:

'Brittle' interpretations where small input changes can lead to large output changes.

Inappropriate assumptions about the distribution of data.

To deal with these two issues, our project proposes the use of (a) clustering algorithms (meso and ant), and (b) treatment learning. The clustering algorithms will allow us to reason about crime scene data without the knowledge of standard statistical distributions, while treatment learning offers the analyst a measure of how strongly an interpretation should be believed.

Our project will also boast of a plotting tool (CLIFF) which offers the visualization of data. The software features four(4) of the current models used in the forensic community, namely the 1995 Evett model, the 1996 Walsh model, the Seheult model and the Grove model. For each model, data can be generated randomly and plotted. Other features of CLIFF include:

the ability to perform dimensionality reduction by applying the Fastmap algorithm

and the ability to determine if the results gained from a particular region of space is important and well supported.

In the end, it must be made clear that our purpose is not to determine innocence or guilt, or give a 100% guarantee of a match/non-match. Instead our goal is to aid in the decision making process of forensic interpretation. We want to provide the analyst with a standard dependable tool/model which reports to them an interpretation of the crime scene data as well as the treatment learner's analysis of what minimal 'treatment' could change the interpretation. What happens after this is in the hands of the analyst.

Ultimately we want to be able to rely on the results presented in a court of law, results based on correct models accepted and used by the entire forensic community. So, if a suspect demands that test be done by five(5) different forensic labs, there should be no dramatic differences in the results.

Friday, August 14, 2009

Adam is a Computer Science graduate student at WVU who studies data mining as well as other applications for machine learning. His interests off-campus include (but are not limited to) playing MMOs, going to see movies frequently, sleeping, eating, etc.

Wednesday, August 12, 2009

He was testing some tools developed at the MILL on flight control problems at the NASA. His results, available on-line, showed that some models are better analyzed with non-continuous contrast set learning that traditional continuous methods.

Concept location is a critical activity during software evolution as it produces the location where a change is to start in response to a modification request, such as, a bug report or a new feature request. Lexical based concept location techniques rely on matching the text embedded in the source code to queries formulated by the developers. The efficiency of such techniques is strongly dependent on the ability of the developer to write good queries. We propose an approach to augment information retrieval (IR) based concept location via an explicit relevance feedback (RF) mechanism. RF is a two-part process in which the developer judges existing results returned by a search and the IR system uses this information to perform a new search, returning more relevant information to the user. A set of case studies performed on open source software systems reveals the impact of RF on the IR based concept location.

Computer Science researcher by day, gaming journalist by night - is there anything that Greg can’t do? Hint, the answer is lots. Greg is a pop-culture junkie with an entertainment addiction. He’s also likely to quote Oscar Wilde at any given moment. Greg loves quirky Japanese video games, esoteric role-playing games, Tetris, film noir, Lovecraftian horror, and Irish whiskey. He isn’t wild about Russian literature or DC crossover events, both of which result in confusing, mind-shattering crises.

Greg is a master's student at WVU. When not researching the latest information retrieval and parametric testing methods, he writes about video games for 4 Color Rebellion. He used to head up WVU's ACM chapter.

Prediction of fault prone software components is one of the most researched problems in software engineering. Many statistical techniques have been proposed but there is no consensus on the methodology to select the "best model" for the specific project. In this paper, we introduce and discuss the merits of cost curve analysis of fault prediction models. Cost curves allow software quality engineers to introduce project-specific cost of module misclassification into model evaluation. Classifying a software module as fault-prone implies the application of some verification activities, thus adding to the development cost. Misclassifying a module as fault free carries the risk of system failure, also associated with cost implications. Through the analysis of sixteen projects from public repositories, we observe that software quality does not necessarily benefit from the prediction of fault prone components. The inclusion of misclassification cost in model evaluation may indicate that even the "best" models achieve performance no better than trivial classification. Our results support a recommendation favoring the use of cost curves in practice with the hope they will become a standard tool for software quality model performance evaluation.

We implemented Boehm-Turner’s model of agile and plan-based software development. That tool is augmented with an AI search engine to ﬁnd the key factors that predict for the success of agile or traditional plan-based software developments. According to our simulations and AI search engine: (1) in no case did agile methods perform worse than plan-based approaches; (2) in some cases, agile performed best. Hence, we recommend that the default development practice for organizations be an agile method. The simplicity of this style of analysis begs the question: why is so much time wasted on evidence-less debates on software process when a simple combination of simulation plus automatic search can mature the dialogue much faster?

SEESAW combines AI search tools, a Monte Carlo simulator, and some software process models. We show here that, when selecting technologies for a software project, SEESAW out-performs a variety of other search engines. SEESAW’s recommendations are greatly affected by the business context of its use. For example, the automatic defect reduction tools explored by the ASE community are only relevant to a subset of software projects, and only according to certain value criteria. Therefore, when arguing for the value of a particular technology, that argument should include a description of the value function of the target user community.

Assoc. Prof. Tim Menzies, (tim@menzies.us) has been working on advanced modeling and AI since 1986. He received his PhD from the University of New South Wales, Sydney, Australia and is the author of over 164 refereeed papers .

A former research chair for NASA, Dr. Menzies is now a associate professor at the West Virginia University's Lane Department of Computer Science and Electrical Engineering and director of the Modeling Intelligence Lab.

Greg went to NASA AMES this summer and found that treatment learning (tar3 and tar4) beat the heck out of a range of standard methods (simulated annealing and gradient descent) for optimizing NASA simulators.

Meanwhile, he wowwed the hell out of the NASA folks. Maybe we can send grad students there every year?