Data provenance refer to the process of tracing and recording the origins of data and its movement between databases. LDA-GA program serves as a funnel in the data provenance reconstruction multi-level funneling model. The model has a very good precision and recall, but it is extremely time consuming. In this project, the single-threaded program is optimized with parallel computing techniques. The optimized program can work simultaneously on different machines. On each machine, the computation is carried out multi-threadedly.

By making use of multiple cores on multiple machines, the performance of LDA-GA program is improved greatly. Comparing with the single-threaded version, the multi-threaded one has a multiple times better performance.

Finding motifs in networks usually involves traversing through a large network to enumerate all possible subgraphs of a given size, and then determining their statistical uniqueness by sampling subgraphs from a large number of randomly generated graphs. Current algorithms for network motif analysis can be said to follow either a network-centric or motif-centric approach. While network centric algorithms cannot choose the subgraph patterns to search, the motif-centric algorithms find instances for just one or a small set of subgraph patterns in the network. Researchers interested in just one or a few subgraph shapes will find motif-centric tools useful for there will be no redundant work finding subgraphs they do not care about. This project introduces a motif-centric tool which will enable researchers find subgraph instances in the input network for only the subgraphs of interest – called query graphs – as opposed to finding instances for all possible (non-isomorphic) k-graphs. The tool, called ParaMODA, incorporates the existing motif-centric algorithms as well as a new algorithm for carrying out the same task. The tool also collect and store discovered instances on the disk for future retrieval and analysis. Experimental results show that the new algorithm out-performed the existing ones, although the performance improvements varied depending on the set of query graphs used. More importantly the new algorithm allows for parallelization of huge chunks of the task – which will be helpful, especially for studying motifs of double-digit sizes in large networks.

Thursday, March 2

According to Prevent Blindness America, one in four school-age children has vision problems that, if left untreated, can affect learning ability, personality and adjustment in school. The child who has a near vision problem could pass the regular eye exam due to the short time of exam. In addition, the children’s eye health condition changes rapidly as they grow up. In order to find an effective way to track and detect children’s near vision problem, the Educating Young Eyes (EYE) project has been established to improve children’s eye health as well as their performance in school. The EYE project is mainly about developing interesting and kid-friendly mobile games as screening tools to help early detect children’s near vision problems and also serves as a research center to advocate the study of children’s near vision problems. This project, as part of the EYE project, proposed a shared database to support the data persistence requirements and implemented web portals for stakeholders to access and modify the data in the shared base. In particular, the shared database will store all the valuable user test data generated from mobile apps and visualize the data through web portal for parents to understand their child’s eye health, for medical practitioners to manage patient’s historical test results and prescriptions, and for researchers to further study the near vision problems and discover new solutions. Specific goals include facilitating to accommodate a variety of new assessment and vision therapy tools and games; cloud-based support for the repository; compliancy with health and student information privacy needs; and real-time as well as batch uploading of data.

Monday, March 6

In this project, we aim to identify and track people in a given video footage. We perform face detection and recognition, and then use a block matching algorithm to continuously track each detected face. We develop a simple, intuitive, user-friendly interface that enables search by name or image that is useful and convenient for people identification and activity recognition. By plotting position data associated with each individual as a function of time, we generate a trajectory map T, which represents the movement of each person throughout the day. By overlaying the trajectories for every individual and computing the proximities of the trajectories at any given time, we build a potential interaction graph I, that identifies and records the potential interaction patterns between individuals and groups of individuals. These interaction patterns can be further analyzed to provide useful insight to improve collaboration, productivity and security.

Recent trends indicate a steady rise in cyber attacks targeting the healthcare industry and patient data. Mobile applications in healthcare are becoming increasingly popular, and users presume these applications are inherently secure. However, a lack of security cognizance among developers and a rush to market has introduced a plethora of security vulnerabilities in mobile health applications. Our initial research showed that health-related mobile applications contain numerous vulnerabilities for attackers to potentially obtain medical data. Due to the sensitivity of the medical data, security is one of the most vital requirements for any device and application that utilizes medical data. The goal of this capstone project is to ensure secure storage and exchange of personal health information used in mobile applications while maintaining an expected level of user functionality. To this end, this project develops a security framework for mobile applications in healthcare to address common security vulnerabilities in the application development process.

MASS is a parallel-computing library for multi-agent and spatial simulation over a cluster of computing nodes. The library uses two important concepts; Agent and Place. Place represents each array index of the given data set while Agent instances execute set of instructions to perform actual computation. Agents communicate with each other to exchange data and they can spawn new child Agent instances, migrate between Places, or get terminated. The library is used for conducting simulation or big data analysis. This project primarily focuses on memory efficiency of MASS Java Library for big data analysis.

Many contributors have improved this library since its initial release in 2010 and some of them made significant changes on MASS Java version during their capstone project or thesis process. Our findings show that an Agent instance uses up to 1MB of memory space and some scientific applications, such as UW Climate Analysis or Biological Network Motif, require 3 to 5 millions of Agents during their execution. This shows us that the system requires terabytes of memory which none of the computing nodes can afford. This project introduces an agent population control mechanism that restricts the number of active agents in the system and serializes new incoming agents into byte streams for later use. In addition, the library communicates with user application in each iteration cycle and transmits data. The overhead of this communication causes performance decrease in terms of both excessive memory consumption and high cpu usage. We also address this issue by implementing a practical way of doing recursive method calls along with executing spawn, kill, and migrate processes without sending data back to user application in each iteration.

Wednesday, March 8

Detecting and tracking vehicles from transportation surveillance videos is essential for applications ranging from traffic queue detection, volume calculation to incident and vehicle identification. However, challenges including object occlusion and shadowy condition often lead to poor detection and tracking performance. In this project, we propose a new approach to tackle these challenges. It is done in four stages: background subtraction, vehicle segmentation, shadow detection and removal, and vehicle tracking. Further, we apply the results of the above approach to perform vehicle classification and vehicle counting. Experimental results show that our method is simple, robust and effective to work on videos with occlusion issues and under various illumination/weather conditions.

The Tribal Education Network (TEN) goal is to provide a culturally-relevant content within STEM-related courses to help improve learning outcomes for Native American students. This work focuses on the concept of “cultural learning objects” which are described as any audio, video, and/or text-based content that has important metaphors, learning constructs, and objectives that may be linked to existing STEM course content. This provides an important context for learning. My capstone project involves researching, evaluating, and building search mechanisms that can be incorporated into the evolving TEN application architecture. Three search methods – keyword search, full text search, and semantic search – are leveraged to generate best matches between culture and STEM content.

Thursday, March 9

Using a 360-degree video to record complex environments has the advantage of collecting a much wider field-of-view compared to a standard video. For researchers who perform video-based analytic work, 360-degree videos can allow them to follow much more of what is going on around their subject. This can be quite useful in situations like recording a meeting in a “war room” where people are collaborating at various places in the room (e.g., seated, standing at whiteboards, standing around sticky notes on the wall). The problem is that the field-of-view is wider than the researcher can view in a naturalistic projection (e.g., with a head mounted display) and that limits the researcher to see only a portion of the 360-degree video sphere that he/she is facing. Because of the limited view, describing a complex 360-degree video scene in textual form can be ineffective and often difficult. This project, titled ‘SWIS: See What I Saw’, aims to simplify collaborative video-based analytic work by enabling the researchers to share what they saw in a 360-degree video with their colleagues, exactly how they saw it. SWIS allows the users to record their view, generate an intermediate viewpath file, and playback the recorded information from the viewpath file. In addition to being useful to video-based analysis researchers, we believe that SWIS can be extended to aid other types of users such as professional video game coaches, video surveillance analysts, or individuals who simply wish to share what they saw in a 360-degree video from a video hosting website.

The literature of visualization is predominated with quantitative discourse. This thesis inquires: how might visualizations be approached qualitatively? First, this thesis examines the “multiplicities of engagements” (Cope and Elwood, 2009) between visualizations and the literature of qualitative research. Then, it presents the findings from a series of qualitative interviews which explored how researchers use visualizations in the process of qualitative data analysis. It suggests that visualizations used within a stream of qualitative inquiry are strongly embedded in the context of analysis. Finally, this thesis presents the insights gained from a study on the use of visualizations within the space of Wide-Field-Ethnography (WFE). It provides considerations for Human-Centric Design of a visualization for navigation and describes a simple application of these considerations into a stream-oriented viewer of a large, multi-modal dataset.

Friday, March 10

Currently there are approximately 7,000 languages worldwide and one language disappears every two weeks. At this rate of extinction, only one half of these languages will continue to exist by the end of the 21st century. Language endangerment is a serious concern to which linguists have turned their attention in the last several decades. Documentation work can help to maintain, consolidate or revitalize endangered languages, and safeguard full range of uses of a given language. We aim to design, develop, and deploy web services on a cloud platform to process endangered language recordings to facilitate effective documentation and analysis. To achieve this goal, we deployed ELAN audio analysis component, time-aligned annotation, into a cloud-based web application. ELAN is an open source linguistics tool used for creation of complex annotations on audio resources. However, ELAN is a standalone application, and does not expose any libraries or services to be used in third-party applications. The deployed ELAN component can be used as a web service and integrated into any application that supports HTTP communication. We have also developed a web UI, in the Microsoft Azure cloud, for general users to access ELAN component via web browsers. With these ELAN components, users can play the audio file, view the waveform, select a specific time range of an audio file, add hierarchical tiers, and add annotations (comments/notations) based on specific requirements. In addition, all annotation related information can be saved in an EAF file on the system and viewed later for future access or sharing among users.