Congratulations to the Spring 2018 Masters!

Congratulations to the new Spring 2018 Masters from my research lab! This announcement is a few months late, nonetheless both of these individuals deserve to be congratulated on their achievements!

Alexander Rodriguez, Masters of Science, Data Science and Analytics

Alexander David Rodriguez completed the Masters of Data Science and Analytics. I first met Alexander when he took my graduate course in Intelligent Data Analytics as a foreign-exchange undergraduate student from Peru. He performed so incredible in my course that I immediately started talking with him about the possibility of continuing his graduate studies with me. After returning to Peru and working one year in industry, he came back and joined the DSA program at OU. He work for me as both a GTA and an GRA, funded in part by the NIST Center of Excellence on Risk-Based Community Resilience Planning. His MS thesis is entitled “Data-based Stochastic Network Mitigation”. The abstract follows.

Data-based Stochastic Network Mitigation abstract: Current decision-support frameworks to assist mitigation planning do not include uncertainty and complexity of network failures, either one or both. To close this research gap, this thesis walks through a demonstration of the importance of including uncertainty in the decision analysis to later propose a novel methodology that employs simulation data that encapsulates both uncertainty and complexity of failures modeled by domain experts. Thus, this work is divided in two parts. The first part of this work examines how component importance measures fail to give the necessary intuition for mitigation planning in the light of uncertainty. The analysis is assisted by a novel component importance measure called probabilistic delta centrality that demonstrates how previously neglected stochastic considerations change decisions suggested. In the second part, a new paradigm for stochastic network mitigation is proposed. The approach leverages realizations from scenario event simulations to develop a probabilistic framework that supports constrained decision making. This scenario event simulation framework is capable of comprising component fragilities, correlation among random variables, and other physical aspects that affect component failure probabilities. On the top of that, a statistical learning model is built to enable a rapid estimation of post-disruption impact, which permits a metaheuristic to intelligently explore feasible discrete enhancements from mitigation strategies. The search for near-optimal solutions can be restricted by limited resources and potential political, social, and safety limitations. Two examples are presented to exhibit how this method provides detailed information for mitigation. The level of complexity embedded in search along with its detailed solutions are pioneering in network mitigation planning.

From this work Alexander and I have published one conference paper, and have two journal papers in progress. Alexander will be pursuing his PhD in Computer Science at Virginia Tech starting Fall 2018.

Yanbin Chang, Masters of Science, Industrial and Systems Engineering

Yanbin completed his Masters of Industrial and Systems Engineering in the Spring as well. His MS thesis is entitled “Heuristic approach to network recovery”

Abstract: This study addresses optimization modeling for recovery of a transportation system after a major disaster. In particular, a novel metric based on the shape of the recovery curve is introduced as the objective to minimize. This metric is computed as the distance from the pre-disaster system performance at a time immediately before disruption to the two-dimensional location of the centroid point of the area beneath the recovery curve. The recovery trajectories derived from optimization models with this new metric are considered along with two other recovery goals from literature, i.e., minimizing the total recovery time and minimizing the skew of the recovery trajectory. A genetic algorithm is implemented to search for optimal restoration schedules under each objective and empirical analysis is used to evaluate the corresponding quality of the solutions. Additionally, a particle swarm optimization algorithm is employed as an alternative metaheuristic and the quality of the recovery schedules, as well as the observed computational efficiency is analyzed.

Yanbin is currently preparing this thesis work for submission as a journal article. He will begin his PhD in Industrial Engineering at Clemson this Fall as well.

BONUS Material

Brad Osborn, Bachelors of Science, Industrial and Systems Engineering

Brad Osborn, completed his undergraduate in Industrial and Systems Engineering. While Brad was not officially a part of the OU Analytics Lab, he worked with AT&T as an intern and used some of the skills he mastered in my class ISE 4113 Spreadsheet-based Decision Support Systems to wow his superiors. They offered him a full-time job and relocated him to Seattle, WA. However, his relocation is also bittersweet — Brad was a key player on my soccer team “Total Chaos” in the Norman area adult soccer league. Regardless, I wish you great success at AT&T!

Congratulations to Vigneshwaran Dharmarajan on his new job!

Congratulations Vignesh! Vigneshwaran just received a job offer as a data scientist and has relocated to Kansas. Vignesh recently completed his Master’s of Science in Data Science and Analytics in the Gallogly College of Engineering at the University of Oklahoma and he also worked for me as a TA in the Fall of 2017 for the ISE/DSA 5103 Intelligent Data Analytics course.

Vignesh came by to see me the other day to let me know about his recent success and then sent me a note describing a little bit about the interview process and how his education in the DSA program helped him land the job.

I would like to thank you so much for offering the Intelligent Data Analytics (IDA) course. The knowledge that I acquired through the course work helped me get job offer as Data Scientist. As part of the interview process, I got a case study where I was given a real world data set to analyze, predict and to present the results that decisions can be made from. From the course work, I have learnt techniques like exploratory data analysis, feature engineering, train model, validate the model & future prediction. With this I can be able to perform all the requirements given in the case study without any difficulty and used all types of visual aids learnt in the course work to present the data more palatable or comprehensible. This helped me moved to the next round and in the final round the discussion started with the learning & the projects did in the IDA course along with other experiences & projects. I know, the knowledge and techniques I learnt in the course is a significant factor for getting this job offer and will definitely help me to apply this in the industry to improve the business process. Thank you once again!

— Vigneshwaran Dharmarajan

I have heard similar stories lately regarding what companies seem to care about in hiring new data scientists. They regularly require candidates to address case studies and data sets as a first step in the interview process. I am glad that the DSA program helped prepare Vignesh for this.

Vignesh also supplied me with the job description (excerpted below with highlights added).

Job Description

Provide accurate and timely data support, analysis and maintenance using statistical math and algorithms on a variety of reports, charts, models and projects that have a direct impact on all aspects of the organization.

Build Efficiency

Work closely with the Lead Data Scientist regarding efficiency, analysis, quality goals, and market and product trends

Contribute using SQL and R language to the design, development, and maintenance of ongoing metrics, reports, analyses, dashboards, etc. to drive key business decisions.

Support cross-functional teams on the day-to-day delivery of projects and initiatives

As a quick note: in the ISE/DSA 5103 course you will learn and use R, learn how to clean and deal with messy data, work very hard on turning data into insight, use visualizations to explore and explain data, and learn an array of techniques for predictive modeling, among other things.

I will post more student’s stories on data science interviews and jobs in the future. And if you are a former student who wants to be highlighted in the blog, just send me your information and picture and I am happy to boast about your success!

Community Recovery Webinar Announcement

Thursday, May 3, 10:00 AM – 12:00 PM (CDT)

In this Community Recovery Webinar learn more about the NIST-funded Center for Risk-Based Community Resilience Planning and how the Center’s research is progressing to models of community recovery and to field studies.

A resilient community is prepared for and can adapt to changing conditions and can with stand and recover rapidly from disruptions to its physical, economic, and social infrastructure. Modeling community resilience for purposes of risk-­‐informed decision requires a collaborative effort by experts in engineering, economics, social sciences, and information sciences to explain how community systems interact and affect recovery efforts. Over the last three years, Center researchers have been working on fundamental research on hazard characterization, models of physical, social and economic systems, damage and losses following hazard events, recovery of community systems, and optimization of alternative options to improve resilience – all at the community-­‐scale.

Join this community recovery webinar to learn more about the Center’s recent activities. A brief introduction of the Center and a recap of the past webinars and the Center’s research accomplishments will be followed by two presentations. The first summarizes the interdependent physical, social, and economics models of communities being developed. These include physics-­‐based models of networks and buildings, combined with computable general equilibrium models and population dislocation models, which are presented within the context of several testbed communities. The second presentation explains the approaches being used within the Center to model housing recovery in a testbed community, followed by integrated data sampling techniques developed for the Center’s longitudinal resilience field study of Lumberton, NC. The latter highlights the integration of social science and engineering data collection techniques to better inform community resilience models.

Additionally, the webinar will have an open Q&A “chat” session at the end.

Undergrad Summer Internship at Noble Research Institute

Job: Computing Services Data Intern

Overview

The Noble Research Institute LLC is accepting applications for an intern in the Computing Services Solutions team for the summer of 2018. The position is located in Ardmore, OK and is geared toward undergraduates in Analytics, Computer Science, or other related STEM work.

This internship will also provide experience in the daily operations of the solutions team, including assisting with organizing and addressing data formatting and quality issues with spreadsheets; assisting with profiling data sets using SQL queries and other data profiling tools; assisting with data points to be cleaned in source systems; assisting with researching, comparing, and experimenting with data technology products; assisting with written summaries for review; assisting with data modeling and data extraction, transformation, and loading pipelines and help in developing reports and visualizations.

Duration: Approximately 12 weeks during the summer of 2018.

Hourly Rate: This Intern position will earn $12-14 per hour (subject to federal and state income tax withholding). The Intern will work 40 hours per week for up to 12 weeks.

Qualifications:

The successful candidate must:

Be enrolled in an undergraduate degree program in a college or university within the United States, with such program resulting in the award of a baccalaureate.

Have completed his/her sophomore year at the time the internship begins (at least 60 credit hours) with a declared major in Computer Science, MIS, Data Analytics, or related STEM field of study with appropriate fundamental coursework completed.

Be legally authorized to work in the United States (for any employer) and WILL NOT require employment visa sponsorship for this internship; and

Be capable of working 40 hours per week for 12 weeks. The intern will work in Ardmore, Oklahoma during the program.

Open positions

PhD-levelteaching/research assistantships

The School of Industrial and Systems Engineering at the University of Oklahoma has multiple PhD-level teaching and research assistantships available for the Fall 2018 semester.

In the School of ISE, we are focusing on applying methods from analytics and systems engineering to problems in (i) Cyber-Physical-Social Systems and (i) Health and Medical Systems, and opportunities exist in both of these application domains.

From the Interim Director, Dr. Shiva Raman: “Industrial and Systems Engineering at OU is a dynamic program that maintains great balance between research and teaching. Our faculty have been consistently recognized as outstanding teachers and have received many awards for excellence in teaching. Our faculty have received several grants from external agencies including the National Science Foundation, the Federal Aviation Administration, the Department of Defense, the Department of Transportation, NIST, and NASA. Faculty publications appear in leading journals in the areas of Operations Research, Risk and Reliability, Human Factors, and Manufacturing. The School of ISE provides our students with cutting edge laboratories and other resources to lead them to successful professional careers. Our most recent Ph.D. graduates have found employment at prestigious academic programs as Iowa State University, Texas A&M University and Vanderbilt University. We are very proud of the accomplishments of our Industrial and Systems Engineering family.”

I got an email from a current student in the analytics courses. He is doing an internship this summer with Pioneer Natural Resources and told me that they are looking for additional intern this summer. If you are interested, please see the following listing for a Data Scientist / Machine Learning Engineering Intern. Email Dr. Nicholson if you would like to apply.

“Pioneer Natural Resources is a large, Texas-based independent exploration and production company that is focused on helping to meet the world’s energy needs. We deliver industry-leading production and reserve growth through onshore, unconventional, oil and gas resource development in the United States, while providing opportunities for growth and enrichment for our business partners, employees and the communities in which we operate.”

Pioneer Natural Resources

Data Scientist / Machine Learning Engineering Intern

We are currently seeking Graduate/Undergraduate students (advanced degrees preferred – M.S., Ph.D.) for Pioneer’s Analytics intern program with a strong fundamental understanding of various modern Artificial Intelligence (AI) and machine-learning (ML) methods, and with good experience in a few of the following areas: deep neural networks / LSTM, tensor factorization, reinforcement learning, Markov Random Fields, Bayesian networks, signal processing, distributed computing, operations research and large scale optimization. The candidate(s) may work on one or more of the following:

Research and develop data analytics (including streaming) and / or machine learning systems for Upstream Exploration & Production (E&P) applications

Weili Zhang was the first analytics lab @ OU student to join the team, the first MS Data Science and Analytics graduate from OU, and will be Dr. Nicholson’s first student to complete his PhD in Industrial & Systems Engineering. He accepted a machine learning job at eBay last year in San Jose, CA, but is back this week to defend his PhD research on Friday, December 8, and then at 4:30p, to give a seminar presentation, open to the public, on machine learning at eBay. I expect this to be a pretty casual meeting and expect that Weili will be open to lots of Q&A and discussion.

It is my great pleasure to invite you to attend the seminar if you can: Friday, December 8, 2017 @ 4:30p in the Carson Engineering Center, Room 117 (map below). Also if you would like to join remotely, you can connect via Zoom: https://onenet.zoom.us/j/414767800

This week I am very happy to congratulate all of the students completing their Master’s of Science and PhD degrees.

Several of these students are my advisees and I am quite proud of their accomplishments. As of today, all of my MSc students have defended their work. And on Friday, my first PhD student will defend his research. I’ll post the results of that as soon as I have it!

For now, lets focus on the Analytics Lab 2017 new masters!

New Masters and the MSc Research Path

The Master’s thesis student has three major components of their academic path: (1) successful completion of rigorous graduate course work; and (2) an in-depth research effort, spanning one to two years, on an area of specialization that results in the Master’s thesis (usually a 50 to 100 page manuscript detailing the background of the problem, the complexities of work, and their results), and (3) the Master’s defense.

The defense is a presentation to a committee of faculty members, and any others present, the summary of their entire research efforts. During the defense, the committee members ask questions relating to any detail of the work. Questions are aimed at determining whether or not the student truly understands the concepts, methods, and results. These are often open-ended and require critical, yet on-the-spot, reflection about his or her work.

Most defenses last 30 minutes to 1 hour, but some may exceed 1.5 hours, depending on the questions and student responses. While the process is not ‘grueling’ per se, it is significant.

Successful defenders…

This semester, I am privileged to participate on 8 MS thesis committees and 2 PhD committees of students completing in December. Most of the defenses are occurring this week. So it is a busy week!

However, I am particularly happy about the successful results of 4 of the MS students, since I am their advisor. Congratulations to Yunjie “Nicole” Wen, Gowtham Talluru, Samineh Nayeri, and Pauline Ribeyre!

Yunjie “Nicole” Wen, Masters of Science in Data Science and Analytics

Thesis: Game theory application of resilience community road-bridge transportation system

Abstract: “This paper considers the problem of game theory application in resilience-based road-bridge transportation network. Bridges in a community may be owned and maintained by separated entities. These owners may have different and even competing objectives for the recovering the transportation system after disaster. In this work, we assume that each player attempts to maximize the efficiency of repair to the system from the perspective of their own damaged damaged bridges after a hazard. The problem is modeled as an N-player nonzero-sum game. Strategic form and sequential form game are designed to demonstrate methodology. A genetic algorithm is applied to the computation of the problem. The transportation network from Shelby County, TN is used to demonstrate the proposed methodology.”

Nicole will be continuing her academic career by pursing a PhD in Industrial and Systems Engineering at the University of Oklahoma.

Gotham Talluru, Masters of Science in Data Science and Analytics

Thesis: Dynamic Uplift Modelling

Abstract: “A new approach to Uplift modelling which considers time dependent behavior of the customers is analyzed. Uplift modelling (also known as true lift or incremental modeling) has applications in marketing, insurance, banking, personalized medicine, among other fields. The objective of an Uplift model is to identify individual entities who should be targeted for treatment (e.g., a marketing campaign) to maximize the incremental impact overall.

Research to-date has considered this as a static problem modelled at a single instance of time. The method introduced in this work considers modelling uplift in a dynamic environment. In particular, I consider a series of direct marketing contacts and simulate periodic purchasing behavior of customers. In contrast to static uplift models, the uplift in the purchase probability of the customers is dependent on time as well as customers previous purchases and offers received. Appropriate modifications are made to static model approaches to adapt them to a dynamic model approach.

This study demonstrates significant potential for both researches and retail companies for thinking about the problem of uplift longitudinally.”

Gowtham has accepted a prestigious job in data science with PricewaterhouseCoopers (PwC) in the Oil and Gas sector of their business.

Abstract: “A wide range of network flow problems primarily used in transportation is categorized as time-space fixed charge network flow problems. In this family of networks, each node is associated with a specific time and is replicated across all time-periods. The cost structure in these problems consists of variable and fixed costs where continuous and binary variables are required to formulate the problem as a mixed integer linear programming. and the problem is known to be NP-hard. When the time dimension is added to the problem, solution approaches are even more time-consuming and CPU and memory intensive.

In this work, a decomposition heuristic is proposed that subdivides the problem into various time epochs to create smaller and more manageable subproblems. These subproblems are solved sequentially to find an overall solution for the original problem. To evaluate the capability and efficiency of the decomposition method vs. exact method, a total of 1600 problems are generated and solved using Gurobi MIP solver, which runs parallel branch & bound algorithm. Statistical analysis indicates that depending on the problem specification, the average solution time in the decomposition is improved by more than four orders of magnitude and the solutions found are high quality (<2.5% from optimal, on average).”

Abstract: “Multidrug resistance is the simultaneous resistance to two or more chemically unrelated therapeutics, including some therapeutics the cell has never been exposed to. It is one of the biggest obstacles to effective cancer chemotherapy treatments. Multidrug resistance can be caused by drug efflux, an otherwise useful body mechanism that prevents a too-high drug concentration in cells, by using proteins called transporters. Some chemical compounds have the ability to sensitize the cells to the drugs by disabling these transporters. The focus of this work is to find key characteristics of compounds that may disable a specific transporter, the P-glycoprotein. Three datasets listing compounds, their values for different features, and their ability to disable the transporters are provided by experts. Using the programming language R, various data analytics methods are applied to these datasets with the objective of predicting whether compounds are P-glycoprotein inhibitors or not. The main issue encountered is the fact that the most important dataset did not contain enough samples for the number of predictor variables. Ultimately, the decision tree and random forest models prove to be the most effective in predicting the compounds’ ability to disable the transporter.”

Probabilistic Prediction of Post-disaster Functionality Loss of Community Building Portfolios Considering Utility Disruptions

I am proud to announce that the latest collaborative work from the CORE lab has been accepted for publication in the ASCE’s Journal of Structural Engineering. The new paper title is a mouthful, “Probabilistic Prediction of Post-disaster Functionality Loss of Community Building Portfolios Considering Utility Disruptions”, but the researchers (Weili Zhang, Peihui Lin, Naiyu Wang, Charles Nicholson, and Xianwu Xue) have been just calling the effort the “PPPD” project.

The study proposes a framework for the probabilistic prediction of building portfolio functionality loss in a community following an earthquake hazard. Building functionality is jointly affected by both the structural integrity of the building itself and the availability of critical utilities.

To this end, the framework incorporates three analyses for a given earthquake scenario:

evaluation of the spatial distribution of physical damages to both buildings and utility infrastructure

computation of utility disruptions deriving from the cascading failures occurring in the interdependent utility networks; the cascading failures are simulated by use of new mixed-integer, multicommodity network flow optimization model

by integrating (1) and (2), a probabilistic prediction of the post-event functionality loss of building portfolios at the community scale.

Overview of the PPPD Framework

The framework couples functionality analyses of physical systems of distinct topologies and hazard response characteristics in a consistent spatial scale, providing a rich array of information for community hazard mitigation and resilience planning.

Case Study

An implementation of the framework is illustrated using the residential building portfolio in Shelby County, TN, subjected to an earthquake hazard. A single realization of an earthquake scenario in Shelby Country, TN is depicted below.

Single disruptive event simulation realization

Since the building damage, the flow model, the data collection/aggregation can all be complted efficiently, it is easy to extend the single simulation realization to many realizations. This allows for a spatial probabilistic analysis of the vulnerabilities in the affected area. The figure below depicts the expected impact to the region based on 1,000 simulations of the scenario earthquake.

Expected impact based on multiple earthquake simulation realizations

The intricacies that relate how the electric power network (EPN) support the potable water network (PWN), along with the particular individual component vulnerabilities of the EPN and PWN, produce probabilistic failure patterns in building functionality (see sub-figure d. above), that are not obvious!

RUO and RFL

Additionally, the framework allows us to compare a more traditional building portfolio analysis to with that of the practical implications of disruptive events. That is, even if your place of employment is not damaged, if the building does not have power or water, then it will be closed for business anyway!

The green line in the figure to the right denotes the probability of exceedance for the ratio of buildings which cannot be occupied (RUO) due to physical damage. The dotted line relates to the ratio of functional loss of buildings (RFL) which is due to any combination of direct damage and utility loss. Clearly, the RUO is a conservative estimate compared to RFL. For example, there is only a 40% chance that 40+% of the buildings will be directly damaged to the extent of restricted occupancy. However, that number jumps to 80% when the utilities are considered!

Teamwork

This work represents a wonderful collaborative effort within the CORE lab. Weili Zhang developed the interdependency model and worked closely with Peihui Lin, who provided the building analyses. And both worked closely with Xianwu Xue, the GIS expert. And of course, I am always pleased to work with my colleague Naiyu Wang in Civil Engineering. We have much, much collaborative work already in-progress and planned for the future!

I was happy to represent the Analytics Lab recently as a part of a larger team from OU who were invited down to Dallas, TX near Love Field to meet with Southwest Airlines (SWA) to learn more about the airline business and operations. The attendees from OU included the Vice President of Research; directors from the School of Computer Science in the Gallogly College of Engineering and Management Information Systems in the Price College of Business; senior researchers and specialists from political science, psychology, computer science, and of course, data science.

We were privileged to take a tour of the famous Southwest Airlines Network Operations Control, a.k.a., the NOC. This facility and the employees who work here are at the very core of the SWA network operations. From dispatchers to air traffic control specialists to flight operations to maintenance to crew schedulers to weather analysts — this is where the major operational decisions are made.

The unique look of the NOC, bathed in blue as it is, was designed scientifically to help with mood and to reduce eye strain. And, well, it simply looks cool.

While we were at the NOC, it so happens that Southwest Airlines was actively engaged in planning for the expected impacts from the impending Hurricane Harvey. Obviously, weather, and especially major weather events like hurricanes, play a huge role in flight delays and cancellations for all airlines. Such disruptive events can have impacts across across an entire transportation network. Analyzing and optimizing under this larger “system-wide” view is what ISE’s are famous for. These are hard problems, but they are worth solving!

The analytics lab at the University of Oklahoma is actively pursuing research in various aspects of data science and analytics, with particular interest in enhancing "community resilience" to natural hazards and disruptive events through analysis of complex interdependent networks, predictive modeling, and optimal allocation of resources for both mitigation and recovery.

Dr. Charles Nicholson, an Assistant Professor in the School of Industrial and Systems Engineering, is the Analytics Lab director. The lab team members and active collaborators include Masters and PhD students in Data Science and Analytics, Industrial and Systems Engineering, and Civil Engineering.