Abstracts of Talks and Panel Discussions

Multimodal magnetic resonance imaging can generate huge amounts of data. Not all of it is
informative of brain anatomy or function, but in conventional hypothesis-driven studies data reduction may
result in critical loss of information. This is an issue in the study of autism spectrum disorders (ASD) because
there is little consensus on what brain features or functional systems may constitute the ‘core’ of the disorder.
I will present examples of highly multivariate MRI datasets in ASD and show how narrowly focused hypothesis-driven
approaches can miss the ‘big picture’. I will then turn to some exploratory data-driven studies, which aim to
uncover imaging biomarkers of ASD. Trade-offs between sample size, data quality, and coverage of anatomical and
functional brain features through multimodal imaging remain a challenge. Nonetheless, data mining in neuroimaging
provides a promising approach to identifying currently unknown ASD subtypes, which may be linked to specific genetic
(or epigenetic) risk factors and may respond to specifically tailored treatments. Challenges in ASD research
discussed here are exemplary of those encountered in the study of other disorders (e.g., fetal alcohol syndrome,
dyslexia, Alzheimer’s disease) by other members of the Center for Clinical and Cognitive Neuroscience.

It’s well known that the substantial increase in data volume, which may be produced by high‐resolution Earth modeling
systems or derived from satellite observations, poses a great challenge to stage, handle, and manage these data and
extract scientific insights from the data. We believe that efficiently handling and analyzing these massive data sets,
from terabytes for short‐term runs to petabytes for long‐term runs, require innovative thought processes and approaches.
To achieve the goals, I will introduce (1) scalable Concurrent Visualization (CV) technology and (2) multi‐level Parallel
Ensemble Empirical Model Decomposition (PEEMD) method, which have been developed at NASA. In CV, a simulation code is
instrumented such that its data can be extracted for analysis while the simulation is running without having to write the
data to disk. By avoiding file system I/O, CV provides much higher temporal resolution than is possible with traditional
post‐processing. The original Ensemble Empirical Model (EMD, Huang et al., 1998) and ensemble EMD (Wu et al., 2009) were
developed for multiscale analysis as the involved processes are non‐stationary and nonlinear. To efficiently analyze high
resolution, global, multiple‐dimensional data sets, we implement multi‐level parallelism into the ensemble EMD and obtain
a parallel speedup of 720 using 200 eight‐core processors.

Co-authors: Suzanne Hughes, Vincent Berardi, John Bellettiere, Neil Klepeis, Saori Obayashi, Jennifer Jones, Sandy Liles and Marie Boman-Davis
Principles of Behavior have been scientifically vetted for more than 100 years. Pavlov showed that reflex behavior is elicited
by novel stimuli; Skinner showed that “voluntary” or operant behavior could be selected as a consequence of consequences.
Yet, basic science has not yet offered a reliable means of engineering human behavior. This is likely due to the lack of
necessary tools and incompletely-specified theory. Real time and continuous measures and interventions guided by our Behavioral
Ecological Model (BEM) may set the stage for engineering and sustaining human behavior. Our investigators are conducting
research to alter smoking; separately accelerometers are being used to shape increased step counts per day to enhance fitness
in high risk populations, with repeated measures that range from 1-2 million per family over about 4 months, billions for
family samples as large as 300. These studies serve as models for future multi-disciplinary research that may culminate in
technology that can shape and sustain healthy behavior in individuals and populations. This lightening presentation will
introduce the BEM, show how its use generates big data and demonstrate preliminary results of real-time interventions.

Discovering new insights about spatial, temporal, and behavioral patterns from large moving object data has been a
major challenge in the data science community particularly since the rapid advancement and deployment of
location aware technologies and services. The challenge stems from high volume, high velocity, and high variety
of data, which makes difficult to efficiently and effectively process and analyze moving objects’ behavior.
In this talk, I will present a suite of analytic methods to reveal human behavioral patterns in space and time
using GPS tracking data. Our data analytics approach involves classifications of movement patterns based on both
geometric and semantic properties of movement data and investigations of associations among movement patterns,
interaction behaviors, geographic contexts, and individual characteristics.

Public interactions that a generation ago were ephemeral are now mediated by the internet and leave a permanent
and accessible record. Combined with large-scale natural language processing and text analysis, this has created
unprecedented opportunities for scientists studying human linguistic behavior. This talk will present an instructive
example, using conversations gathered from an online forum frequented by people who are planning to relocate. Their
discussions typically center around the pros and cons of neighborhoods in various US cities. Using topic modeling,
we mined this corpus to construct conceptual city maps in which distances are derived from neither physical proximity
nor demographic similarity, but rather from the subjective role that neighborhoods play in the popular imagination as
reflected in the text. This map of a city's cultural landscape can be useful to both residents and to researchers.
For example, areas which are likely targets of gentrification may look different in the cultural space than other
areas which objectively have very similar attributes. Or, by overlaying cultural maps of different cities, we can match
up equivalent regions, allowing someone to find a neighborhood in an unfamiliar city which plays a similar role to a
neighborhood in a more familiar one.

Dr. Sam Shen (Math and Statistics, Co-Director of Center for Climate and Sustainability Studies)
Title: Big climate data at SDSU and the world.

SDSU Climate Informatics Lab (CIL) was established in 2006 and has been training students with climate data analysis
skills through Masters thesis research and independent studies. The lab emphasizes error estimation, optimization,
and big data. The lab’s students have been hired by various kinds of consulting firms, SPAWAR, financial companies,
IT industry, and others. Some students choose to teach or to continue studying for PhD at SDSU or other top institutions.
SDSU-CIL has many national and international collaborations on big data, such as with NASA Jet Propulsion Lab on
deep ocean data, NOAA National Climatic Data Center on data errors, and the Third Pole Environment program on
Tibetan plateau data. Rapid increase of the modeled and observed climate data requires efficient big data technologies
to facilitate various applications. The Coupled Model Intercomparison Project for Inter-governmental Panel for Climate
Change (IPCC) is an example of exponential data increase: 1 GB in 1995, 500 GB in 2001, 35 TB in 2007, and 3.5 PB in 2013.
SDSU-CIL is developing cutting-edge tools to meet the challenges of big data analysis and visualization. The CIL’s
latest release is the 2014 SOGP 1.0 software package: a Weather History Time Machine.

There is untapped potential in geographic visualization expanding its reach and impact beyond traditional geocentered
applications. In this presentation, an expanded vision of visualization is presented that is foremost informed by
recognition of "space" as a useful construct for supporting all kinds of pattern discovery and decisionmaking.
This applies in particular to the artifacts that are produced and consumed by domain actors in the course of various
knowledge‐based activities. When such geographic concepts as distance, scale, or region are combined with high‐dimensional
approaches, novel techniques and applications can emerge that are applicable to vast collections of structured and
unstructured data. This will be demonstrated with examples ranging from thousands of medical records to millions of
research publications and social media artifacts. The lightning talk will also touch upon the question of how creative
research involving big data and compute‐intensive methods can be translated into meaningful and sustainable innovation
via strategic partnerships and product development.

Co-authors: Melbourne Hovell, Suzanne Hughes, John Bellettiere, Neil Klepeis, Saori Obayashi, Jennifer Jones, Sandy Liles and Marie Boman-Davis
Project Fresh Air is a clinical trial aiming to reduce secondhand smoke (SHS) exposure via feedback from air particle monitors that measure air quality every ten seconds over the course of several months. Each of the approximately 300 homes is fit with two such monitors which results in roughly 1-2 million data points per home throughout the course of the study. The data are transmitted in near real-time to web servers where they are converted to time-series graphs, allowing study personnel to expeditiously appraise participant progress and to identify and react to equipment failures. Once a home has completed the intervention, substantial processing is required to convert the data to a format that is appropriate for analysis. Many options exist when interpreting the data, including whether to assess the study as a whole or focus on single-home units. In either case, effects can be gauged graphically or through statistical procedures. The data time-scale can range from the order of seconds to weeks, which significantly affects analysis. Additionally, missing and anomalous data must also be taken into account. This presentation will provide an overview of the management and interpretation complications that occur when big data moves from the abstract to tangible. Ultimately, the processing of such data should “automatically” be converted in real-time to analytical products in that serve practical and scientific outcomes.

Co-Authors: O'Connell, C., et al.
The purpose of this research was to examine the web of global security concerns from a network perspective and map salient relationships among constituent elements. An action research process was utilized to identify relationships between the seemingly disparate elements of crime, corruption, proliferation, sustainability, and health and disaster management currently plaguing the Middle East and North Africa, and illustrate how these elements connect, overlap, and potentially fuel each other. By visualizing the connections between actors within the arena of global security using social network analysis software, a holistic understanding of influence within criminal networks was achieved. Visually coding connections between elements helped researchers determine the strength and validity of current research in multiple domains, exposed non-obvious relationships between elements, and revealed knowledge gaps in regards to geographic and attribute spaces.