we directly compare the incident rates of women’s and men’s cancers in the United States to the corresponding levels of traffic that these cancers elicited during World Cancer Day across two social media platforms, Twitter and Instagram.

we examine social media activity for breast cancer versus prostate cancer on both Twitter and Instagram during the dedicated month-long campaigns (October and November, respectively).

we compare the top terms associated with each campaign on these two social media platforms to discover whether there are differences in the terms associated with these online discussions.

Below you can read the abstract to our paper, see some of our results and at the bottom of the post have the full citation and link to the paper.

Abstract:

Social media are often heralded as offering cancer campaigns new opportunities to reach the public. However, these campaigns may not be equally successful, depending on the nature of the campaign itself, the type of cancer being addressed, and the social media platform being examined. This study is the first to compare social media activity on Twitter and Instagram across three time periods: #WorldCancerDay in February, the annual month-long campaigns of National Breast Cancer Awareness Month (NBCAM) in October and Movember in November, and during the full year outside of these campaigns. Our results suggest that women’s reproductive cancers – especially breast cancer – tend to outperform men’s reproductive cancer – especially prostate cancer – across campaigns and social media platforms. Twitter overall generates substantially more activity than Instagram for both cancer campaigns, suggesting Instagram may be an untapped resource. However, the messaging for both campaigns tends to focus on awareness and support rather than on concrete actions and behaviors. We suggest health communication efforts need to focus on effective messaging and building engaged communities for cancer communication across social media platforms.

A comparison of percentages of cancer cases (green bars) and references to corresponding cancers in Twitter (blue bar) and Instagram (orange bar) during World Cancer Day 2016.

References to breast cancer (green line), prostate cancer (orange line), and Movember (blue line) over the full year 2015 in Instagram.

Continuing our work on geosocial analysis we recently had a paper entitled “Social Media Engagement with Cancer Awareness Campaigns Declined During the 2016 U.S. Presidential Election” published in World Medical and Health Policy. In the paper we…

Continuing our work on geosocial analysis we recently had a paper entitled “Social Media Engagement with Cancer Awareness Campaigns Declined During the 2016 U.S. Presidential Election” published in World Medical and Health Policy. In the paper we…

Below you can read the abstract of the chapter, see some of the figures we used to support our discussion, along with the full reference and a pdf proof of the chapter. As always any thoughts or comments are welcome.

Abstract:

Big Data (BD) offers researchers the scope to simulate population behavior through vastly more powerful Agent Based Models (ABMs), presenting exciting opportunities in the design and appraisal of policies and plans. Agent-based simulations capture system richness by representing micro-level agent choices and their dynamic interactions. They aid analysis of the processes which drive emergent population level phenomena, their change in the future, and their response to interventions. The potential of ABMs has led to a major increase in applications, yet models are limited in that the individual-level data required for robust, reliable calibration are often only available in aggregate form. New (‘big’) sources of data offer a wealth of information about the behavior (e.g. movements, actions, decisions) of individuals. By building ABMs with BD, it is possible to simulate society across many application areas, providing insight into the behavior, interactions, and wider social processes that drive urban systems. This chapter will discuss, in context of urban simulation, how BD can unlock the potential of ABMs, and how ABMs can leverage real value from BD. In particular, we will focus on how BD can improve an agent’s abstract behavioral representation and suggest how combining these approaches can both reveal new insights into urban simulation, and also address some of the most pressing issues in agent-based modeling; particularly those of calibration and validation.

Keywords: Agent-based models, Big Data, Emergence, Cities.

The growth in Agent-based modeling -from search results of Web of Science and Google Scholar.

Hotspots of activity of Tweeter Users: Tweet locations and associated densities for a selection of prolific users.

Below you can read the abstract of the chapter, see some of the figures we used to support our discussion, along with the full reference and a pdf proof of the chapter. As always any thoughts or comments are welcome.

Abstract:

Big Data (BD) offers researchers the scope to simulate population behavior through vastly more powerful Agent Based Models (ABMs), presenting exciting opportunities in the design and appraisal of policies and plans. Agent-based simulations capture system richness by representing micro-level agent choices and their dynamic interactions. They aid analysis of the processes which drive emergent population level phenomena, their change in the future, and their response to interventions. The potential of ABMs has led to a major increase in applications, yet models are limited in that the individual-level data required for robust, reliable calibration are often only available in aggregate form. New (‘big’) sources of data offer a wealth of information about the behavior (e.g. movements, actions, decisions) of individuals. By building ABMs with BD, it is possible to simulate society across many application areas, providing insight into the behavior, interactions, and wider social processes that drive urban systems. This chapter will discuss, in context of urban simulation, how BD can unlock the potential of ABMs, and how ABMs can leverage real value from BD. In particular, we will focus on how BD can improve an agent’s abstract behavioral representation and suggest how combining these approaches can both reveal new insights into urban simulation, and also address some of the most pressing issues in agent-based modeling; particularly those of calibration and validation.

Keywords: Agent-based models, Big Data, Emergence, Cities.

The growth in Agent-based modeling -from search results of Web of Science and Google Scholar.

Hotspots of activity of Tweeter Users: Tweet locations and associated densities for a selection of prolific users.

Building on our work on narratives and social media at the 15th International Symposium on Spatial and Temporal Databases (SSTD’17) we have a paper entitled: “Predicting the Evolution of Narratives in Social Media.” In the paper we discuss briefly the …

Building on our work on narratives and social media at the 15th International Symposium on Spatial and Temporal Databases (SSTD’17) we have a paper entitled: “Predicting the Evolution of Narratives in Social Media.” In the paper we discuss briefly the …

In the paper we explored how health narratives and event storylines pertaining to the recent Zika outbreak emerged in social media and how it related to news stories and actual events.

Specifically we combined actors (e.g. twitter uses), locations (e.g. where the tweets originated) and concepts (e.g. emerging narratives such as pregnancy) to gain insights on the mechanisms that drive participation, contributions, and interactions on social media during a disease outbreak. Below you can read a summary of our paper along with some of the figures which highlight our methodology and findings.

An overview of the Twitter narrative analysis approach, starting with data collection, and proceeding with preprocessing and data analysis to identify narrative events, which can be used to build an event storyline.

Abstract:

Background: The recent Zika outbreak witnessed the disease evolving from a regional health concern to a global epidemic. During this process, different communities across the globe became involved in Twitter, discussing the disease and key issues associated with it. This paper presents a study of this discussion in Twitter, at the nexus of location, actors, and concepts.

Objective: Our objective in this study was to demonstrate the significance of 3 types of events: location related, actor related, and concept- related for understanding how a public health emergency of international concern plays out in social media, and Twitter in particular. Accordingly, the study contributes to research efforts toward gaining insights on the mechanisms that drive participation, contributions, and interaction in this social media platform during a disease outbreak.

Methods: We collected 6,249,626 tweets referring to the Zika outbreak over a period of 12 weeks early in the outbreak (December 2015 through March 2016). We analyzed this data corpus in terms of its geographical footprint, the actors participating in the discourse, and emerging concepts associated with the issue. Data were visualized and evaluated with spatiotemporal and network analysis tools to capture the evolution of interest on the topic and to reveal connections between locations, actors, and concepts in the form of interaction networks.

Results: The spatiotemporal analysis of Twitter contributions reflects the spread of interest in Zika from its original hotspot in South America to North America and then across the globe. The Centers for Disease Control and World Health Organization had a prominent presence in social media discussions. Tweets about pregnancy and abortion increased as more information about this emerging infectious disease was presented to the public and public figures became involved in this.

Conclusions: The results of this study show the utility of analyzing temporal variations in the analytic triad of locations, actors, and concepts. This contributes to advancing our understanding of social media discourse during a public health emergency of international concern.

Spatiotemporal participation patterns and identifiable clusters over 4 of our twelve week study. The top left panel shows the data during the first week, and time progresses from left to right and from top to bottom towards .

Subsets of the full retweet network pertaining to the WHO (left) and CDC (right), and clusters identified within them. Magenta clusters are centered upon health entities, green upon news organizations, orange upon political entities.

In the paper we explored how health narratives and event storylines pertaining to the recent Zika outbreak emerged in social media and how it related to news stories and actual events.

Specifically we combined actors (e.g. twitter uses), locations (e.g. where the tweets originated) and concepts (e.g. emerging narratives such as pregnancy) to gain insights on the mechanisms that drive participation, contributions, and interactions on social media during a disease outbreak. Below you can read a summary of our paper along with some of the figures which highlight our methodology and findings.

An overview of the Twitter narrative analysis approach, starting with data collection, and proceeding with preprocessing and data analysis to identify narrative events, which can be used to build an event storyline.

Abstract:

Background: The recent Zika outbreak witnessed the disease evolving from a regional health concern to a global epidemic. During this process, different communities across the globe became involved in Twitter, discussing the disease and key issues associated with it. This paper presents a study of this discussion in Twitter, at the nexus of location, actors, and concepts.

Objective: Our objective in this study was to demonstrate the significance of 3 types of events: location related, actor related, and concept- related for understanding how a public health emergency of international concern plays out in social media, and Twitter in particular. Accordingly, the study contributes to research efforts toward gaining insights on the mechanisms that drive participation, contributions, and interaction in this social media platform during a disease outbreak.

Methods: We collected 6,249,626 tweets referring to the Zika outbreak over a period of 12 weeks early in the outbreak (December 2015 through March 2016). We analyzed this data corpus in terms of its geographical footprint, the actors participating in the discourse, and emerging concepts associated with the issue. Data were visualized and evaluated with spatiotemporal and network analysis tools to capture the evolution of interest on the topic and to reveal connections between locations, actors, and concepts in the form of interaction networks.

Results: The spatiotemporal analysis of Twitter contributions reflects the spread of interest in Zika from its original hotspot in South America to North America and then across the globe. The Centers for Disease Control and World Health Organization had a prominent presence in social media discussions. Tweets about pregnancy and abortion increased as more information about this emerging infectious disease was presented to the public and public figures became involved in this.

Conclusions: The results of this study show the utility of analyzing temporal variations in the analytic triad of locations, actors, and concepts. This contributes to advancing our understanding of social media discourse during a public health emergency of international concern.

Spatiotemporal participation patterns and identifiable clusters over 4 of our twelve week study. The top left panel shows the data during the first week, and time progresses from left to right and from top to bottom towards .

Subsets of the full retweet network pertaining to the WHO (left) and CDC (right), and clusters identified within them. Magenta clusters are centered upon health entities, green upon news organizations, orange upon political entities.

Figure 1: Map Mashup of Twitter data, where eachdotrepresents a tweet, the text corresponds to the selected tweet marked with a starIn the recently released “The International Encyclopedia of Geography: People, the Earth, Environment, and Technolo…

Figure 1: Map Mashup of Twitter data, where eachdotrepresents a tweet, the text corresponds to the selected tweet marked with a starIn the recently released “The International Encyclopedia of Geography: People, the Earth, Environment, and Technolo…

“This cutting edge special issue responds to the latest digital revolution, setting out the state of the art of the new technologies around so-called Big Data, critically examining the hyperbole surrounding smartness and other claims, and relating it to age-old urban challenges. Big data is everywhere, largely generated by automated systems operating in real time that potentially tell us how cities are performing and changing. A product of the smart city, it is providing us with novel data sets that suggest ways in which we might plan better, and design more sustainable environments. The articles in this issue tell us how scientists and planners are using big data to better understand everything from new forms of mobility in transport systems to new uses of social media. Together, they reveal how visualization is fast becoming an integral part of developing a thorough understanding of our cities.”

In our paper we discuss and show how crowdsourced data is leading to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics. Specifically how such data can provide us information pertaining to linked spaces and geosocial neighborhoods. We argue that a geosocial neighborhood is not defined by its administrative boundaries, planning zones, or physical barriers, but rather by its emergence as an organic self-organized social construct that is embedded in geographical spaces that are linked by human activity. Below is the abstract of the paper and some of the figures we have in it which showcase our work.

“Traditionally urban morphology has been the study of cities as human habitats through the analysis of their tangible, physical artefacts. Such artefacts are outcomes of complex social and economic forces, and their study is primarily driven by traditional modes of data collection (e.g. based on censuses, physical surveys, and mapping). The emergence of Web 2.0 and through its applications, platforms and mechanisms that foster user-generated contributions to be made, disseminated, and debated in cyberspace, is providing a new lens in the study of urban morphology. In this paper, we showcase ways in which user-generated ‘big data’ can be harvested and analyzed to generate snapshots and impressionistic views of the urban landscape in physical terms. We discuss and support through representative examples the potential of such analysis in revealing how urban spaces are perceived by the general public, establishing links between tangible artefacts and cyber-social elements. These links may be in the form of references to, observations about, or events that enrich and move beyond the traditional physical characteristics of various locations. This leads to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics.”

“This cutting edge special issue responds to the latest digital revolution, setting out the state of the art of the new technologies around so-called Big Data, critically examining the hyperbole surrounding smartness and other claims, and relating it to age-old urban challenges. Big data is everywhere, largely generated by automated systems operating in real time that potentially tell us how cities are performing and changing. A product of the smart city, it is providing us with novel data sets that suggest ways in which we might plan better, and design more sustainable environments. The articles in this issue tell us how scientists and planners are using big data to better understand everything from new forms of mobility in transport systems to new uses of social media. Together, they reveal how visualization is fast becoming an integral part of developing a thorough understanding of our cities.”

In our paper we discuss and show how crowdsourced data is leading to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics. Specifically how such data can provide us information pertaining to linked spaces and geosocial neighborhoods. We argue that a geosocial neighborhood is not defined by its administrative boundaries, planning zones, or physical barriers, but rather by its emergence as an organic self-organized social construct that is embedded in geographical spaces that are linked by human activity. Below is the abstract of the paper and some of the figures we have in it which showcase our work.

“Traditionally urban morphology has been the study of cities as human habitats through the analysis of their tangible, physical artefacts. Such artefacts are outcomes of complex social and economic forces, and their study is primarily driven by traditional modes of data collection (e.g. based on censuses, physical surveys, and mapping). The emergence of Web 2.0 and through its applications, platforms and mechanisms that foster user-generated contributions to be made, disseminated, and debated in cyberspace, is providing a new lens in the study of urban morphology. In this paper, we showcase ways in which user-generated ‘big data’ can be harvested and analyzed to generate snapshots and impressionistic views of the urban landscape in physical terms. We discuss and support through representative examples the potential of such analysis in revealing how urban spaces are perceived by the general public, establishing links between tangible artefacts and cyber-social elements. These links may be in the form of references to, observations about, or events that enrich and move beyond the traditional physical characteristics of various locations. This leads to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics.”

Over the summer, Arie Croitoru and myself took part in the George Mason University Aspiring Scientists Summer Internship Program. We worked with three very talented high-school students who over the course of the seven and a half week program produced some excellent research around the areas of agent-based modeling and social media analysis. An overview of their work can be seen in the posters and abstracts that the students produced at the end of the internship.

Lawrence Wang explored how social media could be used with respect to predicting election results under a project entitled “And the Winner Is? Predicting Election Results using Social Media”. Below you can read Lawrence’s abstract and see his poster.

“The 2012 U.S. presidential election demonstrated how Twitter can serve as a widely accessible forum of political discourse. Recently, researchers have investigated whether social media, particularly Twitter, can function as a predictive tool. In the past decade, multiple studies have claimed to successfully predict the results of elections using Twitter data. However, many of these studies fail to account for the inherent population bias present in Twitter data, leading to ungeneralizable results. In this project, I investigate the prospects of using Twitter data as an alternative to poll data for predicting the 2012 presidential election. The tweet corpus consisted of tweets published one month before the November election day. Using VADER, a sentiment analysis tool, I analyzed over 140,000 tweets for political sentiment. I attempted to circumvent the Twitter population bias by comparing age, race, and gender metrics of the Twitter population with that of the U.S. population. Furthermore, I utilized Bayesian inference with prior distributions from the results of the 2008 presidential election in order to mitigate the effects of limited tweet data in certain states. The resulting model correctly predicted the likely outcomes of 46 of the 50 states and predicted that President Obama would be reelected with a probability of 0.945. Such a model could be used to explore the forthcoming elections. ”

In a second project, Varun Talwar, explored how knowledge bases could be utilized to better contextualize social media discussions with a project entitled “Context Graphs: A Knowledge-Driven Model for Contextualizing Twitter Discourse.” Below you can read Varun’s project abstract and his end of project poster.

“Introduction: User posted content through online social media (SM) platforms in recent years has emerged as a rich field for narrative analysis of topics captured during the discussion discourse. In particular, collective discourse has been used to manually contextualize public perception of health related events.

Objective: As SM feeds tend to be noisy, automated detection of the context of a given SM discourse stream has proven to be a challenging task. The primary objective of this research is to explore how existing knowledge bases could be utilized to better contextualize SM discussions through topic modeling and mining. By utilizing such existing knowledge it would then be possible to explore to what extent a given discourse is related to a known or a new context, as well as compare and contrast SM discussions through their respective contexts.

Methods: In order to accomplish these goals this research proposes a novel approach for contextualizing SM discourse. In this approach, topic modeling is combined with a knowledgebase in a two-step process. First, key topics are extracted from a SM data corpus by applying a statistical topic-modeling algorithm, a process that also results in data dimensionality reduction. Once a set of salient topics are extracted, each topic is then used to mine the knowledge base for sub graphs that represent the contextual linkages between knowledge elements. Such sub-graphs can then further disambiguate the topic modeling results, and be utilized for qualifying context similarity across SM discussions.

Results: The time-series analysis of the Twitter discourse via graph-matching algorithms reveals the change in topics as evidenced by the emergence of the terms “pregnancy” and “abortion” as information about the virus propagated through the Twitter community. “

Elizabeth Hu explored the current migration crisis in Europe in a project entitled “Across the Sea: A Novel Agent-Based Model for the Migratory Patterns of the European Refugee Crisis”. Below is Elizabeth’s abstract, poster and an example model run.

“Since 2010, a growing number of refugees have sought asylum in European nations, fleeing violence and military conflict in their home countries. Most of the refugees originate from Syria, Iraq, Afghanistan, and African nations. The vast majority of refugees risk their lives in the popular yet perilous Mediterranean Sea Route often prone to boat accidents and subsequent deaths of migrants. The flow of millions of refugees has introduced a humanitarian crisis not seen since World War II. European nations are struggling to cope with the influx of refugees through various border policies.

In order to explore this crisis, a geographically explicit agent-based model has been developed to study the past and future patterns of refugee flows. Traditional migration models, which represent the population as an aggregate, fail to consider individual decision-making processes based on personal status and intervening opportunities. However, the novel agent-based model developed here of migration allows population behavior to emerge as the result of individual decisions. Initial population, city, and route attributes are based upon data from the UNHCR, EU agencies, crowd-sourced databases, and news articles. The agents, refugees, select goal destinations in accordance with the Law of Intervening Opportunities. Thus, goals are prone to change with fluctuating personal needs. Agents choose routes not only based on distance, but also other relevant route attributes. The resulting migration flows generated by the model under various circumstances could provide crucial guidance for policy and humanitarian aid decisions.”

The movie below gives a sense of the migration paths the refugees are taking.

Over the summer, Arie Croitoru and myself took part in the George Mason University Aspiring Scientists Summer Internship Program. We worked with three very talented high-school students who over the course of the seven and a half week program produced some excellent research around the areas of agent-based modeling and social media analysis. An overview of their work can be seen in the posters and abstracts that the students produced at the end of the internship.

Lawrence Wang explored how social media could be used with respect to predicting election results under a project entitled “And the Winner Is? Predicting Election Results using Social Media”. Below you can read Lawrence’s abstract and see his poster.

“The 2012 U.S. presidential election demonstrated how Twitter can serve as a widely accessible forum of political discourse. Recently, researchers have investigated whether social media, particularly Twitter, can function as a predictive tool. In the past decade, multiple studies have claimed to successfully predict the results of elections using Twitter data. However, many of these studies fail to account for the inherent population bias present in Twitter data, leading to ungeneralizable results. In this project, I investigate the prospects of using Twitter data as an alternative to poll data for predicting the 2012 presidential election. The tweet corpus consisted of tweets published one month before the November election day. Using VADER, a sentiment analysis tool, I analyzed over 140,000 tweets for political sentiment. I attempted to circumvent the Twitter population bias by comparing age, race, and gender metrics of the Twitter population with that of the U.S. population. Furthermore, I utilized Bayesian inference with prior distributions from the results of the 2008 presidential election in order to mitigate the effects of limited tweet data in certain states. The resulting model correctly predicted the likely outcomes of 46 of the 50 states and predicted that President Obama would be reelected with a probability of 0.945. Such a model could be used to explore the forthcoming elections. ”

In a second project, Varun Talwar, explored how knowledge bases could be utilized to better contextualize social media discussions with a project entitled “Context Graphs: A Knowledge-Driven Model for Contextualizing Twitter Discourse.” Below you can read Varun’s project abstract and his end of project poster.

“Introduction: User posted content through online social media (SM) platforms in recent years has emerged as a rich field for narrative analysis of topics captured during the discussion discourse. In particular, collective discourse has been used to manually contextualize public perception of health related events.

Objective: As SM feeds tend to be noisy, automated detection of the context of a given SM discourse stream has proven to be a challenging task. The primary objective of this research is to explore how existing knowledge bases could be utilized to better contextualize SM discussions through topic modeling and mining. By utilizing such existing knowledge it would then be possible to explore to what extent a given discourse is related to a known or a new context, as well as compare and contrast SM discussions through their respective contexts.

Methods: In order to accomplish these goals this research proposes a novel approach for contextualizing SM discourse. In this approach, topic modeling is combined with a knowledgebase in a two-step process. First, key topics are extracted from a SM data corpus by applying a statistical topic-modeling algorithm, a process that also results in data dimensionality reduction. Once a set of salient topics are extracted, each topic is then used to mine the knowledge base for sub graphs that represent the contextual linkages between knowledge elements. Such sub-graphs can then further disambiguate the topic modeling results, and be utilized for qualifying context similarity across SM discussions.

Results: The time-series analysis of the Twitter discourse via graph-matching algorithms reveals the change in topics as evidenced by the emergence of the terms “pregnancy” and “abortion” as information about the virus propagated through the Twitter community. “

Elizabeth Hu explored the current migration crisis in Europe in a project entitled “Across the Sea: A Novel Agent-Based Model for the Migratory Patterns of the European Refugee Crisis”. Below is Elizabeth’s abstract, poster and an example model run.

“Since 2010, a growing number of refugees have sought asylum in European nations, fleeing violence and military conflict in their home countries. Most of the refugees originate from Syria, Iraq, Afghanistan, and African nations. The vast majority of refugees risk their lives in the popular yet perilous Mediterranean Sea Route often prone to boat accidents and subsequent deaths of migrants. The flow of millions of refugees has introduced a humanitarian crisis not seen since World War II. European nations are struggling to cope with the influx of refugees through various border policies.

In order to explore this crisis, a geographically explicit agent-based model has been developed to study the past and future patterns of refugee flows. Traditional migration models, which represent the population as an aggregate, fail to consider individual decision-making processes based on personal status and intervening opportunities. However, the novel agent-based model developed here of migration allows population behavior to emerge as the result of individual decisions. Initial population, city, and route attributes are based upon data from the UNHCR, EU agencies, crowd-sourced databases, and news articles. The agents, refugees, select goal destinations in accordance with the Law of Intervening Opportunities. Thus, goals are prone to change with fluctuating personal needs. Agents choose routes not only based on distance, but also other relevant route attributes. The resulting migration flows generated by the model under various circumstances could provide crucial guidance for policy and humanitarian aid decisions.”

The movie below gives a sense of the migration paths the refugees are taking.

Following on with our GeoSocial Analysis work, we recently had a paper published in PLOS ONE entitled “Crowdsourcing A Collective Sense of Place.” In the paper we discuss and showcase how one can take a quantitative approach to derive a collectiv…

Following on with our GeoSocial Analysis work, we recently had a paper published in PLOS ONE entitled “Crowdsourcing A Collective Sense of Place.” In the paper we discuss and showcase how one can take a quantitative approach to derive a collectiv…

We recently received word that we had a paper accepted for the upcoming 2016 International Conference on Social Media and Society entitled “Accuracy Of User-Contributed Image Tagging In Flickr: A Natural Disaster Case Study.” In the paper w…

We recently received word that we had a paper accepted for the upcoming 2016 International Conference on Social Media and Society entitled “Accuracy Of User-Contributed Image Tagging In Flickr: A Natural Disaster Case Study.” In the paper w…

Megacities, which can be roughly defined as cities with a population of over 10 million people are on the increase due to ongoing urbanization trends. The United Nations notes that since the 1970’s the number of megacities has more than tripled (from 8 to 34), and is expected to further double until 2050 (to exceed 60).

Due to ongoing urbanization trends the worldwide urban population is projected to grow from half of the global population (today) to two thirds of it by 2030. Almost all the new megacities that will emerge through this process are in geopolitical hotspots of southeast Asia and sub-Saharan Africa. Therefore, the U.S. Department of Defense must consider the challenges presented by engagement in such environments when planning for the future. The physical challenge of operating in such dense, highly three-dimensional, environments is only compounded by the added challenge presented by the advanced functional complexity of these environments: megacities function at the intersection of the physical, social, and cyber spaces. Accordingly, military operations in these locations must prepare to engage in environments where news, ideas, and opinions are shaped in cyberspace and propagated across the physical urban landscape. As social networks connect (or, often, divide) populations they form communities and facilitate their mobilization.

We have observed these processes time and again, from the streets of Cairo during the Arab Spring, to the streets of Tokyo during the Fukushima nuclear disaster, and the streets of Paris during the recent ISIL terrorist attacks. Advancing our capability to analyze crowd-generated content in the form of social media feeds is a substantial scientific challenge with considerable implications for future DoD operations. In this publication, we use representative examples to demonstrate the opportunities and challenges associated with such information, especially as they relate to large urban areas.

An emerging framework to study urban systems.

Social networks embedded within a geographical content, leading to connected, non-contiguous areas.

Megacities, which can be roughly defined as cities with a population of over 10 million people are on the increase due to ongoing urbanization trends. The United Nations notes that since the 1970’s the number of megacities has more than tripled (from 8 to 34), and is expected to further double until 2050 (to exceed 60).

Due to ongoing urbanization trends the worldwide urban population is projected to grow from half of the global population (today) to two thirds of it by 2030. Almost all the new megacities that will emerge through this process are in geopolitical hotspots of southeast Asia and sub-Saharan Africa. Therefore, the U.S. Department of Defense must consider the challenges presented by engagement in such environments when planning for the future. The physical challenge of operating in such dense, highly three-dimensional, environments is only compounded by the added challenge presented by the advanced functional complexity of these environments: megacities function at the intersection of the physical, social, and cyber spaces. Accordingly, military operations in these locations must prepare to engage in environments where news, ideas, and opinions are shaped in cyberspace and propagated across the physical urban landscape. As social networks connect (or, often, divide) populations they form communities and facilitate their mobilization.

We have observed these processes time and again, from the streets of Cairo during the Arab Spring, to the streets of Tokyo during the Fukushima nuclear disaster, and the streets of Paris during the recent ISIL terrorist attacks. Advancing our capability to analyze crowd-generated content in the form of social media feeds is a substantial scientific challenge with considerable implications for future DoD operations. In this publication, we use representative examples to demonstrate the opportunities and challenges associated with such information, especially as they relate to large urban areas.

An emerging framework to study urban systems.

Social networks embedded within a geographical content, leading to connected, non-contiguous areas.

We recently co-authored a paper entitled “Lessons from the Ebola Outbreak: Action Items for Emerging Infectious Disease Preparedness and Response” in EcoHealth” with several other researchers from George Mason University. In the paper we discuss t…

We recently co-authored a paper entitled “Lessons from the Ebola Outbreak: Action Items for Emerging Infectious Disease Preparedness and Response” in EcoHealth” with several other researchers from George Mason University. In the paper we discuss t…

This week I am attending the AAG Annual Meeting in Chicago. While here, we organized 3 sessions entitled “Geosimulation and Big Data: A Marriage made in Heaven or Hell?” in which I presented a paper, co-authored with Sarah Wise: “Leveraging Crowdsource…

This week I am attending the AAG Annual Meeting in Chicago. While here, we organized 3 sessions entitled “Geosimulation and Big Data: A Marriage made in Heaven or Hell?” in which I presented a paper, co-authored with Sarah Wise: “Leveraging Crowdsource…

After a very full first day, the second day opened with a breakfast that provided opportunity to meet the board of the Citizen Science Association (CSA), and to have a nice way to talk with people who got up early (starting at 7am) for another full day of citizen science. Around the breakfast tables, new […]

“Urban form and function have been studied extensively in urban planning and geographic information science. However, gaining a greater understanding of how they merge to define the urban morphology remains a substantial scientific challenge. Towards this goal, this paper addresses the opportunities presented by the emergence of crowdsourced data to gain novel insights into form and function in urban spaces. We are focusing in particular on information harvested from social media and other open-source and volunteered datasets (e.g. trajectory and OpenStreetMap data). These data provide a first-hand account of form and function from the people who define urban space through their activities. This novel bottom-up approach to study these concepts complements traditional urban studies work to provide a new lens for studying urban activity. By synthesizing recent advancements in the analysis of open-source data we provide a new typology for characterizing the role of crowdsourcing in the study of urban morphology. We illustrate this new perspective by showing how social media, trajectory, and traffic data can be analyzed to capture the evolving nature of a city’s form and function. While these crowd contributions may be explicit or implicit in nature, they are giving rise to an emerging research agenda for monitoring, analyzing and modeling form and function for urban design and analysis.”

This paper builds and extends considerably our prior work, with respect to crowdsourcing, volunteered and ambient geographic information. In the scope of this paper we use the term ‘urban form’ to refer to the aggregate of the physical shape of the city, its buildings, streets, and all other elements that make up the urban space. In essence, the geometry of the city. In contrast, we use the term ‘urban function’ to refer to the activities that are taking place within this space. To this end we contrast how crowdsourced data can related to more traditional sources of such information both explicitly and implicitly as shown in the table below.

A typology of implicit and explicit form and function content

In addition, we also discuss in the paper how these new sources of data, which are often at finer resolutions than more authoritative data are allowing us to to customize the we we aggregate the data at various geographical levels as shown below. Such aggregations can range from building footprints and addresses to street blocks (e.g. for density analysis), or street networks (e.g. for accessibility analysis). For large-scale urban analysis we can revert to the use of zonal geographies or grid systems.

Aggregation methods for varied scales of built environment analysis

In the application section of the paper we highlight how we can extract implicit form and function from crowdsourced data. The image below for example, shows how we can take information from Twitter, and differentiate different neighborhoods over space and time.

Neighborhood map and topic modeling results showing the mixture of social functions in each area.

Finally in the paper, we outline an emerging research agenda related to the “persistent urban morphology concept” as shown below. Specifically how crowdsourcing is changing how we collect, analyze and model urban morphology. Moreover, how this new paradigm provides a new lens for studying the conceptualization of how cities operate, at much finer temporal, spatial, and social scales than we had been able to study so far.

“Urban form and function have been studied extensively in urban planning and geographic information science. However, gaining a greater understanding of how they merge to define the urban morphology remains a substantial scientific challenge. Towards this goal, this paper addresses the opportunities presented by the emergence of crowdsourced data to gain novel insights into form and function in urban spaces. We are focusing in particular on information harvested from social media and other open-source and volunteered datasets (e.g. trajectory and OpenStreetMap data). These data provide a first-hand account of form and function from the people who define urban space through their activities. This novel bottom-up approach to study these concepts complements traditional urban studies work to provide a new lens for studying urban activity. By synthesizing recent advancements in the analysis of open-source data we provide a new typology for characterizing the role of crowdsourcing in the study of urban morphology. We illustrate this new perspective by showing how social media, trajectory, and traffic data can be analyzed to capture the evolving nature of a city’s form and function. While these crowd contributions may be explicit or implicit in nature, they are giving rise to an emerging research agenda for monitoring, analyzing and modeling form and function for urban design and analysis.”

This paper builds and extends considerably our prior work, with respect to crowdsourcing, volunteered and ambient geographic information. In the scope of this paper we use the term ‘urban form’ to refer to the aggregate of the physical shape of the city, its buildings, streets, and all other elements that make up the urban space. In essence, the geometry of the city. In contrast, we use the term ‘urban function’ to refer to the activities that are taking place within this space. To this end we contrast how crowdsourced data can related to more traditional sources of such information both explicitly and implicitly as shown in the table below.

A typology of implicit and explicit form and function content

In addition, we also discuss in the paper how these new sources of data, which are often at finer resolutions than more authoritative data are allowing us to to customize the we we aggregate the data at various geographical levels as shown below. Such aggregations can range from building footprints and addresses to street blocks (e.g. for density analysis), or street networks (e.g. for accessibility analysis). For large-scale urban analysis we can revert to the use of zonal geographies or grid systems.

Aggregation methods for varied scales of built environment analysis

In the application section of the paper we highlight how we can extract implicit form and function from crowdsourced data. The image below for example, shows how we can take information from Twitter, and differentiate different neighborhoods over space and time.

Neighborhood map and topic modeling results showing the mixture of social functions in each area.

Finally in the paper, we outline an emerging research agenda related to the “persistent urban morphology concept” as shown below. Specifically how crowdsourcing is changing how we collect, analyze and model urban morphology. Moreover, how this new paradigm provides a new lens for studying the conceptualization of how cities operate, at much finer temporal, spatial, and social scales than we had been able to study so far.

In particular, we discuss the notion of physical presence within social media and its importance for exploring the relation between the cyber and the physical domains. We discuss how communities and groups can be detected in both the cyber and physical space, and how they can be processed to form a ‘hybrid’ geosocial view of communities using social network analysis, community detection (the Louvain method) and DenStream. To showcase these concepts and their benefits, we present the analysis of two case studies that make use of Twitter data associated with two different types of events: a planned activity during the Occupy Wall Street (OWS) Day of Action (November 17th, 2011), and the response to the Boston Marathon Bombing (April 15, 2013). We conclude with a summary and outlook. Below is the abstract of the paper:

Over the last decade we have witnessed a significant growth in the use of social media. Interactions within their context lead to the establishment of groups that function at the intersection of the physical and cyber spaces, and as such represent hybrid communities. Gaining a better understanding of how information flows in these hybrid communities is a substantial scientific challenge with significant implications on our ability to better harness crowd-contributed content. This paper addresses this challenge by studying how information propagates and evolves over time at the intersection of the physical and cyber spaces. By analyzing the spatial footprint, social network structure, and content in both physical and cyber spaces we advance our understanding of the information propagation mechanisms in social media. The utility of this approach is demonstrated in two real-world case studies, the first reflecting a planned event (the Occupy Wall Street – OWS – movement’s Day of Action in November 2011), and the second reflecting an unexpected disaster (the Boston Marathon bombing in April 2013). Our findings highlight the intricate nature of the propagation and evolution of information both within and across cyber and physical spaces, as well as the role of hybrid networks in the exchange of information between these spaces.

Research highlights include:

Our analysis includes two major events as captured in Twitter.

The themes in cyber and physical communities tend to converge over time.

Messages among physical space users are more consistent at the onset of the event.

Geolocated users are consuming information more than they produce.

Below are some of the images from the paper. Specifically the first image is how one can think of the relationships between physical and cyber spaces. The next image provides an overview Our geosocial analysis framework for examining cyber and physical communities.

Our Geosocial analysis framework

In the figure below we show an example of using DenStream for spatiotemporal clustering and how the process can capture the protest activities that were planned for the Occupy Wall Street movement’s Day of Action. Each dot corresponds to the originating location of a geolocated tweet; The color of each point indicates the time of the corresponding tweet, ranging from dark blue (early morning, 0) to dark red (late night, 1). While the circles represent a specific spatiotemporal cluster. For example the circle labeled A marked the start of the day where people congregated around Wall Street while circle labeled C shows a cluster at Foley Square.

Physical space groups identified in the lower Manhattan area. Each dot corresponds to the originating location of a geolocated tweet; The color of each point indicates the time of the corresponding tweet, ranging from dark blue (early morning, 0) to dark red (late night, 1).

While in the figure below we show one example of linking the cyber and physical communities. Specifically in (a), the top five communities (node degree > 100) in the cyber space retweet network (each community is designated by one color) are shown; (b) shows the physical space groups; and (c) shows the resulting hybrid meta-network where the connections between physical groups (P nodes), and cyber space communities (C nodes) are shown.

In particular, we discuss the notion of physical presence within social media and its importance for exploring the relation between the cyber and the physical domains. We discuss how communities and groups can be detected in both the cyber and physical space, and how they can be processed to form a ‘hybrid’ geosocial view of communities using social network analysis, community detection (the Louvain method) and DenStream. To showcase these concepts and their benefits, we present the analysis of two case studies that make use of Twitter data associated with two different types of events: a planned activity during the Occupy Wall Street (OWS) Day of Action (November 17th, 2011), and the response to the Boston Marathon Bombing (April 15, 2013). We conclude with a summary and outlook. Below is the abstract of the paper:

Over the last decade we have witnessed a significant growth in the use of social media. Interactions within their context lead to the establishment of groups that function at the intersection of the physical and cyber spaces, and as such represent hybrid communities. Gaining a better understanding of how information flows in these hybrid communities is a substantial scientific challenge with significant implications on our ability to better harness crowd-contributed content. This paper addresses this challenge by studying how information propagates and evolves over time at the intersection of the physical and cyber spaces. By analyzing the spatial footprint, social network structure, and content in both physical and cyber spaces we advance our understanding of the information propagation mechanisms in social media. The utility of this approach is demonstrated in two real-world case studies, the first reflecting a planned event (the Occupy Wall Street – OWS – movement’s Day of Action in November 2011), and the second reflecting an unexpected disaster (the Boston Marathon bombing in April 2013). Our findings highlight the intricate nature of the propagation and evolution of information both within and across cyber and physical spaces, as well as the role of hybrid networks in the exchange of information between these spaces.

Research highlights include:

Our analysis includes two major events as captured in Twitter.

The themes in cyber and physical communities tend to converge over time.

Messages among physical space users are more consistent at the onset of the event.

Geolocated users are consuming information more than they produce.

Below are some of the images from the paper. Specifically the first image is how one can think of the relationships between physical and cyber spaces. The next image provides an overview Our geosocial analysis framework for examining cyber and physical communities.

Our Geosocial analysis framework

In the figure below we show an example of using DenStream for spatiotemporal clustering and how the process can capture the protest activities that were planned for the Occupy Wall Street movement’s Day of Action. Each dot corresponds to the originating location of a geolocated tweet; The color of each point indicates the time of the corresponding tweet, ranging from dark blue (early morning, 0) to dark red (late night, 1). While the circles represent a specific spatiotemporal cluster. For example the circle labeled A marked the start of the day where people congregated around Wall Street while circle labeled C shows a cluster at Foley Square.

Physical space groups identified in the lower Manhattan area. Each dot corresponds to the originating location of a geolocated tweet; The color of each point indicates the time of the corresponding tweet, ranging from dark blue (early morning, 0) to dark red (late night, 1).

While in the figure below we show one example of linking the cyber and physical communities. Specifically in (a), the top five communities (node degree > 100) in the cyber space retweet network (each community is designated by one color) are shown; (b) shows the physical space groups; and (c) shows the resulting hybrid meta-network where the connections between physical groups (P nodes), and cyber space communities (C nodes) are shown.

As regular visitors will know, we have been developing our ability to collect and analyze social media. To this end we have just received word from Transactions in GIS that our paper entitled “Triangulating Social Multimedia Content for Event Localizat…

As regular visitors will know, we have been developing our ability to collect and analyze social media. To this end we have just received word from Transactions in GIS that our paper entitled “Triangulating Social Multimedia Content for Event Localizat…

Recently the USGIF published a book entitled “Human Geography: Socio-Cultural Dynamics and Global Security” in which we have a chapter called “Social Media and the Emergence of Open-Source Geospatial Intelligence”. This book has been some time in…

Recently the USGIF published a book entitled “Human Geography: Socio-Cultural Dynamics and Global Security” in which we have a chapter called “Social Media and the Emergence of Open-Source Geospatial Intelligence”. This book has been some time in…

Recently the USGIF published a book entitled “Human Geography: Socio-Cultural Dynamics and Global Security” in which we have a chapter called “Social Media and the Emergence of Open-Source Geospatial Intelligence”. This book has been some time in…

Readers of the blog know that I have an interest in social media, and how through it we can gain an understanding of society at large. The question is how does the cyber community reflect the corresponding physical community? To this end, papers from 6…

Readers of the blog know that I have an interest in social media, and how through it we can gain an understanding of society at large. The question is how does the cyber community reflect the corresponding physical community? To this end, papers from 6…