Category Archives: Data sources

Had a nice chat with Flynn about the paper. He agrees generally, and points out that physical security will converge with cybersecurity. I think that also hovering on a potential horizon is the addition of social security with physical and cyber security.

Fix directory code of LMN so that it remembers the input and output directories – done

Add time bucketing capabilities. Do this by taking the complete conversation and splitting the results into N sublists. Take the beginning and ending time from each list and then use those to set the timestamp start and stop for each player’s posts.

Thinking about a time-series LMN tool that can chart the relative occurrence of the sorted terms over time. I think this could be done with tkinter. I would need to create and executable as described here, though the easiest answer seems to be pyinstaller.

Here are two papers that show the advantages of herding over nomadic behavior:

Predation was a powerful selective force promoting increased morphological complexity in a unicellular prey held in constant environmental conditions. The green alga, Chlorella vulgaris, is a well-studied eukaryote, which has retained its normal unicellular form in cultures in our laboratories for thousands of generations. For the experiments reported here, steady-state unicellular C. vulgaris continuous cultures were inoculated with the predator Ochromonas vallescia, a phagotrophic flagellated protist (‘flagellate’). Within less than 100 generations of the prey, a multicellular Chlorella growth form became dominant in the culture (subsequently repeated in other cultures). The prey Chlorella first formed globose clusters of tens to hundreds of cells. After about 10–20 generations in the presence of the phagotroph, eight-celled colonies predominated. These colonies retained the eight-celled form indefinitely in continuous culture and when plated onto agar. These self-replicating, stable colonies were virtually immune to predation by the flagellate, but small enough that each Chlorella cell was exposed directly to the nutrient medium.

The transition from unicellular to multicellular life was one of a few major events in the history of life that created new opportunities for more complex biological systems to evolve. Predation is hypothesized as one selective pressure that may have driven the evolution of multicellularity. Here we show that de novo origins of simple multicellularity can evolve in response to predation. We subjected outcrossed populations of the unicellular green alga Chlamydomonas reinhardtii to selection by the filter-feeding predator Paramecium tetraurelia. Two of five experimental populations evolved multicellular structures not observed in unselected control populations within ~750 asexual generations. Considerable variation exists in the evolved multicellular life cycles, with both cell number and propagule size varying among isolates. Survival assays show that evolved multicellular traits provide effective protection against predation. These results support the hypothesis that selection imposed by predators may have played a role in some origins of multicellularity. \

Overview: The Internet Argument Corpus (IAC) is a corpus for research in political debate on internet forums. It consists of ~11,000 disscussions, ~390,000 posts, and some ~73,000,000 words. Subsets of the data have been annotated for topic, stance, agreement, sarcasm, and nastiness among others.

The Data: The data is stored in JSON files with most annotations in CSV format (see included readme for details). Python code to load and use the data is included. The zip archive is 158MB.

Normal Accident drew attention to two different forms of organizational structure that Herbert Simon had pointed to years before, vertical integration, and what we now call modularity. Examining risky systems in the Accident book, I focused upon the unexpected interactions of different parts of the system that no designer could have expected and no operator comprehend or be able to interdict. Reading Charles Perrow’sNormal Accidents. Riveting. All about dense, tightly connected networks with hidden information

Building generators.

Need to change the “stepsize” in the Torrance generator to be variable – done. Here’s my little ode to The Shining:

#confg: {"rows":100, "sequence_length":26, "step":26, "type":"words"}
all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes
jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work
and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a
dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no
play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy
all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes
jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work
and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a
dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no
play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy
all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes
jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work
and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a

Need to be able to turn out a numeric equivalent. Done with floating point. This:

An important use of machine learning is to learn what people value. What posts or photos should a user be shown? Which jobs or activities would a person find rewarding? In each case, observations of people’s past choices can inform our inferences about their likes and preferences. If we assume that choices are approximately optimal according to some utility function, we can treat preference inference as Bayesian inverse planning. That is, given a prior on utility functions and some observed choices, we invert an optimal decision-making process to infer a posterior distribution on utility functions. However, people often deviate from approximate optimality. They have false beliefs, their planning is sub-optimal, and their choices may be temporally inconsistent due to hyperbolic discounting and other biases. We demonstrate how to incorporate these deviations into algorithms for preference inference by constructing generative models of planning for agents who are subject to false beliefs and time inconsistency. We explore the inferences these models make about preferences, beliefs, and biases. We present a behavioral experiment in which human subjects perform preference inference given the same observations of choices as our model. Results show that human subjects (like our model) explain choices in terms of systematic deviations from optimal behavior and suggest that they take such deviations into account when inferring preferences.

This article presents an overview of the Schwartz theory of basic human values. It discusses the nature of values and spells out the features that are common to all values and what distinguishes one value from another. The theory identifies ten basic personal values that are recognized across cultures and explains where they come from. At the heart of the theory is the idea that values form a circular structure that reflects the motivations each value expresses. This circular structure, that captures the conflicts and compatibility among the ten values is apparently culturally universal. The article elucidates the psychological principles that give rise to it. Next, it presents the two major methods developed to measure the basic values, the Schwartz Value Survey and the Portrait Values Questionnaire. Findings from 82 countries, based on these and other methods, provide evidence for the validity of the theory across cultures. The findings reveal substantial differences in the value priorities of individuals. Surprisingly, however, the average value priorities of most societal groups exhibit a similar hierarchical order whose existence the article explains. The last section of the article clarifies how values differ from other concepts used to explain behavior—attitudes, beliefs, norms, and traits.

Jamieson’s Post article was grounded in years of scholarship on political persuasion. She noted that political messages are especially effective when they are sent by trusted sources, such as members of one’s own community. Russian operatives, it turned out, disguised themselves in precisely this way. As the Times first reported, on June 8, 2016, a Facebook user depicting himself as Melvin Redick, a genial family man from Harrisburg, Pennsylvania, posted a link to DCLeaks.com, and wrote that users should check out “the hidden truth about Hillary Clinton, George Soros and other leaders of the US.” The profile photograph of “Redick” showed him in a backward baseball cap, alongside his young daughter—but Pennsylvania records showed no evidence of Redick’s existence, and the photograph matched an image of an unsuspecting man in Brazil. U.S. intelligence experts later announced, “with high confidence,” that DCLeaks was the creation of the G.R.U., Russia’s military-intelligence agency.

Jamieson argues that the impact of the Russian cyberwar was likely enhanced by its consistency with messaging from Trump’s campaign, and by its strategic alignment with the campaign’s geographic and demographic objectives. Had the Kremlin tried to push voters in a new direction, its effort might have failed. But, Jamieson concluded, the Russian saboteurs nimbly amplified Trump’s divisive rhetoric on immigrants, minorities, and Muslims, among other signature topics, and targeted constituencies that he needed to reach.

Twitter released IRA dataset (announcement, archive), and Kate Starbird’s group has done some preliminary analysis

Any exchange that supports this format should be able to participate. Additionally, each exchange should contain a list of other exchanges that a consumer can request, so we don’t need another level of hierarchy. Exchanges could rate other exchanges as a quality measure

It also occurs to me that there could be some kind of peer-to-peer or mesh network for degraded modes. A degraded mode implies a certain level of emergency, which would affect the (now small-scale) allocation of resources.

PSC will convene a working group meeting on Thursday,Oct. 18 from 9am – 10am to identify actions and policy considerations related to advancing the use of AI solutions in government. Come prepared to share your ideas and experience. We would welcome your specific feedback on these questions:

How can PSC help make the government a “smarter buyer” when it comes to AI/ML?

How are agencies effectively using AI/ML today?

In what other areas could these technologies be deployed in government today?

Looking for bad sensors on NOAA satellites

What is the current federal market and potential future market for AI/ML?

Notes:

How to help our members – federal contracts. Help make the federal market frictionless

Kevin – SmartForm? What are the main gvt concerns? Is it worry about False positives?

The killer app is cost savings, particularly when one part of government is getting a better price than another part.

Federal Data Strategy

Send a note to Kevin about data availability. The difference between NOAA sensor data (clean and abundant), vs financial data, constantly changing spreadsheets that are not standardized. Maybe the creation of tools that make it easier to standardize data than use artisanal (usually Excel-based) solutions. Wrote it up for Aaron to review. It turned out to be a page.

This research examines the competing narratives about the role and function of Syria Civil Defence, a volunteer humanitarian organization popularly known as the White Helmets, working in war-torn Syria. Using a mixed-method approach based on seed data collected from Twitter, and then extending out to the websites cited in that data, we examine content sharing practices across distinct media domains that functioned to construct, shape, and propagate these narratives. We articulate a predominantly alternative media “echo-system” of websites that repeatedly share content about the White Helmets. Among other findings, our work reveals a small set of websites and authors generating content that is spread across diverse sites, drawing audiences from distinct communities into a shared narrative. This analysis also reveals the integration of government funded media and geopolitical think tanks as source content for anti-White Helmets narratives. More broadly, the analysis demonstrates the role of alternative newswire-like services in providing content for alternative media websites. Though additional work is needed to understand these patterns over time and across topics, this paper provides insight into the dynamics of this multi-layered media ecosystem.

In topic modeling, identifiability of the topics is an essential issue. Many topic modeling approaches have been developed under the premise that each topic has an anchor word, which may be fragile in practice, because words and terms have multiple uses; yet it is commonly adopted because it enables identifiability guarantees. Remedies in the literature include using three- or higher-order word co-occurence statistics to come up with tensor factorization models, but identifiability still hinges on additional assumptions. In this work, we propose a new topic identification criterion using second order statistics of the words. The criterion is theoretically guaranteed to identify the underlying topics even when the anchor-word assumption is grossly violated. An algorithm based on alternating optimization, and an efficient primal-dual algorithm are proposed to handle the resulting identification problem. The former exhibits high performance and is completely parameter-free; the latter affords up to 200 times speedup relative to the former, but requires step-size tuning and a slight sacrifice in accuracy. A variety of real text copora are employed to showcase the effectiveness of the approach, where the proposed anchor-free method demonstrates substantial improvements compared to a number of anchor-word based approaches under various evaluation metrics.