The change in popularity of a page rapidly affects the rank. So the connections are stiff

The relationships of the returned links both to each other and to the broader information landscape in general is hidden.

An additional density and stiffness issue is that everyone uses Google, so there is a dense, stiff connection between the search engine and the population of users

Write up something about how

ML can make maps, which decrease the likelihood of IR contributing to normal accidents

AI can use these maps to understand the shape of human belief space, and where the positive regions and dangerous sinks are.

Two measures for maps are the concepts or Range and length. Range is the distance that a trajectory can be placed on the map and remain contiguous. Length is the total distance that a trajectory travels, independent of the map its placed on.

Write up the basic algorithm of ML to map production

Take a set of trajectories that are known to be in the same belief region (why JuryRoom is needed) as the input

Generate an N-dimensional coordinate frame that best preserves length over the greatest range.

What is used as the basis for the trajectory may matter. The range (at a minimum), can go from letters to high-level topics. I think any map reconstruction based on letters would be a tangle, with clumps around TH, ER, ON, and AN. At the other end, an all-encompassing meta-topic, like WORDS would be a single, accurate, but useless single point. So the map reconstruction will become possible somewhere between these two extremes.

The Nietzsche text is pretty good. In particular, check out the way the sentences form based on the seed “s when one is being cursed.

the fact that the spirit of the spirit of the body and still the stands of the world

the fact that the last is a prostion of the conceal the investion, there is our grust

the fact them strongests! it is incoke when it is liuderan of human particiay

the fact that she could as eudop bkems to overcore and dogmofuld

In this case, the first 2-3 words are the same, and random, semi-structured text. That’s promising, since the compare would be on the seed plus the generated text.

Today, see how fast a “Shining” (All work and no play makes Jack a dull boy.) text can be learned and then try each keyword as a start. As we move through the sentence, the probability of the next words should change.

Generate the text set

Train the Nietzsche model on the new text. Done. Here are examples with one epoch and a batch size of 32, with a temperature of 1.0:

----- diversity: 0.2
----- Generating with seed: "es jack a
dull boy all work and no play"
es jack a
dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes
----- diversity: 0.5
----- Generating with seed: "es jack a
dull boy all work and no play"
es jack a
dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes
----- diversity: 1.0
----- Generating with seed: "es jack a
dull boy all work and no play"
es jack a
dull boy all work and no play makes jack a dull boy anl wory and no play makes jand no play makes jack a dull boy all work and no play makes jack a
----- diversity: 1.2
----- Generating with seed: "es jack a
dull boy all work and no play"
es jack a
dull boy all work and no play makes jack a pull boy all work and no play makes jack andull boy all work and no play makes jack a dull work and no play makes jack andull

Note that the errors start with a temperature of 1.0 or greater

Rewrite the last part of the code to generate text based on each word in the sentence.

So I tried that and got gobbledygook. The issues is that the prediction only works on waveform-sized chunks. To verify this, I created a seed from the input text, truncating it to maxlen (20 in this case):

sentence = "all work and no play makes jack a dull boy"[:maxlen]

That worked, but it means that the character-based approach isn’t going to work

----- temperature: 0.2
----- Generating with seed: [all work and no play]
all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes
----- temperature: 0.5
----- Generating with seed: [all work and no play]
all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes
----- temperature: 1.0
----- Generating with seed: [all work and no play]
all work and no play makes jack a dull boy all work and no play makes jack a dull boy pllwwork wnd no play makes
----- temperature: 1.2
----- Generating with seed: [all work and no play]
all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes

Based on this result and the ensuing chat with Aaron, we’re going to revisit the whole LSTM with numbers and build out a process that will support words instead of characters.

Late last week, about 60 percent of the conversation was driven by likely bots. Over the weekend, even as the conversation about the caravan was overshadowed by more recent tragedies, bots were still driving nearly 40 percent of the caravan conversation on Twitter. That’s according to an assessment by Robhat Labs, a startup founded by two UC Berkeley students that builds tools to detect bots online. The team’s first product, a Chrome extension called BotCheck.me, allows users to see which accounts in their Twitter timelines are most likely bots. Now it’s launching a new tool aimed at news organizations called FactCheck.me, which allows journalists to see how much bot activity there is across an entire topic or hashtag

Today, we’re announcing the Cloud Inference API to address this need. Cloud Inference API is a simple, highly efficient and scalable system that makes it easier for businesses and developers to quickly gather insights from typed time series datasets. It’s fully integrated with Google Cloud Storage and can handle datasets as large as tens of billions of event records. If you store any time series data in Cloud Storage, you can use the Cloud Inference API to begin generating predictions.

Realized that there are additional matrices that can post-multiply the Laplacian. That way we can break down the individual components that contribute to “stiffness”. The reason for this is that only identical oscillators will synchronize. Similarity is a type of implicit coordination

Leave the Master matrix[M]: as degree on the diagonal, with “1” for a connection, “0” for no connection

=Bandwidth matrix[B]: has a value (0, 1) for each connection

Alignment matrix[A]: calculates the direction cosine between each connected node. Completely aligned nodes get an edge value of 1.0

There can also be a Weight vectorW: which contains the “mass” of the node. A high mass node will be more influential in the network.

Had a few thoughts about JuryRoom self governance. The major social networks seem to be a mess with respect to what rights users have, and what constitutes a violation of terms of service. The solutions seem pretty brittle (Radiolab podcast on facebook rule making). JuryRoom has built in a mechanism for deliberation. Can that be used to create an online legal framework for crowdsourcing the rules and the interpretation? Roughly, I think that this requires the following:

A constitution – a simple document that lays out how JuryRoom will be goverened.

A bill of rights. What are users entitled to?

The concept of petition, trial, binding decisions, and precedent.

Is there a concept of testifying under oath?

The addition of “evidence” attachments that can be linked to posts. This could be existing documents, commissioned expert opinion, etc.

A special location for the “legal decisions”. These will become the basis for the precedent in future deliberations. Links to these prior decisions are done as attachments? Or as something else?

Localization. Since what is acceptable (within the bounds of the constitution and the bill of rights) changes as a function of culture, there needs to be a way that groups can split off from the main group to construct and use their own legal history. Voting/membership may need to be a part of this.

What is visible to non-members?

What are the requirements to be a member?

How are legal decisions implemented in software?

What are the duties of a “citizen”?

More iConf paper

I wanted to make figures align on the bottom. Turns out that the way that you do this is to set top alignment [t] for each minipage. Here’s my example:

One of the main objectives facing marketers is to present consumers with information on which to base their decisions. In doing so, marketers have to select the type of information system they want to utilize in order to deliver the most appropriate information to their consumers. One of the most interesting and distinguishing dimensions of such information systems is the level of control the consumer has over the information system. The current work presents and tests a general model for understanding the advantages and disadvantages of information control on consumers’ decision quality, memory, knowledge, and confidence. The results show that controlling the information flow can help consumers better match their preferences, have better memory and knowledge about the domain they are examining, and be more confident in their judgments. However, it is also shown that controlling the information flow creates demands on processing resources and therefore under some circumstances can have detrimental effects on consumers’ ability to utilize information. The article concludes with a summary of the findings, discussion of their application for electronic commerce, and suggestions for future research avenues.

This may be a good example of work that relates to socio-cultural interfaces.

The Greeks had experts determine choices, and the public vote between the expert choices

A satisfactory model of decision-making in an epistemic democracy must respect democratic values, while advancing citizens’ interests, by taking account of relevant knowledge about the world. Analysis of passages in Aristotle and legislative process in classical Athens points to a “middle way” between independent-guess aggregation and deliberation: an epistemic approach to decision-making that offers a satisfactory model of collective judgment that is both time-sensitive and capable of setting agendas endogenously. By aggregating expertise across multiple domains, Relevant Expertise Aggregation (REA) enables a body of minimally competent voters to make superior choices among multiple options, on matters of common interest. REA differs from a standard Condorcet jury in combining deliberation with voting based on judgments about the reputations and arguments of domain-experts.

The Centre for Collective Intelligence Design will explore how human and machine intelligence can be combined to make the most of our collective knowledge and develop innovative and effective solutions to social challenges.

H1: Groups are defined by a common location, orientation, and velocity (LOV) through a navigable physical or cognitive space. The amount of group cohesion and identification is proportional to the amount of similarity along all three axis.

H2: Group Behavior emerges from mutual influence, based on awareness and trust. Mutual influence is facilitated by Dimension Reduction: The lower the number of dimensions, the easier it is to produce a group.

H3: Group behavior has three distinct patterns: Nomadic, Flocking and Stampeding. These behaviors are dictated by the level of trust and awareness between individuals having similar LOVs

Nomadic emphasizes environmental gradients as an individual or small group of agents. This supports the broadest awareness of the belief space, though it may be difficult to infer fitness peaks. Gradient discovery is less influences by additional social effects,

Flocking behavior results from environmentally constrained social gradient seeking. For example, distance attenuates social influence. If an agent finds a risk or reward, that information cascades through the population as a function of the environmental constraints. (Note: In-group and out group could be manifestations of pure social gradient creation.)

Stampede emphasizes social gradients. This becomes easier as groups become larger and a strong ‘social reality’ occurs. When social influence is dominant at the expense of environmental awareness, a runaway stampede can occur. The beliefs and associated information that underlie a stampede can be inferred to be untrustworthy.

H4: Individual trajectories through these spaces, when combined with large numbers of other individual trajectories produce maps which reflect the dimensions that define the groups in that space.

I’m very excited to announce my latest project, a book on data visualization. The working title is “Fundamentals of Data Visualization”. The book will be published with O’Reilly, and a preview is available here. The entire book is written in R Markdown, and the figures are made with ggplot2. The source for the book is available on github.

Social learning provides an effective route to gaining up-to-date information, particularly when information is costly to obtain asocially. Theoretical work predicts that the willingness to switch between using asocial and social sources of information will vary between individuals according to their risk tolerance. We tested the prediction that, where there are sex differences in risk tolerance, altering the variance of the payoffs of using asocial and social information differentially influences the probability of social information use by sex. In a computer-based task that involved building a virtual spaceship, men and women (N = 88) were given the option of using either asocial or social sources of information to improve their performance. When the asocial option was risky (i.e., the participant’s score could markedly increase or decrease) and the social option was safe (i.e., their score could slightly increase or remain the same), women, but not men, were more likely to use the social option than the asocial option. In all other conditions, both women and men preferentially used the asocial option to a similar degree.

In line with past work in well-mixed populations, we find that selection favors either the intuitive defector (ID) strategy which never deliberates, or the dual-process cooperator (DC) strategy which intuitively cooperates but uses deliberation to switch to defection in Prisoner’s Dilemma games. We find that sparser networks (i.e. smaller average degree) facilitate the success of DC over ID, while also reducing the level of deliberation that DC agents engage in; and that these results generalize across different kinds of networks.

Working out how to add capability to the sim for P&RCH paper. My thoughts from vacation:

The agents contribution is the heading and speed

The UI is what the agent’s can ‘see’

The IR is what is available to be seen

An additional part might be to add the ability to store data in the space. Then the behavior of the IR (e.g. empty areas) would b more apparent, as would the effects of UI (only certain data is visible, or maybe only nearby data is visible) Data could be a vector field in Hilbert space, and visualized as color.

Updated IntelliJ

Working out how to to have a voxel space for the agents to move through that can also be drawn. It’s any number of dimensions, but it has to project to 2D. In the case of the agents, I just choose the first two axis. Each agent has an array of statements that are assembled into a belief vector. The space can be an array of beliefs. Are these just constructed so that they fill a space according to a set of rules? Then the xDimensionName and yDimensionName axis would go from (0, 1), which would scale to stage size? IR would still be a matter of comparing the space to the agent’s vector. Hmm.

Voters are often highly dependent on partisanship to structure their preferences toward political candidates and policy proposals. What conditions enable partisan cues to “dominate” public opinion? Here I theorize that variation in voters’ reliance on partisanship results, in part, from the opportunities their environment provides to learn about politics. A conjoint experiment and an observational study of voting in congressional elections both support the expectation that more detailed information environments reduce the role of partisanship in candidate choice.

9:00 – 5:00 BRI

Good lord, the BoA corporate card comes with SIX seperate documents to read.

Onward to Chapter Three and Spring database interaction

Well that’s pretty clean. I do like the JdbcTemplate behaviors. Not sure I like the way you specify the values passed to the query, but I can’t think of anything better if you have more than one argument:

Reworking the lit review. Meeting set up with Wayne for tomorrow at 4:00.

Still thinking about modelling. I could use sets of strings that would define a CAs worldview and then compare individuals by edit distance.

Not sure how to handle weights, a number, or repetitions of the character?

Comparing a set of CAs using centrality could see what the most important items are in that (overall and sub) population. how close the individual CA conforms to that distribution is a measure of the ‘belonging’?

CAs could adjust their internal model. Big changes should be hard, little changes should be easy. Would the dropping of a low ranked individual item result in a big change in edit distance with a group that doesn’t have the item?

Working on infrastructure that builds, collects and maintains Factoids

Split out the calculation and spreadsheet functions to support snapshots and debugging.

Set up the base class to be the control. Explorers only look outside their SD, while confirmers and avoiders stay within. Not sure how to tease out the difference between those. I think it will have something to do with the way they look for information, which is beyond the scope of this model for now. Also switched to a random distribution. Here’s an initial result. Much more work to follow

In the book ‘‘The Filter Bubble: What the Internet Is Hiding from You’’, Eli Pariser argues that Internet is limiting our horizons (Parisier, 2011). He worries that personalized filters, such as Google search or Facebook delivery of news from our friends, create individual universes of information for each of us, in which we are fed only with information we are familiar with and that confirms our beliefs. These filters are opaque, that is to say, we do not know what is being hidden from us, and may be dangerous because they threaten to deprive us from serendipitous encounters that spark creativity, innovation, and the democratic exchange of ideas. Similar observations have been previously made by Gori and Witten (2005) and extensively developed in their book ‘‘Web Dragons, Inside the Myths of Search Engine Technology’’ (Witten, Gori, & Numerico, 2006), where the metaphor of search engines as modern dragons or gatekeepers of a treasure is justified by the fact that ‘‘the immense treasure they guard is society’s repository of knowledge’’ and all of us accept dragons as mediators when having access to that treasure. But most of us do not know how those dragons work, and all of us (probably the search engines’ creators, either) are not able to explain the reason why a specific web page ranked first when we issued a query. This gives rise to the so called bubble of Web visibility, where people who want to promote visibility of a Web site fight against heuristics adopted by most popular search engines, whose details and biases are closely guarded trade secrets.

Added both papers to the corpus. Need to read and code. What I’m doing is different in that I want to add a level of interactivity to the serendipity display that looks for user patterns in how they react to the presented serendipity and incorporate that pattern into a trustworthiness evaluation of the web content. I’m also doing it in Journalism, which is a bit different in its constraints. And I’m trying to tie it back to Group Polarization and opinion drift.

More on libraries and serendipity. Found lots, and then went on to look for metions in electronic retrieval. Found Foster’s A Nonlinear Model of Information-Seeking Behavior, which also has some spiffy citations. Going to take a break from writing and actually read this one. Because, I just realized that interdisciplinary researchers are the rough academic equivalent of the explorer pattern.

Page 3 – To design and develop a new research method we used Sonnenwald’s (1999) framework for human information behavior as a theoretical foundation. This theoretical framework suggests that within a context and situation is an ‘information horizon’ in which we can act. For a particular individual, a variety of information resources may be encompassed within his/her information horizon. They may include social networks, documents, information retrieval tools, and experimentation and observation in the world. Information horizons, and the resources they encompass, are determined socially and individually. In other words, the opinions that one’s peers hold concerning the value of a particular resource will influence one’s own opinions about the value of that resource and, thus, its position within one’s information horizon.