It’s somewhat ironic that I’m starting my second database programming related post with the same “I was looking for a certain solution but I couldn’t find any” passage. The difference is, this time it’s a problem more general than querying a weighted list: how to handle complex changes in a distributed database.

In the RDBMS world, a single transaction lets you do as many operations as you like, on as many tables or rows as you like. Lacking proper design and forward thinking though, more often than not this practice leads to hogging a significant portion of your resources.

Distributed databases can’t let this happen, so they introduce certain limitations to transactions. In Google App Engine (GAE) datastore for example, you can operate on rows of the same type having a common ancestor (rows, or entities as they’re called in GAE may have parent-children relationships ensuring that all entities on the same tree are stored on the same server). Bad design will still lead to problems (e.g. storing all your data on the same tree) but this way it’s your app that gets screwed not the database.

The lazy way

While this is generally good news, you’re faced with a new problem: handling changes that depend on the outcome of others. Here’s an example: you’re adding a new entry to the table (model in GAE) “grades” in a school database. You’ll have to keep the corresponding row, identified by the student, subject and semester, in the table “averages”, in sync. The lazy way of doing this would be performing the following steps in one transaction:

Add row to grades

Calculate new average

Update average

‘By’ instead of ‘to’

There is a solution however, with two transactions, or more in more complex cases. In the last step of the above example, instead of changing the average (or rather the sum) to a certain value we’d change it by a certain value in a separate, “delta” transaction. The figure below demonstrates the difference between the two. At the top you see the timeline of two conventional transactions and below their overlapping delta equivalents.

Advantages of delta transactions:

Isolated changes remain consistent.

You can still roll back a batch of changes from code as delta transactions don’t care about subsequent changes as long as they are reversible and commutative.

Using more, but shorter transactions (often operating on a single entity) there’s lower chance of concurrency.

Even if concurrency occurs, fewer re-tries will be enough to avoid failure.

The obvious constraint to delta transactions is that they are only applicable to numeric values.

Support?

Considering how often this is needed, I was expecting some support for this kind of transactions in NoSQL based platforms. GAE features a method called get_or_insert() which is similar in a sense that it wraps a transaction that inserts the specified row (entity) before returning it, in case it doesn’t exist. But there could just as well be a method delta_or_insert(), that either inserts the specified row (entity) with the arguments as initial values if it doesn’t exist, or updates it with the arguments added, multiplied, etc. to the currently stored values.

Moreover, there could be support for rollback too, using delta transaction lists, and even the possibility to evaluate expressions only when they are needed would be a great one. Features like these that simplify transactions while increasing application stability and data integrity would be much appreciated I think by many developers new to these recent platforms.

In previous posts, especially in those related to content mapping, I frequently referred to collective actions and efforts in describing certain concepts, but never elaborated on the exact meaning of these terms. One could think that collectivity and collaboration are identical (they often are mentioned in the same context) as both have something to do with individuals working together. In fact, I find it important to highlight their differences for I expect collectivity to play as vital a role in Web 3.0 as collaboration did in Web 2.0.

Transition

As already understood from popular Web 2.0 applications such as Wikipedia, Google Docs, or WordPress, we define collaboration as sharing workload in a group of individuals who engage in a complex task, working towards a common goal in a managed fashion, and are conscious of the process’ details all the way.

As the number of participants grow however, it becomes apparent that collaboration is not scalable beyond a certain level while remaining faithful to the definition outlined above. Although there is such a thing as large-scale collaboration, what that covers is lots of people having the possibility of contribution but in reality only a few doing so. Mass collaboration goes further by blurring the definition of collaboration so much that it practically becomes just another expression for collectivity.

And when I speak of collectivity, I think of a crowd performing a simple, uncoordinated task where participants don’t have to be aware of their involvement in the process while contributing. The outcome of a collective action is merely a statistical aggregation of individual results.

Different realms

Collaboration and collectivity operate in different realms. Collaboration can be thought of as an incremental process (linear) while collectivity is more similar to voting (parallel). On the figure below, arrows represent the timeline of sub-tasks performed by participants.

Suppose a sub-task like that was the creation or modification of a Wikipedia entry. In this case collaboration proves more effective, as it offers a higher chance of eliminating factual errors during the process, while a collective approach would surely preserve all of them (and offer the one with the fewest). The semantic complexity of a document does not fit the more or less hit-and-miss approach of collectivity.

However, if we decrease the complexity of the content, say, to one sentence, the probability of individual solutions being as ‘good’ as products of collaboration is expected to be equal. Collective approaches therefore suit low-complexity content better.

The synaptic web

What content is of lower complexity than connections within a content network? Different relations such as identity, generalization, abstraction, response or ‘part-of’ require no more than a yes-no answer. Collectivity is cut out exactly for this kind of tasks.

With the advent of the real-time web, however, increasingly effective publishing, sharing and engagement tools are making it easier to find connections between nodes in near-real time by observing human gestures at scale, rather than relying on machine classification.

Hence the synaptic web calls for collectivity. What we need now is more applications that make use of it.

Updates

Just one day before my post, @wikinihiltres posted an article comparing the efficiency of collective and collaborative approaches to content production through the examples of Wikipedia and Wikinews, concluding that “the balance that ought to be sought is one that continues to accept the powerful aggregative influence, but that greatly promotes collaboration where possible, since collaboration most reliably produces good results”.

]]>https://collectiveweb.wordpress.com/2010/02/09/collaboration-versus-collectivity/feed/5danstockerAdd to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to NewsvineCollaboration versus collectivityDatabase Options for Content Mappinghttps://collectiveweb.wordpress.com/2010/02/05/database-options-for-content-mapping/
https://collectiveweb.wordpress.com/2010/02/05/database-options-for-content-mapping/#respondFri, 05 Feb 2010 22:44:08 +0000http://collectiveweb.wordpress.com/?p=214

While writing posts on the relation between content mapping and semantic web-related topics, I’m also working out the technical background for a specific content mapping solution.

Organic network of content

Content mapping is a collective effort to organize content into an organic, “living” network. As opposed to technologies that attempt to understand content by semantic analysis, content mapping facilitates understanding by having humans classify the connections in between. It is built on the presumption that conceptualization in the human brain follows the same pattern, where comprehension manifests through the connections between otherwise meaningless words, sounds, and mental images gathered by experience. Content mapping therefore is not restricted to textual content as NLP is. It’s applicable to images, audio, and video as well.

The purpose of content mapping is to guide from one piece of content to its most relevant peers. Searching in a content map equals to looking for the ‘strongest’ paths between a node and its network.

Approach & architecture

The technical essence of content mapping is the way we store and manage connections. From a design perspective, I see three approaches.

Graph: Querying neighbors in arbitrary depth while aggregating connection properties along paths. Limited in the sense that it works only on networks with no more than a fixed number of edges on a path, e.g. question-answer pairs.

Recursive: Crawling all paths in a node’s network while calculating and sorting weights. Resource hungry due to recursion. Aggregated weights have to be stored until result is returned, and cached until an affected connection is changed.

Indexing: Tracking paths as implicit connections on the fly. All implicit connections have to be stored separately to make sure they’re quickly retrievable.

When deciding on an architecture upon which to implement the solution, three choices come to mind.

Relational: Traditional RDBMS, mature and familiar. The richness of SQL and data integrity is highly valuable for most web applications, but such advantages often come at the price of costly joins, tedious optimizations and poor scalability.

Graph: Fits applications dealing with networks. Despite the structural resemblance with content maps, this genre of databases – being relatively young – lacks certain features necessary for a content mapping solution, such as aggregation along paths.

Distributed: Scalability and performance are given highest priority. Consequently, access to resources, and features common in relational databases such as references, joins, or transactions are limited or completely missing.

The following table summarizes the key characteristics of each of the nine approach-architecture combinations.

Graph

Recursive

Indexing

Relational

Costly self-joins in fixed depth

Complex, caching required

Writing is not scalable

Graph

No aggregation along paths

Graph architecture not exploitable

Implicit connection as separate edge type

Distributed

Lacks joins, same as recursive

Limited access to resources

Needs concurrency management

Finalists

The table above shows that most options have at least one showstopper: either complexity, lack of features and scalability, costly operations or unfitting architecture.

Only two of them seem to satisfy the purpose of content mapping as described in the first section: the graph and distributed implementations of the indexing approach.

Even though it’s not the graph approach we’re talking about, this is a combination that exploits the advantages of the graph database to its full extent. By storing implicit connections as separate edges, there’s no need to query paths deeper than one neighbor.

In a distributed database there are no constraints or triggers, demanding more attention in regard to concurrency management. Graph structure is not supported on a native level, but scalability and performance make up for it.

]]>https://collectiveweb.wordpress.com/2010/02/05/database-options-for-content-mapping/feed/0danstockerAdd to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to NewsvineApproachesArchitecturesOntologies in Content Mappinghttps://collectiveweb.wordpress.com/2010/01/28/ontologies-in-content-mapping/
https://collectiveweb.wordpress.com/2010/01/28/ontologies-in-content-mapping/#respondThu, 28 Jan 2010 20:22:48 +0000http://collectiveweb.wordpress.com/?p=196

Ontology in computer science is a formal representation of concepts within a domain. With the emergence of the semantic web, ontologies will take the role of the anchor to which all content can and should relate.

Ownership

If their purpose is indeed to serve as the lighthouse on the sea of information, ontologies must be unambiguous, and therefore be defined and maintained by a single entity. Will this single entity be a company, a consortium, a committee, an organization or perhaps a government agency? What guarantees that formal ontologies will follow the changes that may occur in the instance domain?

Collective definition

There is one guarantee: collective ontology management. And I’m not thinking of Wikipedia-style collaboration, but real collective effort where everyone throws in his/her two cents.

Take a look at this very simple comparison between equivalent fractions of a content map and an ontology. (A content map is based on collective definition of probabilistic relations between content elements.)

The resemblance is hard to miss. It’s no surprise, both deal with concepts and instances bound into a network through different sorts of relations. But as we take a closer look, it becomes obvious that content mapping is fundamentally different.

There’s no distinction between elements such as classes, instances, attributes, et cetera. They’re all content. What constitutes a class from an ontology point of view depends solely on the relation. One element may be instance and class simultaneously.

There are fewer, more general types of connections. You can extend an ontology with new relations that specify the way certain elements are connected to each other. Content mapping defines only a few, from which new, implicit ones can be derived via machine learning.

Domains don’t have definite borders. It is very likely that elements have connections leading out of a domain, superseding what we call ontology alignment. As an element may be instance and class at the same time, it can also belong to more than one domain. In fact, these are the connections through which cross-ontology relationships emerge.

Dynamics is inherently embedded into the system. As content changes, connections follow. Classes are constantly created, updated or deleted by changing generalization connections.

Content mapping creates an organic system where ontologies float on the surface.

Defining ontologies in this environment is no longer necessary, they crystallize with the natural progress. We only have to harvest the upper generalization layers to get an understanding of conceptual connections in any data set. Domains needn’t be defined beforehand either. Instead, we draw their outlines where we deem them fitting.

Clues to rely on

However flexible content mapping technology may seem in defining and following ontologies, its purpose is to connect previously unconnected content, and therefore it needs clues to follow up. Prior user input, search indexes, or existing ontologies may provide these clues. Once those clues are there, content mapping simplifies ontology management in several aspects.

Fewer relations: Only a handful of general relations are explicit, domain specific relations are all derived from those.

No need for focused attention: Ontology management requires no supervision as implicit connections change with content.

No knowledge of semantics: Connections (both explicit and implicit) can be set or changed without any knowledge on the subject of semantics or ontologies.

]]>https://collectiveweb.wordpress.com/2010/01/28/ontologies-in-content-mapping/feed/0danstockerAdd to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to NewsvineContent map vs. OntologyStochastic Linguistics and Content Mappinghttps://collectiveweb.wordpress.com/2010/01/26/stochastic-linguistics-and-content-mapping/
https://collectiveweb.wordpress.com/2010/01/26/stochastic-linguistics-and-content-mapping/#respondTue, 26 Jan 2010 09:24:16 +0000http://collectiveweb.wordpress.com/?p=174

Stochastic linguistics deals with the probabilities of certain patterns occurring in natural language and is therefore very likely to play an important role in future natural language processing (NLP) applications, including the semantic web.

… the leading edge of web technology developing towards globalized information interflow is almost exclusively based on stochastic technologies.

Stochastic techniques

Stochastic techniques, such as n-gram and latent semantic analysis (LSA) help us identify and classify patterns in natural language, through which we are able to compare, search and analyze documents in a language independent manner.

However, they fail to further machine understanding beyond a broad structural analysis. These techniques recognize a very limited set of relationships between terms, that usually narrow down to: “identical” or “interchangeable”. These relations say nothing about the quality of the connection, i.e. whether two similar terms are similar in structure or meaning.

This is where a catch 22 begins to unfold. In order to provide relations reflecting on the meaning of terms by which they can be understood, machines running stochastic analyses would have to understand those terms first. The only way of resolving this “catch” leads through adding human intelligence to the mix, extending the network of terms with the missing semantic links, which is how we arrive at content mapping.

Mapping language

Content mapping is a system that does exactly the above by collectively defining and maintaining a rich set of relations between bits of content, including natural language patterns. Relations create equivalence classes for content elements where one term may belong to several classes based on its meaning, structure and function. Synonymy and polysemy that are hinted by LSA for instance, are not only explicitly defined in content mapping, but extended by relations vital to machine understanding, like generalization and identical meaning.

Let’s see an example. The figure below places the term “it’s 5 o’clock” in a content map. Colored connections are based on the votes of people participating in the mapping process. Outlined white arrows represent generated connections. Similarly, bubbles with solid outlines are actual pieces of content (terms), ones with dotted outlines are generated.

In the map, different relations contribute to language understanding in different ways.

Generalizations help conceptualizing terms.

Responses indicate contextual relationships between terms by connecting effects to their causes or answers to questions.

Identical meaning creates pathways for terms with low number of connections to other relations such as generalizations or responses.

Abstractions extract structural similarity from terms to be used later by pattern recognition.

It takes two

Even though we obtain richer and more reliable information about term relationships through content mapping, it would take a lot of guesswork before actually related terms would get connected. To reduce the amount of unnecessary passes, LSA could provide higher-than-normal error rate connections as clues for the content mapping process to follow up.

A composite solution that unites the two, could point to a direction where a language independent structural and semantic understanding of text finally comes within our reach.

]]>https://collectiveweb.wordpress.com/2010/01/26/stochastic-linguistics-and-content-mapping/feed/0danstockerAdd to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to NewsvineSample content mapEntropy and the Future of the Webhttps://collectiveweb.wordpress.com/2010/01/19/entropy-and-the-future-of-the-web/
https://collectiveweb.wordpress.com/2010/01/19/entropy-and-the-future-of-the-web/#commentsTue, 19 Jan 2010 20:51:00 +0000http://collectiveweb.wordpress.com/?p=154

Inspired by this post of Chris Dixon, I summarized my thoughts on the future of the web in a single tweet like this:

The fundamental question that will shape the future of the web is how we deal with entropy.

Options

The disorder of the web thrives on both content and their connections. Today’s approach to the web of tomorrow depend on how we address this issue. The figure below shows what we can expect as a result of different combinations of low and high entropy in the two layers.

Entropy in the content layer reflects the degree of internal disorder. If we choose to lower content entropy through the addition of relevant metadata or structure we’ll realize the semantic web. If we don’t then content will remain unorganized and we’ll end up in the noisy web.

Entropy in the connection layer expresses disorder in the network of content. By defining meaningful relations between content elements connection entropy will decrease leading to the synaptic web. Should we leave connections in their ad-hoc state, we’ll arrive in the unorganized web.

The study of web entropy becomes interesting when we take a look at the intersections of these domains.

Semantic – synaptic: The most organized, ideal form of the web. Content and connections are thoroughly described, transparent and machine readable. Example: linked data.

Semantic – unorganized: Semantic content loosely connected throughout the web. Most blog posts have the valid semantic structure of documents, however, they’re connected by hyperlinks that say nothing about their relation. (Say, whether a blog entry extends, reflects on or debates the linked one.)

Noisy – unorganized: Sparse network of unstructured content. This is the domain we’ve known for one and a half decades where keyword based indexing and search still dominates the web. If it continues to develop in this direction then technologies such as linguistic parsing and topic identification will definitely come into play in the future.

Which one?

The question is obvious: which domain represents the optimal course to take? Based on the domains’ description semantic – synaptic seems to be the clear choice. But we’re discussing entropy here and from thermodynamics we know that entropy grows in systems that are prone to spontaneous change and order is restored only at the cost of energy and effort.

Ultimately, the question comes down to this: are we going to fight entropy or not?

Bringing the semantic web into existence is an enormous task. To me, fighting the reluctance of people to adopt the use of metadata and semantic formats is unimaginable. The synaptic web seems more feasible as the spreading of social media already indicates. But in the end what matters is which domain or combination of domains will be popular among early adopters. The rest will follow.

]]>https://collectiveweb.wordpress.com/2010/01/19/entropy-and-the-future-of-the-web/feed/10danstockerAdd to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to NewsvineEntropy levels of the webAdvertising with Content Mappinghttps://collectiveweb.wordpress.com/2010/01/14/advertising-with-content-mapping/
https://collectiveweb.wordpress.com/2010/01/14/advertising-with-content-mapping/#commentsThu, 14 Jan 2010 00:22:09 +0000http://collectiveweb.wordpress.com/?p=132

Advertising is the number one option to monetize content on the web. Even with the advent of the real-time web, until a viable model of in-stream advertising is conceived, search engines and their means of online marketing, such as SEO, AdWords and AdSense remain dominant. As in-stream and other real-time web marketing models mature, become relevant and non-intrusive, search engines will have to undergo fundamental change to keep a significant segment of the market.

Keywords are bad

In order to induce fundamental change first we have to identify the fundamental flaw. At the dawn of the World Wide Web the paradigm of search was borrowed from text documents where a certain paragraph is easily spotted by looking up a few words we presume to be in it.

With more extensive content and less prior knowledge about it this paradigm became harder and harder to apply. However, in our efforts to rank content by its structure, context and user preferences we kept keywords all along. Moreover, keywords today fuel an entire industry of online advertising, recklessly overlooking the distortion they add between content and the user’s specific preferences.

To make search and search-based ads as relevant and as non-intrusive as the real-time web has to offer, keywords must be forgotten once and for all.

Content mapping

Content mapping connects units of content directly by user interaction via a rich, irreducible set of relations. Between your content and others there may be similarities, equivalence, references and other sorts of relations of various significance and strength (relevance). Content that is relevant to yours make up its ideal context. The first n results a search engine returns is on the other hand the actual context.

The distance between the ideal and actual context marks the accuracy of a search engine.

Now, when you’re searching with Google, you’re basically trying to define the ideal context for the content you need. Imagine just how clumsy and inefficient it is to do through a couple of keywords.

Using a content mapping engine you type in a piece of content, not context. That content (or one that’s semantically identical) is probably already placed and centered in its ideal context. You’ll receive the elements inside as results in decreasing order of relevance.

SEO

Search engine optimization is an attempt to match the actual context to the ideal. Inevitably, when you tune your webpage for certain keywords you’re guaranteed to bolt it into the wrong context.

With content mapping, there’s no need for SEO. Not in a fair use scenario anyway, but on the other hand, the well-known SEO exploits (black hat, article spinning, keyword stuffing) obviously won’t work either. If the actual context of your content changes, it will re-position itself automatically to a new context that approximates the ideal as close as possible.

Shooting in the dark

When it comes to online marketing, SEO is just one of your options. Ranking algorithms may change and your content gets easily ripped out of the context you worked on so hard to match. So, you turn to a different, somewhat more reliable marketing tool, AdWords for example.

What happens from then on is again viewed through the smudgy glass of keywords. First, you take a wild guess at what keywords will best match your ideal context, bid for them and see what happens. If the conversion rates are not satisfactory, repeat the process until you get the best achievable results.

Assuming your campaign was successful, along the way you’ve probably

lost a lot of time tweaking

lost potential customers / deals

paid for the wrong keywords

took an exam or hired a consultant

and ended up in a wrong context anyway

In a content mapping environment however, you land at the center of your ideal context. With no tweaking, no time nor money lost.

What’s the catch?

I’ve hinted in the definition that content mapping relies on user input. In fact it relies on almost nothing but that. I admit that building and maintaining the connection index takes huge collective efforts, but I’m convinced about its feasibility.

We only have to make sure it

Provides frictionless tools for contribution: When the entire index has to be collected from the network it’s vital for the process not to demand more time and attention from contributors than what’s necessary.

Treats harmful activity as noise: Random noise is natural in content mapping. Useful information within the system – however small percentage – is expected to be coherent and thus extractable. In order to suppress useful information a successful attack would have to insert harmful information of at least equal coherence. Input gathering tools within the system must be designed with that in mind.

Regardless of how cleverly we gather information from the network, latency remains an integral property of content mapping. Changing actual context needs time to catch up to the ideal depending on the size of the network. The bigger it is, the faster the response. At the start of a campaign one must be clear with the delay by which the content gets centered in its context.

Unfortunately neither of the concerns above are comparable to building the network in terms of size and difficulty. However, the steps through which this can be achieved are yet to be defined.

Updates

Click-through rates: It’s sort of self explaining, but it may be necessary to emphasize the following. When a piece of content is centered in its ideal context it will yield the highest click-through rates when placed on a blog or website in an AdSense fashion.

Similar solutions: MyLikes has implemented a system in which advertisers may reach a higher click-through rate by placing their ads next to (or embedded into) relevant content produced by trusted “influencers”.

In-stream solutions: Take a look at this list of Twitter-based marketing tools on oneforty. They may not be all in-stream, but they can give you a general idea of advertising in the real-time web.

Originally this post started out as an analysis of collective intelligence in Asimov‘s Foundation universe just to see if my model is applicable to fictional entities as well. By the time I finished the first draft I realized it all boiled down to a new collective consciousness property: robustness.

Following my original plan, I’m going to start by examining the two kinds of collective consciousness that form the parallel and competing alternatives for the development of the Galaxy.

Gaia

Asimov’s Gaia is a planet-size autonomous collective entity formed by every living and inanimate objects connected to it. Gaia’s consciousness is so strong that for anything to become a part of it has to be ingested first through an energy-consuming process. Once an object or being becomes a part of Gaia, its own consciousness is added to the whole, and from then on contributes to the collective wisdom, memory and actions of Gaia. The ingestion process is reversible: one may leave the collective of Gaia by severing the connection either by distance or becoming a part of a non-Gaia system. Death doesn’t lead out of Gaia, only decreases its overall consciousness, which is kept in balance by the birth of new highly conscious components.

To see the Asimovian Gaia in context, I’m going to use the Lovelockian Gaia as reference in its analysis.

Quantitative analysis

Asimov’s Gaia and the Lovelockian Gaia share most of their properties in the Collective Entity Space (CES). The only obvious difference is in their level of awareness. In the Asimovian Gaia one is not only aware of the collective consciousness and his/her/its connection to it, but also actively accessing and using it for individual and global purposes.

Gaia

Components

Engagement

Awareness

Nature

Technology

Lovelockian

Entire planet

Passive

Unaware

Mixed

None

Asimovian

Entire planet

Passive

Aware

Mixed

None

Qualitative analysis

Through Asimov’s story we get to know a lot more about Gaia than we know about the collective consciousness of the Gaia Theory. The only Consciousness Classification Criteria (CCC) property that we don’t know is control, probably because of its irrelevance to the deeply integrated Gaians.

Gaia consciously maintains ecological balance that best suits the needs of its habitants.

The mental well-being and happiness of Gaians is credited to collective consciousness.

Gaia

Communication

Problem solving

Maturity

Usefulness

Control

Lovelockian

?

?

Infant / mature

Life

?

Asimovian

Telepathy

Multiple

Mature

Multiple

?

In a way, the Asimovian Gaia is an evolved Lovelockian Gaia.

Foundation

Although it’s not explicitly referred to as a collective consciousness in the novels, the First Foundation unquestionably shows the expected characteristics. The existence of the Second Foundation is the proof: it communicates with it, controls it and uses it for a purpose (the hastened establishment of a stable second Galactic Empire).

Quantitative analysis

The First Foundation just as any collective entity, can be placed in the CES. If you’ve read the series you know about the importance of unawareness. The science of psychohistory and the control exerted by the Second Foundation don’t work if the First Foundation becomes aware that they are observed or manipulated in any way. This translates to the CES as a small nudge on the awareness axis.

Components

Engagement

Awareness

Nature

Technology

The galaxy’s human population

Passive

Unaware

Psychical / mixed

None

Qualitative analysis

The First Foundation, as a collective entity (galactic consciousness) is communicated with by the Second Foundation through their mental powers and the mathematics of psychohistory. It needs to be under constant control in order to keep it on track and have its purpose fulfilled, hence it’s reasonable to assume that it’s still in a state of infancy.

Communication

Problem solving

Maturity

Usefulness

Control

Psychohistory

?

Infant

Social development

Second Foundation

The Mule

An analysis of the Foundation can’t be thorough without a glimpse on the Mule and his actions. The Mule is a mutant individual with unusual mental powers which he uses to manipulate the ruling class of the First Foundation and take over the Galaxy.

In terms of the CES he changes the behavioral structure of the entity’s components by altering their engagement and introducing a technology (governing by his mental influence) into the entity.

The Second Foundation

The Mule’s search for the Second Foundation during his struggle to rule the Galaxy draws the First Foundation’s attention to it, who, feeling threatened by its mere existence, wipes out the group that is believed to be its members. Only by the purposeful and delicate planning of the Second Foundation is its total destruction avoided, unawareness restored and the galactic consciousness returned to its previous state.

Robustness

If we compare Asimov’s two collective consciousnesses we’ll see that there’s a fundamental difference in how they react to perturbation in the CES. While Gaia is either indifferent to it or responds by returning to (near) its original position, the Foundation could be ruined by a small shift on the awareness axis, and returned only at extreme difficulties. Gaia therefore is a robust collective entity, while the First Foundation is a fragile one.

The robustness property of a collective consciousness reflects the probability of returning to its original state in the Collective Entity Space in response to perturbation.

Let’s take a look at the examples I used in the previous posts. The following table shows the expected reaction and robustness of a collective entity to perturbation on the given axis.

Entity

Engagement

Nature

Technology

Robustness

Weak Gaia

–

?

–

robust

GCP

–

collapses

communication collapses

moderately fragile

Twitter

collapses

collapses

collapses

fragile

Note that components and awareness are not included in the table as all three entities are indifferent to small changes along those axes. Adding or removing a few components, as well as awareness, even when spread among components don’t have an impact on any of them.

Through robustness we are able to measure the sustainability of a collective entity. The examples from Asimov’s Foundation indirectly indicate its importance by the choice made in the story between the Foundation and a galactic Gaia, “Galaxia” in favor of the latter. Golan Trevize, the protagonist of Foundation’s Edge, unconsciously chooses the robust collective entity that ensures the long-term safety of humankind that is scattered throughout the galaxy.

]]>https://collectiveweb.wordpress.com/2010/01/08/collective-entity-robustness/feed/0danstockerAdd to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to NewsvineThe Noisy Webhttps://collectiveweb.wordpress.com/2010/01/06/the-noisy-web/
https://collectiveweb.wordpress.com/2010/01/06/the-noisy-web/#commentsWed, 06 Jan 2010 17:00:44 +0000http://collectiveweb.wordpress.com/?p=90

We are witnessing the decay of Google Search. The recent improvements (categories, promotion, real-time results) are insignificant compared to the magnitude of the problem, namely, poor relevance of results. By relevance I mean the results’ relation to the specific idea in the user’s mind, and not their relation to the keywords.

Keyword-based search increases the distance and distortion between results and what the user is really looking for.

Poor relevance

Why do keywords perform so poorly? After all, they would work perfectly in a world where all data on the web is semantically indexed through relevant metadata. In reality however, the gap between relevant information and noise is so huge that keywords are likely to be caught in both. The keyword meta tag fiasco around the millennium has proven the inefficiency and vulnerability of metadata.

The widely criticized Semantic Web that is anticipated to be an integral part of the third generation Web aims in that direction anyway. How it’s going to deal with obvious obstacles such as entropy and human behavior remain unanswered.

Making sense of noise

Instead of going into the problems posed by metadata, let’s focus on the naturally noisy web. Since reducing entropy in general requires immense efforts let’s turn the problem around and start digging in the noise.

There are two ways to do this:

treat the entire set of data as noise and recognize patterns that are interesting to us

prepare useful data for extraction from the background noise as we come across them

The first option calls for some sort of AI. While this is a viable solution I’d question its feasibility. I don’t see algorithms – no matter how complex they are – cover every single aspect of content recognition and interpretation.

For the second option I can show a very fitting example. In digital watermarking we’re hiding drops of information in a vast ocean of noise. In order to recover that tiny amount of data we have to make sure that it’s either or both

significantly more coherent than the background noise (coherent)

repeated over and over throughout different domains of the signal (redundant)

We can put the same concept in Web terms by connecting relevant content through user interaction.

Content mapping

There are a couple of attempts at using the crowd to add context to content: Google’s Promote button, Digg, Twitter lists just to name a few. It’s easy to see that these tools don’t connect content to content. They connect content to metadata which brings us back to the original problem. OWL, the language of the Semantic Web can be used to define connections indirectly via class connections, but this solution again favors the metadata domain.

Direct content to content connections are practically non-existing as of today except for online stores where articles refer to each other by a recommendation system. These connections are quite limited by the narrow niche and the very few and specific relations (also bought / also viewed / similar). Unquestionably, creating these connections on a grand scale is an enormous yet far more feasible a task than keeping entropy low. The good news is that tools like the ones mentioned above (Digg, Twitter) spread a completely new user behavior that will perfectly fit content mapping.

By defining a sufficiently rich set of relations in content connections, mapping will be machine readable. It won’t know that e.g. a certain text element does represent a book author as it would in a semantic solution, but through a series of connections it’s going to have implicit knowledge about it.

The “Google killer” cometh

Whatever is going to go in the footsteps of Google Search (perhaps a new Google Search?) it’s going to end the era of keywords. Ideally it’s going to feature strong content mapping induced by fundamentally changing online behavior mixed with light semantics. It will be dumb enough in terms of algorithmic complexity, yet smart enough to harness the collective intelligence and knowledge of content creators and consumers alike.

Updates

In Google abandons Search Andrew Orlowski elaborates on how real-time results and voting kill PageRank and through the generated noise and irrelevance pushes back the entire Internet into the chaos from which it emerged.

Nova Spivack tears down the hype encircling search engines in Eliminating the Need for Search by realizing how search is an “intermediary stepping stone” that’s ““in the way” between intention and action”. He lists a couple of solutions that aim to break out of the conventional search engine image, but in the end fail to bring about drastic change. Instead, he proposes the concept of “help engines” that supposedly help the user in a proactive way.

]]>https://collectiveweb.wordpress.com/2010/01/06/the-noisy-web/feed/1danstockerAdd to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to NewsvineClassifying Global Consciousnesshttps://collectiveweb.wordpress.com/2010/01/01/classifying-global-consciousness/
https://collectiveweb.wordpress.com/2010/01/01/classifying-global-consciousness/#respondFri, 01 Jan 2010 21:35:16 +0000http://collectiveweb.wordpress.com/?p=31

In the previous post I introduced a quantitative definition for global consciousness derived from the Collective Entity Space (CES). While quantitative analysis based on this definition helps to understand (or lay down) the fundamentals of a global consciousness, we need a more abstract frame of reference: a comprehensible system of criteria through which we can observe, classify or even design its perceivable characteristics.

Consciousness Classification Criteria (CCC)

I propose one such system that’s connected to the CES and consists of five components: communication, problem solving, maturity, usefulness and control.

Communication

How do we communicate with it?

It’s reasonable to expect a global consciousness to be able to communicate with other intelligent entities, including humans.

To find a communication channel to a global consciousness, it’s vital to understand its event – response mechanisms. One entity may offer several choices of various complexity and response time. For instance, a LovelockianGaia entity would take eons to respond, making communication with humans practically impossible due to non-matching time windows.

For communication to succeed the complexities and response times must agree on both sides.

Language

The mechanisms described above model the internal working of the ‘brain’. Based on our own experiences however, communication hardly occurs in a brain-to-brain fashion. There’s usually a series of abstraction layers on top forming an interface, through which flows a stream of information encoded by a mutually understood protocol.

In a human analogy the brain controls the diaphragm, vocal cords and tongue, and interprets audio signals captured by the tympanum. All this is accurately coordinated to form a structured system of voice patterns we call language.

How are the same kind of abstractions found or developed for / by a global consciousness? For instance, how do we manipulate world events in case of GCP in order to purposefully teach it something, or ask a question? Similarly, how do we arrange its correlation patterns into meaningful answers, acknowledgements, warnings or requests?

Problem solving

What problems does it solve?

Sometimes the existence of a global consciousness in itself solves certain problems. For instance, Twitter is today the medium of choice even when other means of news reporting are available. Natural disasters, protests against totalitarian regimes and unexpected events in general are repeatedly reported on Twitter for the first time due to its accessibility and real-time nature.

But that’s not the kind of problem solving my question refers to.

Assume we’ve overcome the difficulties of communication. There’s a collective intelligence with the combined (or entirely new level of) wisdom of the people (and perhaps the whole biosphere, atmosphere, lithosphere, etc.) of Earth, and we are able to conduct meaningful conversations. What sort of problems can it solve as a collective intelligence? What sort of questions can we ask from the global brain – Twitter?

It’s quite likely that the problem solving capabilities of a global consciousness restrict to a very narrow set of problems.

Maturity

How far developed is it?

I define two major stages in the development of a collective consciousness: preconscious (unrelated to the same term used in psychoanalysis) and postconscious development.

Preconscious development covers the process up to the point of gaining consciousness. During this stage components are probably still underdeveloped or too few, or the required technology, activeness or awareness has not yet arisen. The ‘brain’ is not yet functional at this point. Imagine a global ‘preconsciousness’ as a five dimensional blob forming and expanding in the Collective Entity Space.

After gaining consciousness development advances to the postconscious stage. For the sake of simplicity, I divide this one into two sub-stages: infancy and maturity. During infancy the collective consciousness is learning. Although it already has the capacity to communicate or solve problems, it’s lacking the necessary amount of information, experience and wisdom. As soon as those are obtained the collective consciousness reaches maturity. That is when its full potential is unleashed and becomes a true global consciousness.

Here’s an example for a lifeline.

Entity

Preconscious

Infancy

Maturity

Twitter

users are too few and interaction too slow

language being formed upon retweet patterns

communication using developed language

For global consciousnesses that presumably already exist, these stages are mostly unknown. For instance the Global Consciousness Project doesn’t investigate the internal mechanisms of the noosphere, only gathers evidence for its existence and studies its behavior. The same applies to the Gaia theory.

Even if the lifeline of a collective consciousness is known it’s still not necessarily clear which stage it is currently in. The Twitter example aptly demonstrates the connection between the obviousness of a lifeline and the fact that it is based on well-known technology.

Usefulness

What do we benefit from it?

Similar to problem solving, the usefulness of a global consciousness may originate in either a) its existence (native usefulness), or b) its conscious quality (conscious usefulness). But unlike at problem solving, both are equally important as both may have an impact on evaluation or design. Furthermore, a global consciousness can be beneficial on both individual and global scale.

The following examples show these conceivable benefits for three different entities.

Gaia

The Gaia entity as Earth is the only known stellar object in the universe to support life. Even if it does not consciously maintain the optimal circumstances for life, life does exist on Earth, and that’s certainly useful for every living thing on its surface.

Depending on the different views (strong Gaia vs. weak Gaia), a conscious Gaia facilitates co-evolution, or manipulates the physical environment for biologically optimal conditions. Both are advantageous on a global scale even though individual benefits are not so straightforward to see.

Gaia

Individual

Global

Native

individual life

life

Conscious

?

co-evolution / homeostasis

Global Consciousness Project

The GCP project is very much like eavesdropping on a global consciousness. We don’t know anything about the entity itself, we only hear it whispering and look for patterns. As long as we don’t get to know more, native usefulness remains a mystery on both global and individual level.

The conscious usefulness of GCP however, is exactly what the project is about. The correlation between world events (resonance in the noosphere) and REG output may help predict and avoid disasters.

GCP

Individual

Global

Native

?

?

Conscious

avoiding disasters

predicting disasters

Twitter – global brain

The native usefulness of Twitter is very easy to identify. It’s obvious to anyone who understands how and for what purpose it is being used. An individual may connect with friends, build a network, access information, keep in touch, and all that in real-time. There’s no consensus about its advantages, but it’s mostly agreed that it ignites personal and professional communities and increases online transparency.

The benefits of a conscious Twitter, or any real-time social network for that matter that, enhanced with automated agents, resemble a neural network are unknown as of today.

Twitter

Individual

Global

Native

access to news

emerging communities

Conscious

?

?

Again, it seems that the use of technology affects our knowledge of a global consciousness. As the examples indicate, information on native usefulness is more likely to be obtained when the entity relies on technology.

Control

Are we in control of it?

The word consciousness implies autonomous behavior. Without it the abilities, knowledge and wisdom of a global consciousness would not be exploitable. Still, I anticipate people to be afraid of a planet-size autonomous, conscious entity even if they are parts of it.

To that end I outline two ways to ‘neutralize’ a global consciousness.

Disabling its components by reducing the level of engagement or awareness below the entity’s threshold.

Destroying connections between components by restricting access to the technology or natural phenomena involved.

In quantitative terms, it’s a suitable transformation in the CES. There may be certain cases however, where such transformation is not possible. A global consciousness that doesn’t require active involvement nor awareness from its components and where connections are made on the quantum-level (GCP comes to mind), would require highly advanced (and thus currently unattainable) technology reaching down from individual consciousness to the most basic elements of matter.

Control and technology do too, seem connected. The more technology dependent a global consciousness is the more control we have over it.

Structure and goals

The five criteria making up the CCC are connected to each other, depend on each other. Communication is a prerequisite for problem solving and maturity. Usefulness refers to both. Communication implicitly, maturity, usefulness and control directly refer to the CES.

Beside these connections, criteria may also be categorized by bias. Communication, problem solving and maturity are unbiased as they are viewed from an objective perspective. Usefulness and control however, are concerns very specific to human society.

Maturity, usefulness and control are also connected by their reliance on technology. Through that we see how technology affects our knowledge about global consciousnesses.

We have much more control over entities that are technology-dependent.

The three points prove how much we still don’t understand beyond what’s already in our control. The CCC aims to uncover these blind spots in regard to global consciousness by asking the right questions.

]]>https://collectiveweb.wordpress.com/2010/01/01/classifying-global-consciousness/feed/0danstockerAdd to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine