google

Post navigation

Companies like Google and IBM are opening up services through APIs that will allow you to do things like check if an image contains adult/violent content, check to see what mood a face on a picture is in, or detect the language a piece of text is written in. Artificial Intelligence as a Service as it were (or maybe Machine Learning as a Service would be more appropriate).

So imagine building your product on top of these services. What happens if they start asking you to pay? Or if they censor particular types of input? Or if they stop existing? Where are the open alternatives that you can host yourself?

For anyone who likes logical Lego, the availability of these plug and play services means that in many cases you don’t have to worry about the base technology, at least to get a simple demo running. Instead, the creativity comes in the orchestration of services, and putting them together in interesting ways in order to do useful things with them…

I was at a full day about innovation at Mediaplaza in Utrecht today. We used a room that had a stage in the center and chairs on four sides around it. This is a bit weird as the speaker has to look in four directions to be able to connect with the audience. The funny thing is that it actualy works (also because there are four screens on each wall): each of the speakers could do nothing else than be dynamic on the stage.

Below my public notes on a few of the presentations:

Gijs van der Hulst, Business Development Manager at Google

The Wall Street Journal has done some research and found out that there has been an increase of 65% in how often top 500 companies mention the word “innovation” in their public documents in the last five years. Unfortunately the business practices of these companies have not really changed. How can you really effect change?

Google has nine “rules for innovation”:

Innovation, not instant perfection. Another way of saying this is “launch and iterate”: first push it to the market and then see if it is working.

Ideas come from everywhere. They can come from employees, but also from acquisitions or from outsiders.

A licence to pursue your dreams. An example of a 20% project that was very succesful is Gmail. This was started by somebody who didn’t like how email was working at the time.

Share as much information as you can. This is very different from most companies. The default for documents within the company is to share with everyone.

Users, users, users. At Google they innovate on the basis what users want, not on profit.

Data is apolitical. Opinions are less important than the data that supports them. They always seek evidence in the data to support their ideas. Personal note from me: Really? Really?? You cannot be serious!

Creativity love constraints. Their obsession with speed (with hard criteria for how quickly the interface has to react to user input) is an example of an enabler for many of their innovations.

You’re brilliant? We’re hiring. In the end it is about people and Google puts a lot of effort into making sure they have the right people on board.

Larger companies are more bureaucratic than smaller companies. Google is now more bureaucratic than it used to be. One of the ways this can be battled is by reorganizing which is exactly what Google has done recently.

Sean Gourley, Co-founder and CTO of Quid

Sean talked about our eye as an incredible machine with an incredible range. We enhanced our sight through microscopy and telescopy which opened up views towards the very small and the very big. We have yet to develop something that helps us see the very complex. He calls that “macroscopy”. For macroscopy you need:

big data

algorithms

visualization

He used this framing for his PhD work on understanding war. His team used publicly available information to analyze the war. When wikileaks leaked the US sig event database they could validate their data set and found that they had 81% coverage. His work was published in Science and in Nature. He decided to take it further though as he really wanted to understand complex systems. They needed to go from 300K in funding and 6 people towards an ambition level of about $100M and a 1000 people. He sought venture capital and had Peter Thiel as his first funder for Quid.

Sean then demoed the Quid software analyzing the term “big data”. Quid allows you to interactively play with the information. They extract entities from the information. So for example there are about 1500 companies involved in the big data space which can be put into different themes allowing you to see the connections between them while also sizing them for influence. Next was a fractal zoom into American Express where they looked at their patents portfolio and explored their IP creating a cognitive map of what it is that American Express does.

In 1997 Deep Blue changed the way we discussed artificial intelligence. We were beaten in chess by brute horsepower. As a reaction Kasparov started a new way of playing chess where you are allowed to bring anything you want to the chess table. The combination of human and machine turned out to be the best one. Gourley sees that as a metaphor for what he is trying to do with Quid: enhancing human cognitive capacity with machines, augmenting our ability to perceive this complex world.

Sean also talked about the adjacent possible: the way that the world could be if we used the pieces that are on the table right in front of you (e.g. the Apollo 13 Air Filter and duct tape).

His research on insurgents has taught him that some of them are successful and when they are, it is because of the following reasons:

Many groups

Internal Competition

Long Distance Connections

Reinforce Success

Fail

Shatter

Redistribute

Polly Summer, Chief Adoption Officer at Salesforce

Salesforce was recently recognized by Forbes as the most innovative company in the world. According to Polly the tech industry has significant innovations every 10 years. For each of these ten-year cycles the industry has 10 times more users.

Polly talked about how she used their social platform called Chatter to collaborate in a completely “flat” way. They now even use Chatter as a means to make the worldwide management offsite meeting radically transparent. The next step in the Chatter platform is to “gamify” it and let the individual contributors rise and recognize their contributions (they’ve acquired Rypple for example).

Agile is about maintaining innovation velocity and delivering at speed. The “prioritize, create, deliver, get feedback, iterate”-cycle needs to be sped up. One way of doing this is by listening to your customers as they are all a natural source for ideas. She showed a couple of examples from Starbucks and KLM:

Polly then shared an example of where Salesforce made a mistake: they announced a premium service that they wanted to charge extra for. Customers complained loudly on social media and within 24 hours they reversed their decision.

In 2000 they asked themselves the questions: Why isn’t all enterprise software like Amazon.com? Right now in 2011 they asked themselves a different question: Why isn’t all enterprise software like Facebook? She would consider 2011 the year of Social Revolution. Salesforce’s vision is that of a social enterprise: allowing the employee social network and the customer social network to connect (preferably in a single social profile).

On Fortune 500 Statoil rates first on social responsibility and seventh on Innovation.

Bjarte discussed the problems with traditional management. He used my favourite metaphor, traffic, comparing traffic lights to roundabouts. Roundabouts are more efficient, but also more difficult to navigate. A roundabout is values-based and a traffic light is rules-based. Roundabouts are self-regulating and this is what we need in management models too. He then touched on Theory X and Theory Y.

When you combine Theory X with a perception of a stable business environment you get traditional management (rigid, detailed and annual, rules-based micromanagement, centralised command and control, secrecy, sticks and carrots). If you perceive the business environment as stable and you have Theory Y your management is based on values, autonomy, transparency (can be an alternative control mechanism) and internal motivation. If you combine Theory X with a dynamic business environment you get relative and directional goals, dynamic planning, forecasting and resource allocation and holistic performance evaluation.

Finally, if you combine Theory Y with a dynamic business environment you get Beyond Budgeting.

Beyond Budgeting has a set of twelve principles (it isn’t a recipe, but more of an idea or a philosophy):

Governance and transparency

Values: Bind people to a common cause; not a central plan

Governance: Govern through shared values and sound judgement; not detailed rules and regulations

Transparency Make information open and transparent; don’t restrict and control it

Accountable teams

Teams: Organize around a seamless network of accountable teams; not centralized functions

Trust: Trust teams to regulate their performance; don’t micro-manage them

Accountability: Base accountability on holistic criteria and peer reviews; not on hierarchical relationships

When we combine these three things in a single number then we might run into its conflicting purposes. So the first step towards Beyond Budgeting is separating these three things. So for example the target is what you want to happen and the forecast is what you think will happen. The next step is to become more event driven rather than calendar driven.

Statoil has a programme called “Ambition to Action”:

Performance is ultimately about performing better than those we compare ourselves with.

Do the right thing in the actual situation, guided by the Statoil book, your Ambition to action, decision criteria & authorities and sound business judgement.

Within this framework, resources are made available or allocated case-by-case.

Business follow up is forward looking* and action oriented.

Performance evaluation is a holistic assessment of delivery and behaviour.

From strategic ambitions to KPIs (“Nothing happens just because you measure: you don’t lose weight by weighing yourself.”) and then into actions/forecasts and finally into individual or team goals.

Fosdem is the place where you’ll find a Google engineer who as a “full time hobby” is lead developer for WorldForge an open source Massive Multiplayer Online game, or where you have a beer with a developer who has a hard time finding a job, because all the code he write has to have a free software license: “you don’t ask a vegan to have a little bit of meat do you?”. It probably is the world’s biggest free software conference: More than 5000 people show up yearly in Brussels, there is no fee to attend and there is no registration process.

I really enjoy going because there are few other events that have this few barriers to attendance and to approaching the event the way you want to approach it. I like wondering around and thinking about how these are the people that actually keep the Internet working. Below some notes about the different talks that I attended (very little educational technology to be found, beware!).

Free Software: A viable model for Commercial Success

Robert Dewar from AdaCore had an interesting talk about how to use free software as a true commercial offering. There was no ideology in his talk but only a pure commercial perspective. They usually sell free software as “open source” and focus on convenience and utility in their selling proposition. They tell the customer they get the source code included without locks and with no limits on the number of installs.

The business model is based around subscriptions (for support, testing, etc.). What he really likes about that model is that the interests of them and the customer are fully aligned: they only make money when the customer renews. Often companies have to get used to asking for support though, they have not been “trained” to value support in the past.

He considers commercial versus open source a bogus distinction. In many ways he would consider AdaCore to be very similar to what Microsoft in what they do. The main difference is the license of the software. The AdaCore is much more permissive as you are allowed to copy and do with it what you want.

He also spent some time thinking about whether AdaCore’s approach would work with other companies. Could Microsoft open source Windows? He thinks they could without it affecting them badly: people would be willing to pay for timely updates and support. Could a games company open source their games? Copryright protection is one way they currently protect their very large investments. It might be hard for them to open source, but in general the model could be used much more widely. Every company is in the business of giving users what they want and open source licenses are that much more convenient for users.

A New OSI For A New Decade

Simon Phipps has joined the board of the Open Rights Group and the Open Source Initiative (OSI). He talked about reptiles: they have no morality and are very old and only react to fear and hunger. Corporates are reptiles too. Corporations don’t have ethics, people have ethics. OSI tried to find a way to show large organizations that the four software freedoms (use, study, modify and distribute) are important for them too. A pragmatic rather than a moral perspective on open source software helped the OSI to be able to get corporate involvement. Their initial focus was very much on licensing. They have been succesful: OSI has become the standard for open source in government and the fear around the term has been turned around: other processes are now appropriating the term.

We are now in a new decade: Open Source is the default and digital liberty is moving to centre stage. OSI has lost some of its relevance, so they decided to reinvigorate the organization with a member-based governance which should include all stakeholders. They now have new affiliates (other open source non-profits like Mozilla or Drupal) and the next stage will be government bodies and non-entities (whatever that might mean). Later they will get personal associates and then corporate patrons. All of this should enable a bottom-up governance. Members will decide how OSI will operate, they will create OSI initiatives, they can use OSI as a policy venue and they will co-ordinate initiatives locally and globally.

A new OSI project will try and help educators educate the world about open source: FLOSSBOK. I am personally not sure the world is waiting for another project like this. There are quite a few alternatives already.

Mozilla Devroom

Tristan Nitot, Principal Mozilla Evangelist kickstarted the Mozilla Devroom. He told us that six European organisations have gotten significant grants from Mozilla (one of them being Fosdem). Mozilla strives to create an Internet that is benefiting everyone. The Internet that is being built currently does not benefit everyone. He focused on a couple of trends on the net:

App Stores have good sides (app discovery and monetization), but also very bad sides: they create vendor lock-in and prevent people from switching platform (I have personally felt this when contemplating switching away from the iOS platform) and occasionally inhibit free speech through “censorship”. Mozilla believes you can get the good of the app stores without the bad.

Social networks have obvious good sides, but also profile users, prevent users from porting their data to other services and identity providers can even lock people out of their digital lives. Using Facebook is ok, but don’t use it exclusively to interact with others. When you use something for free, then you can assume that you are the products. He showed us a great cartoon about Facebook users:

The "Free" Model by Geek&Poke

Newer devices (tablets, smartphones and netbooks) are increasingly convenient and popular. Very often they force users to a specific browser (e.g. Chrome on the Chromebook or Safari on iOS) making them definition the opposite of the web.

What is Mozilla doing about these things:

Open Web Apps are based on open web technologies, cross-browser and available in multiple app stores. You can even host your own apps on your websites for others to install in their browser. WebRT brings this a step further. It is a runtime for web applications that makes web apps look and feel like native apps on multiple platforms. Things like a Media Capture API will really change what is possible to do with Javascript in a browser. Other surprising APIs are the Battery API, the WebNFC (Near Field Communications) API and the Vibration API(!). More documentation is available here

They are trying to solve identity in a decentralized, browser agnostic and privacy respecting way. The codename for the project is BrowserID and it is based on using email addresses to provide identity.

In my book these three projects (especially the last one) make Mozilla a group of absolute heroes. Donate here!

There was an interesting talk about how Mozilla organizes its own IT services. Currently that is done by paid staff, but they strongly believe they can get this done through the community (MediaWiki does something similar.

Kai Engert talked about a very important topic: “Web security, and how to prevent the next DigiNotar“. He has a let’s say “unconventional” presentation style: instead of slides he used a piece of written text that he displayed on the screen and read out loud. Maybe this should be called something like “live visual podcasting”. His points were good though. He explained how it is a problem that every Certificate Authority (CA) has unlimited power and he listed the alternatives. You could maybe use a web of trust like the CAcert community. This still doesn’t solve the problem of a single root key. Another proposed solution was Convergence using notaries that would monitor certificates. Kai see too many problems with this as a solution for general users. One suggestion could be build on top of DNSSEC. Again that has problems. How do you know who has signed the the DNS? Google has also proposed something called Certificate Transparency which might work, but also might create some problems. His proposed solution builds on what is in existence using the existings CA combined wit the notary system. This talk was bit dense (I got lost half way if I am honest, obsessibely reading Megan Amram), so if you want to read it yourself find it here.

Michelle Thorne is the global event strategist for Mozilla. She is currently very focused on creating communities of “webmakers” and they are starting with children, video makers and journalists first. She presented three tools/projects for these webmakers:

Hackasaurus let’s anybody edit the web. Kids are suddenly empowered to remix existing web pages. Check out the hacktivity kit if you want to use this in the classroom.

Popcorn.js is a HTLM5 media framework that allows you to connect web content with video.

OpenNews (formerly called knight-mozilla) puts web developers in newsrooms building tools that help journalistic challenges.

One thing I noticed is that she used htmlpad to present a few slides. I need to check this out as it is probably one of the simplest ways of collaborating around text or getting a quick HTML page online.

The focus for Mozilla in Fosdem is very much on the technology side of things and less on the broader themes that the Mozilla foundation is tackling. I had a hard time finding somebody from the Mozilla Learning team to talk about Open Badges, but did get some good connections to have this conversation later in the year.

Wikiotics

Wikiotics did a very short lightning talk of which I only managed to catch the tail end. Their goal is to make a site that allows anybody to create, update, remix interactive language lessons.

The Pandora

The Pandora is a small Nintendo DS sized open Linux computer designed for gaming. It has a 800×480 touchscreen, wifi, bluetooth, two SDHC card slots, SVideo output, two analogue controllers, a DPad, L/R buttons, a QWERTY thumb keyboard, 256/512MB RAM and 512MB NAND Storage. It has about 10 hours of battery life (full use).

It comes with its own repository (an app store) allowing for easy installation and updating of games and other applications. One thing that will appeal to many people is the amount of emulators that it can run. If you want to relive the days you spent on the Amiga 500, Commodora 64, Apple II or the Atari ST it will work for you.

Because the device is so open, the possibilities are limitless. For example, you could connect a keyboard and mouse using a USB hub and connect it to a TV to turn the Pandora into a small desktop PC or connect a USB harddisk and turn it into a web- or fileserver. The price price will be €375 (ex VAT). What is great is that the device is produced in Germany and so does not have any sick labour conditions for the people building it.

Balancing Games, The Open Source Way

Jeremy Rosen has been working on Battle for Wesnoth, a turn-based strategy game, since 2004. He talked about how to achieve balance in a game. When you are talking about multiplayer balance:

No match should be decided by the matchup

No match should be decided by the chosen map

The best player should win… usually

Single player balance is different, in single player game fairness is not important anymore, it is just about having fun:

The AI won’t complain if the game is unfair (Jeremy on the AI: “By the way our AI doesn’t cheat, but is very good in math”)

Players want the game to be challenging

Each player has different capacities, we need to decide who we balance for

Balance problems can occur in many places (e.g. map balance, cross scenario balance, unit characteristics) and aren’t easy to find. One way of finding them is by organizing tournaments as people will do their best to exploit balance weaknesses to win. Balance will always be a moving target and new strategies will appear. User feedback is not so useful because players think they never make mistakes and that all their strategies should work. Sometimes you can find some good providers of feedback: “These persons are important, and like all of us, they are fueled by ego. Don’t forget to fuel them”.

His recommendation is to find somebody in your game’s community who can make a balance a fulltime job.

Freedom Box: Out of the Box!

The FreedomBox Foundation

Bdale Garbee, gave us an update on the activities at the FreedomBox Foundation. According to him it really is a problem that we are willfully hand over a lot of personal data to companies to manage on our behalf without thinking much about the consequences. Regardless of the intention of companies, for-profit companies have to operate within the rules of the jurisdictions that they operate and can lead to things like Photo DNA.

Freedombox’ vision is to create a personal server running a free software operating system and applications designed to create and preserve personal privacy that should run on cheap, power-efficient plug computers that people can install in their own homes. That will then be a platform on which privacy-respecting federated alternatives to current social networks can be build. These devices will probably be mesh-networked to augment or replace the current infrastructure.

The foundation has to do four things:

Technology

User Experience (this is very important if it is going to be useful for people who are not “geeks”)

Publicity and Fund-Raising

Industry Relations

They have had to bound the challenge by focusing on software, rather than custom hardware and on servers and services rather than client devices. They have also decided to use existing networking infrastructure where appropriate while working to move away from central infrastructure control points (like the Domain Name System (DNS)). Another decision has been to build all elements of their reference implementation on top of Debian which is a completely open volunteer based International organisation. This means that regardless of how successful they will be as a foundation all of their work will survive and remain available. Their goal is that new stable releases of Debian should have everything needed to create FreedomBoxes “out of the box”.

The first “application” they want to deliver is a secure chat service. They have based this on XMPP with Prosody on a single host (by chance I was sitting next to one of the Prosody developers).

They have also decided to make OpenPGP (GnuPG) keys as the root of trust. It is great technology, but it is hard to establish initial trust relationships. One interesting idea is to take advantage of smartphone technology (that we all walk around with) to facilitate initial key exchange (see the work from Stefano Maffuli).

They have done some investigations into plug computers. They focused mostly on the Dreamplug (which gave them quite a bit of GPL related headaches), but you also have the Sheeva and the Tonido.

He finished his talk by quoting Benjamin Franklin:

They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.

What I should have written last year: distributed and federated systems

There is an overarching trend at Fosdem that I could already see last year: the idea of decentralisised, distributed and federated systems for social networking and collaboration. There is a whole set of people working on creating social networks without a center (e.g. BuddyCloud or Status.net or distributed filesystems (like OpenAFS), alternatives to GoogleDocs (LibreDocs) and mesh networking (like Village Telco with the Mesh Potato). There are even people who are trying to separate cloud storage from the cloud application (Project Unhosted). These are very important project that have my full attention.

If you have reached this far in the post and still want to read more (with a little bit more of a learning perspective) then you should check out Bert De Coutere’s blogpost. Through him I learned about Open Advice, an interesting approach to capturing lessons learned.

Two weeks ago I visited Learning Technologies 2011 in London (blog post forthcoming). This meant I had less time to write down some thoughts on Lak11. I did manage to read most of the reading materials from the syllabus and did some experimenting with the different tools that are out there. Here are my reflections on week 3 and 4 (and a little bit of 5) of the course.

The Semantic Web and Linked Data

This was the main topic of week three of the course. Basically the semantic web has a couple of characteristics. It tries to separate the presentation of the data and the data itself. It does this by structuring the data which then allows linking up all the data. The technical way that this is done is through so-called RDF-triples: a subject, a predicate and an object.

Although he is a better writer than speaker, I still enjoyed this video of Tim Berners-Lee (the inventor of the web) explaining the concept of linked data. His point about the fact that we cannot predict what we are going to make with this technology is well taken: “If we end up only building the things I can imagine, we would have failed“.

The benefits of this are easy to see. In the forums there was a lot of discussion around whether the semantic web is feasible and whether it is actually necessary to put effort into it. People seemed to think that putting in a lot of human effort to make something easier to read for machines is turning the world upside down. I actually don’t think that is strictly true. I don’t believe we need strict ontologies, but I do think we could define more simple machine readable formats and create great interfaces for inputting data into these formats.

Microformats: where are the learning related ones?

These formats actually already exist and they are called microformats. Examples are hCard, hCalendar and hReview. These formats are simple and easy to understand and are created in a transparent and open process. Currently it does require some understanding of how these formats work to be able to use them, but in the near future this functionality will be build into the tools that we use to publish to the web. So just by filling in a little form about yourself you would be able to create an editable piece of text with an embedded hCard microformat.

So where are the learning related formats? I think it would be great to have small microformats that can describe a course or a learning object. I am aware of Dublin Core and IEEE LOM as ways of describing content, but these are a bit too complex (and actually do mix data and presentation is some weird way). Is anybody aware of initiatives to create some more simple formats? Are they built into any existing learning-related products?

Thinking about this has inspired me to add two microformats to my blog. The little text about me now contains machine readable hCard information and the license at the bottom of the sidebar is now machine readable too (using rel=”license”). I will also start to work on building my resume into the hResume format and publish it on my site. Check http://www.hansdezwart.info/qr in a couple of weeks to see how I have been getting on.

Use cases for analytics in corporate learning

Weeks ago Bert De Coutere started creating a set of use cases for analytics in corporate learning. I have been wanting to add some of my own ideas, but wasn’t able to create enough “thinking time” earlier. This week I finally managed to take part in the discussion. Thinking about the problem I noticed that I often found it difficult to make a distinction between learning and improving performance. In the end I decided not to worry about it. I also did not stick to the format: it should be pretty obvious what kind of analytics could deliver these use cases. These are the ideas that I added:

Portfolio management through monitoring search terms
You are responsible for the project management portfolio learning portfolio. In the past you mostly worried about “closing skill gaps” through making sure there were enough courses on the topic. In recent years you have switched to making sure the community is healthy and you have switched from developing “just in case” learning intervention towards “just in time” learning interventions. One thing that really helps you in doing your work is the weekly trending questions/topics/problems list you get in your mailbox. It is an ever-changing list of things that have been discussed and searched for recently in the project management space. It wasn’t until you saw this dashboard that you noticed a sharp increase in demand for information about privacy laws in China. Because of it you were able to create a document with some relevant links that you now show as a recommended result when people search for privacy and China.

Social Contextualization of Content
Whenever you look at any piece of content in your company (e.g. a video on the internal YouTube, an office document from a SharePoint site or news article on the intranet), you will not only see the content itself, but you will also see which other people in the company have seen that content, what tags they gave it, which passages they highlighted or annotated and what rating they gave the piece of content. There are easy ways for you to manage which “social context” you want to see. You can limit it to the people in your direct team, in your personal network or to the experts (either as defined by you or by an algorithm). You love the “aggregated highlights view” where you can see a heat map overlay of the important passages of a document. Another great feature is how you can play back chronologically who looked at each URL (seeing how it spread through the organization).

Data enabled meetings
Just before you go into a meeting you open the invite. Below the title of the meeting and the location you see the list of participants of the meeting. Next to each participant you see which other people in your network they have met with before and which people in your network they have emailed with and how recent those engagements have been. This gives you more context for the meeting. You don’t have to ask the vendor anymore whether your company is already using their product in some other part of the business. The list also jogs your memory: often you vaguely remember speaking to somebody but cannot seem to remember when you spoke and what you spoke about. This tools also gives you easy access to notes on and recordings of past conversations.

Automatic “getting-to-know-yous”
About once a week you get an invite created by “The Connector”. It invites you to get to know a person that you haven’t met before and always picks a convenient time to do it. Each time you and the other invitee accept one of these invites you are both surprised that you have never met before as you operate with similar stakeholders, work in similar topics or have similar challenges. In your settings you have given your preference for face to face meetings, so “The Connector” does not bother you with those video-conferencing sessions that other people seem to like so much.

“Train me now!”
You are in the lobby of the head office waiting for your appointment to arrive. She has just texted you that she will be 10 minutes late as she has been delayed by the traffic. You open the “Train me now!” app and tell it you have 8 minutes to spare. The app looks at the required training that is coming up for you, at the expiration dates of your certificates and at your current projects and interests. It also looks at the most popular pieces of learning content in the company and checks to see if any of your peers have recommended something to you (actually it also sees if they have recommended it to somebody else, because the algorithm has learned that this is a useful signal too), it eliminates anything that is longer than 8 minutes, anything that you have looked at before (and haven’t marked as something that could be shown again to you) and anything from a content provider that is on your blacklist. This all happens in a fraction of a second after which it presents you with a shortlist of videos for you to watch. The fact that you chose the second pick instead of the first is of course something that will get fed back into the system to make an even better recommendation next time.

Using micro formats for CVs
The way that a simple structured data format has been used to capture all CVs in the central HR management system in combination with the API that was put on top of it has allowed a wealth of applications for this structured data.

There are three more titles that I wanted to do, but did not have the chance to do yet.

Using external information inside the company

Suggested learning groups to self-organize

Linking performance data to learning excellence

Book: Head First Data Analytics

I have always been intrigued by O’Reilly’s Head First series of books. I don’t know any other publisher who is that explicit about how their books try to implement research based good practices like an informal style, repetition and the use of visuals. So when I encountered Data Analysis in the series I decided to give it a go. I wrote the following review on Goodreads:

The “Head First” series has a refreshing ambition: to create books that help people learn. They try to do this by following a set of evidence-based learning principles. Things like repetition, visual information and practice are all incorporated into the book. This good introduction to data analysis, in the end only scratches the surface and was a bit too simplistic for my taste. I liked the refreshers around hypothesis testing, solver optimisation in Excel, simple linear regression, cleaning up data and visualisation. The best thing about the book is how it introduced me to the open source multi-platform statistical package “R”.

Learning impact measurement and Knowledge Advisers

The day before Learning Technologies, Bersin and KnowledgeAdvisors organized a seminar about measuring the impact of learning. David Mallon, analyst at Bersin, presented their High-Impact Measurement framework.

Bersin High-Impact Measurement Framework

The thing that I thought was interesting was how the maturity of your measurement strategy is basically a function of how much your learning organization has moved towards performance consulting. How can you measure business impact if your planning and gap analysis isn’t close to the business?

Jeffrey Berk from KnowledgeAdvisors then tried to show how their Metrics that Matter product allows measurement and then dashboarding around all the parts of the Bersin framework. They basically do this by asking participants to fill in surveys after they have attended any kind of learning event. Their name for these surveys is “smart sheets” (an much improved iteration of the familiar “happy sheets”). KnowledgeAdvisors has a complete software as a service based infrastructure for sending out these digital surveys and collating the results. Because they have all this data they can benchmark your scores against yourself or against their other customers (in aggregate of course). They have done all the sensible statistics for you, so you don’t have to filter out the bias on self-reporting or think about cultural differences in the way people respond to these surveys. Another thing you can do is pull in real business data (think things like sales volumes). By doing some fancy regression analysis it is then possible to see what part of the improvement can be attributed with some level of confidence to the learning intervention, allowing you to calculate return on investment (ROI) for the learning programs.

All in all I was quite impressed with the toolset that they can provide and I do think they will probably serve a genuine need for many businesses.

The best question of the day came from Charles Jennings who pointed out to David Mallon that his talk had referred to the increasing importance of learning on the job and informal learning, but that the learning measurement framework only addresses measurement strategies for top-down and formal learning. Why was that the case? Unfortunately I cannot remember Mallon’s answer (which probably does say something about the quality or relevance of it!)

Experimenting with Needlebase, R, Google charts, Gephi and ManyEyes

The first tool that I tried out this week was Needlebase. This tool allows you to create a data model by defining the nodes in the model and their relations. Then you can train it on a web page of your choice to teach it how to scrape the information from the page. Once you have done that Needlebase will go out to collect all the information and will display it in a way that allows you to sort and graph the information. Watch this video to get a better idea of how this works:

I decided to see if I could use Needlebase to get some insights into resources on Delicious that are tagged with the “lak11” tag. Once you understands how it works, it only takes about 10 minutes to create the model and start scraping the page.

I wanted to get answers to the following questions:

Which five users have added the most links and what is the distribution of links over users?

Which twenty links were added the most with a “lak11” tag?

Which twenty links with a “lak11” tag are the most popular on Delicious?

Can the tags be put into a tag cloud based on the frequency of their use?

In which week were the Delicious users the most active when it came to bookmarking “lak11” resources?

Imagine that the answers to the questions above would be all somebody were able to see about this Knowledge and Learning Analytics course. Would they get a relatively balanced idea about the key topics, resources and people related to the course? What are some of the key things that would they would miss?

Unfortunately after I had done all the machine learning (and had written the above) I learned that Delicious explicitly blocks Needlebase from accessing the site. I therefore had to switch plans.

The Twapperkeeper service keeps a copy of all the tweets with a particular tag (Twitter itself only gives access to the last two weeks of messages through its search interface). I manage to train Needlebase to scrape all the tweets, the username, URL to user picture and userid of the person adding the tweet, who the tweet was a reply to, the unique ID of the tweet, the longitude and latitude, the client that was used and the date of the tweet.

I had to change my questions too:

Which ten users have added the most tweets and what is the distribution of tweets over users?
This was easy to get and graph with Needlebase itself:

Top 11 Lak11 Twitter Users

I personally like treemaps for this kind of data, so I tried to create one in IBM’s ManyEyes. Unfortunately they seem to have some persistent issues with their site:

ManyEyes error message

Which twenty links were added the most with a “lak11” tag? Another way of asking this would be: which twenty links created the most buzz?
This was a bit harder because Needlebase did not get the links for me. I had to download all the text into a text file and use some regular expressions to get a list of all the URLs in the tweets. 796 of the 967 tweets had a URL (that is more than 80%), 453 of these were unique. I could then do some manipulations in a spreadsheet (sorting, adding and some appending) to come up with a list. Most of these URLs are shortened, so I had to check them online to get their titles. This is the result:

One problem I noticed is that two of the twenty results were the same URL with a different shortened URLs (the link to the Moodle course and to the Paper.li paper): URL shorteners make the web the more difficult place in many ways.

What other hashtags are used next to Lak11?
Here I used a similar methodology as for the URLs. In the end I had a list of all the tags with their frequencies. I used Wordle and ManyEyes to put them into tag clouds:

Wordle Lak11 Hashtags

ManyEyes Lak11 Hashtags

Also compare them to tag clouds of the complete texts of the tweets (cleaned up to remove usernames, “RT”, “Lak11” URLs and the # in front of the hash tags):

Wordle Lak11 Tweets Texts

ManyEyes Lak11 Tweets Texts

Which one do you find more insightful? I personally prefer the latter one as it would give somebody who knows nothing about Lak11 a good flavor of the course.

How are the Tweets distributed over time? Is the traffic increasing with time or decreasing?
I decided to just get a simple list of days with the number of tweets per day. As an exercise I wanted to graph it in R. These are the results:

Tweets per day

I couldn’t learn anything interesting from that one.

Imagine that the answers to the questions above would be all somebody were able to see about this Knowledge and Learning Analytics course. Would they get a relatively balanced idea about the key topics, resources and people related to the course? What are some of the key things that would they would miss? If you would automate getting answers to all these question (no more manual writing of regex!) would that be useful for learners and facilitators?
I have to say that I was pleasantly surprised by how fruitful the little exercise with getting the top 20 links was. I really do believe that these links capture much of the best materials of the first couple of weeks of the course. If you would use the Wordle as the single image to give a flavour of the course and then point to the 20 URLs and get the names of the top Twitterers, than you would be off to badly.

Another great resource that I re-encountered in these weeks of the course was the Rosling’s Gapminder project:

Google has acquired some part of that technology and thus allows a similar kind of visualization with their spreadsheet data. What makes the data smart is the way that it shows three variables (x-axis, y-axis and size of the bubble and how they change over time. I thought hard about how I could use the Twitter data in this way, but couldn’t find anything sensible. I still wanted to play with the visualization. So at the World Bank’s Open Data Initiative I could download data about population size, investment in education and unemployment figures for a set of countries per year (they have a nice iPhone app too). When I loaded that data I got the following result:

Click to be able to play the motion graph

The last tool I installed and took a look at was Gephi. I first used SNAPP on the forums of week and exported that data into an XML based format. I then loaded that in Gephi and could play around a bit:

Week 1 forum relations in Gephi

My participation in numbers

I will have to add up my participation for the two (to three) weeks, so in week 3 and week 4 of the course I did 6 Moodle posts, tweeted 3 times about Lak11, wrote 1 blogpost and saved 49 bookmarks to Diigo.

The hours that I have played with all the different tools mentioned above are not mentioned in my self-measurement. However, I did really enjoy playing with these tools and learned a lot of new things.

Arjen Vrielink and I write a monthly series titled: Parallax. We both agree on a title for the post and on some other arbitrary restrictions to induce our creative process. In our previous post we tried to argue whether you could engineer serendipity. The conclusion was: no, you cannot engineer serendipity (on the web). In this post we use the same recipe to investigate the corollary: the (social) web is hindering serendipity by clustering and clumping similar information around our web presence based on our online behaviour (e.g. the social graph). You can read Arjen’s post with the same title here.

In my teens I went to a Montessori high school in Amsterdam Zuid. The school is known for its liberal and cultural approach to education. My friends and I all thought we were free thinkers and radicals. It was therefore quite a shock to me when I learned at the college for PE teacher education that not all people had the “VPRO gids” at home and read the “Volkskrant”. It suddenly dawned on me how silo-ed my experience at high school had been and how similar we all were in our drive to be different. Occasionally I get the feeling that I am in a very similar position in my current educational technology profession.

The current toolset on the web helps us find people that are like ourselves, recommends us books that are similar to the ones we have already read and amplifies our existing opinions by aligning them to people who think the same as us. There are no tools to do the opposite: find people who are very different from you or content that gives new perspectives. In this post I would like to give a couple of examples of how the web helps in turning us into mussels (sessile animals that like being close to each other).

Example 1: The concept of RSS and Google Reader
Every day I spent 30 to 60 minutes reading my news feeds through Google Reader. I have subscribed to over 300 feeds and try to not miss any news items from about 100 of them. These feeds are very specific (one of the affordances of RSS is that it can easily be generated based on tags or search words). None of them carry general world news. Instead of reading the Guardian’s most important world news, I read the Guardian news that is tagged with Royal Dutch Shell. Instead of general feeds about the state of education and learning I read the posts of certain learning gurus. This means that on my Google Reader news from the last couple of days there was no way for me to encounter the release of Aung San Suu Kyi (I only learned about it by looking it up just now), whereas I read about Facebook’s new messaging system at least three different times (here, here and here) with very similar perspectives each time.

Google is also willing to suggest some new feeds for me to subscribe to. As of today the first four suggested sources that Google gives me are as follows:

Google's first recommendations

More of the same! Wouldn’t it be way more beneficial for me to be confronted with people, opinions and news that is very different from the things I already know? It seems like there isn’t enough semantic understanding of the things that I am reading to be able to tell me: “You always read news about Shell on the Guardian, the Financial Times usually has a very different perspective”. How far off do you think we are before that becomes a reality?

Example 2: Amazon suggestions

Amazon recommends the book I am already reading

Amazon was one of the first companies that made use of its customer’s behaviour to improve the service to that same customer. When you browse at Amazon they track everything, not just your purchases, but also your browsing history, the links you click, the reviews you read and write, the books you don’t buy and probably how much time you spend doing each of these things. They use this data and correlate it with other people’s data to be able to suggest a couple of books that should interest you.

I haven’t bought at Amazon for a while (I now buy my books at Book Depository as they ship for free), but my current suggestions do include titles like Drive (which I am reading right now), Free and Growing Up Digital (and many other similar titles that I have already read). These books increase my specialization in the field of Internet and educational technology. There is no way for me to try and find books on Amazon that can function as a bridge to other genres.

There also is no way to really browse serendipitously. Like RSS, the categorization of the books is incredibly specific. Much more than in a traditional book store. On Amazon I would be able to go to one of my favourite subjects cognitive psychology (finding more than 8000 titles), whereas in a book store I would have to go to “popular science”. The latter forces me to run into books in fields of science that I wouldn’t usually look at. A book shelve also has a nicer (and faster!) browsing experience: running with a finger past all the books, taking one out and quickly scanning its contents all do not work on Amazon.

Example 3: Anglo-Saxon focus through the English language and through Silicon Valley based innovation
Silicon valley seems to be a village. I listen to Leo Laporte’s podcasts (e.g. This Week in Tech), read TechCrunch, Mashable and ReadWriteWeb and am inundated with news about Facebook, Google, Apple, Microsoft and mobile phone carriers in the US. A lot of the web technology innovation is indeed driven by companies in Silicon valley and innovative start-ups from all over the world flock to California to be successful (see here for an example). But it does leave me wondering whether I am not missing out on a large part of the technium by not being able to read Japanese, Mandarin, German, etc. Through Western (English) media I have learned that Japan has a very specific mobile phone culture. But in all ways I am completely disconnected from it.

To experience how true this is, I would like you to do the following assignment: Use Google to try and find three sites in Japanese about technology culture. Let me know in the comments how that went…