Frederik Durant's .data blogJekyll2018-10-06T11:46:28+02:00/Frederik Durant//blog/karma-points-on-the-commute2018-10-06T13:45:00+02:002018-10-06T13:45:00+02:00Frederik Durant<p>After twenty years of splendid isolation in despicable company cars, I
recently joined the ranks of the daily commuters. I’m not a
big fan of loyalty cards, but still: in no time have my green and
social karma scores reached an all-time high. Wholly undeserved, I
must admit.</p>
<p>Because, I ask you: what could possibly be more delightful than
spending thirty minutes in the intimate presence of perfect strangers — <em>twice a day?</em>
The daily journey from my flat to work and back is <em>such</em> a rewarding
experience — it would make the most ardent atheist start begging for
reincarnation.</p>
<figure>
<img src="/images/wonderful-world-offpeak-hp_640x520.jpg" alt="Tube Heaven" />
<figcaption>Commuting in Heaven. Source: <a href="https://tfl.gov.uk">TfL</a></figcaption>
</figure>
<p>Now, not sharing an opportunity for cosmic harmony would be utterly
selfish. Therefore, at the risk of preaching to the choir, allow me
to spread the Word of the <a href="https://tfl.gov.uk">Tube</a>.</p>
<p>It all starts at the Gates.</p>
<p>Like spermatozoa racing towards the ovum, my fellow
commuters and I swiftly maneuver our way through the
masses. Whilst looking for our <a href="https://tfl.gov.uk/fares-and-payments/oyster">Oyster card</a> — how
aptly named! — we desperately try to avoid the
inevitable collision. Where are the roundabouts when you need them?
Anyway, with ever growing anticipated joy, we get in line. One
redemptive blink of the green light, and we wrestle ourselves through
the Gates. Kind of a <a href="https://www.huffingtonpost.co.uk/entry/jade-eggs-vagina-goop_us_588641dbe4b096b4a2335935">jade egg</a> experience, <a href="https://www.theguardian.com/film/2018/sep/05/gwyneth-paltrow-goop-to-pay-out-over-unproven-health-benefits-of-vaginal-eggs">so I’m
told</a>, but at a mere fraction of the cost.</p>
<figure>
<img src="/images/eggs.jpg" alt="Eggs" />
<figcaption>Free-range eggs, as laid by free-range poultry.</figcaption>
</figure>
<p>Down on the platform, the brilliant self-organising
spectacle continues as we queue up once more, now neatly in <em>double</em>
lines. Spurred by motivational messages from the station manager
— “Have an amazing day!” — and carefully <a href="https://en.wikipedia.org/wiki/Mind_the_gap">minding the Gap of
Death</a>, I finally make it onto the Train of Life. This is what birth must
have felt like, had I paid a bit more attention.</p>
<p>Once inside, spontaneous Chants of Praise screaming from the
neighbors’ headphones fill my ears with enchantment. Tears of joy follow
when, like a Sign from Above, the aircon blows away any remaining
doubts and worries. How wonderful to sneeze as I feel the Breeze of
Freeze, right there, down my neck.</p>
<p>While reaching out for the yellow handle to stay on my feet, I generously
offer my vertically challenged co-passenger an olphactory glimpse into
my Axe-sprayed armpit. Gratefully, she retaliates by spilling hot
coffee on my trousers. So much human warmth, you can’t imagine.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/yi__tc23plg" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen=""></iframe>
<p>Fortunately, at each station, more and more passengers get stuffed
onto the Godly Carriage. There definitely is a place for <em>everyone</em> on
the Train of Eternity. #YouToo, #MeToo, the more the merrier.</p>
<p>Alas, just as Nirvana comes within reach, the Heavenly Horses
approach my final destination — at least in this life. Sweatily stumbling
over three suitcases and a drunk, I fall face flat on the floor. As I
crawl my way out through the Doors of Illumination into the darkness
on <a href="https://en.wikipedia.org/wiki/Moorgate_station">Moorgate’s</a> platform, I jealously look up, straight into
the new entrants’ eyes. Such hope, such happiness, such health!</p>
<p>Damn, where’s my Oyster card?</p>
<p><a href="/blog/karma-points-on-the-commute/">Karma points on the commute</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on October 06, 2018.</p>/blog/london-calling-or-is-it-europe2018-09-06T15:45:00+02:002018-09-06T15:45:00+02:00Frederik Durant<p>In the winter of 1990-1991, while studying computer science and <a href="https://en.wikipedia.org/wiki/Natural_language_processing">natural
language processing</a> at the <a href="https://www.essex.ac.uk">University of Essex</a>, I spent my first
weekend ever in London. <a href="https://en.wikipedia.org/wiki/Margaret_Thatcher">Margaret Thatcher</a>’s 15-year service as Prime Minister
in Her Majesty’s Government had just come to a dramatic end.</p>
<p>The Iron Lady left Downing Street just a year after <a href="https://www.youtube.com/watch?v=b8GzptqhT68">the Iron Curtain
fell</a>. Back in the idyllic <a href="https://goo.gl/maps/4h9sCVxy73m">Ardleigh Park Lodge</a> in Essex, I saw my
German housemate <a href="https://www.linkedin.com/in/matthias-jäschke-7394352a/">Matthias</a> cast his vote in the first
all-German multi-party elections since <a href="https://en.wikipedia.org/wiki/1933_in_Germany">1933</a>. The other student in the
house was <a href="https://www.linkedin.com/in/annemariemineur/">Anne-Marie</a>, nowadays Member of the
European Parliament for the <a href="https://www.sp.nl">Dutch Socialistische Partij</a>.</p>
<p>We were young, we were Europeans in England, and the times were historic.</p>
<p>And then, a quarter century later, there was Brexit.</p>
<p>True, I may be skipping a few life events here, like: getting
married and daughters; buying a house; starting and ending a dozen of
jobs. The fact is: like Marcel Proust’s mind wandered off to his
madeleines, Brexit threw me back to that snowy winter.</p>
<p>And now, half a year before the United Kingdom is set to leave the
European Union, I’m going back. Back to England. Back to London!</p>
<p>What’s more: not just on a visit.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/-Gp6lYQrqSE" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen=""></iframe>
<p>Next Monday, I start as Sr. Data Scientist
at <a href="https://aire.io">Aire Labs</a> in <a href="https://en.wikipedia.org/wiki/East_London_Tech_City">Shoreditch</a>, just north of <a href="https://en.wikipedia.org/wiki/City_of_London">the City of
London</a>. Aire is, in its own words, “a new
credit assessment service” that helps “lenders make more informed
decisions, and borrowers get fairer access to credit”.</p>
<p>I will be commuting from Brussels to London on a weekly basis
(Monday-Friday). Weekends will be spent at home, with wife and daughters, as
usual.</p>
<p>I sometimes dream of meeting the Brexit negotiators —my fellow
travellers— on the Eurostar. In the lounge or the restaurant
coach, I would tell them this story. And invite them for a beer. Or,
in my case, a <a href="https://en.wikipedia.org/wiki/Pint_glass">pint</a> of semi-skimmed milk.</p>
<p>For old times’ sake.</p>
<p><a href="/blog/london-calling-or-is-it-europe/">London calling. Or is it Europe?</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on September 06, 2018.</p>/blog/truisms-to-counter-company-culture-traps2018-07-26T18:00:00+02:002018-07-26T18:00:00+02:00Frederik Durant<h1 id="problem-to-solve-systemic-inertia">Problem to solve: systemic inertia</h1>
<p>Anyone who has spent enough time in a large organization must have witnessed
the tragic power of <strong>systemic inertia</strong>. Even more so when it is
a by-product of its <strong>company culture</strong>.</p>
<figure>
<img src="/images/facepalm.jpg" alt="Facepalm" />
</figure>
<p>Cultural traditionalists —especially those born in or raised by
the powers that be— expect, sustain, and even thrive on
status quo for a living. The <strong>inertia</strong> that results from their
long-standing but backward-looking methods and views are euphemistically
repackaged as <strong>strategic sustainability</strong>. This way, they try to rationalize
and justify the status quo. Tragically, they even see themselves as
defenders of the long term.</p>
<p>Assisted by an army of conceptual architects, designers and
other consultants, they spend person centuries and tons of money on
<strong>all-encompassing yet <em>paper</em> visions</strong> of a corporate
future that one day will solve all of today’s and tomorrow’s problems.</p>
<p>Their motto: <em><strong>Just wait and see,
we’re almost there!</strong></em> Of course, they never are, and never
will be. For the simple reason that <strong>the as-is world out there keeps moving at a higher speed
than the desired to-be world can be designed</strong> — let alone realized.</p>
<figure>
<img src="/images/are_we_there_yet.jpg" alt="Are we there yet? Almost" />
</figure>
<p>In the meantime, among the lower ranks of the organization, people are exposed
to the <strong>daily realities</strong> of business life: a fast changing competitive
landscape, <a href="/blog/what-to-learn-when-and-why/">technological generations that last no more than a couple of years</a>,
and customers demanding reactive solutions, <em><strong>now</strong></em>. ICT professionals <a href="/blog/what-to-learn-when-and-why/">who take
care of themselves and their career</a> experiment with <a href="https://hbr.org/2018/05/agile-at-scale">agile
at scale</a>, <a href="https://www.holacracy.org">holacracy</a>, <a href="https://www.atlassian.com/devops">DevOps</a>, <a href="https://aws.amazon.com/what-is-cloud-computing/">cloud
computing</a> and other aspects of the <a href="http://www.peterhinssen.com/books/the-new-normal">new
normal</a>.
To fulfill their customers’ — and therefore their own
— <strong>need for speed</strong>, they rightfully expect their organization
to act as an adaptive, loosely coupled, self-organized and
self-(re)organizing network of small, autonomous cells.</p>
<p>The <strong>bravest</strong> departments, teams and individuals <strong>don’t wait</strong> for corporate
manna to fall out of the sky: they make the <strong>perilous desert journey
from reality to vision</strong> by themselves. While they may
have <strong>no final game plan</strong>, they are driven by strong <strong>intuition, belief and
leadership</strong> — not necessarily personified by a current member of their formal
management.</p>
<h1 id="solution-partial-culture-change">Solution: (partial) culture change</h1>
<p>Corporate reality always has many moving parts, so fixing
organizational sclerosis is never a walk in the park. The
undertaking is not for the faint at heart. Especially since
the corporate patient is not supposed to end up on the undertaker’s
table.</p>
<p>For what it’s worth, here are a
couple of <strong>truisms</strong>, i.e. commonly accepted truths or advice from the
<em>world out there</em>. Take them at heart to prevent pernicious
aspects of corporate culture from doing any further harm. <em>Or not.</em></p>
<p>So, to whom it may concern:</p>
<h3 id="adapt-or-die">#1. Adapt or die</h3>
<p>Whether it’s your job, your life or your company: nothing lasts for
ever. As evolution has shown, species can survive over thousands of
generations. They do so by “selecting” slight variations in their individuals
lucky enough to have adapted better than others to their changing
environments.
Seen holistically over time and from a distance, it looks as if the species
has survived. Likewise, <strong>in order for a company (culture) to keep on
thriving</strong> across generations, <strong>it needs to adapt</strong> every now and
then. The driver for change, obviously, is the fast-moving political,
socio-economical, business and technological environment. In
organizations that rely on the status quo, aspects of
corporate culture that once were a great asset “suddenly” become a liability.</p>
<p>In today’s turbulent times, hanging onto existing company culture can
be reassuring: it gives a — possibly false! — sense of security and
control. As said, <strong>the obvious danger is that the environment changes
faster than you culture can afford.</strong> From the ancient civilizations to the
Kodaks and Nokias of this world, the hard learning is this: adapt in time,
or you will die.</p>
<figure>
<img src="/images/dinosaurs-noahs-ark-oh-crap-today.jpg" alt="Dinosaurs: Oh crap, was that today?" />
</figure>
<p>Change brings (more) uncertainty, and uncertainty means risk. Nevertheless,
<strong>the biggest risk is not to take enough risk, or not fast
enough</strong>. Reinvent yourself, or someone else will.</p>
<h3 id="high-priests-are-just-women-in-funny-dresses">#2. High-priests are just (wo)men in funny dresses</h3>
<p>In any long-standing organization, culture is just as much a
matter of emotional experience as of rational learning. That’s why it
takes quite a while to fully integrate or assimilate into a new
environment. Unfortunate consequence: <strong>strong cultures take more time to
adapt, i.e. to <em>unlearn</em>,</strong> when such becomes a necessity.</p>
<p>Apart from norms, habits and regulations, company culture manifests itself
in the cumulative <a href="https://en.wikipedia.org/wiki/Newspeak"><strong>language and vocabulary</strong></a> that corporate high-priests
create, speak and disseminate among their followers. Cultural
assimilation is complete when the words and their sometimes special
meanings have been internalised to the extent that their usage
<em>feels</em> natural and normal.</p>
<p>Eye-opener: <strong>however important high-priests may look, they are fundamentally just
men or women in funny dresses.</strong> What I mean is: however valuable the high-priests’
statements have been historically, and however much their actions may seem logical to their
followers today, <strong>their methods and techniques are <em>not necessarily</em>
adapted to tomorrow’s problems.</strong></p>
<figure>
<img src="/images/highpriests.jpg" alt="High-priests" />
</figure>
<p>To give cultural change a chance, <strong>it is vital
to <em>unmask</em> the high-priests, and replace a sufficient number of
them</strong> with people who can credibly represent the next cultural
wave.</p>
<h3 id="dont-just-allow-agile-and-devops-embrace-them">#3. Don’t just <em>allow</em> Agile and DevOps: <em>embrace</em> them</h3>
<p>Any software professional who hasn’t been asleep for the last ten to fifteen
years, knows that the <a href="http://agilemanifesto.org"><strong>Agile Movement</strong></a>, the <a href="https://www.atlassian.com/devops"><strong>DevOps approach</strong></a> and related
initiatives have profoundly changed the nature of the ICT profession,
across all industries. In these times of digital transformation,
jumping onto these trains is an absolute <strong>no-brainer</strong> — which
doesn’t mean it’s an easy task.</p>
<figure>
<img src="/images/devops.png" alt="The DevOps cycle" />
<figcaption><a href="https://medium.com/@neonrocket/devops-is-a-culture-not-a-role-be1bed149b0" title="DevOps is a culture, not a role!">Source: Irma
Kornilova. DevOps is a culture, not a role!</a></figcaption>
</figure>
<p><strong>High-priests</strong> who proclaim that their followers are <strong>not ready (yet)</strong> for
Agile, DevOps and the like, therefore totally miss the mark. They probably mean
that <em>they themselves</em> are <strong>not able and/or willing to
adapt</strong> — proving on the spot that they never will be.</p>
<p>Simple advice: Don’t listen to them, and <strong>do the right thing</strong>.</p>
<h3 id="meet-less-decide-more">#4. Meet less, decide more</h3>
<p>The number and quality of decisions is inversely proportional to the
number of meeting platforms and participants involved in making
them. In that spirit, I would strongly advise to:</p>
<ul>
<li>abolish any meeting platform that fails to gather at least half
of its participants three times in a row</li>
<li>never attend an agenda-less meeting: it is probably useless anyway</li>
</ul>
<p>Some people justify the large number of meetings and meeting attendants by
the need to create <strong>broad support</strong> throughout the
organization. Which in turn is supposed to facilitate
speedy execution after the decision is made.</p>
<figure>
<img src="/images/dilbert_4_hour_meeting.jpg" alt="Dilbert: 4 hour meeting" />
</figure>
<p>In my experience, however, the “let’s get everyone on board up-front” approach
boils down to to a <strong>self-inflicted veto power mechanism</strong>, as seen for
example in the <a href="https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/United_Nations_Security_Council_veto_power.html">United Nations Security Council</a> or the
<a href="http://www.consilium.europa.eu/en/council-eu/voting-system/unanimity/">Council of the European Union</a>. When world matters are at
stake, thoughtful stability may make sense; when companies need to make
and execute decisions fast, <strong>veto power kills progress</strong>.</p>
<h3 id="think-networks-not-hyperdimensional-spaces">#5. Think networks, not hyperdimensional spaces</h3>
<p>There is a good reason why MBA students all over the world are
commonly exposed to <a href="http://2x2matrix.com/downloads/bcg.pdf">Boston Consulting Group’s 2x2
matrices</a>: that seems to be the maximum number of dimensions
that ordinary mortals —including those same students, of
course— are able to visually digest.</p>
<p>An “intelligent” yet static design of an organization that requires its
members to think in <strong>three dimensions or more</strong>, stretches their mental
capabilities: it makes it <strong>too hard for
collaborators to understand their own place</strong> in the
organization. Indeed, as more and more conceptual dimensions are
added, the number of possible paths and connections through the
organizational and terminological search space grows
exponentially. Result: people can’t see the woods for the trees anymore.</p>
<p><strong><a href="https://en.wikipedia.org/wiki/Complex_adaptive_system">Complex, adaptable organisms and organizations</a></strong> <em>are</em> able to
prosper and function, though: examples from nature and biology abound.
The condition is that the individual cells that make them up only
need to know three things: <strong>their own purpose, that of their immediate
dependents, and (possibly) which services their dependees use.</strong> How
the cells want to operate <strong>internally, is for them —and only
them— to decide.</strong></p>
<figure>
<img src="/images/connected_company.jpg" alt="The connected company" />
<figcaption><a href="http://www.xplaner.com/connectedco/" title="The Connected Company">Source: Dave Gray, The Connected
Company</a></figcaption>
</figure>
<p><a href="https://medium.com/slingr-for-slack/what-year-did-bezos-issue-the-api-mandate-at-amazon-57f546994ca2">Jeff Bezos’ famous
2002 API mandate</a> is brilliant because it applies
the principle of <strong>information abstraction</strong> to <strong>teams <em>and</em> information
systems</strong> in one symbiotic go. To the outside world, the only thing
that matters are
the <strong>services</strong> offered, with a description of <strong>inputs and outputs</strong>, based on a
<strong>common information exchange protocol</strong> (e.g. <a href="https://en.wikipedia.org/wiki/Representational_state_transfer">REST</a>). From the inside, each team gets a
mandate to pick whatever technology suits them best to implement their
mission. There is no further need to get permissions from central
boards or other kinds of people with veto power.</p>
<p><a href="https://medium.com/slingr-for-slack/what-year-did-bezos-issue-the-api-mandate-at-amazon-57f546994ca2">Amazon</a> and any massively deployed microservice-based
software application demonstrate, paradoxically so, the structural
principles of our era: <strong>more complex yet less complicated</strong>
organizations that organize themselves around simple principles, allow
for <strong>more adaptability, flexibility and growth.</strong></p>
<h3 id="use-generally-available-state-of-the-art-it">#6. Use generally available, state-of-the-art IT</h3>
<p>The more complicated an organization is (re)designed and (re)built, the harder
it is for its collaborators to find who is responsible for
what. <strong><a href="https://www.aiim.org/What-is-Enterprise-Search">State of the art enterprise search</a> technology</strong> becomes even more
important then. The time when a company could develop a competitive
advantage by building its own core information sharing systems lies
at least two decades behind us: Google was founded in 1998.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/LVV_93mBfSU" frameborder="0" allowfullscreen=""></iframe>
<p>Today’s knowledge workers are used to finding <strong>relevant information in
fractions of a second</strong>. That’s not only true for millennials, by the way.</p>
<p>On the information storage and processing side, <a href="https://blog.pa.com.au/cloud-2/cloud-vs-premise-debate/">the cloud debate is
over</a> too. <strong>There is no need to build and push your own cart, when you
can rent and drive someone else’s ten-ton truck.</strong> If you <em>really must</em> have
your own, then buy one. That is, if you have the time.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/uYGQcmZUTaw" frameborder="0" allowfullscreen=""></iframe>
<p>But first and foremost, hire and train your proverbial truck drivers!
<em>They</em> are the new bottleneck.</p>
<h3 id="want-to-innovate-build-the-path-to-production">#7. Want to innovate? Build the path to production</h3>
<p>To play the <strong>innovation game</strong>, many companies feel the need to
externalize their efforts in so-called <strong>skunkworks or
garages</strong>. There is certainly value in the argument
that internal processes focusing on operational excellence would
easily and immediately kill off any innovative approach. Backed by rules
and regulations, corporate process watchers are very
skilled indeed at fighting off non-compliant behaviour.</p>
<p>So from a short-term point of view, this makes some sense. In the longer run,
however, there is <strong>more value</strong> in a fundamental revision of the rules
and regulations that govern the <strong>core company information processes</strong>
themselves.</p>
<p>Simply stated, companies need to make the following <strong>structural changes</strong>:</p>
<ul>
<li>less centralized control, more trust in distributed knowledge and skills (see fewer meetings)</li>
<li>fewer committees, more self-organization and devolution of decision power (see Agile)</li>
<li>fewer hand-overs from silo to silo, more customer-focused end-to-end thinking and automation (see DevOps)</li>
<li>organize the company in terms of customer-focused value paths, that stream through networks of autonomous cells. Somewhat like physical goods that pass swiftly through a supply chain (see complex systems)</li>
<li>less top-down and idealistic descriptions of hyperdimensional organizations with roles and functions, more focus on locally decided purpose, self-development and self-determination (see my other post on <a href="/blog/what-to-learn-when-and-why/">what to learn when and why</a>)</li>
<li>think less in terms of projects and portfolios. <strong>Bring work to (networks of) (teams of) people, rather than people to work</strong></li>
</ul>
<p>In short: rent, buy and/or build the <strong>innovation highway to
production</strong> <em>first</em>: your ten-ton truck drivers will step forward,
and use it.</p>
<p><strong>Wanna bet?</strong></p>
<p><a href="/blog/truisms-to-counter-company-culture-traps/">Truisms to counter company culture traps</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on July 26, 2018.</p>/blog/the-more-things-change2018-05-21T12:00:00+02:002018-05-21T12:00:00+02:00Frederik Durant<p>One of the delights of being a millennial is that every new technology fad looks
completely novel, even if it isn’t. That thought crossed my mind when
attending a <a href="https://dialogflow.com/">DialogFlow</a> workshop at Google Brussels a
couple of weeks ago.</p>
<figure>
<img src="/images/echo-dot-google-home-mini.jpg" alt="Photo of Amazon Echo Dot and Google Home Mini" />
</figure>
<p>In our times of ever accelerating change,
technological generations last shorter and shorter. The World Wide Web
— better known as the Internet — went
mainstream 25 years ago, mobile (voice) telephony followed 7 years later.
The smartphone as we know it is barely 10 years old.
Data science reached the age of consent — <a href="https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century">sexiest
job</a>, remember — just 5 years ago. And if you
believe the newspapers, the age of artificial intelligence has now
finally arrived. With — would you believe it — the <a href="https://venturebeat.com/2016/09/01/are-chatbots-an-evolution-or-a-revolution/">chatbot</a>
aka conversational interface as one of its poster children.</p>
<p>Back to Google. Looking around in their classroom, I seemed to be the only
attendee older than 35. By a wide margin, I must admit. Logically, most of the
digital natives in the room started their professional life around or
after the arrival of the <a href="https://youtu.be/9hUIxyE2Ns8">iPhone</a>. Which means they were not
yet around during the early years of speech-driven phone applications
and call center automation, say the beginning of this century and
millennium.</p>
<p>Nostalgia alert: who remembers nowadays the fully automated speech services offered
in the (phone) cloud by companies such as <a href="https://en.wikipedia.org/wiki/Tellme_Networks">Tellme
Networks</a> (a former employer of mine), <a href="https://www.crunchbase.com/organization/bevocal">BeVocal</a>, and
<a href="https://en.wikipedia.org/wiki/Voxeo">Voxeo</a>? Let alone 20-year old technologies like
<a href="https://en.wikipedia.org/wiki/VoiceXML">VoiceXML</a> or the <a href="https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language">Speech Synthesis Markup Language</a> aka SSML?</p>
<p>I’m afraid I do. For the simple reason that for the better part of a decade, <a href="http://blog.prompt-speechapps.com">I
made a living out of building voice applications</a> with said platforms
and technologies. Pleasant surprise: <a href="https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language">SSML</a> has survived the generational
gap. It is actively <a href="https://developers.google.com/actions/reference/ssml">supported in the Google Actions
Simulator</a>.</p>
<p>So when I asked our instructor if <a href="https://assistant.google.com/">Google Assistant</a> offered shared
revenue models <em>like the toll numbers in the years
before the smartphone GUI replaced the voice channel for
mobile information access</em>, he was speechless at first. Then he asked
politely if I could repeat the question. By the way: the answer was no,
even though technically speaking, there is certainly room for such
business models on the fulfillment side, using some form of
account linking and automated payment provider.</p>
<p>Since the workshop, I have dabbled with <a href="https://github.com/fdurant/dialogflow-rock-paper-scissors">a toy
chatbot</a> for Google <a href="https://dialogflow.com/">DialogFlow</a> and <a href="https://developers.google.com/actions/">Actions</a>. My
first impression is that the democratization of chatbot
development is both a curse and a blessing. The innate volatility of
spoken conversation makes that the quality of the interaction
depends <em>at least</em> as much on conversational interface
design as on the technical platform it is implemented on. Nothing
new there, i’m afraid: user friendly, intuitive web interfaces like
<a href="https://dialogflow.com/">DialogFlow</a> won’t change that. In fact, despite Google’s own <a href="https://dialogflow.com/docs/best-practices/agent-design">agent
design guidelines</a>, the ease of the point &amp; click development
interface might and will trick some developers into thinking that
chatbots are easy to build. <em>Quod non.</em></p>
<p>A few years into the VoiceXML era, the advent of integrated, multi-platform tools for application
development like <a href="https://en.wikipedia.org/wiki/VoiceObjects">VoiceObjects</a> made it easier for the industry to shift from bare
bones VoiceXML programming (a coding job) to the art of conversational
interface design (a voice user interface specialist’s job). In the <a href="https://developer.amazon.com/alexa">Amazon Alexa</a> and
<a href="https://store.google.com/product/google_home">Google Home</a> era, voice application framework providers like
<a href="https://www.jovo.tech">jovo</a> may benefit from studying these ancient precedessors,
lest they reinvent the wheel.</p>
<p>In that respect, a timeless book still worth reading is <a href="https://www.amazon.com/Voice-Interface-Design-James-Giangola/dp/0321185765/">Voice User
Interface Design</a> by James P. Giangola and Jennifer
Balogh. I bought it in … 2004.</p>
<p><a href="/blog/the-more-things-change/">Chatbots: the more things change, the more they are the same</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on May 21, 2018.</p>/blog/most-epic-professional-fails2018-03-25T12:00:00+02:002018-03-25T12:00:00+02:00Frederik Durant<p>We all make mistakes in life, don’t we? Especially when we’re at work.</p>
<figure>
<img src="/images/errare-humanum-est.jpg" alt="Photo of eraser gum with inscription 'errare humanum est'" />
</figure>
<p>Here’s my <strong>personal top-10</strong> of professional fails (so far), carefully collected over the years.
They range from slightly reckless behaviour and laughably embarrassing
situations, to outright unforgivable actions.</p>
<p>Enjoy!</p>
<h3 id="driving-180-kmh-with-a-harddisk-full-of-data">#10. Driving 180 km/h with a harddisk full of data</h3>
<p>More than twenty years ago, when I was still in my twenties and
therefore immortal,
a customer of ours needed to urgently launch a
massive website containing tens of thousands of pre-generated pages.</p>
<p>Too much data to send over the wire or by <a href="https://en.wikipedia.org/wiki/IP_over_Avian_Carriers">avian carrier</a>,
so we went for the highway. Never was I driven faster from Brussels to
Luxemburg than on that particular day.</p>
<p>The website went online alright, but
I swore to never do that again.</p>
<p><em>Good</em>: <strong>Made the deadline</strong><br />
<em>Bad</em>: <strong>Almost made the deadline</strong></p>
<h3 id="stomach-problems-on-an-intercontinental-flight">#9. Stomach problems on an intercontinental flight</h3>
<p>What do you get when you combine a week of jetlag and lack of sleep
with a chicken curry inflight meal, and a glass of port?</p>
<p>A big
mess, a lot of hassle and a plastic bag full of clothes. Plus complimentary pajamas
carrying the British Airways logo — a real collector’s item!</p>
<p><em>Good</em>: <strong>deep respect</strong> for flight attendants and airport paramedics<br />
<em>Bad</em>: <strong>embarrassing walk</strong> in BA pajamas through Heathrow transit zone</p>
<h3 id="not-checking-company-critical-backup-tapes">#8. Not checking company-critical backup tapes</h3>
<p>This is a classic.</p>
<p>When I was working at one of Belgium’s first web
development shops in the mid-nineties, the external SCSI disk
containing half of all our managed websites suddenly decided to stop
working. To make matters worse, the backup tape also seemed corrupt.</p>
<p>Just when I was about to take the blame and write my resignation
letter, the disk came up again, and the company was saved. As was my job.</p>
<p>A few weeks
later, we got our state-of-the-art automated backup system, and a rack
full of fault-tolerant RAID disks.</p>
<p><em>Good</em>: <strong>never waste a good crisis</strong><br />
<em>Bad</em>: <strong>heart rate</strong> and <strong>blood pressure</strong></p>
<h3 id="firing-a-key-person-in-the-team-and-then-quitting">#7. Firing a key person in the team, and then quitting</h3>
<p>At that same startup, I had to let go an important team member,
knowing that I might resign myself a week later. Which I did, once I
got the green light from my new employer.</p>
<p>Every so often, timing is everything.</p>
<p><em>Good</em>: <strong>the company survived</strong>, of course, at least for a while<br />
<em>Bad</em>: there’s little fun in dealing with <strong>ethical dilemmas</strong></p>
<h3 id="agreeing-to-get-paid-half-in-money-half-in-shares">#6. Agreeing to get paid half in money, half in shares</h3>
<p>In the mid-noughties, when I was a freelancer building speech-driven
phone applications, I was called in to program the one and only Beavis
&amp; Butthead Hotline. The New York startup I was working for didn’t have
a lot of cash, so I agreed to be paid in equity for half of my
work.</p>
<p>No need to say that this assignment was by far the funniest
thing I’ve ever worked on. But also the least well paid.</p>
<p>Oh, never mind!</p>
<p><em>Good</em>: It was damned <strong>COOL heh hehheh heheheheh heh heh</strong><br />
<em>Bad</em>: I <strong>never met Beavis &amp; Butthead</strong> in person</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/OQUaguZawJQ" frameborder="0" allowfullscreen=""></iframe>
<p>So, what else did I suck at?</p>
<h3 id="telling-political-jokes-to-foreign-co-workers">#5. Telling political jokes to foreign co-workers</h3>
<p>Over lunch at work, I once jokingly made a comment to a French project
partner about “Hirochirac” — the
nickname given to the then French president Jacques Chirac. There was
no laughter: <em>apparemment, ce n’était pas très marrant.</em></p>
<p>Years later, when I should have known better, I wanted to display my knowledge
about the Po Valley to Italian colleagues we were visiting.
To do so, I innocently used the word <a href="https://en.wikipedia.org/wiki/Padania">Padania</a>, which
turned out to have acquired a
strong political connotation. A few colleagues choked on their
espressos and cappuccinos.</p>
<p>Don’t. Just don’t.</p>
<p><em>Good</em>: <strong>Follow the international news</strong><br />
<em>Bad</em>: <strong>Discuss it with locals</strong></p>
<h3 id="not-putting-all-contractual-clauses-on-paper">#4. Not putting all contractual clauses on paper</h3>
<p>When joining a Belgian company that had just been acquired by a very
well funded Silicon Valley startup, I was promised a sizeable amount of
stock options. Alas, I was gullible and naïve enough to settle for a
gentleman’s agreement with the local boss.</p>
<p>One year later, the daughter company went bankrupt. Stock
options were nowhere to be seen, of course.</p>
<p>Six years later, the mother company was sold to Microsoft for a bit
less than a billion dollars. A small part of which could have been mine.</p>
<p>Since then, everything is on paper. Always.</p>
<p><em>Good</em>: <strong>I could have made some decent money</strong><br />
<em>Bad</em>: <strong>I didn’t</strong></p>
<h3 id="agreeing-to-split-an-rd-team-in-r-and-d-subteams">#3. Agreeing to split an R&amp;D team in R and D subteams</h3>
<p>A couple of years ago, my boss wanted to make sure that the R&amp;D team I
was leading would spend enough time on innovative research, next to
the more operational work of developing models. He wanted to
achieve this by splitting the team in two virtual subteams, along
these lines.</p>
<p>Intuitively, I knew this was a very bad idea: most R&amp;D people — especially the ones
with a PhD — prefer to see themselves as
… researchers. Nevertheless, against
my own will and intuition, I gave in and complied with my boss’ wish.</p>
<p>It didn’t take a day for the team to lose its internal coherence, and
fall apart in two camps: the “winners” (researchers) and the “losers” (developers).</p>
<p>Next time: just say no.</p>
<p><em>Good</em>: I had been <strong>loyal to the chief</strong><br />
<em>Bad</em>: I was <strong>squeezed as a middle manager</strong></p>
<h3 id="telling-truth-to-ceo-bypassing-local-boss">#2. Telling truth to CEO, bypassing local boss</h3>
<p>The daughter company from item #4 was in dire straits, so the
mother company CEO decided to cut his losses and pull the
plug. Because I had —rightly so — lost all confidence in
our boss to represent, let alone defend
his local staff, I sent a
mail directly to the CEO with my version of the truth, pleading for a reversal of his decision. To
no avail, of course.</p>
<p>A few hours later, I got one of the most unpleasant phone calls in my life. A
few weeks later, I was fired, together with a couple of other people. A few months later, everyone was fired.</p>
<p>That is, except for the local boss, who parked the exit funds meant for the turnaround of the daughter company in his
personal Luxemburg holding. He lived happily ever after, I
think. I never checked.</p>
<p><em>Good</em>: Keep your <strong>self-respect</strong> by staying <strong>loyal to your own beliefs</strong><br />
<em>Bad</em>: Be prepared to <strong>take the bullet</strong></p>
<h3 id="send-blame-mail-to-team-member-for-assumed-lack-of-motivation-with-whole-team-in-cc">#1. Send blame mail to team member for assumed lack of motivation, with whole team in Cc:</h3>
<p>A project team member wrote to the team that she was unable to perform a certain task
by a certain deadline, giving a reason that I deemed bogus. In a
moment of weakness, anger and frustration, I sent a not-so-friendly
reply mail, clearly singling her out. Three seconds after pushing the
Send button, I fully realized how stupid I had been.</p>
<p>Written and oral apologies followed to the team member, her boss and the team. It took a couple
of days to be on speaking terms again.</p>
<p><em>Good</em>: I <strong>assumed responsibility</strong> and <strong>apologized</strong><br />
<em>Bad</em>: <strong>Everything else</strong></p>
<h3 id="morale-of-the-story">Morale of the story</h3>
<p>Next time you’re acting stupidly, know that you’re not alone. There’s
quite some competition out there.</p>
<p><a href="/blog/most-epic-professional-fails/">Most epic professional fails</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on March 25, 2018.</p>/blog/hypekus2018-01-20T19:00:00+01:002018-01-20T19:00:00+01:00Frederik Durant<h1 id="artificial-intelligence">Artificial intelligence</h1>
<blockquote>
<p>Knowing humankind’s<br />
Usual intelligence<br />
<a href="https://www.youtube.com/watch?v=kZp-l22OD-c">Can AI do worse</a>?</p>
</blockquote>
<h1 id="blockchain">Blockchain</h1>
<blockquote>
<p>When the snake is <a href="https://en.wikipedia.org/wiki/Fork_(blockchain)">forked</a><br />
And schismatizes in two<br />
What is fool’s truth worth?</p>
</blockquote>
<h1 id="crypto-currency">Crypto-currency</h1>
<blockquote>
<p>Them <a href="https://en.wikipedia.org/wiki/Cryptocurrency">coins and tokens</a><br />
Soaped up in bubbly wallets<br />
Yearning for value</p>
</blockquote>
<h1 id="deep-learning">Deep learning</h1>
<blockquote>
<p><a href="https://en.wikipedia.org/wiki/Gradient_descent">Valley</a> in sight, when<br />
<a href="https://en.wikipedia.org/wiki/Selection_bias">Selection bias</a> abyss<br />
Swallows the climber</p>
</blockquote>
<h1 id="internet-of-things">Internet of things</h1>
<blockquote>
<p>On the Internet<br />
Each <a href="https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you%27re_a_dog">dog</a> knows a <a href="https://internetofbusiness.com/internet-smells-olfaction-via-nanomechanical-sensors/">smelly thing</a><br />
To <a href="https://techcrunch.com/2015/10/24/why-iot-security-is-so-critical/">sniff</a> that’s secret</p>
</blockquote>
<p><a href="/blog/hypekus/">Five hypekus — because we can</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on January 20, 2018.</p>/blog/what-to-learn-when-and-why2017-12-29T21:45:00+01:002017-12-29T21:45:00+01:00Frederik Durant<h1 id="a-forward-looking-story-with-hindsight">A forward-looking story, with hindsight</h1>
<p>In a previous century, towards the end of the eighties, I got interested in computers
thanks to <a href="https://en.wikipedia.org/wiki/WordPerfect">WordPerfect</a>. This word processor helped
me write my first master’s thesis - which, by the way, had nothing to do with computing.
Had I been born a few years sooner, I would surely have used a <a href="https://en.wikipedia.org/wiki/Typewriter">typewriter</a> instead.</p>
<p>In the early nineties, I learnt my first programming languages: <a href="https://en.wikipedia.org/wiki/Pascal_(programming_language)">Pascal</a>, <a href="https://en.wikipedia.org/wiki/Lisp_(programming_language)">Lisp</a> and <a href="https://en.wikipedia.org/wiki/Prolog">Prolog</a>.
If you’ve never heard of these, that’s fine. Today I hardly ever use them anymore.</p>
<p>Forward-looking “fact”, with hindsight: without WordPerfect, I would never have tried <a href="https://www.latex-project.org">LaTeX</a>. Nor <a href="https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language">SGML</a>, <a href="https://en.wikipedia.org/wiki/HTML">HTML</a> or <a href="https://en.wikipedia.org/wiki/XML">XML</a>, for that matter.
And without Pascal, Lisp and Prolog, I would never have learnt and used <a href="https://en.wikipedia.org/wiki/Perl">Perl</a> in the nineties,
<a href="https://en.wikipedia.org/wiki/Java_(programming_language)">Java</a> in the noughties or <a href="https://en.wikipedia.org/wiki/Python_(programming_language)">Python</a> in this decade as my respective main programming language.</p>
<p>On the business side, a similar <strong>knowledge investment path</strong> is discernible in my career - it just started ten years later.
In 2000, the <a href="https://en.wikipedia.org/wiki/Lernout_&amp;_Hauspie">Lernout &amp; Hauspie</a> works council presented its members with balance sheets and profit &amp; loss statements, but I was unable to read them. [Not that they were accurate, but that is <a href="http://www.standaard.be/cnt/dexx05012001_001">another story</a>.]
So two years later, to bridge the knowledge gap, I found myself doing an MBA at <a href="https://www.vlerick.com/en">Vlerick</a>. Which, two more years later,
made me decide to start as a self-employed consultant in voice-driven dialog systems.</p>
<p>With the hindsight of time and experience, these and <a href="http://frederikdurant.com/blog/survive-data-science-bootcamp/">other</a> seemingly random knowledge investments do display
some internal logic.
In these times where the <strong><a href="https://en.wikipedia.org/wiki/Half-life_of_knowledge">half-life of knowledge</a></strong> is getting <strong>shorter and shorter</strong>, it is more
important than ever to make <strong>conscious investment choices</strong>.</p>
<p>So if you’re faced with the choice to spend your precious time, energy and money on an time-tested, mainstream
and/or brand new technology, here’s some advice.</p>
<h1 id="guideline-1-know-your-local-industry">Guideline 1: Know your (local) industry</h1>
<p>Even though knowledge and skills are easier to transmit and learn than ever, <strong>different industries and regions do move
at different speeds</strong>. In a Silicon Valley blockchain startup, or in a top-notch academic lab,
you’re more likely to <em>produce and share</em> new knowledge and software code, than to “simply” consume it. Conversely,
in a traditional retail company in Belgium, it’s definitely possible to be innovative while using
not-so-new technologies.</p>
<p>Both options are valid, as long as you <strong>know the league you’re playing in</strong>. The advantage of playing is a (s)lower league
is that, all other things being equal, introducing a new technology comes with a lower (technological) risk.</p>
<p>Don’t drive a bicycle on a highway; likewise, don’t drive a Formula 1 car on a local road. <strong>Be in sync</strong> with your local industry’s
pace of innovation.</p>
<p>Then again, also be aware that <a href="http://bigthink.com/think-tank/big-idea-technology-grows-exponentially"><strong>innovation speed is sharply on the rise</strong></a>, including in many traditional “slower” industries.</p>
<h1 id="guideline-2-know-yourself">Guideline 2: Know yourself</h1>
<p>Are you an <strong>early adopter</strong>, or would you rather take a <strong>more conservative</strong> approach? Both are fine,
as long as you know what gives you energy, and what pays your - and your employer’s - bills.</p>
<p>Easily bored innovators may adopt a <strong>serial</strong> approach and happily jump from one novel technology to the next.
However, if overstretched and out-of-sync, these fast-moving innovators may <strong>not stay around long enough</strong> to see their
innovations make it into production. Which somehow defeats the purpose.</p>
<p>The option at the other extreme is to learn, master and apply a winning technology <strong>from its cradle to its grave</strong>.
In large and (supposedly?) stable industries and companies,
many people make a living out of supporting tried and tested technologies.
Only to discover one day that they’ve been <strong>feeding a dinosaur</strong>.</p>
<p>Both personal strategies are fine and come with their own risks. Just know what you’re doing and learning, and why.</p>
<h1 id="guideline-3-diversify">Guideline 3: Diversify</h1>
<p>A particularly <strong>effective and robust knowledge investment strategy</strong> can be to ride the technology
<a href="https://en.wikipedia.org/wiki/Hype_cycle">hype cycle</a> simultaneously at different points of the curve. Just as in a financial portfolio, it never hurts
to <strong>spread knowledge investment risk</strong>
across multiple technologies: pick a stable one from the <strong>plateau of productivity</strong> to pay for today’s bills, and a few smaller but
more risky ones from the <strong>steeper and more slippery slopes</strong> of the curve, to pay for tomorrow’s bills.</p>
<p>Secondly, on a wider scale, also balance your investments according to their
<strong>expected payback time</strong>. Knowledge decay may be accelerating on average,
but an investment in an MBA is still more time-tested than the cost of learning the newest
web development stack. <strong>Mix infrequent long-term investments</strong> (personal examples: MA, MSc, MBA, <a href="https://www.thisismetis.com">data science bootcamp</a>)
<strong>with more frequent short-lived investments</strong>
(recent examples: MOOCs on <a href="https://www.coursera.org/specializations/gcp-data-machine-learning">Google Cloud Platform</a> and on <a href="https://www.coursera.org/specializations/deep-learning">Deep Learning</a>).</p>
<p>Combining short and longer investment cycles significantly reduces the risk of falling through the professional cracks when an
industry or company gets disrupted. In the best scenario, you may even get a chance to join the disruptor on its path
to becoming the next incumbent.</p>
<h1 id="guideline-4-show-dont-tell">Guideline 4: Show, don’t tell</h1>
<p>Are you willing to blindly link your professional fate to that of a traditional, incumbent company? If not, you should
at all times <strong>be willing and able to leave tomorrow</strong>, so to speak. To reach that level of agility and freedom, never stop investing
in your knowledge. But more is needed.</p>
<p>Money is a convention based on trust, and so is knowledge. <strong>Anyone can be an expert, on the condition that others are willing to
recognize it</strong> - most notably by paying for the expertise. To reach that level of trust, it never hurts to demonstrate
newly acquired knowledge and skills through code, blogs, videos or any other <strong>tangible means visible to the outside world</strong>.</p>
<p>Each new technology offers its learners the opportunity to move through the ranks from apprentice over practitioner to
master and trainer. Over the last years, <a href="http://github.com/">GitHub</a>, <a href="https://www.coursera.org">Coursera</a> and the like
have often been more <strong>credible sources of trust</strong> than (expensively paid for) professional certifications from established technology
vendors. The latter have an obvious conflict of interest, at least to some extent.</p>
<h1 id="conclusion">Conclusion</h1>
<p>To invest wisely in your career, <strong>create and actively manage your personal knowledge portfolio</strong>. Know your industry
and yourself, so you can make your personal choices in a professional context that is willing to provide a return
for your past, current and future investments <strong>in a time horizon that suits you</strong>.
In the mean time, keep on acquiring new knowledge so as to build up <strong>agility
and resilience</strong> against professional earthquakes that can happen at any time. <strong>Diversification</strong> by payback
time, hype cycle position and personal risk profile allows for a <strong>balanced and therefore robust portfolio</strong>.</p>
<p>Happy learning!</p>
<p><a href="/blog/what-to-learn-when-and-why/">What to learn when, and why</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on December 29, 2017.</p>/blog/unexpected-encounters2017-11-18T01:00:01+01:002017-11-18T01:00:01+01:00Frederik Durant<h1 id="lembarras-du-choix">L’embarras du choix</h1>
<blockquote>
<p>Ferré me fait rire<br />
Dalida me fait pleurer<br />
Le Temps? Il s’en fout</p>
</blockquote>
<iframe width="560" height="315" src="https://www.youtube.com/embed/ZH7dG0qyzyg?ecver=2" frameborder="0" allowfullscreen=""></iframe>
<p> <br /></p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/oqFK-2X0xsc" frameborder="0" allowfullscreen=""></iframe>
<h1 id="die-welt-von-gestern">Die Welt von Gestern</h1>
<blockquote>
<p>Amidst the ashes<br />
Of the dancing empire<br />
Words left on <a href="https://en.wikipedia.org/wiki/The_World_of_Yesterday" target="_paper">paper</a></p>
</blockquote>
<figure>
<img src="/images/Stefan_Zweig_1900_cropped.jpg" alt="Stefan Zweig around 1900" />
<figcaption><a href="https://en.wikipedia.org/wiki/The_World_of_Yesterday">Photo Credit: Wikipedia</a></figcaption>
</figure>
<h1 id="der-tod-in-venedig">Der Tod in Venedig</h1>
<blockquote>
<p>At <a href="https://en.wikipedia.org/wiki/Gustav_Mahler" target="_gustav">Gustav</a>’s gravestone<br />
in <a href="https://www.gustav-mahler.eu/index.php/plaatsen/139-austria/vienna/1158-grinzing-cemetery" target="_grinzing">Grinzing</a>, young <a href="https://en.wikipedia.org/wiki/Anna_Mahler" target="_anna">Anna</a> stood<br />
heading for <a href="https://en.wikipedia.org/wiki/Highgate_Cemetery" target="_highgate">Highgate</a></p>
</blockquote>
<iframe width="560" height="315" src="https://www.youtube.com/embed/BJT5BUZr_9Y?ecver=2" frameborder="0" allowfullscreen=""></iframe>
<p><a href="/blog/unexpected-encounters/">Unexpected encounters</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on November 18, 2017.</p>/projects/firstname-network-belgium2016-10-30T16:45:00+01:002016-10-30T16:45:00+01:00Frederik Durant<p>I’ve been working — well, playing — with network visualization tools like <a href="https://gephi.org">Gephi</a> or <a href="http://www.yworks.com/products/yed">yEd</a> for a couple of years now.
But for confidentiality reasons, none of this work could be openly shared.</p>
<p>As it happens, yesterday afternoon, with a cosy cup of coffee and an even cosier piece of ricetart at my favorite <a href="http://www.lepainquotidien.be/en/store/le-pain-quotidien-sablon/">Pain Quotidien</a>,
I downloaded the <a href="http://statbel.fgov.be/nl/modules/publications/statistiques/bevolking/bevolking_-_voornamen_van_de_pasgeborenen_1995-2014.jsp">full list of first names given to babies</a> born in Belgium over the last 20 years. And wondered how this data set could be turned into a network.</p>
<p>After a moderately short labor of <a href="https://github.com/fdurant/belgian_firstname_network">134 lines of program code</a> — including quite a number of fruitful multiplications — two <em>orthographic word similarity networks</em> were born. In layman’s language: similarly looking names are linked, and therefore visualized close(r) to one another.</p>
<p>The <a href="/projects/firstname-network-belgium/pdf/1902_first_names_for_girls_network.pdf">girls’ name network</a> connects 1900 names. Here’s a sample:</p>
<p><a href="/projects/firstname-network-belgium/pdf/1902_first_names_for_girls_network.pdf"><img src="/images/girl_name_network.jpg" /></a></p>
<p>Marie(ke) is quite a popular name indeed:</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/wfGDpzL9H7Y" frameborder="0" allowfullscreen=""></iframe>
<p>The <a href="/projects/firstname-network-belgium/pdf/1500_first_names_for_boys_network.pdf">young boys’ network</a> contains 1500 names in total, including:</p>
<p><a href="/projects/firstname-network-belgium/pdf/1500_first_names_for_boys_network.pdf"><img src="/images/boy_name_network.jpg" /></a></p>
<p>So no, Jef, you’re not alone!</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/5EpEW82p4i8?ecver=2" frameborder="0" allowfullscreen=""></iframe>
<p>Together, the <a href="/projects/firstname-network-belgium/pdf/1500_first_names_for_boys_network.pdf">boys’</a> and <a href="/projects/firstname-network-belgium/pdf/1902_first_names_for_girls_network.pdf">girls’</a> graphs provide a holistic overview of the Belgian baby name space.</p>
<p>So, young or not-so-young friends &amp; colleagues: if you’re into procreation mode or expecting, and want your newborn to wear that fancy name: <em>no more excuses</em>.</p>
<p><a href="/projects/firstname-network-belgium/">Mapping the Belgian baby name space</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on October 30, 2016.</p>/blog/club-chair-bookstores2016-03-06T08:00:00+01:002016-03-06T08:00:00+01:00Frederik Durant<p>One of my favorite pastimes since I was a philology student in the late eighties has
been, rather unsurprisingly, visiting bookstores. In my <a href="http://www.kuleuven.be/english">student</a> town <a href="https://en.wikipedia.org/wiki/Leuven">Leuven</a>, I spent
hours - and relative tons of money - at <a href="http://boekhandelpeeters.be/nl">Boekhandel Peeters</a>, specialized in the liberal
arts. For computing and other
so-called hard sciences, Wouters was my
preferred place to go. It sadly <a href="http://www.boekblad.nl/acco-neemt-deel-fonteyn-wouters-over.121794.lynkx">went bankrupt</a> in 2006,
when it was swallowed by the cooperative <a href="https://www.acco.be/en/boekhandel/onzeboekhandels">Acco</a>, itself founded in
1960.</p>
<p>In 1995, just after I started my professional career, <a href="https://en.wikipedia.org/wiki/Jeff_Bezos">Jeff
Bezos</a> founded <a href="http://www.amazon.com">Amazon.com</a>. The brick-and-mortar
bookstore as we knew it was indeed never going to be the same as before. In the
US, super-bookstore chain <a href="https://en.wikipedia.org/wiki/Borders_Group">Borders</a> <a href="http://business.time.com/2011/07/19/5-reasons-borders-went-out-of-business-and-what-will-take-its-place/">didn’t adapt fast
enough</a>, started going downhill and finally
<a href="http://dealbook.nytimes.com/2011/02/16/borders-files-for-bankruptcy/">disappeared</a> in 2011. <a href="http://stores.barnesandnoble.com">Barnes &amp; Noble</a>
gobbled them up.</p>
<p>As much as I appreciate Amazon’s search capabilities and limitless offering, I
still prefer the smell and touch of physical books and, by extension,
bookstores. For the latter, on the condition that they have <strong>something extra</strong> to
offer. That can be a combination of:</p>
<ul>
<li>a choice adapted to my specialized taste - quality always trumping quantity</li>
<li>a <a href="https://en.wikipedia.org/wiki/Club_chair">club chair</a>, vital for bringing the <strike>buyer</strike> reader in the correct mental state</li>
<li>a cup of coffee or tea, with or without <a href="https://en.wikipedia.org/wiki/Madeleine_(cake)">madeleine</a> (or similar)</li>
<li><em>à la limite</em>, silent whispers from the other <em>flâneurs</em></li>
</ul>
<p>Yesterday, I was lucky enough to discover such a place in the <a href="http://albertine.com/about-us/">Librairie
Albertine</a> in New York City. The whole atmosphere breathes <em>luxe, calme et volupté</em>, as <a href="https://en.wikipedia.org/wiki/Charles_Baudelaire">Charles B.</a> would say.</p>
<figure>
<img src="/images/albertine.jpg" alt="Marcel Proust Reading Room at Albertine" />
<figcaption>The Marcel Proust Reading Room at the Librairie Albertine in NYC</figcaption>
</figure>
<p>Other bookstores that I have visited and particularly like include:</p>
<ul>
<li><a href="http://www.passaporta.be/en/home">Passa Porta</a> and <a href="http://www.tropismes.com">Tropismes</a> in Brussels</li>
<li><a href="https://stores.barnesandnoble.com/store/2234">Barnes &amp; Noble on 5th Avenue</a> in New York City</li>
<li><a href="http://www.cambridge.org/about-us/visit-bookshop">The Cambridge University Press Bookshop</a> in Cambridge, England</li>
<li><a href="http://web.mit.edu/bookstore/www/">The MIT Press Bookstore</a>, <a href="http://store.thecoop.com">Coop</a> and <a href="http://www.harvard.com/about/hbs_in_brief/">Harvard Bookstore</a> in Cambridge, Massachusetts</li>
</ul>
<p>Ironically, Amazon.com recently announced <a href="http://www.cnbc.com/2015/11/03/amazons-next-chapter-opening-a-physical-bookstore.html">plans to open
hundreds of <strong>physical</strong> bookstores</a>.</p>
<p>I fear the worst, but hope for the best!</p>
<hr />
<p><strong><small>Note: article updated on Sun March 7th, 2016</small></strong></p>
<p><a href="/blog/club-chair-bookstores/">In praise of the club chair bookstore</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on March 06, 2016.</p>/blog/open-belgium-20162016-03-01T23:40:00+01:002016-03-01T23:40:00+01:00Frederik Durant<p>Being <a href="/blog/changing-course/">in-between jobs</a> for a (short) while, I spent the leap day yesterday at the
<a href="http://2016.openbelgium.be/">Open Belgium conference</a> in Antwerp. The announcement of the federal <a href="http://www.openknowledge.be/2015/07/24/green-light-for-the-belgian-federal-open-data-strategy/">Open Data strategy</a>
in December 2015 has raised new expectations in the Belgian open data community. So, how are we doing?</p>
<p>During the opening panel discussion, <a href="http://www.fedict.belgium.be/nl">Fedict</a> Innovation Manager <a href="https://be.linkedin.com/in/christinecopers">Christine Copers</a>
warned that the shift to open government data is not going to happen overnight. Take for example the
the <a href="http://www.ngi.be/">National Geographic Institute</a>. From its <a href="http://www.ngi.be/Common/NGI_2013_verslag.pdf">2013 yearly report</a>, we learn that more than 15%
of annual income is self-originated. At this moment, giving away map data for free without direct monetary compensation
seems like a no-go.</p>
<p>Yet all is not lost. Since 1999, the Flemish authorities have invested €93 million into the development of the Large-Scale
(Geographical) Reference Database. Keeping this database up to date requires another €7 million per annum. Nevertheless,
in December 2015, the database was made freely available under the <a href="https://dov.vlaanderen.be/dovweb/html/pdf/Gratis%20Open%20Data%20Licentie.pdf">Gratis Open Data Licentie Vlaanderen</a>. A simple registration
is all that’s required. The open map data have recently supported the creation of
a map of available youth space in the Flemish region. Local youth organizations including <a href="https://chiro.be/english">Chiro Vlaanderen</a> and
<a href="https://www.scoutsengidsenvlaanderen.be">Scouts en Gidsen Vlaanderen</a> are contributing themselves to create this extra layer, to their own benefit and that of society
at large.</p>
<p>As a regular attendee - or should I say spectator - of <a href="http://www.dilbeek.be/bestuur-administratie/gemeentebestuur/gemeenteraad.html">my own municipality council</a>,
I participated actively in a brainstorm session on open data use cases
for local government. One topic was proactive transparency in local decision making.
What if the complete process leading to a local council decision was public by default,
and open for Wikipedia-style amendments by any citizen or other stakeholder? Even though the final vote would
still be made by elected representatives,
much of its quality lies in how it was prepared. Civic engagement platforms like <a href="http://citizenlab.co">CitizenLab</a> are available <em>right now</em>
to serve as modern communication bridges between municipalities and citizens. To prevent reinvention of
the wheel, Flemish organizations like <a href="http://www.vvsg.be/">VVSG</a> and <a href="https://www.v-ict-or.be">V-ICT-OR</a> can (read: must) play a coordinating
and facilitating role here, without doing all the work themselves. Even more so, because municipal budgets are scarcer than ever.</p>
<p>Data journalist <a href="http://www.maartenlambrechts.be/">Maarten Lambrechts</a> listed do’s and don’ts
for data publishers like <a href="http://ec.europa.eu/eurostat">Eurostat</a>, <a href="http://www.oecd.org">OECD</a>, <a href="http://statbel.fgov.be">Statistics Belgium</a>
or any other local data provider.
As a data exchange format, PDFs are evil. As a story inducer, aggregate figures like the mean are useless.
To give journalists and their graphic collaborators an incentive to comment on your data and graphs, make sure
they can be easily recreated straight from the source. For complex data that require more time to handle, an embargo
works just fine. To see what’s possible when all these conditions are met, check out this <a href="http://www.tijd.be/ondernemen/transport/Interactief_De_pendelaars_van_en_naar_uw_gemeente.9734600-3084.art?ckc=1">interactive map of
daily commuters in Belgium</a>.</p>
<p>The maxim “open your data and they will come and build” has not yet been proven in the (Belgian) field.
Nevertheless, success will keep on depending on the collaboration between (governmental) publishers and users.
In a data publication project, elaborate requirements, specifications and cost estimates are not the way to go.
A better approach is to start small, and work iteratively towards on-the-spot discovery
of new uses and applications, which may in turn lead to new data requirements.
To promote this agile approach, large government agencies and institutions need a cultural change in the way they handle
public procurement - if only to give start-ups in this field a reasonable chance to win their first deals.</p>
<p><a href="/blog/open-belgium-2016/">Impressions from Open Belgium 2016</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on March 01, 2016.</p>/blog/new-york-video2016-01-16T00:00:00+01:002016-01-16T00:00:00+01:00Frederik Durant<p>One year ago, I was in New York for the experience of a lifetime.</p>
<p>I could turn this into a very long blog post, but …</p>
<blockquote>
<p>Words do not express thoughts very well. They always become a little different immediately they are expressed,
a little distorted, a little foolish.
And yet it also pleases me and seems right that what is of value and wisdom to one man seems nonsense to another.<br />
– <cite>Herman Hesse</cite></p>
</blockquote>
<p>Hence this collection of moving pictures - you can take that literally, or not.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/Qyx9-PFJR4k" frameborder="0" allowfullscreen=""></iframe>
<p>The music that best reflects my mood of the day is by jazz trumpeter Kenny Dorham.</p>
<p><a href="/blog/new-york-video/">New York, one year ago</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on January 16, 2016.</p>/blog/changing-course2015-12-13T16:00:00+01:002015-12-13T16:00:00+01:00Frederik Durant<p>People come, people go.</p>
<p>Earlier this week, I resigned from speech &amp; language company <a href="http://www.nuance.com/index.htm">Nuance Communications</a>, and signed at <a href="http://www.colruytgroup.be/en">Colruyt Group</a>.</p>
<p>For my foreign friends: <a href="http://www.colruytgroup.be/en">Colruyt Group</a> is Belgium’s largest food
retailer - among other things - with a 24.7% local market share (source: <a href="http://www.gondola.be/nl/content/gondola-scant-de-belgische-food-retail-76829">gondola.be</a>).</p>
<p>From mid-March 2016 on, I’ll be mining Colruyt’s vast data resources,
in search for actionable insights and innovative applications.</p>
<p>I can’t be very specific at this moment,
but <a href="https://en.wikipedia.org/wiki/Text_mining">text mining</a> and <a href="https://en.wikipedia.org/wiki/Process_mining">process mining</a> will likely be on my radar. While
I did some work on <a href="http://www.clips.ua.ac.be/projects/biomint-biological-text-mining">(biological) text mining</a> before - it’s a classical subdiscipline of <a href="https://en.wikipedia.org/wiki/Natural_language_processing">Natural
Language Processing</a> - process mining is rather new to me. Which makes it all
the more exciting, of course. I’m already reading an <a href="http://www.processmining.org/book/start">introductory
book</a> and taking the <a href="https://www.coursera.org/course/procmin">Coursera course</a>.</p>
<p>Looking forward to 2016!</p>
<p><a href="/blog/changing-course/">Changing course, once more</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on December 13, 2015.</p>/blog/corporate-buddhism2015-10-03T14:01:00+02:002015-10-03T14:01:00+02:00Frederik Durant<h1 id="conference-call">Conference call</h1>
<blockquote>
<p>Let’s all go on mute<br />
One hour full of silence<br />
Now that’s a brainstorm</p>
</blockquote>
<h1 id="work">Work</h1>
<blockquote>
<p>There’s no I in ‘team’<br />
They say - and no U either<br />
Then why are we here?</p>
</blockquote>
<h1 id="project-planning">Project planning</h1>
<blockquote>
<p>So many milestones<br />
Along the critical path<br />
To stumble upon</p>
</blockquote>
<h1 id="acceleration">Acceleration</h1>
<blockquote>
<p>Faster and faster<br />
Time reduced to one moment<br />
Absolute standstill</p>
</blockquote>
<h1 id="hierarchy">Hierarchy</h1>
<blockquote>
<p>With each new layer<br />
The game of Chinese whispers<br />
Beautifies the truth</p>
</blockquote>
<p><a href="/blog/corporate-buddhism/">Corporate buddhism</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on October 03, 2015.</p>/blog/poetry2015-07-29T14:01:00+02:002015-07-29T14:01:00+02:00Frederik Durant<h1 id="memories">Memories</h1>
<blockquote>
<p>Through the veil of time<br />
Distant features once so close<br />
Glimmer on the mind</p>
</blockquote>
<h1 id="city-walls">City walls</h1>
<blockquote>
<p>High on the rooftops<br />
Water towers bleed to death<br />
Staining the brownstones</p>
</blockquote>
<h1 id="broken-light">Broken light</h1>
<blockquote>
<p>On the window sill<br />
The glass shines, filled with moonbeams<br />
scattered on the floor</p>
</blockquote>
<h1 id="winter">Winter</h1>
<blockquote>
<p>Snowflake by snowflake<br />
Early spring’s nipped in the bud<br />
Flowers in the clouds</p>
</blockquote>
<h1 id="moments">Moments</h1>
<blockquote>
<p>From tomorrow on<br />
Yesterday won’t be the same<br />
as it was today</p>
</blockquote>
<p><a href="/blog/poetry/">Musings from a distant present</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on July 29, 2015.</p>/projects/kiva-talk-brussels-data-science-community2015-05-06T01:55:00+02:002015-05-06T01:55:00+02:00Frederik Durant<p>On Thursday April 23, 2015 I was invited to present my <a href="/projects/kiva-loan-funding-predictor-project/">loan
funding predictor project for Kiva.org</a> at the <a href="http://www.meetup.com/Brussels-Data-Science-Community-Meetup/events/219310846/">Data for Good Meetup organized by the
Brussels Data Science Community</a>.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/jZgtw-eEPPk" frameborder="0" allowfullscreen=""></iframe>
<p>Here is the <a href="https://www.parleys.com/tutorial/meetup-data4good-proof-concept-micro-finance-loan-funding-predictor-kiva-org"><strong>link to the 27.5 minute video</strong></a> of the
presentation, available exclusively to members of the Brussels Data
Science Community.
To gain access, simply register first via your Facebook,
LinkedIn, Twitter or Google Plus account.</p>
<p>If you prefer not to register or are just short on time, here’s the slide deck:</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/E3YuFhceWq3Q78" width="800" height="575" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;
border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen=""> </iframe>
<p><strong>Postscript (May 14, 2015)</strong></p>
<p>Ultimately, and with hindsight, all four soon-to-expire example
loans (<a href="http://www.kiva.org/lend/858602">1</a>, <a href="http://www.kiva.org/lend/858607">2</a>, <a href="http://www.kiva.org/lend/858570">3</a>, <a href="http://www.kiva.org/lend/857469">4</a>) from this
presentation <em>did</em> get fully funded, whereas the model
predicted that only one of them would. Does this mean the model is
flawed? Not necessarily.</p>
<p>As explained, the prediction is <em>a priori</em>: it
does not take into account any effect introduced by, to name one
thing, the prominence of a loan on <a href="http://www.kiva.org/lend#/?sortBy=expiringSoon">Kiva’s web page with loans that are about to
expire</a>. The mere act of highlighting a loan on this page greatly
increases the probability for a potential lender visiting that page
to actually contribute to the loan.</p>
<p>It would be interesting to know which algorithm the Kiva
website builders employ to drive this ranked list. If the algorithm is random,
different viewers see different loan proposals; in that case,
influence should be minimal, since multiple lenders are needed to
fully fund a loan. If not, the act of consistent highlighting
obviously steers lending behavior, favoring some loan proposals over
others.</p>
<p>The really interesting question, now, is <strong>whether the a
priori prediction offered by the model should play a role in
highlighting soon-to-expire loans</strong> (and/or other types of featured
loans).
Highlighting loans with higher predicted a priori probabilities should increase productivity of money in the
system: that’s a global benefit. The other, ethical side of the
medal, is whether such a global optimization is fair to loan proposals
with lower predicted scores.</p>
<p>As discussed during the <a href="https://www.parleys.com/tutorial/meetup-data4good-proof-concept-micro-finance-loan-funding-predictor-kiva-org">Q&amp;A session after the presentation</a>, it is up to
Kiva to decide on the right trade-off between individual fairness and
global productivity of money.</p>
<p><a href="/projects/kiva-talk-brussels-data-science-community/">Presenting the Kiva project to the Brussels Data Science Community</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on May 06, 2015.</p>/blog/age-prediction-child-diseases2015-05-01T18:00:00+02:002015-05-01T18:00:00+02:00Frederik Durant<p><img style="float:left; margin:0px;" width="75" src="/images/microsoft_how_old/actual_46_estimated_38_360x520px.png" />
<img style="float:left; margin-left:51px" width="75" src="/images/microsoft_how_old/actual_46_estimated_39_600x800px.png" />
<img style="float:left; margin-left:51px" width="75" src="/images/microsoft_how_old/actual_45_estimated_42_600x800px.png" />
<img style="float:left; margin-left:51px" width="75" src="/images/microsoft_how_old/actual_45_estimated_43_440x540px.png" />
<img style="float:left; margin-left:51px" width="75" src="/images/microsoft_how_old/actual_46_estimated_43_360x510px.png" />
<img style="float:left; margin-left:51px" width="75" src="/images/microsoft_how_old/actual_46_estimated_45_400x540px.png" /></p>
<p><br style="clear:both" /></p>
<p><img style="clear:both; float:right; margin:5px" width="75" src="/images/microsoft_how_old/actual_46_estimated_47_550x750px.png" />
In this <a href="http://www.secondmachineage.com">(second) machine age</a>, it was only a matter of time before
machines would start guessing our age. Researchers from Microsoft have
recently released an online tool called “<a href="http://how-old.net">How Old Do I
Look?</a>”. When given a photo, the tool will not only tell you how
old you are, but also your gender.</p>
<p><img style="clear:both; float:right; margin:5px" width="75" src="/images/microsoft_how_old/actual_45_estimated_48_440x550px.png" />
Since one is never too old to learn
— especially about oneself — I gave
it a go. Contrary to this <a href="http://www.theguardian.com/media/2015/may/01/how-old-do-i-look-another-way-to-feel-bad-about-yourself-online">rather biased article in The
Guardian</a>, I wanted to use more than a single
data point per person to evaluate the predictor’s quality. (Data) Science oblige!</p>
<p><img style="clear:both; float:right; margin:5px" width="75" src="/images/microsoft_how_old/actual_45_estimated_54_460x560px.png" />
Enter my often mocked selfie archive, painstakenly collected over the
years for exactly this kind of scientific purpose. Thirteen pictures
from the last year were pseudo-randomly selected, with some variation in
hairstyle, location, sunlight, season, hours of sleep, bespectacledness, and headwear.</p>
<p><img style="clear:both; float:right; margin:5px" width="75" src="/images/microsoft_how_old/actual_45_estimated_54_580x700px.png" />
Let’s start with the good news: in 13 out of 13 cases, my gender was determined as
being of the <strong>male</strong> kind. To whom it may concern, and for the record: this prediction is <strong>100%
congruent with the actual situation</strong>.</p>
<p><img style="clear:both; float:right; margin:5px" width="75" src="/images/microsoft_how_old/actual_45_estimated_55_420x540px.png" />
Then again, according to Microsoft, my <strong>average age</strong> over the last year
was (38 + 39 + 42 + 43 + 43 + 45 + 47 + 48 + 54 + 54 + 55 + 55 + 58) /
13 = <strong>47,8</strong> years. This is <strong>almost 2 years more than the actual</strong>
average age I had when the photos were taken. Not brilliant, but not
that bad either, given the relatively small test sample size.</p>
<p><img style="clear:both; float:right; margin:5px" width="75" src="/images/microsoft_how_old/actual_45_estimated_55_470x570px.png" />
With a <strong>standard deviation</strong> of <strong>6.7 years</strong> (about 1/7 of the
average), the age predictor does allow itself a fair amount of
variability. As the saying goes: <em>with age comes wisdom, but sometimes age comes
alone.</em> I’m talking about the predictor, of course — it may
still have some child diseases to outgrow.</p>
<p><img style="clear:both; float:right; margin:5px" width="75" src="/images/microsoft_how_old/actual_45_estimated_58_603x950px.png" />
On a less serious and scientific note: both my daughters will no doubt revel in joy that <strong>funny hats</strong> do not make a
middle-aged person — let alone a father — look any
younger. Quite the contrary.</p>
<p><a href="/blog/age-prediction-child-diseases/">Age prediction's child diseases</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on May 01, 2015.</p>/blog/survive-data-science-bootcamp2015-04-22T17:00:00+02:002015-04-22T17:00:00+02:00Frederik Durant<p>It’s been two weeks now since the <a href="http://www.thisismetis.com/ds-alumni">Winter Class of 2015</a> presented their final projects to
a few dozen hiring companies. For us presenters, summarizing our
<a href="/projects/kiva-loan-funding-predictor-project/">project work of 4-6 weeks</a> into a three-minute
presentation was a challenge. Then again, it forced us to <strong>focus
on the absolute essence</strong>. Which, at higher granularity, was a necessity throughout the
bootcamp.</p>
<p>In this post, I want to share a number of tips that helped me survive 12 very intense
weeks of information overload, never-ending programming assignments
(euphemistically called <em>challenges</em>) and long days.</p>
<p>So, without further ado:</p>
<h1 id="tip-1-define-your-learning-goals-upfront-and-adapt-as-you-go">Tip #1: Define your learning goals upfront, and adapt as you go</h1>
<p>Each student starts class with his/her own background, strengths and
weaknesses. Given the wide array of addressed topics, you may find
yourself positioned on the knowledge/skills curve at the fifth percentile today, and at the
ninety-fifth tomorrow. That’s fine, really — it’s just a
statistical fact.</p>
<p>Apply a <a href="http://en.wikipedia.org/wiki/Triage"><strong>knowledge and
relevance triage</strong></a> to each topic, so you don’t waste time on either “lost” causes
(personal example: <a href="http://en.wikipedia.org/wiki/Generalized_linear_model">generalized linear models</a>) or already gained causes
(personal examples: <a href="http://en.wikipedia.org/wiki/Representational_state_transfer">RESTful API</a>, <a href="http://en.wikipedia.org/wiki/SQL">SQL</a>, <a href="http://en.wikipedia.org/wiki/MySQL">mySQL</a>,
and a few others). <strong>Focus your time and effort
exclusively on what you can, want and need to learn.</strong></p>
<p>It goes without saying that this is different for each student.</p>
<h1 id="tip-2-know-what-you-dont-know-and-want">Tip #2: Know what you (don’t) know and want</h1>
<p>A precondition to tip #1 is that you <a href="http://en.wikipedia.org/wiki/There_are_known_knowns"><strong>know you knowns and unknowns</strong></a>. You
should also know what you (don’t) want to reach as your end
goal. It’s OK to define this self-knowledge as the bootcamp
progresses, but about half-way into the bootcamp, this picture should
become clear.</p>
<p>If you don’t know where you’re going, you won’t get there. If you’re
lost, ask for help. You already paid for it, after all :-)</p>
<h1 id="tip-3-skip-a-lecture-challenge-or-speaker-its-ok">Tip #3: Skip a lecture, challenge or speaker, it’s OK</h1>
<p>If your mind is a muscle, and it’s been strained, give it a
rest. <strong>Don’t feel bad about skipping class every now and then, if
that’s what you need.</strong> Chances are the
information offered that day won’t sink in anyway, so this is probably the
best you can do.</p>
<p>For clarity’s sake: this does <em>not</em> mean that you should give up on the <em>bootcamp</em>,
quite the contrary! The goal is to get your grey cells functioning
again as soon as possible.</p>
<h1 id="tip-4-write-your-first-blog-post-today-yes-today">Tip #4: Write your first blog post today. Yes, today!</h1>
<p>The knowledge and skills that you acquire are one thing — their <strong>public
visibility</strong> quite another. Don’t wait too long to set up your
blog and tell the world about that cool problem you’re working on.
If you do wait too long, you may already suffer a mid-bootcamp-crisis. Instead,
<strong>lay your blogging foundations early enough in the game</strong>.</p>
<h1 id="tip-5-always-always-store-your-work-on-github">Tip #5: Always, <em>always</em> store your work on GitHub</h1>
<p><strong>Push your code and key results at least once a day.</strong> Believe me, you
<em>don’t</em> want to be that one student who lost
everything because of a disk crash, a stolen laptop or that
“funny” classmate aka Unix guru who told you that typing “sudo rm -rf /”
would solve all your problems.</p>
<p>Two out of these three purely
hypothetical examples actually happened during our class.</p>
<h1 id="tip-6-be-ambitious-overstretch-it-but-only-so-far">Tip #6: Be ambitious, overstretch it, but only so far</h1>
<p>There is <strong>no learning within the comfort zone</strong>, so if the bootcamp doesn’t hurt
every now and then, you’re not paying attention and/or not making much progress. Then again,
beware of setting unrealistic project goals: the deadline at the end of the
bootcamp is real, and cannot be postponed.</p>
<p>Therefore, design your final project in such a way that the
solution will be <strong>based on low-risk technologies</strong>
you know you master. Complement this foundation with <strong>newly acquired
skills and techniques that you feel comfortable applying</strong> during the
bootcamp. Finally, top it off with <strong>a fancy technique or crazy idea</strong>.
Just make sure that this cherry on the cake is dispensable — you
never know the odds/gods are against you.</p>
<p>Applied to <a href="/projects/kiva-loan-funding-predictor-project/">my own final project</a>:</p>
<ul>
<li>Foundation: RESTful API, end-to-end solution, GitHub, Python, JSON,
object-oriented programming, agile process.</li>
<li>Novelties: pandas/numpy, serialization and deployment of trained logistic regression model;
D3 visualization of money flows; Supervised Latent Dirichlet
Allocation (SLDA)</li>
<li>Cherry on the cake: browser integration via Google Chrome extension
(written in JavaScript).</li>
</ul>
<h1 id="tip-7-be-agile">Tip #7: Be agile</h1>
<p>To further reduce risk while making steady progress, apply a number of
well-known principles from Agile development:</p>
<ul>
<li>Tackle the <strong>highest-risk problems first</strong>.</li>
<li>Work in <strong>short iterations</strong> of 3-5 days each; make sure you have a
<strong>working Minimum Viable Product to show</strong> at the end of each one.</li>
<li>Use so-called <a href="http://en.wikipedia.org/wiki/Method_stub">stubs</a> to replace complex functionality in
your first iteration, so that the <strong>end-to-end solution chain</strong> is never
broken.</li>
<li>Apply <a href="http://en.wikipedia.org/wiki/Test-driven_development"><strong>test-driven development</strong></a>.</li>
</ul>
<p>Applied to my project:</p>
<ul>
<li>I started with an explorative iteration in which I tested if a
<a href="https://github.com/chbrown/slda">crucial but
external C++ software package for SLDA</a> would work at all — it did.</li>
<li>In the second iteration, the RESTful API hid a <a href="https://github.com/fdurant/kiva_project/commit/594a08af1123af3853f5af47bea75e5b4af139c6">stub function</a> that
returned a score that was randomly generated, but in the right JSON
format. This way, all interfaces between the solution components
were successfully tested end-to-end right from the start. After
that, I “only” needed to fill in the blanks.</li>
<li>My initial idea was to visualize the loan prediction score in a
<a href="http://jqwidgets.com/jquery-widgets-demo/demos/jqxgauge/index.htm">Javascript gauge
widget</a>. Since I couldn’t get it to work, and the functionality was
non-essential, I ditched this idea quite quickly.</li>
<li>80% of all code developed in the the explorative iteration was
reorganized into separate Python classes, with a quite comprehensive number of
test cases.</li>
</ul>
<h1 id="tip-8-whenever-possible-take-shortcuts">Tip #8: Whenever possible, take shortcuts</h1>
<p>By definition, <strong>the most efficiently performed piece of work is the one you choose
not to do</strong>. When picking my final project, the availability of a
comprehensive snapshot of historical Kiva data was a major decision
factor. It meant that I didn’t need to spend (read: waste) my own development and
the machine’s downloading time on web scraping. Having done web
scraping many times before, there was no learning argument either.</p>
<p>Therefore: <strong>be pragmatic. Cut the waste.</strong></p>
<h1 id="tip-9-avoid-dependencies">Tip #9: Avoid dependencies</h1>
<p>Dependencies on external parties, customers, data deliveries,
knowledge gaps, unreliable cloud services, etc. are all risks:
whenever possible, try to avoid them. Focus on what works for you, and
<strong>don’t rely (too much) on external factors that you cannot control.</strong> If
you <em>must</em> rely on an external piece of code or data, make sure to take
away that risk <em>first</em>.</p>
<p>In my project, I decided not to contact Kiva.org before the work was
completed (I <em>am</em> talking to them <em>now</em>). By doing so, I may have missed
an opportunity to work on the most important problem they have, but I
also gained a lot of time, and had total freedom.</p>
<h1 id="tip-10-remember-oscar-wilde">Tip #10: Remember Oscar Wilde</h1>
<p><em>“The only thing to do with good advice is to pass it on. It is never of
any use to oneself.”</em></p>
<p>(source: <a href="http://www.brainyquote.com/quotes/quotes/o/oscarwilde103888.html">BrainyQuote</a>)</p>
<p><strong>To all current and future students: enjoy the pain - and don’t
forget to have <em>some</em> fun in between!</strong></p>
<iframe width="420" height="315" src="https://www.youtube.com/embed/imhrDrE4-mI" frameborder="0" allowfullscreen=""></iframe>
<p><a href="/blog/survive-data-science-bootcamp/">9 1/2 data science bootcamp survival tips</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on April 22, 2015.</p>/blog/usa-picture-haikus2015-04-12T04:00:00+02:002015-04-12T04:00:00+02:00Frederik Durant<h1 style="clear:both; float:left" id="Six">Arlington Cemetary</h1>
<p><img style="float:right; margin-top:20px;" src="/images/jfk_and_so_my_fellow_americans.jpg" alt="JFK's inaugural address at Arlington Cemetary" /></p>
<blockquote style="clear:left">
What's America?<br />
No greater duty on earth<br />
Than live by free will
</blockquote>
<h1 style="clear:both; float:left" id="Six">Art</h1>
<p><img style="float:right; margin-top:30px;" src="/images/nam_june_paik_electronic_superhighway.jpg" alt="Nam June Paik: Electronic Superhighway" /></p>
<blockquote style="clear:left">
The canvass of truth<br />
Is more than often painted<br />
In hellish colors
</blockquote>
<h1 style="float:left" id="Six">New York Subway</h1>
<p><img style="float:right; margin-top:30px;" src="/images/faith_vs_fate.jpg" alt="Faith versus fate" /></p>
<blockquote style="clear:left">
Motion and movement<br />
Driven by skies or by man<br />
A squeaking balance
</blockquote>
<h1 style="clear:both; float:left" id="Six">Brooklyn</h1>
<p><img style="float:right; margin-top:30px;" src="/images/brooklyn_graffiti.jpg" alt="Faith versus fate" /></p>
<blockquote style="clear:left">
Looneys keeping guard<br />
On the street, women yelling<br />
Spring is in the air
</blockquote>
<h1 style="clear:both; float:left" id="Six">Broadway</h1>
<p><img style="float:right; margin-top:30px;" src="/images/helen_mirren_the_audience.jpg" alt="Helen Mirren in The Audience" /></p>
<blockquote style="clear:left">
A ticket to life<br />
Please &mdash; between fame and forgotten<br />
There's no mezzanine
</blockquote>
<p><a href="/blog/usa-picture-haikus/">American picture haikus</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on April 12, 2015.</p>/projects/kiva-loan-funding-predictor-project2015-04-07T13:00:00+02:002015-04-07T13:00:00+02:00Frederik Durant<html>
<head>
<title>Project: an end-to-end loan funding predictor for kiva.org</title>
<!--script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script-->
<!--script src="http://d3js.org/d3.v3.js" charset="utf-8"></script-->
<!--script src="nv.d3.min.js" charset="utf-8"></script)-->
<script src="d3.v3.min.js" charset="utf-8"></script>
<link rel="stylesheet" type="text/css" href="visualizer.css" />
<script src="http://code.jquery.com/jquery-1.11.2.min.js" charset="utf-8"></script>
<script type="text/javascript" src="jquery.lettering-0.6.1.min.js"></script>
<script type="text/javascript" src="career_center.js"></script>
<meta name="viewport" content="width=device-width, minimum-scale=1.0, maximum-scale=1.0" />
<script src="sankey.js" charset="utf-8"></script>
<script src="loanflow_visualizer.js" charset="utf-8"></script>
</head>
<body>
<h2>No time? Here's the short story</h2>
<iframe src="//www.slideshare.net/slideshow/embed_code/46767328" width="476" height="400" frameborder="1" marginwidth="0" marginheight="0" scrolling="no"></iframe>
<h2>Introduction</h2>
<p>My <a href="/projects/kiva-topic-modelling-project/" target="_">previous blog post</a> offered a high-level glance
into the loan descriptions at <a href="http://kiva.org/" target="_">kiva.org</a>.
For my final project at
the <a href="http://www.thisismetis.com/data-science" target="_">Metis Data Science Bootcamp</a>,
I delved a bit deeper in the micro-finance mechanics at Kiva,
looking for a practical problem to solve.
Earlier <a href="#kiva_research">research papers on Kiva</a>
have studied for example how images of Kiva borrowers
influence loan decisions by lenders <a href="#jenq_pan_theseira">[1]</a>,
or which factors motivate
lenders to make additional
loans <a href="#Choo_Lee_Lee_Zha_Park">[2]</a>. My high-level
aim in this project was double: produce <b>new insights</b>, and, above all, build
something <b>ready to use and demo</b> &mdash; all in the course of
four weeks.
</p>
<p>Let's start by setting the stage.</p>
<h3>The micro-finance process at Kiva.org</h3>
<p>
The abovementioned paper <a href="#Choo_Lee_Lee_Zha_Park">[2]</a> contains a clear overview of
the <a href="http://www.kiva.org/about/how/even-more" target="_">Kiva loan process</a>. It is repeated here below, but with
a couple of additions (two ovals and one curved arrow) to point
out some particulars that drew my attention:</p>
<ul>
<li>Field partners actually <b>pre</b>disburse loans to
borrowers <b>before</b> they are published to lenders
on the Kiva website. This obviously involves a certain risk.</li>
<li>So, when (typically an ad hoc group of) lenders allocate their money to a
loan, they are actually <b>back-filling a
predisbursed loan</b>. Only when a loan is fully backfilled
does the loan default <b>risk
move from the field partner to the lenders</b>.</li>
</ul>
<figure>
<img src="/images/kiva_loan_process.png" />&lt;/img&gt;
<figcaption>Augmented overview of the Kiva loan process
(adapted from [<a href="#Choo_Lee_Lee_Zha_Park">2</a>])</figcaption>
</figure>
<p>The crucial point, of course, is whether <b>enough</b> lenders will be
found <b>within a loan request's expiration time</b> to back-fill the
loan.
When a loan is not completely funded
in time, the allocated amounts return to the Kiva lenders, and the full
risk stays with the field partner. The predisbursed loan itself,
however, remains: the borrower already received the money, and is expected to
pay it back over time.</p>
<p>Getting loans fully funded is in the interest of multiple
parties:
<ul>
<li>Even though <b>field partners</b> must have some capital of their own,
they want to limit their overall risk by having the worldwide
Kiva lender community maximally back-fill the predisbursed loans.</li>
<li>Kiva <b>lenders</b> &mdash; who are benevolent by definition &mdash; want to
see their available capital used, rather than stay
dormant in the system.</li>
<li><b>Borrowers</b> will have an easier time getting loans
predisbursed as the fields partners are more confident that
they will be backfilled in time.</li>
</ul>
</p>
<p><b>Efficient loan funding is therefore beneficial to the Kiva
ecosystem as a whole</b>. This is especially true in times when the total amount of
loan requests exceeds the available money in the system.</p>
<p>By the way,
Kiva loans being interest free, there is no other built-in
mechanism for steering offer and demand, than trust, benevolence
and advertizing &mdash; whether by word of mouth or otherwise.</p>
<h3>The downside of Kiva's growth: non-backfilled loans</h3>
<p>In terms of growth, Kiva has been a phenomenal success. Since
its start in 2005, the number of loan requests has risen year after year. In the
early years, there seems to have been enough available money in the
system to fully backfill most, if not all loan requests. However, <b>since
2012, a significant number of predisbursed loans do not get
backfilled</b>, as is illustrated below.</p>
<figure>
<img src="/images/nr_loans_per_quarter.png" style="margin-top:20px; margin-bottom:20px;" />
<figcaption>Per-quarter bar chart of loan requests in period 2006-2014</figcaption>
</figure>
<p>The rising number of non-backfilled loans since 2012 may be due to the
relative lack of fresh inflowing capital from Kiva lenders, and/or to
an intrinsic deterioration of loan quality. Whichever the reason,
it is unknown if, to what extent and how soon the rising proportion of
non-backfilled loans has translated into a growing reticence for
field partners to predisburse loans in the
first place: no data are currently available to test either of these
hypotheses. But from the
field partners' perspective, <b>there must
be an inherent limit to increase their
own capital and incur more risk</b>. In a system with acceptable
risk-taking, any transgression of this limit must
eventually <b>hamper the (invisible) predisbursal rate of initial loan
requests, <em>including valid ones</em></b>.</p>
<p>Especially male borrowers are lagging behind. Not
only have they historically entered the Kiva system less
frequently than females;
but since 2012,
loans to male borrowers have proportionally had a harder time
getting backfilled
than loans predisbursed to females.</p>
<p><b>Irrespective of the amount of money present in the
Kiva ecosystem at any moment, we should at least try to make the
available funds flow as efficiently as possible</b>.</p>
<h3>Kiva micro-finance is big</h3>
<p>As a final introductory note, let's
visualize <a href="sankey.html" target="_">how (much) money
flows through the global Kiva ecosystem</a>. The interactive
<a href="http://en.wikipedia.org/wiki/Sankey_diagram" target="_">Sankey diagram</a>
below summarizes, for the period 2012 through 2014, the
money flows between lenders and borrowers. In these three years,
more than <b>321 million dollars</b> have been lent &mdash; and then
(hopefully) repaid.</p>
<p> The connections in the center &quot;column&quot; show the
country-to-country money flows. The left and right column do not
really represent flows; they rather group the countries by region,
respectively from the lenders' and the borrowers' perspective.</p>
<p>To see the dollar amounts, simply hover with the mouse over the
graph. After a second or two, a popup will appear.</p>
<div id="sankey_container" style="margin-top:20px; margin-bottom:40px; border-width:0px;border-style:solid;border-color:black;">
<h4 style="margin-top:20px; margin-bottom:20px;" align="center">Kiva microloans, grouped per region and country<br />
(2012-2014, combined)</h4>
<div id="heading">
<div class="header" id="lending_region">Lending Regions</div>
<div class="header" id="lending_country">Lending Countries</div>
<div class="header" id="borrowing_country">Borrowing Countries</div>
<div class="header" id="borrowing_region">Borrowing Regions</div>
</div>
<div id="loanFlowChart"></div>
<div style="float:left; text-align:left;"><small>[This <a href="http://d3js.org/" target="_blank">D3</a> <a href="http://bost.ocks.org/mike/sankey/" target="_blank">Sankey Diagram</a> is inspired
by <a href="https://apps.carleton.edu/career/visualize/" target="_blank">Carleton College Career Path]</a></small></div>
<div style="float:right; text-align:right;"><small>[<a href="sankey.html" target="_blank">Open in separate window</a>]</small></div>
</div><!-- sankey_container -->
<p style="clear:both">
All figures are calculated from
a <a href="http://build.kiva.org/" target="_">Kiva JSON data
snapshot</a> downloaded on February 17, 2015. The software code
is available on <a href="https://github.com/fdurant/kiva_project/tree/master/d3" target="_">GitHub</a>.</p>
<p>Now that the stage is set, let's define our problem more precisely.</p>
<h2>Business goal: optimal money flow, by predicting the funding
of new loan requests</h2>
<p>
The <b>business goal</b> in this project is to <b>reduce the amount of
non-productive, sleeping
money in the system</b>.</p>
<p>The main observation is that <b>it is
better for X% of loans to be 100% funded, than for 100% of loans
to be funded at X% each.</b> In the latter (granted, somewhat
extreme) case, following Kiva policy, all funds would simply
return to the lenders. This means that not a single loan would be backfilled!</p>
<p>
To reach this business goal, I propose a tool that
provides an <b>a priori insight into the chance of a loan request
being fully funded</b> by available Kiva lender money. Next to
that, this study aims to uncover <b>actionable insights</b> to the
concerned parties, who may benefit as follows:</p>
<ul>
<li>Field partners and borrowers can work together to <b>adapt certain
characteristics of a loan request</b> in order to increase its
chance of being funded.</li>
<li>Field partners and/or the Kiva platform can <b>promote certain
loan requests</b>, in the interest of system
efficiency. This way they can avoid or mitigate the &quot;many
loans <em>almost</em> funded&quot; trap.</li>
<li>Lenders may want to know up-front if their money is going
to <b>fund a &quot;winning&quot; loan request</b>, and adapt their
decisions accordingly.</li>
</ul>
&lt;/p&gt;
<p><b>Disclaimer</b>:
I am fully aware that each of these uses are <b>possibly
controversial</b>. They might indeed carry unintended
side-effects, such as information asymmetry, perceived or real
favoritism, and, in the worst case, intentional misrepresentation of
loan characteristics. Since trust and transparency are key values for Kiva
&mdash; or any other finance system, for that matter &mdash;
any potential deployment into a global production setting should only happen after careful
analysis of the risks and benefits involved. <b>In the framework
of this 4-week final
project, I make no further statements about this important
question</b>. By the way, and for the record:
Kiva.org was not informed about this project before it was
completed and published here.</p>
<h3>Solution architecture</h3>
<p>My technical solution consists of 4 main components:
<ul>
<li>Web services offered by <a href="http://kiva.org/" target="_">kiva.org</a>,
for <a href="http://build.kiva.org/" target="_">offline</a> and <a href="http://build.kiva.org/api" target="_">online</a> data retrieval</li>
<li>An offline process for training and evaluating the predictor</li>
<li>A live predictor, deployed in the cloud</li>
<li>A web browser (Google Chrome) with a self-developed extension that virtually integrates all
solution elements</li>
</ul>
</p>
<figure>
<img src="/images/kiva_loan_funding_predictor_architecture.png" style="margin-top:20px; margin-bottom:20px;" />
<figcaption>Architectural overview of the end-to-end loan funding
predictor solution</figcaption>
</figure>
<p>I now elaborate on each component, following
the steps (1-6) in which they come into play. This description is somewhat
idealistic, in that the real-life R&amp;D process was more of an
agile and iterative nature.</p>
<p>All code is available
on <a href="https://github.com/fdurant/kiva_project" target="_">GitHub</a>. Steps 1 and 2 were run from
an <a href="http://nbviewer.ipython.org/github/fdurant/kiva_project/blob/master/Kiva_predicting_loan_funding.ipynb" target="_">iPython notebook</a>.</p>
<h2>Step 1: Data Preparation</h2>
<table>
<tr>
<th>Technology</th>
<th>Used for</th>
<th>Useful for</th>
</tr>
<tr>
<td><a href="https://www.mongodb.org/" target="_">MongoDB</a></td>
<td>Local storage of documents/objects in the <a href="http://build.kiva.org" target="_">Kiva JSON snapshot</a></td>
<td>Selection of loans from specific years</td>
</tr>
<tr>
<td><a href="https://github.com/saffsd/langid.py" target="_">langid.py</a></td>
<td>Language identification of Kiva loan descriptions</td>
<td>Correcting erroneous language identification by Kiva, due
to mixed original versions and their translations</td>
</tr>
<tr>
<td><a href="http://pandas.pydata.org" target="_">Pandas</a>
and <a href="http://www.numpy.org" target="_">numpy</a></td>
<td>Efficient storage of (mostly numeric) data points</td>
<td>Training and test data for supervised learning</td>
</tr>
<tr>
<td><a href="http://matplotlib.org" target="_">Matplotlib</a></td>
<td>Plotting data</td>
<td>Producing the loan bar chart, split per gender and
quarter (see above), and feature polarity graphs (see below)</td>
</tr>
<tr>
<td><a href="https://radimrehurek.com/gensim/" target="_">gensim</a></td>
<td>Conversion of loan descriptions to a bag-of-words
representation</td>
<td>Formatting these data in Blei Corpus Format, readable by
SLDA (see below)</td>
</tr>
<tr>
<td><a href="https://github.com/chbrown/slda" target="_">SLDA</a> (Supervised Latent Dirichlet Allocation)</td>
<td>Training and evaluation of an <em>auxiliary</em> topic model</td>
<td>Reduction of 1000-dimensional bag-of-words vectors from the
loan descriptions to 20 features (distribution over topics)</td>
</tr>
</table>
<p>In preparation of a previous project, I had already downloaded
a Kiva snapshot of approximately 775.000 loans and their
respective lenders on February 17,
2015. The data are reused in this project.</p>
<h2>Step 2: Predictor/classifier training and evaluation</h2>
<table>
<tr>
<th>Technology</th>
<th>Used for</th>
<th>Useful for</th>
</tr>
<tr>
<td><a href="http://scikit-learn.org/" target="_">Scikit-learn</a></td>
<td>Training and evaluating a logistic regression model</td>
<td>Producing predictions and (rankable) prediction scores
between 0-1</td>
</tr>
<tr>
<td><a href="https://docs.python.org/2/library/pickle.html" target="_">pickle</a></td>
<td>Serialization of the trained model</td>
<td>Deployment to the live web app</td>
</tr>
</table>
<h3>Feature creation</h3>
<p>
Three types of input features were extracted and/or derived from the JSON snapshot:
<ul>
<li>19 <b>loan features</b> extracted from
the <a href="http://api.kivaws.org/v1/loans/844974.json" target="_">JSON representation</a> of a <a href="http://www.kiva.org/lend/844974" target="_">Kiva loan</a></li>
<li>20 <b>topic model features</b> inferred for each loan
from a Supervised Latent Dirichlet
Allocation model that I created on the side</li>
<li>4 <b>partner features</b> extracted from the <a href="http://api.kivaws.org/v1/partners.json" target="_">JSON
representation</a> of
the <a href="http://api.kivaws.org/v1/partners.html" target="_">Kiva partner list</a></li>
</ul>
</p>
<p>The to-be-predicted label is defined as follows: divide the actual
&quot;funded_amount&quot; by the requested &quot;loan_amount&quot;. If this
ratio is greater than or equal to 1, the loan is fully funded
(<b>label 1</b>). If not, the loan is not fully funded (<b>label 0</b>).
<ul></ul>
</p>
<p>The data set consists of <b>all</b> usable loans from the years 2012
through 2014: <b>21 K negative and 420 K positive</b>
instances. This a priori 20/1 ratio calls for
careful <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" target="_">class weight settings</a> during
model training.</p>
<h3>Model training</h3>
<p>From the outset, it was my plan to fit
a <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" target="_">logistic regression</a> model
against the training data, using 10-fold cross validation. The
chosen <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" target="_">class weight setting is 'auto'</a>, which should compensate
for the strong predicted label imbalance in the training set. I also included
some experimentation with varying regularization parameters. Since
my prime goal in this project is to <b>deliver an end-to-end
system, rather than
only <a href="http://www.vogue.com/1066581/supermodel-workouts-and-trainers/" target="_">an optimally trained &quot;supermodel&quot;</a></b>, I did not try
out any other algorithm here. This pragmatic choice helped us maintain
focus throughout the project.</p>
<figure>
<img src="/images/optimally_trained_supermodel.jpg" />
<figcaption>Depiction of an optimally trained supermodel (source:
<a href="http://www.vogue.com/1066581/supermodel-workouts-and-trainers/" target="_">Vogue.com</a>)</figcaption>
</figure>
<h3>Incremental feature contribution to model performance</h3>
<p>As evaluation metric, I chose
the <a href="http://en.wikipedia.org/wiki/Receiver_operating_characteristic" target="_">ROC</a> <a href="http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html" target="_">area under the curve</a>. To detect the <b>most
informative feature groups</b>, I started, in a 10-fold cross validation setting,
from an information-less
baseline model with 50% ROC AUC performance. I then tested each
of the remaining feature groups independently, selecting the best one
as the
&quot;constant&quot; feature group for the next round. There, it
was on its turn combined with
each of the remaining feature groups; again, a winner was
selected and added to the ranked list of feature groups. The
end result of this iterative process is the following <b>feature group ranking</b>,
in descending order of <b>marginal contribution</b>:</p>
<table>
<tr>
<th>Type</th>
<th>Feature group</th>
<th>ROC AUC</th>
<th>Percentage point improvement</th>
</tr>
<tr>
<td>&mdash;</td>
<td>Baseline</td>
<td>50.00 &#37;</td>
<td>&mdash;</td>
</tr>
<tr>
<td>Loan</td>
<td>Log10LoanAmount</td>
<td>76,50 &#37;</td>
<td>+ 26.50</td>
</tr>
<tr>
<td>SDLA</td>
<td>20 loan description topics</td>
<td>80.72 &#37;</td>
<td>+ 4.22</td>
</tr>
<tr>
<td>Loan</td>
<td>PostedMonth[Jan..Dec]</td>
<td>83.55 &#37;</td>
<td>+ 2.83</td>
</tr>
<tr>
<td>Loan</td>
<td>MajorityGender</td>
<td>85.99 &#37;</td>
<td>+ 2.44</td>
</tr>
<tr>
<td>Loan</td>
<td>Log10NumberOfBorrowers</td>
<td>87.13 &#37;</td>
<td>+ 1.14</td>
</tr>
<tr>
<td>Partner</td>
<td>LoansPosted &amp; TotalAmountRaised</td>
<td>87.42 &#37;</td>
<td>+ 0.29</td>
</tr>
<tr>
<td>Loan</td>
<td>GeoLongitude &amp; GeoLatitude</td>
<td>87.63 &#37;</td>
<td>+ 0.21</td>
</tr>
<tr>
<td>Loan</td>
<td>RepaymentTerm</td>
<td>87.77 &#37;</td>
<td>+ 0.14</td>
</tr>
<tr>
<td>Partner</td>
<td>DelinquencyRate &amp; Rating</td>
<td>87.77 &#37;</td>
<td>+ 0.00</td>
</tr>
<tr>
<td>Loan</td>
<td>BonusCreditElegibility</td>
<td>87.82 &#37;</td>
<td>+ 0.05</td>
</tr>
</table>
<p>A few observations:
<ul>
<li>The mere combination of the <b>four best feature groups</b>
already achieves <b>85.99%</b> of ROC AUC. The additional six feature
groups add less than 3 percentage points.</li>
<li>Given the abovementioned difference between male and
female borrowers, we expected <b>gender of the borrower</b>
&mdash; or, more generally and precisely,
of the <em>majority</em> of borrowers &mdash; to be among the most
informative features. This is confirmed, but it is somewhat surprising
that the <b>20-dimensional semantic contents of the loan
description</b> is even more informative. A plausible
explanation is that <b>gender is partially encoded in various implicit
and explicit ways in the loan description</b>. Check for yourself,
as an example, in
this <a href="http://www.kiva.org/lend/844974" target="_">female</a>
and <a href="http://www.kiva.org/lend/847570" target="_">male</a> loan description for explicit
gender-specific clues like pronouns. The implicit ones may be
hidden inside particular types of gender-specific activities
or textual clues (not further
mined in this project).</li>
<li>The <b>month</b> in which a loan request is posted is also highly
informative. This was already apparent from the histogram
above, where the second and third quarter of each year
show a spike in loan requests, which has an impact on the
backfill ratio.</li>
</ul>
</p>
<p>Also of interest is the feature polarity, i.e. whether a feature
influences a positive prediction (= full funding) in the positive
or negative sense. Let's look successively at the
&quot;PostedMonth&quot; features, the topic features, and all the
other ones.</p>
<h3>What is the optimal time of year to post a loan?</h3>
<p>All other things being equal, it is easier to get loans fully funded that are posted in the <b>first
three months</b> of the calendar year. September and June are the least
advantageous.</p>
<figure>
<img src="/images/LogRes_Model_Coefficients_Months.png" />
<figcaption>Overview of PostedMonth[Jan..Dec] features with their
respective logistic regression model coefficients, ranked from
left to right in descending order</figcaption>
</figure>
<h3>What is the impact of the topics found in the loan descriptions?</h3>
<p>Twelve topics contribute positively towards full funding, the
remaining eight negatively. The extent to which a <b><a href="http://www.kiva.org/lend/844974" target="_">specific
loan description</a></b> contributes in either direction, of course
also depends of
on the actual <b>distribution of topics</b> found in that
particular document.</p>
<figure>
<img src="/images/LogRes_Model_Coefficients_Topics.png" />
<figcaption>Overview of 20 topic features with their
respective logistic regression model coefficients, ranked from
left to right in descending order &mdash;
[<a href="/images/LogRes_Model_Coefficients_Topics.png" target="_">for better viewing, click here to open the graph in a
separate window</a>]</figcaption>
</figure>
<p>Each topic is represented here by its ten most prominent words.</p>
<h3>And what about the other features?</h3>
<p>A <b>positive</b> contributor to full funding is the <b>total amount of
money raised by the field partner</b>: this can be interpreted as a proxy
for the partner's longevity and reliability. On the other hand,
the <b>number of loans</b> (LoansPosted)
that went through that same field partner's hands <b>negatively</b>
impacts full funding - strangely so.</p>
<p><b>The more borrowers</b> take part in a loan request, <b>the better its
chanches</b> to get fully funded. This can be interpreted as a form
of risk reduction, the idea being that multiple borrowers
will help and/or control each other, thereby increasing loan
quality. Otherwise said: lenders tend to appreciate and foster collaboration
between borrowers - with their money.</p>
<figure>
<img src="/images/LogRes_Model_Coefficients_Other_Features.png" />
<figcaption>Overview of the remaining features with their
respective logistic regression model coefficients, ranked from
left to right in descending order</figcaption>
</figure>
<p>In my system, gender is encoded as female=0, and male=1. The
negative impact of the MajorityGender feature therefore confirms
the initial hypothesis that <b>male borrowers have a harder time
getting their predisbursed loan requests backfilled</b>.</p>
<p>The feature with the <b>most important negative impact</b> across the
whole feature set (<em>including</em> PostedMonth and topic features) is
the loan amount: <b>the more money is requested, the harder it
is to get the loan fully funded</b>. Assuming a constant allocatable
sum per lender at a particular moment in time, it is logical that
more lenders are required then to backfill a larger loan amount. If
not enough lenders are found within the expiration period, the loan does
not get fully funded.</p>
<h3>Code organization for easy deployment</h3>
<p>In order to facilitate deployment of the model in a live
setting, and maximize code reuse, all
relevant data preprocessing, feature creation (especially for SLDA),
and feature scaling functionality was cleanly implemented in separate Python modules:
<ul>
<li><a href="https://github.com/fdurant/kiva_project/blob/master/src/SldaTextFeatureGenerator.py" target="_">SldaTextFeatureGenerator</a></li>
<li><a href="https://github.com/fdurant/kiva_project/blob/master/src/KivaLoan.py" target="_">KivaLoan</a></li>
<li><a href="https://github.com/fdurant/kiva_project/blob/master/src/KivaLoans.py" target="_">KivaLoans</a></li>
<li><a href="https://github.com/fdurant/kiva_project/blob/master/src/KivaPartner.py" target="_">KivaPartner</a></li>
<li><a href="https://github.com/fdurant/kiva_project/blob/master/src/KivaPartners.py" target="_">KivaPartners</a></li>
<li><a href="https://github.com/fdurant/kiva_project/blob/master/src/KivaLoanFundingPredictor.py" target="_">KivaLoanFundingPredictor</a></li>
</ul>
</p>
<p>
Finally, the best performing model
was <a href="https://docs.python.org/2/library/pickle.html" target="_">pickled</a> and <a href="https://github.com/fdurant/kiva_project/blob/master/data/predicting_funding/logres_out/kivaLoanFundingPredictor.pkl" target="_">stored in GitHub</a>.
The main script that runs model training and evaluation is
<a href="https://github.com/fdurant/kiva_project/blob/master/src/loan_funding_predictor.py" target="_">loan_funding_predictor.py</a>. See the <a href="http://nbviewer.ipython.org/github/fdurant/kiva_project/blob/master/Kiva_predicting_loan_funding.ipynb" target="_">iPython notebook</a> for its usage.
</p>
<h2>Step 3: Deployment of the predictor to the live back-end</h2>
<table>
<tr>
<th>Technology</th>
<th>Used for</th>
<th>Useful for</th>
</tr>
<tr>
<td><a href="https://github.com/fdurant/kiva_project/tree/master/data/predicting_funding/logres_out" target="_">GitHub</a></td>
<td>Storage of the best performing model</td>
<td>Lightweight deployment to the (until further notice transient) cloud environment</td>
</tr>
<tr>
<td><a href="https://docs.python.org/2/library/pickle.html" target="_">pickle</a></td>
<td>Deserialization of the trained model</td>
<td>Loading the trained model in the live web app</td>
</tr>
<tr>
<td><a href="https://github.com/chbrown/slda" target="_">SLDA</a> (Supervised Latent Dirichlet Allocation)</td>
<td>Inference of gamma values against a pre-trained auxiliary
topic model</td>
<td>On-the-fly feature generation for previously unseen loan descriptions</td>
</tr>
<tr>
<td><a href="http://flask.pocoo.org/" target="_">Flask</a>
and <a href="https://flask-restful.readthedocs.org/" target="_">Flask-RESTful</a></td>
<td>Setup of live RESTful web application</td>
<td>Serving on-the-fly predictions for previously unseen Kiva loan requests</td>
</tr>
</table>
<p>Deployment of the software and prediction model to the virtual machine
at <a href="https://www.digitalocean.com" target="_">Digital Ocean</a>
was simply done by logging onto that machine, and pulling all
relevant files from GitHub. The trained model itself
was <a href="https://docs.python.org/2/library/pickle.html" target="_">unpickled</a> and loaded into
a <a href="https://github.com/fdurant/kiva_project/blob/master/src/predictLoanFundingWebApp.py" target="_">Flask-based web application</a>. Obviously, the
virtual box also includes preinstalled copies of Python, all
required libraries
(including <a href="https://github.com/fdurant/kiva_project/blob/master/src/predictLoanFundingWebApp.py" target="_">sklearn</a>), and a precompiled <a href="https://github.com/chbrown/slda" target="_">slda</a> binary.</p>
<h2>Step 4: Getting a loan page from Kiva</h2>
<p>This step is simply a retrieval of a
new <a href="http://www.kiva.org/lend/844974" target="_">loan page
from the Kiva website</a>. Kiva offers its lenders a constant stream
of <a href="http://www.kiva.org/browse" target="_">recently
predisbursed loan requests</a> from which to choose.</p>
<p>The following steps describe how we can &mdash; at least
virtually &mdash; <b>augment each loan page with live information
from the loan funding predictor</b>.</p>
<h2>Step 5: Getting a prediction from the live back-end</h2>
<p>
<b>Disclaimer</b>: at the time of writing, and until further
notice, the live environment is not
meant to be running permanently. The backend will only be live
during planned demonstrations.
</p>
<p>A typical backend request-response cycle looks like this:</p>
<figure>
<img src="/images/loan_funding_prediction_REST_request_response.png" style="border-style:solid;border-width:1px;" />
<figcaption>Example request and JSON response of the live
predictor, with highlighted fields (blue) and topic gamma
values (orange)</figcaption>
</figure>
<p>Each loan funding prediction response returns:
<ul>
<li>the <b>prediction</b>: 0 (not fully funded) or 1 (fully funded)</li>
<li>the <b>loan funding score</b>: a floating point number between 0 and 1</li>
<li>the <b>topic scores</b>: a list of 20 topics for the loan
description at hand, ranked by descending
<a href="https://www.cs.princeton.edu/~blei/papers/BleiMcAuliffe2007.pdf" target="_">gamma value</a> (prominence).</li>
</ul>
</p>
<p>
Again, each topic is
represented by its 10 most prominent words. In isolation, the gamma values say
nothing about the propensity of a given loan to get funded or not.
As explained earlier, the polarity or direction of the
relationship between each topic and the class label is
captured in the logistic regression model. At prediction time,
the loan document's
gamma values serve as input to the logistic
regression model.
This prediction happens together with all the other features in one go,
of course.</p>
<h2>Step 6: Getting Kiva's information on a new loan</h2>
<p>Before the predictor could send the abovementioned response, it
obviously needed some loan information to work on. In the current
architecture, the predictor web
application
itself <a href="http://api.kivaws.org/v1/loans/844974.json" target="_">reaches out to the Kiva REST API</a>, to get all
relevant loan information on the fly.</p>
<h2>Step 4 revisited: visual integration in Google Chrome</h2>
<table>
<tr>
<th>Technology</th>
<th>Used for</th>
<th>Useful for</th>
</tr>
<tr>
<td>Kango<a href="http://kangoextensions.com/" target="_"></a></td>
<td>Creating a Google Chrome browser extension</td>
<td>Virtual integration of the loan prediction info into a
Kiva loan web page</td>
</tr>
<tr>
<td>jQuery<a href="http://jquery.com" target="_"></a></td>
<td>Faster JavaScript development</td>
<td>Creating two DOM elements that display loan funding
information on a Kiva loan web page</td>
</tr>
</table>
<p>To dynamically integrate the information received in step 5, I
wrote
a Google Chrome extension based
on <a href="http://kangoextensions.com/ " target="_">Kango</a> and <a href="https://jquery.com/" target="_">jQuery</a>. Once it is <a href="https://support.google.com/chrome_webstore/answer/2664769?hl=en">installed</a>
and <a href="https://support.google.com/chrome/answer/167997?hl=en">activated</a>,
the <a href="kivaloanfundingvisualization_0.1.0_chrome_webstore.zip" target="_">Kiva Loan Funding Visualization</a> extension displays two
new boxes inside any Kiva Loan page visited from that
browser. It then looks like this:</p>
<figure>
<img src="/images/kiva_loan_page_with_augmented_prediction_info.png" style="border-style:solid;border-width:1px;" />
<figcaption>Augmented Kiva loan web page, containing two extra blue
boxes with on the fly predicted info</figcaption>
</figure>
<p>The box in the upper-right corner contains the prediction
score expressed as a (rounded) percentage, and a thumbs up/down image that
represents the binary prediction. The treshold is currently set
at 50%. The probability is called a priori, to stress that
the <em>actual</em> funding status (displayed by Kiva) does not play any role in
the predicted value. In other words, the predictor would predict the same
probability before, during or after the funding period - with one
exception: the field partner features (e.g. LoansPosted), may have changed over time.</p>
<p>The box below the borrower's image contains the three most
prominent topics for the loan (description) at hand.</p>
<h2>Conclusion and further work</h2>
<p>In this project, I have developed an end-to-end system that enables multiple
actors in the Kiva ecosystem to get an a priori insight into the
funding chances of previously unseen loans. While the current
logistic regression model can no doubt be improved in various
ways, its current performance of 87.82% ROC Area Under Curve is
sufficient for a proof of concept.</p>
<p>As the virtual
integration through a browser extension demonstrates, the system can
indeed be deployed with minimum extra engineering effort, should
Kiva wish to do so.</p>
<p> At the same time,
the positive or negative contribution of each feature was
identified. This may help Kiva actors in the field to adapt certain
behaviours or revise certain decision strategies, should they feel
the need to do so.</p>
<p>As far as deployment in the Kiva ecosystem is concerned, then, there
will be <b>more strategic and ethical questions to be answered,
than technical ones</b>.</p>
<p>Potential avenues for model performance improvement include:
<ul>
<li>A more refined and tuned Supervised Latent Dirichlet Allocation model</li>
<li>Different logistic regression settings</li>
<li>Alternative estimators</li>
<li>Additional and/or better features. I am especially interested in
adding &quot;trust&quot; features detected from the borrower's
image. The reason is simple: when I made my first Kiva loan
in February, the image contents played a major role in my own
choice.</li>
</ul>
</p>
<h2><a name="kiva_research">Research papers on Kiva</a></h2>
<p>
<ul>
<li>[<a name="#jenq">1</a>] Christina Jenq, Jessica Pan and Walter Theseira: <a href="http://riped.utcc.ac.th/wp-content/uploads/2012/03/Dr_Jessica_Pan_30_March_2012.pdf" target="_">What Do Donors Discriminate On? Evidence from Kiva.org</a></li>
<li>[<a name="Choo_Lee_Lee_Zha_Park">2</a>]Jaegul Choo,
Changhyun Lee, Daniel Lee, Hongyuan Zha and Haesun Park:
<a href="http://www.cc.gatech.edu/~joyfull/resources/2014_wsdm_kiva.pdf" target="_">Understanding and Promoting Micro-Finance Activities in Kiva.org</a></li>
</ul>
</p>
<h2>Acknowledgements</h2>
<p>Having come at the end of this bootcamp in New York City, I would like to thank:
<ul>
<li>my
instructors <a href="https://www.linkedin.com/pub/irmak-sirer/2a/752/846" target="_">Irmak
Sirer</a>, <a href="https://www.linkedin.com/profile/view?id=92036474" target="_">Bo
Peng</a> and <a href="https://www.linkedin.com/in/ajschumacher" target="_">Aaron Schumacher</a> for having turned this Data
Science Bootcamp into such an pleasant and rewarding
experience. You guys rock!</li>
<li>my <a href="https://krash.io/" target="_">Krash Brooklyn</a> housemate and PhD
candidate <a href="https://www.linkedin.com/in/mgerritzen" target="_">Marc Gerritzen</a> for
his valuable comments on an earlier version of this blog
post, and, more generally, his interest in this project.</li>
<li>all my fellow <a href="http://www.thisismetis.com/ds-alumni" target="_">Metis Data Science students (Winter Cohort
2015)</a>, and my fellow <a href="https://krash.io/" target="_">Krashers in Brooklyn</a>.</li>
<li>my family and friends at home, for putting up with my three-month absence.</li>
<li>the amazing and awesome city and people of New York, for
their inspiration and energy.</li>
</ul>
</p>
<figure>
<img src="/images/Metis_Data_Science_Bootcamp_NYC_Winter_Cohort_2015.jpg" style="border-style:solid;border-width:1px;" />
<figcaption>The Metis Data Science Bootcamp Winter Cohort 2015</figcaption>
</figure>
</body>
</html>
<p><a href="/projects/kiva-loan-funding-predictor-project/">An end-to-end loan funding predictor for kiva.org</a> was originally published by Frederik Durant at <a href="">Frederik Durant's .data blog</a> on April 07, 2015.</p>