The paper, written with Benoit le Maux, “Natural Catastrophe Insurance: How Should Government Intervene?” should appear soon in the Journal of Public Economics.

This paper develops a theoretical framework for analyzing the decision to provide or buy insurance against the risk of natural catastrophes. In contrast to conventional models of insurance, the insurer has a non-zero probability of insolvency which depends on the distribution of the risks, the premium rate, and the amount of capital in the company. When the insurer is insolvent, each loss reduces the indemnity available to the victims, thus generating negative pecuniary externalities. Our model shows that government-provided insurance will be more attractive in terms of expected utility, as it allows these negative pecuniary externalities to be spread equally among policyholders. However, when heterogeneous risks are introduced, a government program may be less attractive in safer areas, which could yield inefficiency if insurance ratings are not chosen appropriately.

This morning, in the ACT2040 class (on non-life insurance), we’ve discussed the difference between observable and non-observable heterogeneity in ratemaking (from an economic perspective). To illustrate that point (we will spend more time, later on, discussing observable and non-observable risk factors), we looked at the following simple example. Let denote the height of a person. Consider the following dataset

If you look at that black line, you might think of a mixture, i.e. something like

(using standard mixture notations). Mixture are obtained when we have a non-observable heterogeneity factor: with probability , we have a random variable (call it type [1]), and with probability , a random variable (call it type [2]). So far, nothing new. And we can fit such a mixture distribution, using e.g.

Here, we include some constraints, to insurance that the probability belongs to the unit interval, and that the variance parameters remain positive. Note that we have something close to the previous output.

Let us try something a little bit more complex now. What if we assume that the underlying distributions have the same variance, namely

In that case, we have to use the previous code, and make small changes,

This is what we can do if we cannot observe the heterogeneity factor. But wait… we actually have some information in the dataset. For instance, we have the sex of the person. Now, if we look at histograms of height per sex, and kernel based density estimator of the height, per sex, we have

So, it looks like the height for male, and the height for female are different. Maybe we can use that variable, that was actually observed, to explain the heterogeneity in our sample. Formally, here, the idea is to consider a mixture, with an observable heterogeneity factor: the sex,

We now have interpretation of what we used to call class [1] and [2] previously: male and female. And here, estimating parameters is quite simple,

we get the same estimators for the means and the variance as the ones obtained previously. So, as mentioned this morning in class, if you have a non-observable heterogeneity factor, we can use a mixture model to fit a distribution, but if you can get a proxy of that factor, that is observable, then you can run a regression. But most of the time, that observable variable is just a proxy of a non-observable one…

Last week, we had a discussion with some colleagues about the fact that – in order to prepare for the SOA exams – we did not have time (so far) to mention results on extreme values in our actuarial program. I did gave an introduction in my nonlife actuarial models class, but it was only an introduction, in three hours, in order to illustrate reinsurance pricing. And I told my students that if they wanted to know more about extreme values, they should start a master program in actuarial science and finance, since I will give a course on extremes (and copulas) next winter.

But actually, extreme values are everywhere ! For instance, there is a Prudential TV commercial where has people place large, round stickers on a number line to represent the age of the oldest person they know. This forms some kind of histogram. The message is to have Prudential prepare you to have adequate money for all these years. And actually, anyone can add his or her own sticker at the Prudential website.

Patrick Honner, on his blog (http://mrhonner.com/…), did mention this interesting representation. But this idea is not new, as mentioned in a post, published three years ago. In 1932, Emil Gumbel gave a talk in France on the “âge limite“. And as he wrote it “on peut donc supposer que la distribution de l’âge limite – c’est à dire la probabilité que cet âge ait une valeur donnée – soit Gaussienne“. In 1932 (not aware of Fisher and Tippett work, he thought that the limiting distribution for a maximum would be Gaussian). But a few years after, he read about Fisher’s work, and observed also that “la distribution d’une valeur extrêmes peut être représentée pour un nombre suffisant d’observations par la formule doublement exponentielle, pourvu que la distribution initiale se comporte asymptotiquement comme une exponentielle. La formule devient rigoureuse si la distribution initiale est exponentielle“, as he wrote in 1935. And in 1937, he wrote a paper on “les centennaires” that can also be related to the work of Bortkiewicz on rare events. One should also mention one of the most important paper in extreme value theory, published in 1974 by Balkema and de Haan, on Residual Life Time at Great Age.

Because in this experiment, the question is “How Old is the Oldest Person You Know?“, so it is the distribution of a maximum. And from Fisher-Tippett theorem, if we assume that the age is bounded (and that there exists some finite upper limit), then the limiting distribution for the maxima (or to be more rigorous, a affine transformation of the maxima) should be Weibull distribution. And this is what it looks like

> plot(-x,dweibull(x,2.25,4),type="l",lwd=2)

As an actuary, the only thing I know about demography, is the distribution of the age of death. For instance, consider the following French life table

This is the distribution of the age of the death in a given population. Which is not the same as the distribution mentioned above! What we look for is the following: given that someone is alive, what could be the distribution of his-her age ? Actually, if we assume that the yearly number of birth is constant with time (as well as death probability), then we can compute easily to number of people of age : we take everyone born (exactly) years ago, and remove all those who died at at , , etc. So the function should be

But this assumption of constant number of birth is not that relevent. And actually, what we need is the distribution of the age within a population… This is a population pyramid, actually. The French one can be downloaded from http://www.insee.fr/fr/ppp/bases-de-donnees/….

Here, we assume that everyone knows 20 other people, randomly chosen in the entire population, then we return the age of the oldest. And we do that for 1,000 people. Here is the distribution, we obtain

Which is quite close to the distribution obtained in the commercial, don’t you think ? But still, it should be possible to be more accurate, since people should think of their parents, or grandparents. So I guess it could be possible to build a more accurate algorithm, to get something closer to the distribution obtained on the Prudential website. But first, let us wait to have more stickers, more observations… and then I’ll be back to play with it !

I will be in Amsterdam for the end of this week. I will be in the jury of the PhD defense of Julien Tomas, entitled “Quantifying Biometric Life Insurance Risks With Non-Parametric Smoothing Methods” (the thesis will probably be online soon). But before, I will give a talk at the actuarial seminar at UvA. My visit last time was a real pleasure, and it should be the same this time too. I will give a talk this Thursday on “R for actuarial science“. The slides can be downloaded from here.

As mentioned in the Appendix of Modern Actuarial Risk Theory, “R (and S) is the ‘lingua franca’ of data analysis and statistical computing, used in academia, climate research, computer science, bioinformatics, pharmaceutical industry, customer analytics, data mining, finance and by some insurers. Apart from being stable, fast, always up-to-date and very versatile, the chief advantage of R is that it is available to everyone free of charge. It has extensive and powerful graphics abilities, and is developing rapidly, being the statistical tool of choice in many academic environments.”

R is based on the S statistical programming language developed by Joe Chambers at Bell labs in the 80’s. To be more specific, R is an open-source implementation of the S language, developed by Robert Gentlemn and Ross Ihaka. It is a vector based language, which makes it extremely interesting for actuarial computations. For instance, consider some Life Tables,

Some more details can be found in the first part of the notes of the crash courses of last summer, in Meielisalp. Vector – or matrices – are extremely convenient to work with, when dealing with life contingencies. It is also possible to model prospective mortality. Here, the mortality is not only function of the age , but also time ,

Some practitioners might be scared because the legend claims that R is not as good as SAS to handle large databases. Actually, a lot of functions can be used to import datasets. The most convenient one is probably

Finally, R is interesting for its graphical interface. “If you can picture it in your head, chances are good that you can make it work in R. R makes it easy to read data, generate lines and points, and place them where you want them. Its very flexible and super quick. When youve only got two or three hours until deadline, R can be brilliant” as said Amanda Cox, a graphics editor at the New York Times. “R is particularly valuable in deadline situations when data is scant and time is precious.”.
Several cases were considered on the blog http ://chartsnthings.tumblr.com/…. First, we start with a simple graph, here State Government control in the US

Then try to find a nice visual representation, e.g.

And finally, you can just print it in your favorite newspaper,

And you can get any kind of graphs,

And not only about politics,

Graphs are important. “Its not just about producing graphics for publication. Its about playing around and making a bunch of graphics that help you explore your data. This kind of graphical analysis is a really useful way to help you understand what you’re dealing with, because if you cant see it, you cant really understand it. But when you start graphing it out, you can really see what you’ve got” as said Peter Aldhous, San Francisco bureau chief of New Scientist magazine. Even for actuaries. “The commercial insurance underwriting process was rigorous but also quite subjective and based on intuition. R enables us to communicate our analytic results in appealing and innovative ways to non-technical audiences through rapid development lifecycles. R helps us show our clients how they can improve their processes and effectiveness by enabling our consultants to conduct analyses efficiently”, as explained by John Lucker, team of advanced analytics professionals at Deloitte Consulting Principal, in http://blog.revolutionanalytics.com/r-is-hot/. See also Andrew Gelman’s view, on graphs, http://www.stat.columbia.edu/…

The Actuarial Toolkit (see http ://www.actuaries.org.uk/…) stresses the interest of R, “The power of the language R lies with its functions for statistical modelling, data analysis and graphics ; its ability to read and write data from various data sources; as well as the opportunity to embed R in excel or other languages like VBA. In the way SAS is good for data manipulations, R is superior for modelling and graphical output“.

Further, R is free. Which can be compared with SAS, $6,000 per PC, or $28,000 per processor on a server (as mentioned on http ://en.wikipedia.org/…)

It is also becoming more and more popular, as a programming language. As mentioned on this month Transparent Language Popularity (see http ://lang-index.sourceforge.net/), R is ranked 12. Far away after C or Java, but before Matlab (22) or SAS (27). On StackOverFlow (see http ://stackoverflow.com/) is also far being C++ (399,232 occurrences) or Java (348,418), but with 21,818 occurrences, it appears before Matlab (14,580) and SAS (899). As mentioned on http ://r4stats.com/articles/popularity/ R is becoming more and more popular, on listserv discussion traffic

It is clearly the most popular software in data analysis, as mentioned by the Rexer Analytics survey, in 2009

If we consider only statistical softwares, SAS is still far ahead, among UK and CAS actuaries

But, as mentioned by Mike King, Quantitative Analyst, Bank of America, “I cant think of any programming language that has such an incredible community of users. If you have a question, you can get it answered quickly by leaders in the field. That means very little downtime.” This was also mentioned by Glenn Meyers, in the Actuarial Review “The most powerful reason for using R is the community” (in http ://nytimes.com/…). For instance, http ://r-bloggers.com/ has contributions from more than 425 R users.

As said by Bo Cowgill, from Google “The best thing about R is that it was developed by statisticians. The worst thing about R is that it was developed by statisticians.”

On February 15th, IFM2, the Institute of Financial Mathematics in Montréal will organize an (one day) Executive workshop on Econometric Modeling in Finance and Insurance with the R language. The event is not yet mentioned in the calendar, but the syllabus can be downloaded here. Additional details (slides and R code) will be available soon, on this blog. In the morning, it will be an introduction to the R langage, and in the afternoon, we will focus on applications,

An Open Lab-Notebook Experiment

Some
sort of unpretentious (academic) blog, by a surreptitious economist and
born-again mathematician. A blog activist, and an actuary, too. Always curious.
Because academics are probably more than the sum of our publication lists, grants and conference talks...

Used to live in Paris (France),
Leuven (Belgium), Hong-Kong (China), and Montréal (Canada). Professor and researcher in
Montréal, currently back in Rennes (France). ENSAE ParisTech & KU Leuven Alumni