Loan Descriptions – Can They Be Helpful When Choosing Loans? Part 1

This is the first post of a two part series on Lending Club loan descriptions by guest writer Sam Kramer. Sam spent the last 15 years working in the finance industry, where he was exposed to financial analysis and consumer credit. Sam is married, has two children, and includes investing in Lending Club amongst his hobbies. He can be contacted on Twitter @P2P_CT.

When I first started investing in Lending Club (“LC”) about one year ago, I was drawn to the large data files and ability to analyze the loan data. At the time, I conducted an analysis of default rates relative to loan description length. The result surprised me, as loans with no description showed a lower default rate (in this analysis, default is defined as any loan which is 16 or more days’ late, defaulted or charged off) than loans with any length of description. I did not feel comfortable forming my investment thesis around loans with no description, and decided to test my ability to choose loans based on their description by reading a number of old descriptions. I was not happy with my performance of selecting loans by their description length, so decided to focus on other metrics when deploying my initial investment in the LC platform.

I have given this topic a lot of thought over the past year, and now find myself revisiting loan description lengths to determine whether they might be useful when selecting loans. I have updated my findings in this chart – loan description length is calculated using Microsoft Excel, and makes adjustment for phrases like “Borrower 123456 added on 8/14/10>” (click on the chart to enlarge it):

The above chart demonstrates that very short loan descriptions (between 1 and 10 characters) have quite a high default rate. However, short loan descriptions (11-350 characters) have a default rate which is closer to the default rate of no description loans. Once again, no-description loans appear to have a lower-than-average default rate.

I decided to break my analysis into three areas: no descriptions, very short descriptions (1-10 characters), and all other lengths (11+ characters).

No description loans

I noticed that no-description loans make up a large proportion of the total population (36%). This surprised me – surely more than two thirds of borrowers would feel the need to enter some type of loan description when submitting their loan application? .

Perhaps an additional factor is at work here. The data appears to support this intuition:

The chart above shows the loan description lengths as a percentage of loans issued by quarter. The bright red bars represent the percentage of quarterly loan issuance without descriptions. This chart implies that around Q4 2009, something happened which resulted in a large proportion of loans being issued without descriptions. A closer look at the underlying data shows that starting in October 2009, a large number of loans started being issued without loan descriptions. This rapid change in the data makes me think that borrowers might not be the ones responsible for the large proportion of no description loans.

This data also made me think that my initial default analysis needs to be revisited, as the loans issued in 2007 – 2009 are, for the most part, fully repaid (and have an established default rate), while the 2010 and newer loans still have a larger portion of their balances outstanding, and in all likelihood the default rates on these vintages will increase in the future. Put in other words, the newer loans (which are the only area where no-description loans are present) currently have a lower default rate than the older loans; because no-description loans (in large volumes) are a relatively new phenomenon, they will naturally have a lower default rate than the loans with descriptions.

In order to more closely analyze the default rate on no-description loans (and to avoid the dates when there was a low instance of no-description loans), I narrowed my analysis to 36-month loans issued after January 1, 2010. The revised default rate is as follows:

This revised analysis shows a no-description default rate which is slightly higher than the average default rate for this population. The default rates on the other loan description lengths have a shape similar to Chart 1.

This analysis demonstrates that no-description loans do not perform better than average, but rather perform in-line with the overall population. As such, I do not believe that no-description loans give insight on credit risk. Rather, investing in no description loans is akin to ignoring loan descriptions entirely.

This is interesting, but is only useful here in that it says we can ignore these loans when establishing a loan selection strategy based on descriptions.

In the second part of this series we will take a look at both the very short loan descriptions that seem to have a much higher default rate as well as other loan description lengths.

Related

Comments

Great stuff – look forward to the next installment. Might want to look at credit quality correlation with length of description – my suspicion is that the hidden variable may be different borrower funnels / levels of informational prompting per credit grade.

That is a good point. An initial run by loan grade shows similar patterns for B-D grade loans (long description A’s perform in-line with other A’s). However, the loan population used above (issued in 2010 or newer, 36 months) doesn’t provide a useful sample size for F and G loans (only 51 G’s are in my population).

Very interesting analysis. I wish there was an easy way to analyze the grammar of a loan description. I would like to see an analysis on the performance of loans where the description has proper grammar and punctuation versus a loan without those attributes.

Grammar analysis would be very difficult and I don’t know anyone who has even attempted such an analysis. I am sure some English major with a love of statistics and p2p lending will tackle this one day.

A question about methodology. You say you remove text a la “Borrower 123456 added on 8/14/10>” … do you also remove the text that was subsequently added? I can’t remember the last time I read an actual user generated loan description aside from the responses to lender questions. But, I am a low volume investor, so that might just be me running into the 34% of loans that have no description.

Otherwise, description length other than 0 is (obviously) confounded with whether any lender asked the borrower a question. Moreover, the actual length of the “description” then might also be a proxy for another interesting variable… number of questions responded to. In that case an interesting value might be mean response length per question as well as number of questions asked. The later being a likely indicator of the class of investor who is putting money into the loan and the first being some indicator about the borrower.

Questions / responses are not included in the loan description field. No description loans that have questions responded to will still show up as no description.

However, in instances where borrowers make multiple loan description entries (say, on different days), the adjustment to remove “Borrower 123456 added on 8/14/10>” would only take into account the first instance of this phrase, and not subsequent instances. In this event, loan description length could be overstated by up to (33 x number of entries) characters.

Please correct me if I missed something, but is it not true that if you instead redid chart #3 & just made 2 categories……………loans with no description & loans with some description, that both categories would then fall within the margin of error of the entire sample (3.1%)? The 3.3% no description ones certainly already do.

I look at chart #3 above as it stands & see that everything falls close to the average except the 5 spikes. These 5 spikes whose sample sizes are 206, 364, 301, 862 & 173…………. even when combined account for a mere 3% of the total number of loans. What am I missing?

Wow, that’s really interesting! I would NEVER loan money to anyone foolish enough to go into debt over a pool, a vacation or a wedding. (OK so beat up on me, but that is the way I feel about it – if I wouldn’t do it, I am definitely not loaning money to someone who would.) So I completely missed out on a whole bunch of loans, but maybe that’s not such a bad deal.

Someone (sorry – can’t remember who) once did an analysis of loan descriptions with words like “need,” “help,” “deserve,” and such, and found that – at least among the loans they analyzed, there was a much higher default rate when these words were used. Have you looked at any of those specific words?

This is a very interesting analysis and I find myself, along with many others, trying to figure out exactly how I can mitigate the risk and still get the highest returns possible. Can’t wait for the next installment!

@Sam Here’s another common spelling mistake “consolodate” and “consolodation” between the two you will get over 300 results, combine that with 400 of “dept” and you will have plenty of notes to run analysis on and see if they have a higher default rate.

I’m glad he considered age. I find it helpful to also consider average interest rate and sample size. One of the analyses I’ve done was on “family words” (family, child, children, son, daughter, mother, father, sister, brother, wife, husband). On my pre-filtered data, I found:

1124 notes contain a family word
Of those, 76 (6.8%) are bad (anything besides current or fully paid)
13.52% average interest rate

13486 notes do not contain a family word
Of those, 682 (5.1%) are bad
13.59% average interest rate

My theory is that people use “family words” to invoke sympathy. The assumption is that the need to invoke sympathy correlates with a higher credit risk. Can a statistician chime in and confirm whether this means that notes containing a family word are 33% (6.8/5.1) more likely to be bad?

A quick simulation shows that assuming an underlying default rate of 5.2% (the weighted average of the two default rates) that one would expect that much of an absolute difference between the two groups less than 2% of the time. The actual increase in risk for loans with family words can be estimated from the value observed, but we (obviously) can not know the true value. How to interpret the 6.8/5.1 ratio… I’ll leave that up to someone else.

Trackbacks

[…] week, it’s the a guest post called Loan Descriptions – Can They Be Helpful When Choosing Loans? that has caught my eye. Written by Sam Kramer, a financial sector old-timer and keen P2P investor […]

The Lend Academy Podcast

Archives

About Lend Academy

Lend Academy has been bringing you all the news and information about peer to peer lending since 2010. Founded by Peter Renton, Lend Academy not only has the most active news site, but also the largest online forum and the first and most popular podcast in the industry.

The Lend Academy team loves peer to peer lending and our staff have all invested their own personal money in one or more of the platforms. Lend Academy Media is part of Cardinal Rose Group which also owns LendIt, the leading industry conference, and has a majority interest in NSR Invest.