Monday, September 26, 2011

AI Series: A Classification Problem

In a previous post on Machine Lending I mentioned that I'd be taking the free Machine Learning course offered by a Stanford professor. The first lectures are now available online and I continue to think about how one would write a program to determine which loans to invest in and which to avoid.

From the lecture:
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
If my understanding is correct, Machine Lending software would have:

  • Task T = Advising the user whether or not to invest in a listing
  • Experience E = The backlog of published data about listings, their Payoff rates, etc.
  • Performance Measure P = The total return from investing in a loan (more detail in my Performance Measure post)

It would fall into the category of Supervised Learning since there is a "correct" answer on whether or not one should have invested in a listing (as measured by Performance Measure P.) And it would fall into the "Classification Problem" subset of supervised learning problems.

(Now, you might be able to change this into a regression problem if you changed the question from "Should I invest in this listing or shouldn't I?" to "How much should I invest in this listing?" But that's a topic for another post.)

Sunday, September 25, 2011

Needs Series: What We Fund

Continuing on with my Needs Series, and with ideas from the graphs of my look at Loan Description Length, let's take a look at what lenders are actually funding:

First let's look all of the 2009 listings on Prosper (the blue line below.) We can see that a bit over 25% of listings had the word "need" in the title or description for most Prosper Scores. (AAs and As were a bit lower, but still over 20%.)

Of those listings that went on to be funded (the red line), we can see they were close to the same percentage as that of the listings for AA-C scores, with D and E loans having a much higher percentage of loans with the word "need" than was in the listings and then a tumbling off to almost nothing for the HRs.



We now look at the data for Prosper's 2010 listings:


To me they look neck-and-neck here. It looks to me like, in general, lenders are not paying attention to whether or not a listing has "need" in the title or description when they are choosing to fund the loan.

This may be unfortunate for lenders because it looks to me like use of the word "need" is correlated with loans which are not paid back.



All Articles in the Needs Series
An Introduction
Initial Findings
Correlation Matrix
Comparing to Lending Club
What We Fund

Sunday, September 18, 2011

Prosper's Top Tips for Borrowers

On Friday Prosper blogged their Top-Tips for Borrower Success. Among the highlights is the following tip:
Write a thorough loan purpose and description.
This one strikes true, especially after my post the prior Tuesday. Allow me to post the graph of my 2010 findings for loan description length again:


Borrowers: it looks like you are already writing longer descriptions when you have a lower Prosper rating (the blue line) which, in my opinion, is a good thing. When we take a look, though, we see that Lenders are funding loans which have even higher character counts than average (the red line.) Indeed, at a B grade and lower it appears that loans that are funded have about 50 more characters than the average request. And HR Grade loans are funded, on average, with even more characters than that. (The aforementioned post from Tuesday shows similar results for 2009 and Pre-2008 requests.)

The data is correlational, not causal, so I can't say that having a longer loan description will make it more likely to get funded -- but it certainly doesn't hurt. Plus, from personal experience, I'd rather fund a loan when I have a clear idea of how my money will be used (and repaid.)

Thursday, September 15, 2011

Bad And Good Words, Another Perspective

Nickel Steamroller followed up with Isepankur (I believe from his initial comment on my post Needs Series: Comparing To Lending Club) and found similar results to what we've been seeing.

Among their findings: loans mentioning payday have a very low ROI. Loans mentioning a steady job or long employment perform much better. Go take a look. Their table is even sortable, which is awesome.

Tuesday, September 13, 2011

Loan Description Length: More Recent Loans

On Sunday I showed findings that, at least for Prosper loans before 2008, lower rated borrowers tend to write longer descriptions for their listings.


From this graph I saw three things:

  1. The lower the credit rating the longer the loan description a borrower writes. (The blue line.) 
  2. Lenders tend to fund loans with higher than average character counts. (The red line.)
  3. Of those loans that are funded, the ones that went on to become Paid tended to have fewer characters than the average funded loan. (The grey line.)

Let's see if these trends hold up for more recent listings. First, we'll look at the number of characters in the listing description, by credit grade, for Listings in 2009:



And now for 2010:


Let's revisit the three conclusions:

  • The lower the credit rating the longer the loan description a borrower writes. (The blue line.) 

This seems even more true now than it did before. Before 2008 there was a peak in characters at around the D rating and then a decline. In 2009 and 2010 we see some very small drops from one credit grade to the next, but for the most part borrowers with a worse credit grade write longer descriptions than borrowers with better credit grades.
  •  Lenders tend to fund loans with higher than average character counts. (The red line.)
This continues to hold true. We still don't know if lenders are funding loans because they have longer descriptions or because long descriptions correlate with some other factors that affect default rate beyond the Prosper rating (low delinquencies, few inquiries, etc) but longer descriptions definitely get funded more.
  • Of those loans that are funded, the ones that went on to become Paid tended to have fewer characters than the average funded loan. (The grey line.)
This does not seem to remain true. It sure looks to me like the Paid/Current descriptions have roughly the same number of characters as the average funded listing. At this point I would posit that this conclusion from my post on Sunday is false.

Prosper Adds New Verification Stage To Listings

Prosper was unavailable for a long time last night, and now we see why. They've added a new verification stage progress indicator to their listings (at the far right.)


Along with a tooltip describing what the stage means.


Way to go, Prosper! More information is always a good thing.

Edit: Prosper has a page up detailing the new feature here.

Sunday, September 11, 2011

Loan Description Length: By Credit Grade

Last week I took a look at how the length of a loan description affects payoff rates with posts about both Lending Club and Prosper.

Today I wanted to dig a bit deeper into the indicated trend: loans with longer descriptions are less likely to be Paid or Current. To begin, let's look at the trend we saw from Prosper Loans initiated before 2008:



The trend is pretty clear cut. Let's step back for a minute, though, and show the number of characters in each description for each credit grade:


From this graph we see three things:
  1. The lower the credit rating the longer the loan description a borrower writes. (The blue line.) 
  2. Lenders tend to fund loans with higher than average character counts. (The red line.)
  3. Of those loans that are funded, the ones that went on to become Paid tended to have fewer characters than the average funded loan. (The grey line.)

Looking at these trends it would seem that, when a credit score is lower, lenders choose to fund loans with longer descriptions. This, very likely, explains my findings from last week. I'll continue to explore on Tuesday with a post examining Prosper's 2009 and 2010 loans.

Thursday, September 8, 2011

Loan Description Length: Prosper

Inspired by a post by Smart Peer Lending, on Tuesday I looked at how the length of a loan's description compared to whether or not the loan was currently Current or had ended with a Paid status for Lending Club loans.

What I found was the opposite of what I expected: loans with shorter descriptions appeared to be Paid or Current more often than loans with longer descriptions. In this post I'll take a look at loans from Prosper and see if the numbers agree.

Pre-2008 Loans (Raw data below.)

Amazingly we see almost a smooth decline between loan groups and percent of loans Paid. I looked, further, at 2008 and Later loans (many of which are still under way) and saw the following results:

Loans from 2008 and Later (Paid and Current) (Raw data below.)

Loans from 2008 and Later (Defaulted and Charged Off) (Raw data below.)

It's easiest to compare the Default and Charge-Off data, which is, again, smaller for less lengthy descriptions and larger for more lengthy descriptions. However, looking at the Paid and Current chart we see that shorter descriptions have fewer Paid loans and more Current loans. This could mean that newer loans have shorter descriptions and older loans have longer descriptions--at least in loans since 2008.


I wanted to go a little farther with these sets of loans, so I further divided the loans between credit grades. I looked at AA, A and B as a set of loans and C, D and E as a set of loans (once again using completed loans made before 2008) and found the following results:

Pre-2008 Loans By Credit Grade (Raw data below.)



For both groups we see the same thing: the fewer characters in a listing, the more likely it was to finish off having Paid.

Well this is certainly an unexpected result. It's worthwhile to keep in mind that these are all loans that funded. It could be that this is a characteristic of Lenders choosing loans with short descriptions only if all other characteristics look good.

But the more I look at this data, and the results of words like "need" and "help", the more it would seem that writing a description is more of a detriment to a borrower than not writing one.

Update: There seems to be a very strong correlation between credit grade and the count of characters in the description. Details are available in my post Loan Description Length By Credit Grade.



Pre-2008 Loans
DescriptionTotal LoansPaidCurrentRecoveredNever Recovered
Pre 2008 Loans, 0-500 character description194870%0%70.8%29.1%
Pre 2008 Loans, 501-1000 character description378864%0%65.4%34.6%
Pre 2008 Loans, 1001-1500 character description343162.7%0%64%35.9%
Pre 2008 Loans, 1501-2000 character description235960.5%0%61.8%38.2%
Pre 2008 Loans, 2001-2500 character description178255.3%0%56.9%43.1%
Pre 2008 Loans, 2501-3000 character description127954.2%0%56.5%43.4%
Pre 2008 Loans, 3001 and greater character description270351.5%0%52.5%47.4%

Loans from 2008 and Later
DescriptionTotal LoansPaidCurrentRecoveredNever Recovered
2008 And Later Loans, Description 0-749 Characters807724.7%62.3%24.9%10.5%
2008 And Later Loans, Description 750-1250 Characters776434.5%45.5%34.9%17.1%
2008 And Later Loans, Description 1250 Characters or More718644.3%28.4%44.9%24.5%

Pre-2008 Loans By Credit Grade
DescriptionTotal LoansPaidCurrentRecoveredNever Recovered
Pre 2008 Loans, AA, A, B All607875.4%0%76.7%23.2%
Pre 2008 Loans, AA, A, B 1249 or fewer character description323178.6%0%79.7%20.3%
Pre 2008 Loans, AA, A, B 1250 or more character description278871.6%0%73.3%26.6%
Pre 2008 Loans, C, D, E All868657.3%0%58.5%41.4%
Pre 2008 Loans, C, D, E 1249 or fewer character description363058.8%0%60%39.9%
Pre 2008 Loans, C, D, E 1250 or more character description497556.2%0%57.4%42.6%

Tuesday, September 6, 2011

Loan Description Length: Lending Club

In a post introducing their new Loan Analyzer, Smart Peer Lending writes that they've added a new Loan Description Length search filter:
Loan Desc Length : One possible feature useful for selecting loans is the length of the description field entered by the borrower. Justification being that borrowers who don't take the time to enter in anything may prove to be a higher risk.
I agree. It makes sense that borrowers who write longer descriptions are ones who care more about their loans and, therefore, are more likely to pay them back. Today I'll look at the data for loan description length on Lending Club and in a few days I'll look at the loan description length on Prosper.


Data Set: Lending Club loans made before the start of 2009

Let's begin by looking at the ROI results using Smart Peer Lending's Loan Analyzer:

(Data is presented in Table Form at the end of the post under the title Lending Club ROI.)

What's interesting is that we see ROI swell in the 101-500 character description range and then drop off significantly after 500 characters--the opposite of what I'd expect. Now I take data from my personal analysis tool and find:

(Data is presented in Table Form at the end of the post under the title Lending Club Percent Paid.)


Wow! The longer the description, the less likely a loan is to be Good (defined as Status = Paid or Current.) This is exactly the opposite of what I was expecting. I suppose that it could be that the longer a loan request is, the more the borrower feels the lender needs to be talked into funding a risky loan request.

On Thursday I'll post an analysis of data from Prosper to see if the same thing holds true with loans made over there. I'll be looking at a broader range of loans and breaking them down into higher-rated and lower-rated loans to see if those categories make a difference.


Update: There seems to be a very strong correlation between credit grade and the count of characters in the description. Details are available in my post Loan Description Length By Credit Grade.



Lending Club ROI
DescriptionTotal LoansSmart Peer Lending ROI
Pre 2009, All29980.72%
Pre 2009, 0-100 Character Description10160.83%
Pre 2009, 101-500 Character Description13531.04%
Pre 2009, 501 Characters And Longer629-0.16%

Lending Club Percent Paid
DescriptionTotal LoansPercent GoodPercent BadFully PaidCurrentCharged OffDefault
Pre 2009, All299677.2%21.8%66.2%11.1%21.1%.2%
Pre 2009, 0-100 Character Description100178.6%20.6%70.2%8.4%20%.1%
Pre 2009, 101-500 Character Description135077.1%21.9%64.6%12.5%21%.2%
Pre 2009, 501 Characters And Longer63075.2%23.7%62.7%12.5%23%.2%

Saturday, September 3, 2011

Bad and Good Words Revisited

In a post the prospers.org forums, user havastat recommended looking at the listings for good and bad words, randomly dividing them, and seeing if they come out similarly. If they don't, there's a good chance that the findings were random. If they do, the findings are more likely to be relevant.

Percent Paid (By Loan): Is the percent of loans, containing the indicated word at least once, which finished with a status Paid.
Percent Paid (By Word): Is the percent of time that a loan ended with the status paid, weighted by the frequency of the word in the listing. (For example, a loan with a title "Help, help, help, help!" which did not pay would count four times more than a loan with "Help" listed only once.)
Word Count: The number of listings containing the word at least once. (Notably not the total number of times the word was used--the maximum here is once per listing.)

Like in the original posts, these are words from Prosper loans that were created before 2008. My methodology is at the bottom of the post, but loans were assigned to groups randomly and there were 8728 loans in each group.

Group 1 Worst Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]61.1%
payday38.9%38.6%596
behind42.9%43.6%592
mother43.5%44.8%566
chance44.5%42.1%631
track46.8%45.6%581
son47.1%44.8%597
daughter48.1%46.3%516
child48.7%47.9%520
husband49%51.3%896
single49.5%49.7%707

Group 2 Worst Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]59.6%
payday37.5%39.2%595
behind42.4%41.3%566
chance43.5%41.5%575
son45.7%44.2%514
mother46.6%46.7%601
children47%45.6%854
daughter47.7%44.6%539
DELETED47.7%46.6%507
child47.8%46.3%552
3000048.3%47.6%532

So, as with the original Words of Loss post, we see the word 'payday' at the bottom, with the words 'behind', 'chance' and then family words like 'mother', 'child', etc. to be on the bottom for both groups.


Now let's take a look at the best performing words:


Group 1 Best Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]61.1%
tax67.1%66.7%504
early67.2%67.7%534
rate67.6%68.6%1952
term67.6%66.6%509
risk67.8%70.3%565
fund68.2%70.2%666
rates68.3%68.8%609
lender68.3%70.2%707
minimum68.4%68.4%583
investment69.1%69.3%679

Group 2 Best Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]59.6%
risk64.2%66.1%592
card64.2%65.9%2910
higher64.4%63.2%765
style64.5%58.8%968
span64.9%59.5%1069
don't65%64.3%861
rate65.2%65.3%1938
student66%66.3%1078
lender66.2%67.5%754
I've66.2%63.6%888

It's interesting to see that lending words appear on both of these lists -- but there are fewer matches than the worst performing words. It looks like we've got 'risk', 'rate(s)' and 'lender' as matches but all of these are still much closer to the average paid than the worst performing words.

It could be that we will find that we can only tell if a loan is more likely to fail from the words that it uses, not that a loan is more likely to succeed.

Methodology:

Similar to the methodology I used in the previous two studies, I began with all Pre-2008 Prosper Loans.

I then placed all the loans in a random order and assigned them to Group 1 or Group 2 sequentially. From there I built a list of all the words in the title and body of the listing for those loans, tallying the number of times the word was used in each loan.

To come up with the Percent Paid (By Loan) I divided the number of loans with that word that finished with a status Paid by the number of loans with that word in total.