MLB Player Similarity - Validation

Nov 13, 2019·
Christopher Teixeira
Christopher Teixeira
· 2 min read
post R

As many of you are aware, Baseball-Reference calculates a similarity score for each player against other players. Although this is an accepted way to calculate the similarity between two players, I wanted to see if my methodology compares. I ran my methodology for Hank Aaron (someone we all know and can understand the comparisons) and compared the list I got against the list Baseball-Reference posted. First, let’s look at what Baseball-Reference has:

  1. Willie Mays (782)
  2. Barry Bonds (748)
  3. Frank Robinson (667)
  4. Stan Musial (666)
  5. Babe Ruth (645)
  6. Ken Griffey (629)
  7. Carl Yastrzemski (627)
  8. Rafael Palmeiro (611)
  9. Alex Rodriguez (610)
  10. Mel Ott (602)

I don’t think anyone would argue with any of these players. However, what’s the list of players I came up with? Well, here they are:

  1. Willie Mays
  2. Frank Robinson
  3. Al Kaline
  4. Ernie Banks
  5. Billy Williams
  6. Brooks Robinson
  7. Roberto Clemente
  8. Ken Boyer
  9. Norm Cash
  10. Carl Yastrzemski

What’s interesting about my list is there are certainly players that don’t seem comparable to Hank Aaron. The question then becomes, how did they make it here? Quickly looking at the numbers you can see that I included more statistics for comparison than Baseball-Reference. In addition, I used a weighting scheme for comparing various statistics.

So which one is right? I think its easy to say that Baseball-Reference seems more accurate, but I am continuously looking to improve this methodology and see how that impacts the results. Keep tuned for the final version of the code and methodology.

Christopher Teixeira
Authors
Data Scientist
Christopher Teixeira is a Data Scientist with extensive experience applying statistics, applied probability, and operations research to solve complex organizational challenges. Throughout his career, he has partnered with diverse stakeholders to drive data-informed decision-making, helping organizations navigate the nuances of various analytical techniques to find optimal solutions. Christopher has a proven track record of delivering code in multiple languages, leading large-scale technical efforts, and responding to technical proposals and developing relationships with customers. He holds an M.S. in Operations Research from George Mason University and a B.S. in Mathematics from Worcester Polytechnic Institute.