I named those my final rankings but I might make slight alterations until draft.

Remarks:

Dahlin is decidedly ranked at #1, while Svechnikov is decidedly ranked at #2 with Svechnikov being a lot closer to 1 than 3.

3rd overall is a toss up for me, with the top11 all being very close. Small alterations in the algorithm or data entered (approximate playing time/quality of linemates) or scouting evaluations could have potentially placed either Wahlstrom, Zadina or Dobson instead of Kotkaniemi at #3 since they are all very close.

The statistics are used as an indicator of performance, but where the player stands in terms of organizational depth is also weighted, those two are weighted differently according to the algorithm depending on the statistical relevancy of each player.

On rare occasions some players are given positive or negative bonuses if something of relevancy hasn't been taken into account by the software, example; injury prone, attitude issue, extra international play.

"I believe that this software can be a useful tool to support the evaluation of prospects, in fact, I would even go as far as to say that it has the potential to give better results than the standard way of evaluating players. When I started doing the exercise of creating my own rankings of draft eligible players I quickly realize how complex of a mental process it is considering all the variables of evaluation that have to be taken into account when evaluating a player (speed, skill, shot, defensive game, compete level, age, offensive production, quality of teammates, league difficulty, ice time, height, weight, tournament play, etc, etc, while assessing to each criteria their degree of importance), inevitably some criteria of evaluation become overvalued or undervalued. Once you start comparing two players the mental process increase in complexity, recalling every relevant information about each player, gaging each gap between the players and assessing which accumulation of gaps outweigh the other, the increase in complexity inevitably creates more room for misjudgment. At that point, if two scouts are arguing about two prospects a fight of will must take over to a certain extent since the analysis of the objective subtleties that differentiate the players can't be addressed in his totality. As a matter of fact, we face ourselves against the limitation of our brain even when isolating simple variables easy to grasp, even the ones who are already objectively defined for us can lead to misevaluation, let’s take the age of player for example, the brain can't associate a precise value for each day of the year, instead, for the sake of simplification, the brain will tend to reduce the number of values by regrouping to a single value weeks or entire month or possibly closer months with one another. This can be seen as a marginal difference that might not have a considerable impact, but those marginal differences can be found everywhere, they add up and their sum can make noticeable differences, even more so if you have a bias towards a specific player your brain will unconsciously selectively favored the regrouping of values in a manner that will confirm your bias opinion. But most importantly, this software eases the process of evaluation by isolating each variable of evaluation in order to come to a final value, the brain performs much better at assessing the most precise value possible to one single variable at the time than taking everything into account at once in order to come to a final value. For those reasons, I believe that if the exercise of assessing the most exact value possible to each isolated variable is done properly with the help of a software it will give results of greater accuracy."

"The human brain has limitation in regards to his precision (exact value attributed to a prospect playing in a certain league at a certain point in time considering his specific date of birth), consistency in evaluation (I give the exact same value to that player as for each other player in that same context, if it is a slightly different context the value is slightly altered proportionally) and processing power (quantity of variables factored in to precision at the same time, looking at all that data at once), those are three major limitations which software does not have, I believe that there is potential there to be exploited in the scouting hockey world."

Click to expand...

Forwards entered in the software (scoring higher than 43):- DY: Draft year. DY-1: 1 year before draft. DY-2: 2 years before draft.
- The first section (with DY, DY-1, DY-2) are the evaluations of each prospect season performance calculated by the software based on the data entered.
- The second section shows the percentage taken into account in the final score, it varies depending on the games/tournaments played for each season.
- The third section shows the player's style of play which is automatically generated based on the scouting evaluation I entered for the player.

Defensemen entered in the software (scoring higher than 43):- DY: Draft year. DY-1: 1 year before draft. DY-2: 2 years before draft.
- The first section (with DY, DY-1, DY-2) are the evaluations of each prospect season performance calculated by the software based on the data entered.
= The second section shows the percentage taken into account in the final score, it varies depending on the games/tournaments played for each season.

Some under the hood type questions: what do you use to generate your predictions? Did you write an application, or is this a model you work with in Python or R? I notice you've been doing this for a couple years now. Have you noticed any degradation in your model as the years have gone by, maybe because of shifting player preferences?

How did you go about choosing your non-obvious features/independent variables?

Terrific work! Really appreciated you laying out how you go about getting to the conclusions by including the human element, not just blindly going with what the analytics say as some do. I guess you believe in what you believe in, but the way you seem to go about it seems like it would yield much better results.

Some under the hood type questions: what do you use to generate your predictions? Did you write an application, or is this a model you work with in Python or R? I notice you've been doing this for a couple years now. Have you noticed any degradation in your model as the years have gone by, maybe because of shifting player preferences?

How did you go about choosing your non-obvious features/independent variables?

Click to expand...

Everything you see right now has been done in excel, the algorithm is probably 3 pages long of purely formulas, I want to transfer it to software now (I have no experience with those coding languages, I just know maths, I will need help for this). It is only the 2nd year I have been using this to evaluate prospects, so no way to notice degradation yet, but that is a valid point which I will need to keep a close eye on (for example smaller defensemen project a lot better than a few years ago). But the reality is that I am already constantly tweaking the algorithm; each time I enter a player I am asking myself if the results make sense, if not I try to think if I can see patterns with players on similar situations, if particular inputs indeed seem to lead to inaccurate evaluations I try to find ways to fix that by either altering the current formulas and/or adding new ones.

About choosing the independent variables; my goal is to mimic the way your brain would naturally evaluate prospects, but with further precision with superior computational processing power, if I am naturally inclined to take into account independent variables when evaluating prospects I try to add them to the software in the same proportional level of importance. Again, each time I enter a player I assess if the independent variables I am using achieve their desired results in improving the evaluation of players.

Everything you see right now has been done in excel, the algorithm is probably 3 pages long of purely formulas, I want to transfer it to software now (I have no experience with those coding languages, I just know maths, I will need help for this). It is only the 2nd year I have been using this to evaluate prospects, so no way to notice degradation yet, but that is a valid point which I will need to keep a close eye on (for example smaller defensemen project a lot better than a few years ago). But the reality is that I am already constantly tweaking the algorithm; each time I enter a player I am asking myself if the results make sense, if not I try to think if I can see patterns with players on similar situations, if particular inputs indeed seem to lead to inaccurate evaluations I try to find ways to fix that by either altering the current formulas and/or adding new ones.

About choosing the independent variables; my goal is to mimic the way your brain would naturally evaluate prospects, but with further precision with superior computational processing power, if I am naturally inclined to take into account independent variables when evaluating prospects I try to add them to the software in the same proportional level of importance. Again, each time I enter a player I assess if the independent variables I am using achieve their desired results in improving the evaluation of players.

Click to expand...

so its all table driven with no hard coded factors or multipliers ? It should not be that hard to code something even with something like access as a database.
That must be one ugly formula to maintain in excel

so its all table driven with no hard coded factors or multipliers ? It should not be that hard to code something even with something like access as a database.
That must be one ugly formula to maintain in excel

Click to expand...

Yeah I only concentrated on finishing the rankings for this year, excel is inefficient in many ways, next step ideally is to program it on software, which I would like it to be accessible for anyone to use. I don't have the extra time to work on this as of now.

These our my favourite rankings, no question. People deathly underrated Kotkaniemi, IMO he looks allot like Datsyuk and is deserving of landing No.3. Glad to see he’s being noticed this high by actual data.

If you have the data, I'd be interested in trying to create a machine learning model and see how that performs

Click to expand...

I would love to see a true ML-oriented take on the draft. I don't know how long we've had this complex CHL of data, but we should be getting to the point where you could have a pretty robust training set with it.

I would love to see a true ML-oriented take on the draft. I don't know how long we've had this complex CHL of data, but we should be getting to the point where you could have a pretty robust training set with it.

Click to expand...

The challenge would be developing some sort of score that would be the output of the model. The data that ProspectsFanatic is using are great input features but the model would have to output some sort of score that could be used to rank the prospects

These our my favourite rankings, no question. People deathly underrated Kotkaniemi, IMO he looks allot like Datsyuk and is deserving of landing No.3. Glad to see he’s being noticed this high by actual data.

Click to expand...

The best way to illustrate why Kotka landed there I believe, in comparison with the most conventional 3rd overall in Zadina, if you are to compare both players statistics you are better off comparing Kotka current season with Zadina previous season, making Kotka 4.75 months younger instead of 7.25 months older than Zadina. What Kotka did this year compare to Zadina last year in a men's league isn't somewhat close, Zadina had 25-1-1-2 in the Czech league (which is the easier league so normally should have somewhat mitigated the age difference). Obviously, you can't negate what Zadina accomplished this year, especially in the WJC20, but that can still give you an idea about why the software puts Kotka in front of Zadina. You can also add to this that Kotka is already the faster skater and is taller than Zadina.

I named those my final rankings but I might make slight alterations until draft.

Remarks:
- Dahlin is decidedly ranked at #1, while Svechnikov is decidedly ranked at #2 with Svechnikov being a lot closer to 1 than 3.
- 3rd overall is a toss up for me, with the top11 all being very close. Small alterations in the algorithm or data entered (approximate playing time/quality of linemates) or scouting evaluations could have potentially placed either Wahlstrom, Zadina or Dobson instead of Kotkaniemi at #3 since they are all very close.

Data taken into account:
- Age, height, weight.
- Scouting evaluation; Skating speed, edge work, shooting, puck control, offensive IQ, competitiveness, physical play, defensive play. (Those evaluations are impacting the statistical evaluation, both sources of data are tested again each other in different ways.)
- Last 3 seasons stats in all leagues played; League difficulty, ice time, quality of teammates, organizational depth, (regular+playoff) GP, G, A, PTS and if available A1, A2, Sh%, relative +/-. Tournament play: WJC18, WJC20, Hlinka.
(The statistics are used as an indicator of performance, but where the player stands in terms of organizational depth is also weighted, those two are weighted differently according to the algorithm depending on the context of each player, for example, independently of statistics being 2nd line SHL on draft eligible year mean something and can be measured as a value.)
* On rare occasions some players are given positive or negative bonuses if something of relevancy hasn't been taken into account by the software, example; injury prone, attitude issue, extra international play.

Forwards entered in the software (scoring higher than 43):DY: Draft year. DY-1: 1 year before draft. DY-2: 2 years before draft.
The first section (with DY, DY-1, DY-2) are the evaluations of each prospect season performance calculated by the software based on the data entered.
The second section shows the percentage taken into account in the final score, it varies depending on the games/tournaments played for each season.
The third section shows the player's style of play which is automatically attributed based on the scouting evaluation I entered for the player.

Defensemen entered in the software (scoring higher than 43):DY: Draft year. DY-1: 1 year before draft. DY-2: 2 years before draft.
The first section (with DY, DY-1, DY-2) are the evaluations of each prospect season performance calculated by the software based on the data entered.
The second section shows the percentage taken into account in the final score, it varies depending on the games/tournaments played for each season.

Yeah I only concentrated on finishing the rankings for this year, excel is inefficient in many ways, next step ideally is to program it on software, which I would like it to be accessible for anyone to use. I don't have the extra time to work on this as of now.

Click to expand...

If you ever manage to turn this into software, I'll definitely buy it.

I hoped to do something in this vain just for German junior players a few years back, but I quickly realised three things:
1) My highschool level math skills are not nearly sufficient to really get into this.
2) Excel can be extremely frustrating to work with.
3) There is just not enough reliable data for German junior hockey out there.

So kudos for getting it this far, I am extremely impressed with your work.

Didn't had time to program it (it is automatically generated), will do it for next year.

Click to expand...

For the overagers, did you enter Marcus Sylvegård in your software? He’s still WJC eligible, good size, and had a nice little year in the SHL last season along with strong, productive showings internationally for Sweden. I’d take him over a few of the ‘top overagers’ you’ve listed (the ones I’ve seen play).