UTK Special 4/2/20

Fatigue, AI, and D2 Baseball

Apr 02, 2020

This one’s going to get a bit technical and a bit narrow, so for those of you just here to read about injuries or figure out whether something’s going to affect your bet, I’m sorry and I’ll see you tomorrow.

For the rest of you, many of you know I spent the spring working with the baseball team at the University of Indianapolis, a Division II baseball program. We went 12-3 before the season was cancelled, enough to know that our methods were working, but not enough to really show how well they would work over the course of the season. Even one season might not be enough.

UIndy Baseball @UIndyBaseball

Here's a look at the @Corey_Bates7 1⃣5⃣ Ks from his complete game 2⃣ hit shutout this past weekend! @BairIsaac working hard back there, stopping at-least 4 of these! #BatteryMates!

As I’ve spoken of in the past, the team used an “opener.” (I still don’t love the term and think it needs to be re-framed, along with the other roles.) The opener was just a part of the system and in fact wasn’t even the focus of the system. Instead, it was simply a way to maximize some of our resources. The side effect was letting our ‘starters’ go deeper into games and that, I believe, would have been the far more important change.

The design of the system came from a very small machine learning project I did using data the team collected using Motus sensors. All of our pitchers used the Motus sensor in all of their workouts (or should have - admittedly, we did not have full compliance.) That gave us data on where they were in terms of both workload and fatigue, though fatigue is where the AI really came in.

Before each game, I used two systems to feed information into the system, which I called the Fatiguealyzer.* (Yeah, it needs a new name as well.) It took the Motus data and combined it with current stats (small sample size issue!) and with a scoring system that is widely used. It then ran 100,000 simulations (Monte Carlo) against a generic opponent. Why generic? I simply didn’t have data about opponents I could trust early in the season. Instead, I used an assumption of facing a team of nine pure-average players from the 2019 Division II season.

So here’s where it gets interesting.

Our team had five “normal starters” - the players that would start the three weekend games, plus two mid-week games. This is a pretty normal collegiate pattern, which is why you hear “Friday starter” as a synonym for ace. However, this is where things get interesting - we were going to limit the starters to five or six innings on their weekend start, then use them again mid-week rather than having a bullpen.

Thinking of it in terms of innings isn’t how the system looked at it however. It was looking in terms of workload (ACR, forward looking) and times through the order. But the latter part of that didn’t work, or at least didn’t work like expected.

It’s now a given, with much supporting data, that times through the order is a major factor in the major leagues. That third time through is tough, but the fourth? That’s why teams have pushed to have bullpen brigades, most notably the Royals world series team that had a near-certain 7/8/9 role set.

At this level and this set, it didn’t work that way. The difference is largely that it didn’t trend linearly. The pitchers often struggled in their first innings, both the opener and the starter, creating a curve rather than the expected line. This was a new role and some pitchers didn’t love the idea, so I think it was both physical - learning a new warm up routine or re-timing it - and mental - wanting their standard role.

I do think that would have changed and created more linearity. I also think that the small sample size leaves far too much unknown to make any sort of judgements about whether it works or doesn’t. My hope is we’ll know far more next year.

What we do know is that the pitchers could have done this. There was no point where any of our pitchers got over the recommended workload aside from early season, when they were being closely monitored. We had two arm injuries and both were pre-existing. I don’t think this would have significantly increased.

What remains to be tested is the patterns that the machine learning system was generating. I don’t want to give those out in too much detail - because they’re not tested and because I’m hoping to use them - but the general idea is to go 1-5-3 in most games, where 1 is an opener going one inning, five is the starter’s innings, and then the three innings are divided among three one-inning relievers, or mix-and-matched with the collection of arms.

During the weekday games, the pattern shifts to 1-3-3-2, where the ‘starter’ gets three and gives three innings to one of the weekday starters. I’m still unsure of this one from a workload basis. It works in the machine, but I’m still cautious about it in real life. I’m leaning more to a 1-4-2-2 system. Another consideration is that our head coach likes to use the weekend starters at the tail end of games, which could also make ‘skipping’ them possible if the score dictates it.

There’s still a lot of unknowns here, but not only do I see enough success in this to think it will work, I don’t think it’s limited to this level. A four-man rotation with strict workload limits would work at the major league level. Piggyback/tandem rotations have been effective at the minor league level, despite being used mostly for development purposes.

I believe that a major league team that switches to a better format for their pitching could gain one of few remaining assymetric advantages left in the game. Pairing it with workload monitoring and other tools would amplify it. I believe in it enough to use it on my team and look forward to an organization that does the same.

You know where to find me when you’re ready, GMs.

* Machine learning is a bit of a buzzword now so I’ll detail what I’m using. It’s basically an AWS Sagemaker system, using H2O.ai to clean up a very sloppy dataset. There’s a different machine learning framework from DataRobot that looks very intriguing. The Monte Carlo simulations were done in Lumina’s Analytica. I did this with a minimum of outside help and I’m no tech guy, so it could be done a lot better and certainly more efficiently, I’m sure.