Data Makes The Difference In Fantasy Baseball

mail-1

For Fantasy Baseball players in 2016, there is a wealth of data to mine in preparation for the upcoming season. Besides the proliferation of websites dedicated specifically to the fantasy player, there are numerous sites whose sole purpose is aggregating every imaginable statistic related to the national pastime. FanGraphs, Baseball Reference, ESPN, and MLB are some of the more accessible sources of both conventional and advanced statistics.

In Olden Times

When I started playing fantasy baseball, perhaps a few years before the administration of Hillary Clinton’s husband, the league, of which I am still a member, actually had to pay for a service to compile the week’s stats and mail them to us in paper form. Whoever was running the league that particular year would then make copies and distribute them to each player at a weekly get together. Only then, could you confirm what you suspected was happening with your team from reading the daily box scores in the newspaper.

Preparation for a new season consisted of waiting until early February for baseball magazines or, some years later, fantasy baseball magazines to come out so you could see the full season stats for each player. As a “cutting edge” fantasy player in those days, I would sit down and hand key the previous season’s statistics into Excel, so I could sort them by position or category to gain an edge over my opponents. I was also convinced that by typing in every player in MLB, I was also ingraining the stat line into my brain for instant recall at the auction.

Not Any More

Somewhere around 1997 (I remember this because I had a brand new version of Office), I found the first free Excel download of the previous season’s statistics on this up and coming thing called the internet. No more hand keying in the projected full starting rosters of 30 MLB teams and their statistics from the previous season. Just a few minutes download (sigh) and I was off and running.

So now that I’ve dragged you along on a stroll down a fossilized fantasy baseball memory lane, let’s start digging in to some of the major advantages a modern fantasy, big data, player has over their early 90’s, pre tech bubble, forefathers.

Splits, Splits, and More Splits

It took years for fantasy baseball publications to catch on to the idea of printing first half and second half numbers. Fantasy players wanted the first half / second half split in order to determine whether a player with overall solid numbers might be trending down after a hot start, or perhaps if a player with a disappointing stat line might have heated up after the All-Star break. Savvy fantasy owners could use this information to avoid a player who might not really be as good as his overall numbers, or to target a “sleeper” who’s development at the plate had been masked by a poor start.

Today for batters, not only can you find first half / second half splits, but splits against right handed and left handed pitching – home and away splits – each month’s stats – whether a batted ball was a grounder, fly ball, line drive, or a bunt – what part of the park did each batted ball go – the hitter’s numbers depending on the count – the hitter’s numbers depending on how many people are on base and which base they occupy. And then you can find the same situational advanced stats

For pitchers, a slightly modified inverse of batting splits can be found on several sites, such as pitching against right handed or left handed batters – home and away splits – month by month splits – pitching in low, medium, and high leverage situations – and finally the count. But, technically the “splits” don’t end with these, as the standard statistical page (splits are usually on another page) also breaks down pitchers by pitch type and velocity.

Advanced Stats

While most fantasy baseball leagues play a 5×5 format – batting average, home runs, RBI, runs, and stolen bases on offense – ERA, WHIP, strikeouts, wins, and saves for pitching – there are several advanced statistics that are actually better predictors of those categories than the categories themselves.

With hitters, I usually just ignore RBI and runs. Both statistics tend to be a byproduct of lineup construction. This isn’t to say that I tank the categories, but rather I don’t spend time studying runs and RBI. For instance, some highly unskilled players at getting on base often end up with high run totals just because managers stubbornly continue to adhere to the idea that fast players should bat at the top of the lineup. And, since most power hitters are routinely slotted into the third or fourth spot in the lineup, they will get their RBI.

Sabermetricians have known for some time that batting average isn’t a particularly valuable statistic in evaluating a hitter, but the mystique of the .300 hitter persists and it remains a category in fantasy baseball. The advanced statistic BABIP (batting average on balls in play) is a very good statistic to get a sense of the likelihood a player will maintain, improve upon, or regress from his previous year’s BA. Remarkably, the entire league tends to hover around .300 in BABIP. If you’re scouting a player with a substantially higher BABIP, say .340, expect regression in both BABIP and BA. If you’re scouting a player with a significantly lower BABIP, say .260, expect him to improve in both BABIP and BA.

Two good measures of power from the advanced statistic tool box are OPS (on base plus slugging) and ISO (isolated power). It used to be that conventional wisdom in fantasy baseball was to look at the number of doubles a younger player hit and then assume that as he matured and grew into his body some of those doubles would become HR. Today, following upward trends in OPS and ISO can be an indicator of a break out in the all-important fantasy category.

For pitchers, I also tend to not dwell too much on wins and saves, as they are also largely situational. I do factor what teams pitchers play for, since better teams win more games and have more save situations, but I’m not going to pass up a potential 200 K pitcher because he plays for the Philadelphia Phillies (and they don’t have one of those).

WHIP, in and of itself, is probably the most “advanced” statistic currently used in 5×5 roto baseball. A pitcher can control walks, if not entirely hits, and the ratio by which he does so is a good indicator of control and command.

For correlating statistics to ERA, the advanced statistics to turn to are FIP and xFIP. Fielding independent pitching and expected fielded independent pitching attempt to isolate a pitcher from his own defense and try to determine if he was “lucky” or “unlucky” with ERA. If FIP and xFIP are higher than ERA, a pitcher probably got a little lucky and his ERA should regress some. If FIP and xFIP are lower than ERA, then a pitcher may have been a little unlucky and ERA might come down in the upcoming season. I often just look at FIP and xFIP when thinking about targets for the WHIP and ERA categories.

Statistically Modeled Projections

Remember that dumb non-word everyone used a while back? Guesstimate. Guess is a perfectly good word. Estimate is an equally good word. But, they mean different things. A guess implies making a choice based on very little evidence. An estimate, on the other hand, suggests that some consideration and weighing of facts has informed a decision. Back before the big data sports revolution, fantasy baseball players guessed. Now we can estimate.

Every reputable sports statistics site, with the exception of Baseball Reference, with a more pristine mission of archiving the sport, publishes projections for the upcoming season. ESPN’s Stats and Info provides predictive models for the machine that is ESPN’s fantasy sports. CBS Sports, Yahoo, Roto Wire, and dozens of other fantasy specific sites make predictiona. I usually turn to FanGraphs since they publish three different sets of predictions – Zips, Steamer, and Fans.

What’s important to understand is that these projections aren’t simply the product of a few experts sitting down in a room and throwing out their thoughts on what might transpire in the future. These projections are based on statistical models that are then run 1000’s of times in computer simulations. I can tell you, that kind of expertise doesn’t exist in my league, where TV salesmen, ad men, and comic book shop owners gather once a year to choose their teams.

The Downside of So Much Data

The downside of having so much data at your disposal is that everyone in in your league also has access to the same data. Oh don’t get me wrong. There are still leagues where one or two participants show up with a magazine cheat sheet they bought the day before the draft or auction, but that’s the exception these days, not the rule. Today, with so much information available, anyone who is going to win their league is going to have to (queue the cliché) – work smarter.

Over the next few weeks, leading up to the start of the MLB season, I’ll go over some studying tips and strategies for winning your league. Full disclosure – I play with some very knowledgeable people and I haven’t actually won this league since 2007, when I finished in a tie. With rare exception, I finish in the upper half and I had a several years’ run in the top three. Last year I finished 5th out of 11. If I’m not mistaken, there is only one current player and one former player who routinely read my column, so we’ll go in depth and have some fun.

Pitchers and catchers report February 18th.

Arrow to top