A couple weeks ago at BangTheBook, an article was published about looking at starting pitchers using sabermetrics rather than the traditional baseline of earned run average. ERA is both defense-dependent and subject to luck, two things that pitchers cannot really control. Traditionalists who don’t subscribe to sabermetric theory can argue points about how a pitcher should pitch to his defense or other things to that effect, but a pitcher can ultimately control only the outcomes that don’t involve fielders.
That article, which ran on May 22, covered three pitchers due negative regression and two pitchers due positive regression, or as it was termed in the article, progression. In some cases, there is regression to the mean, in a statistic like batting average on balls in play (BABIP), which has a 20-point range that most pitchers fall into. Extreme ground ball or fly ball pitchers will deviate from that range, but most pitchers with reasonable batted ball splits will land between .270 and .290 in BABIP.
In other cases, regression or progression is suggested by advanced statistics like FIP (fielder-independent pitching), xFIP (expected fielder-independent pitching assuming a league average HR/FB%), and SIERA (skill-interactive ERA, a complicated formula using batted ball types, strikeouts, and walks). Of the sabermetric statistics, xFIP and SIERA are two of the best predictors of future performance.
The pitchers covered in the May 22 article were Tom Koehler, Mark Buehrle, and Andre Rienzo. Koehler had a 2.25 ERA, a 4.12 FIP, a 4.53 xFIP, and a 4.65 SIERA at the time the article was written. Entering his Wednesday night start, Koehler has a 3.18 ERA, a 4.34 FIP, a 4.54 xFIP, and a 4.60 SIERA. Regression can happen over time or happen instantly. Over the three starts since the article, Koehler’s ERA has gone up nearly one full run and the metrics suggest that it will keep gradually climbing.
Buehrle continues to defy his advanced metrics, now 10-1 with a 2.10 ERA, a 3.06 FIP, a 4.07 xFIP, and a 4.34 SIERA. At the time of writing, Buehrle had a 2.11 ERA. Regression has not hit yet, but it appears likely. Rienzo was a perfect 6-0 for bettors back on May 22 with a 4.00 ERA, a 5.34 FIP, a 4.82 xFIP, and a 4.75 SIERA. Rienzo took the loss in each of his last two starts and is Friday’s probable starter. His ERA bumped up to 4.26 while his FIP, xFIP, and SIERA all dropped.
David Price and Ian Kennedy were the starters pegged for progression. Price had a 4.28 ERA with a 3.21 FIP, a 2.57 xFIP, and a 2.57 SIERA. There have been minimal changes to his numbers over two starts. Kennedy had a 3.79 ERA with a 2.76 FIP, a 2.90 xFIP, and a 2.93 SIERA. In two starts after the article ran, Kennedy allowed two earned runs over 12 innings and lowered his ERA to 3.42.
Regression analysis is not foolproof. Just because something is suggested or forecasted does not mean that it will happen. In most cases, regression will happen, though it may be a gradual process. It’s rare that it happens in one fell swoop with a one inning, eight earned run performance, but that can happen. One concern with sabermetrics from a betting standpoint is that most stats require a large sample size to normalize. It is a long-term projection, not a short-term one, though large gaps between ERA and any of the three sabermetric stats oftentimes suggest regression in the immediate future.
If a pitcher is due for regression, the smart thing would be to go against him several times in a row if regression is gradual and to back off if the regression is swift. Even if you go against a pitcher five consecutive times and he pitches well in two of those five starts, a profit is still a possibility.
Let’s look at a couple pitchers who may experience regression in the very near future.
Jason Vargas (KC) – Jason Vargas has been a tremendous $8M investment for the Royals this season after signing a four-year, $32M contract in free agency. Vargas is 5-2 with a 3.38 ERA. Unfortunately, the weather is warming up and Kauffman Stadium becomes a pretty good hitter’s park during the summer months. Balls carry well into the deep gaps and that’s a concern for Vargas, a pitcher who has a pretty even GB/FB ratio.
Vargas’s advanced metrics suggest some regression with a 4.28 FIP, a 3.98 xFIP, and a 3.99 SIERA. Both the xFIP and SIERA are currently career bests for Vargas, and that seems highly unlikely to continue given that he has played for teams with excellent pitcher’s parks for the duration of his career until now.
Look deeper for Vargas than the FIP/xFIP/SIERA combo and you’ll see other things of concern. Vargas, who has never been a strikeout pitcher, has seen a drop in his pop up rate. Pop ups are essentially strikeouts for contact pitchers because they’re mostly harmless. Vargas is stranding 82.1 percent of opposing baserunners, nearly 10 percent above his career average. Part of that can be attributed to a slightly higher strikeout rate and a quality Royals defense, but only one pitcher, Yu Darvish, posted a LOB% above 82 percent last season. The others over 80 percent included Hisashi Iwakuma, Julio Teheran, Clayton Kershaw, and Zack Greinke. Jeremy Hellickson was the only one above 82 percent in 2012 and regression hit him like a ton of bricks.
Vargas’s batting average against and BABIP are in line with his career numbers, but, again, it’s important to realize that Vargas enjoyed pitching more than half of his games in places like Seattle, Oakland, and Anaheim over the course of his American League career. For his career, Vargas has a 4.91 road ERA and a 4.93 road FIP. That’s compared to a 3.56 home ERA and 3.89 home FIP. Vargas’s home ERA this season is 5.26 in 37.2 innings with a 1.60 ERA on the road in 39.1 innings. With the exception of his last start in Toronto, Vargas has made three cold weather April road starts and road starts in Seattle and Anaheim. Look for things to go south for Vargas very, very soon.
Jorge de la Rosa (COL) – Jorge de la Rosa has engineered a rather impressive season so far. In his first three starts, de la Rosa was blasted to the tune of 14 runs in 13 innings. Since then, de la Rosa has allowed 10 earned runs in his last eight starts covering 53 innings.
In that time, however, de la Rosa has posted a 34/19 K/BB ratio, which is below average, and has allowed just 41 hits. One of the improvements that has led to de la Rosa’s quality season is that his ground ball rate has gone up, which is a very good thing with a Rockies team that excels at fielding.
That being said, the Rockies are now without Gold Glove third baseman Nolen Arenado until late July. On the season, de la Rosa’s BABIP is just .235, nearly 70 points below his career average and way too low to be sustainable for a below average strikeout pitcher. His walk rate of 9.5 percent is slightly above average, meaning that an increase in hits will lead to more runners in scoring position. Part of de la Rosa’s low BABIP is that nine of his 53 hits allowed have been home runs, but even a .217 batting average against is pretty unsustainable, especially in an environment like Colorado. His career slash line against in Colorado is .253/.330/.405, but this season, it sits at .217/.308/.391.
The predictive stats indicate some regression, though not as much because his HR/FB rate is actually a little high. But he has a 3.68 ERA, a 4.74 FIP, a 4.08 xFIP, and a 4.13 SIERA. Because of his crafty left-handed style, regression may be more gradual for de la Rosa, but bettors are likely to get inflated prices with de la Rosa at home and those could be some money-making opportunities.
The flip side of the regression coin has pitchers that are due for some improvements in their performance and stats. Here is one of those individuals:
Edwin Jackson (CHC) – The Cubs have been awful and there’s no denying it. In the case of Edwin Jackson, two really bad starts skew his ERA and his advanced metrics tell of a pitcher that has actually pitched quite well. Jackson has a rather ugly 4.81 ERA, but he has a 3.21 FIP, a 3.56 xFIP, and a 3.71 SIERA. His BABIP against sits at .343, which is extremely high given that the Cubs defense has been around average as a group. Jackson is experiencing the best strikeout season of his career, striking out over 22 percent of opposing hitters. His problem is that batted balls have found holes an inordinate amount of times.
Jackson is up there with names like Michael Wacha, Zack Greinke, and Max Scherzer in a stat called Z-Contact%, or zone-contact percentage. Basically, it means the percentage of contact hitters make swinging at pitches inside the strike zone. When pitchers can get swings and misses without requiring a swing at a ball, that tends to yield positive results.
The Cubs now have some stability in the bullpen and actually entered play on Tuesday in a tie with the Tigers for the best starting pitcher WAR in the league. The Cubs have been one of the unluckiest teams by Pythagorean Win-Loss and should show signs of improvement as a group soon and Jackson will be a part of that.
And finally, one of the beauties of having so much baseball data in front of us is that we can see injury indicators for pitchers. Back in early April, a look at pitchers to keep an eye on isolated Matt Moore, Wandy Rodriguez, CC Sabathia, and John Lackey. Moore was in the initial draft as an injury risk and he was placed on the disabled list prior to posting, requiring an edit. Rodriguez dealt with an injured knee was and designated for assignment last month. Sabathia is out until July with a knee injury that was likely hurting his velocity. Lackey is the only one left standing.
This week, the spotlight turns to Ervin Santana of the Atlanta Braves. Santana sailed through his first six starts with a 1.99 ERA and a 43/10 K/BB ratio. In his last four starts, Santana has allowed 20 runs in 23 innings with a 15/10 K/BB ratio. What’s going on?
Santana possesses one of the game’s best sliders. In his first start, 25 percent of Santana’s pitches were sliders. Over the next five starts, he threw no fewer than 33.7 percent sliders. In his last four starts, his slider percentage per game has been 32.3, 23.2, 27.4, 29.9. After throwing 50 percent or more pitches in the strike zone in his first four starts, Santana has ranged from 35.7 to 48.5, with two games below 40 percent in those last four games.
Sliders are the most taxing pitch on a pitcher’s elbow. The torque required to throw a slider is much more detrimental than any other pitch. Santana, who didn’t sign until late in the offseason because teams were worried about his medicals, seems to be staying away from the slider whenever possible. The result has been a drop in strikeouts, a spike in hits allowed, and his performance has coincided with a lack of fastball command. There’s a good chance that Santana is pitching hurt right now and injured pitchers are always worth betting against.
For Santana, it’s a Catch-22. Throw more sliders and increase injury risk and pain or avoid the slider and keep getting hit hard. In the short-term, Santana may bypass the pain for results, but in the long-term, it makes Santana a pitcher to keep a very close eye on.