Using Pachyderm to analyze the 2016 World Chess Championship

Editor: Pachyderm is officially described as a “data lake that offers complete version control for data and leverages the container ecosystem to provide reproducible data processing”. In simpler language, it is an application that makes it easy to store and analyze big data using containers. The following article was originally written by Daniel Whitenack, a data scientist at Pachyderm

It’s no secret that the Pachyderm team loves chess! In fact, one of our first demonstrations of Pachyderm was a statistical analysis of chess blunders. Now, with the 2016 world chess championship decided, we have updated our analysis and see what we could learn about the matches between Magnus Carlsen and Sergey Karjakin.

In the following analyses, which were implemented in a Pachyderm data pipeline, we attempted to learn:

For each game in the Championship, what were the crucial moments that turned the tide for one player or the other, and
Did the players noticeably fatigue throughout the Championship as evidenced by blunders?

We chose to run these analyses in Pachyderm, because they necessitated the use a variety of technologies (Pachyderm is language agnostic), and it allowed us to quickly and easily commit new game data, which triggered automatic updates to our analyses. Read more about Pachyderm pipelines on the Pachyderm website and visit our pipeline docs to learn how to run your own analyses in Pachyderm.