As a beginning data scientist, you’ll have quite a few subject areas that you need to learn (and eventually master). While you’ll certainly need to learn some math and statistics, math and stats are not the first things I recommend to most beginners.

Almost always, I recommend that people start with data visualization.

The reason for this, is that data visualization is so critical to almost every part of getting things done as a data scientist: reporting, analysis, exploratory analysis (e.g., EDA prior to machine learning). You need data visualization constantly. It’s necessary for nearly every data scientist at all levels.

Furthermore, I’ve argued that at junior levels of a data team job hierarchy, data visualization (when combined with data manipulation) is sufficient for being productive. If you’re a junior member of a data team, your core responsibilities may exclusively revolve around visualization (i.e., reporting, analysis, etc).

Because it’s necessary (and in some cases, sufficient) for productivity, it’s a skill that you need to master early.

ggplot2 is the visualization tool I recommend

Continue reading

R statistical language logo