Statisticians: Beware of the Datasaurus!!

Alberto Cairo created the Datasaurus Dozen to demonstrate the necessity to view data beyond its statistics. He created a scatterplot of a dinosaur, and then generated 12 very different scatterplots with almost identical statistics.

  • N: 142
  • Mean: X=54.27, Y=47.83
  • Standard deviation: X=16.77, Y=26.94
  • Correlation x-y: -0.06

The 12 data sets with almost identical statistics to those above are plotted here, including the x and y means as reference lines:

More information about the Datasaurus Dozen, including how the Dozen were generated and how to download the data, can be found here.

The program to create these graphs, including the data, can be downloaded as a zip file from here.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: