Statisticians: Beware of the Datasaurus!!

Loading

Alberto Cairo created the Datasaurus Dozen to demonstrate the necessity to view data beyond its statistics. He created a scatterplot of a dinosaur, and then generated 12 very different scatterplots with almost identical statistics.

Plot of Datasaurus data set

  • N: 142
  • Mean: X=54.27, Y=47.83
  • Standard deviation: X=16.77, Y=26.94
  • Correlation x-y: -0.06

The 12 data sets with almost identical statistics to those above are plotted here, including the x and y means as reference lines:

More information about the Datasaurus Dozen, including how the Dozen were generated and how to download the data, can be found here.

The program to create these graphs, including the data, can be downloaded as a zip file from here.

Published by

Philip Holland

Owner and Administrator of Holland Numerics: Blog and Forums.

This site uses Akismet to reduce spam. Learn how your comment data is processed.