We look at numbers and try to find patterns. We pursue leads suggested by background information, imagination, patterns perceived, and experience with other data analyses.
This need for approximation in analyzing real-world data has been widely recognized and it motivates the widespread use of probability and statistics in analyzing data. This is not the only possible approach - set-theoretic methods based on the “unknown but bounded” uncertainty model are also possible, for example, although they are not nearly as popular or as well developed as statistical methods - and there are even those who are unwilling to accept uncertainty in describing real-world data. One colorful example is Alfred William Lawson’s Principle of Zig-Zag-and-Swirl, discussed briefly in my book and in more detail in L.D. Henry’s biography of Lawson, Zig-Zag-and-Swirl (see the Other Interesting Books section of The Exploring Data Store at the end of this post for details). Lawson dabbled in many things, from playing and managing minor league baseball to writing a Utopian novel (the late Martin Gardner characterized it as “the worst work of fiction ever published”). He is credited with introducing the term “aircraft” into general use not long after the Wright brothers first flight, and he obtained the first
Experimentalists tend to regard it as a mathematical result that data values obey a Gaussian distribution, whereas mathematicians tend to regard it as an experimental result.
- Metadata (the information describing the contents of a dataset that is all too often missing, incomplete, or incorrect);
- Boxplots, modified boxplots, violinplots, and beanplots - useful tools for characterizing the range of variation of a numerical variable over different data subgroups;
- Various types of data anomalies, including outliers, inliers, missing data, and misalignment errors: what they are, how to detect them, and what to do about them;
- Interestingness measures as useful characterizations of categorical variables;
- Data transformations and the things they can do, both expected and unexpected, sometimes good and sometimes very bad;
- And anything else related to exploratory data analysis that strikes me as interesting along the way.