Matt Briggs, whom I quote, found this graph on reddit. The comments there note, correctly, that the fitted model is stupid. It does not fit reality. Since I spent much of he afternoon looking at a paper based on data mining first childbirth versus first abortion, The idea of models trumping reality was painfully apparent during the afternoon.
To quote Briggs — on that graph.
Problem #1: The Deadly Sin of Reification! The mortal sin of statistics. The blue line did not happen. The gray envelope did not happen. What happened where those teeny tiny too small block dots, dots which fade into obscurity next to the majesty of the statistical model. Reality is lost, reality is replaced. The model becomes realer than reality.
You cannot help but be drawn to the continuous sweet blue line, with its guardian gray helpers, and think to yourself “What smooth progress!” The black dots become a distraction, an impediment, even. They soon disappear.
Problem #1 one leads to Rule #1: If you want to show what happened, show what happened. The model did not happen. Reality happened. Show reality. Don’t show the model.
William S Briggs
If you are going to use a model — and often we do, particularly when dealing with missing data — then you better know what the assumptions within that model are and can state why the model fits.
One of the frustrating things in doing statistics from the command line is you have to understand what the command means to interpret it. But this forces you to be parsimonious, and not over correct the facts on the ground.
Nor, as happened in the discussion today, concentrate on the increased odds ratio for antidepressant use some years post childbirth, ignoring correlations that were twice as strong, such as having other psychiatric medications or seeing a psychiatrist.
For the model should never contradict the findings.