Farewell. The Flying Pig Has Left The Building.

Steve Hynd, August 16, 2012

After four years on the Typepad site, eight years total blogging, Newshoggers is closing it's doors today. We've been coasting the last year or so, with many of us moving on to bigger projects (Hey, Eric!) or simply running out of blogging enthusiasm, and it's time to give the old flying pig a rest.

We've done okay over those eight years, although never being quite PC enough to gain wider acceptance from the partisan "party right or wrong" crowds. We like to think we moved political conversations a little, on the ever-present wish to rush to war with Iran, on the need for a real Left that isn't licking corporatist Dem boots every cycle, on America's foreign misadventures in Afghanistan and Iraq. We like to think we made a small difference while writing under that flying pig banner. We did pretty good for a bunch with no ties to big-party apparatuses or think tanks.

Those eight years of blogging will still exist. Because we're ending this typepad account, we've been archiving the typepad blog here. And the original blogger archive is still here. There will still be new content from the old 'hoggers crew too. Ron writes for The Moderate Voice, I post at The Agonist and Eric Martin's lucid foreign policy thoughts can be read at Democracy Arsenal.

I'd like to thank all our regular commenters, readers and the other bloggers who regularly linked to our posts over the years to agree or disagree. You all made writing for 'hoggers an amazingly fun and stimulating experience.

Thank you very much.

Note: This is an archive copy of Newshoggers. Most of the pictures are gone but the words are all here. There may be some occasional new content, John may do some posts and Ron will cross post some of his contributions to The Moderate Voice so check back.


----------------------------------------------------------------------------------------------------

Tuesday, June 2, 2009

P-values and theories

By Fester:



During the past couple of weeks, most of my time at work has been consumed by a massive and hideous data set that promises some really interesting questions once I have made it tractable. This is a beast of a data set, several tens of thousands of variables at multiple time points and a few million pieces of information. It has been a massive pain in the ass.



One of the annoying portions of this task is that I have to keep my co-workers away from the data set until we are ready with a detailed analysis plan. This is to make sure that we are investigating pathways that make sense. For instance it makes no sense for us to investigate Individual Favorite Color and Individual Height, these two (made-up) variables may or may not be related in a statistically significant way but answering that question leads us nowhere.



Another chunk of this task is deciding what types of p-values will be critical levels. This is important because if we did a massive X thousand by X thousand correlation table and assume that the critical p-value is .05 (so that there is a 95% chance that the observed relationship is not due to random chance), we should see hundreds of 'significant correlations' that are due to pure chance.

Okay, let me back up for a second and talk a bit about p-values.  There are a lot of things that are produced in a statistical analysis, but if you need to go quick and dirty and just look at one thing, the p-value is the thing you look at to determine if you need to go any further into the analysis.  P tells you (roughly) what the probability is that the data distribution that is being tested could be the result of random chance.  The higher the P-value, the more likely the data distribution is due to chance or sampling error.  The lower the P-value, the less likely the data distribution is due to chance.  This is a massive simplification but it is close enough --- the lower the P-value, the more likely that something "interesting" is going on in the data. 

There are some relationships where .05 is good enough, but in most cases, p=.025 or .01 will be enough for the team to really feel confident that there is something actually going on and not just noise in the data set. There may be a couple of cases where p=.001 before we are truly confident, but those will be rare cases. These decisions will exclude some valid but weak results, but it it weed out plenty of bull-shit results. We are increasing our confidence intervals because the data set is a monster and we know that we could generate plenty of 'significant' results that don't mean much beyond being artifacts of the data if we go with the baseline academic standard of significance at p=.05.



If I'm looking at a series of potential relationships within this data set that can not produce a p-value of less than .125, I'm laughing at there is nothing statistically there. I can write off the relationship as due to random chance.



Well, the right wingers can not as they are producing highly misleading "analysis" that is asserting a statistically significant correlation that suggests/implies the Chrysler dealerships are being closed as a Clinton retribution scheme, not an Obama scheme, but a Clinton scheme! Besides using a tool that is way more complex than need be the regression that was produced showing a "Clinton effect" had a p-value of .125, at which point they claimed initial significance.



There is nothing there, statistically. And even if the P-value for any of their hypothesis was less than or equal to .05, that is just the start of a good explanation or a scandal if that is what they are chasing. The next step would be to take the statistical data as the initial scent and push forward to find a coherent and logical mechanism with tangible evidence of a deliberate action. Running a few statistical tests and proclaiming victory without a working model that produces those results with a significant P is just damn lazy work.



1 comment:

  1. Jeez Fester, were you away from the internet for awhile and forget to hit the "post" button on a bunch of saved posts? Spread 'em out a little, eh!
    On an editorial note, many of us are not too current in our statistical modeling, if we ever were, and don't have the foggiest clue as to what p-values are. I can get the gist of what you mean from the context, but a brief explanation of what the term means would greatly enhance the readability of the post, IMHO.

    ReplyDelete