> How is statistical significance applied to a temperature time series?

How is statistical significance applied to a temperature time series?

Posted at: 2015-03-12 
For example, we may hear that a warming trend is statistically significant (or not). What exactly does that mean?

A calculated trend line has a confidence interval, that is the range that reality likely falls in. We know that any set of data may be due to randomness, but the more data we have and the more it fits the line, the narrower the confidence level. For a trend to be statistically significant, the data has to fit the trend line such that randomness would cause that trend less than 99% of the time.

I couldn't tell you now how confidence intervals and standard deviations are calculated. As I recall that was halfway into Stats 1A, so even if I now remembered the specifics it would take 8 weeks of stats education to fully explain it. My suggestion if you really want to know is to take a stats course. More simply just understand the concept. A range of 2/month +/- 3 at 99% confidence is not statistical significant; there is a (roughly) a 1% chance that reality is either more than +5 or less than -1. That's a real chance that it's just randomness that happened to line up. But a range of 2/month +/- 1 at 99% is statistically significant: there is a 1% chance that reality is more than 3/month or less than 1/month. There is very little chance that the trend is random.

No measurement system is perfect, including temperature. From observations, (thermometer, ice core, tree rings, etc.) a best estimate is derived with a plus or minus margin of error. The estimate of error is not perfect either, but it's an empirically-derived measurement and statistical level of confidence, so it should be without bias. If the temperature time series shows a trend, but that trend is within the level of statistical confidence, the trend is not significant and should not be relied upon.

There is a good paper on this, "Influence of Choice of Time Period on Global Surface Temperature Trend Estimates" by Liebmann et al, and published in the November 2010 issue of the Bulletin of the American Meteorological Society (BAMS). I know I've linked to this paper numerous times in here, and it seems to be universally ignored.

http://www.esrl.noaa.gov/psd/people/bran...

Zippi62, it's true that if you have sudden changes of temperature occurring toward beginning or end of a recording period then the "mean" temperature (as determined by averaging the high and low temperatures for the day) may not be representative of the mean determined in a more continuous fashion. However, chinook and other such events are relatively sparse in temperature records, and it is much more common to see a very strong diurnal component to temperature. Do you have evidence that semidiurnal or higher order components are a strong source of error in long term temperature means? Have you done a Fourier transform in order to look at the power spectra of temperature? I haven't done it on temperature, but I've done it on water vapor at observation sites and even that was dominated by the diurnal component. I imagine for temperature the dependence would be WAY stronger. Even for something like a chinook, is there a reason you believe that would systematically bias temperatures in a particular direction?

While you can obviously dream up what mathematicians refer to as "pathological" cases where averaging the high and low for a day does not yield a good mean temperature, in the real world when that is done day-after-day, year-after-year, decade-after-decade, those pathological cases will not matter.

EDIT for Raisin Caine's insult of Gary F: Raisin Caine, in a comment to Gary F, said

"Let me take a guess. You had a statistician friend help you reply. Did that statistician also tell you that you add an autoregressive variable to the model as dropping data is an anathema to statisticians?"

This is incredibly condescending, especially considering that Gary F knows statistics in earth sciences infinitely better than Raisin Caine. Why wouldn't he? Gary F has worked in the field his entire life and Raisin Caine has done nothing in the field, and from his answers I'm quite certain he has never formally or informally studied statistics in earth science. Want an example? Here is a link to a statistics question I asked in here a few years ago. Raisin Caine could not answer it, and did not even seem to understand what I was talking about, particularly in regards to autoregressive variables--one of the things he berates Gary F about.

https://answers.yahoo.com/question/index...

In the question, bob326 understands and answers the questions in just a few lines, while Raisin Caine was just befuddled.

Most of the time it is not. Both sides consistently talk about the amount of warming seen in certain periods of time, without even referencing if the trend is statistically significant.

When it is, they are performing linear regression on the data under the assumption of non-modified data.

They assume:

1.) Linearity of the data. An assumption that warmers clearly do not believe.

2.) independence of error.

3.) normality of the error distribution

4.) Homoscedasticity of the error.

As I said, none of the warmers seem to believe #1. The fact that they are running linear regression on smoothed data means that #2 is not a valid assumption. The increase in technology means that homoscedasticity is not accurate. But the averaging does make #3 a fairly valid assumption.

I am fairly certain that smoothing will raise the error rate, and am thinking of researching this subject.

That being said, they could use a nonparametric approach like a rank-test and still finding warming over the last 100 years, so I have little doubt about the warming. My doubt is about their models.

In any other field of research 70% of your models overestimating indicates a serious problem. With warmers, if you are concerned about 95% of the models overestimating, you must be against science. Go figure.

Gary F,

What does linearity of the data mean? Look at your future temperature models. Do they look linear? Better yet, given the temps are linearly increasing by 0.15 degree/decade, ARE YOU suggesting that the temp in 2100 will be only 1.2 degrees warmer than now??? No??? Then you are not assuming linearity of the data.



Now while I will agree that the assumption of linear regression are in basic statistics books, I would also suggest that we should consider the assumptions inherent in the methods we are using.

Smoothed data. They are taking a running average of the data, meaning that your error from timepoint n is not independent from your error at timepoint n+1. they are not "filtering out extraneous signals", they are creating running averages. Hence one cannot assume independence of error. Guess what? There is no way of magically filtering out noise. Further, no one tries to filter out "extraneous signals". The signal is what they are trying to find, the noise is what they wish to filter.

BTW, I already stated that #3 is helped by the averaging, so thanks for restating this, as if I didn't already say this.

Further, Gary F,you are making this entire argument AFTER I stated that you will still get the same results of warming using a nonparametric approach. So you are literally making stupid and invalid argument to win a point that means absolutely nothing.

Once again, thank you for demonstrating why I lack faith in climate "scientists".

Gary F,

I know you think you are smart listing off a bunch of processes, but it is clear you know nothing about them when you say "filter out extraneous signals". And as for linear regression I have already shown many examples of this, and I don't feel the need to do so again. But your lying is coming through. You first say it is easy to see it is linear, then you say no one uses it. Now I understand liars need a good memory, but since you have your printed word here, perhaps you can at least try to make your lies believable. I love the transition from linear regression discussion to pretending no linear regression is used, right after I showed you wrong on every point. Oooops.

I believe one would have to have an "accurate" analysis of temperature to start with to be statistically significant when it comes to "grading" temperature time series. Just because we have thousands of temperature data points over time (day to day input all of the way back to 1880), doesn't mean it shows an accurate depiction of what temperatures were before. The rapid "movement" of temperature itself has always eluded science.

Gary F - " ... It’s pretty easy to see by plotting the data and residuals ... "

Plotting data and residuals? You have the "plotting" point ok, but "plotting" points in time to establish an "average for 1 day", whether it be "highs and lows" or other data doesn't accurately describe a temperature average at a specific point. If the high was 52F and the low was 25F at a specific data point, that doesn't mean the average for that point is 38.5F. That temperature point could have maintained a 50F temperature for several hours. My point is that the average temperature for that data point could be 41F over a 24 hour period or even 34F over that same time period for that specific data point.

Temperature just doesn't "stand still" for us to measure and give us an accurate reading. Most of the past temperature readings are based on highs and lows, yet the average temperatures for 1 specific day could vary by as much as 2F to 4F at many locations in the same time series.

Chinooks are a prime reason of why it is so hard to accurately measure temperature. A temperature rise of 50F in a matter of minutes can seriously "bias" a temperature based on averages. Temperature may be near 0 for most of the day, but a sharp heat wave or cooling spell makes that data point impossible to use in an anomaly. "High was 50F and low was 0F so the average is 25F for the day"? There's nothing conclusive in past temperature data that filters this out of the equation.

Modern measuring techniques may be getting closer to "actual" temperature of the Planet, but past measurements might as well be thrown out simply because we can't track the movement of temperatures. Past temperature data is statistically insignificant for the most part. Who knows how far back we can "accurately depict" the global average temperature?

Pegminer - " ... in the real world when that is done day-after-day, year-after-year, decade-after-decade, those pathological cases will not matter. ... "

Who says the real world did this 30 years ago, or even 50 years ago? ... and I'm not even talking "only" about Chinooks. Those are extreme weather events. I'm talking about averages being above or below what the stated averages are for any given day back in the 1940s, 50s, 60s, or even the 70s and being inaccurate on what those averages really were. Playing games with temperature anomalies is what has been happening. They do this simply because there isn't an accurate and true measurement of Earth's global average temperature. That's been my point all along.

A 0.2C or 0.3C is huge when we are talking about a 0.74C total rise in temperatures. Accuracy in measurement and how we have measured in the past versus how we measure it now is very relevant. If climate science can't accurately depict an "actual" global average temperatures now with all of our "modern" technological advances, then what does that say about how we did it in the past?

Simply put, I am talking about "average" temperatures on a daily basis over the extended time period of 24 hours and not just the high and the low for that day. Can you understand that? ... or are you too arrogantly fixated on your cause?

The anomalies fluctuate by 0.3C from month to month (cooling and warming) all of the time. If the Earth really does this, then what's the point? Trends mean nothing if we can't accurately and actually show it. It simply means that science has no clue of what the temperature the Planet "is" or "should be" at this point in time. I grant you the idea that CO2 causes warming (gives the possibility of more energy in the atmosphere), but to the extent that it is a minor warming over time versus an extreme variation upwards has been the question all along.

Signal to noise.

Signal to noise improves with longer time frames.

it does not just have to be a comparison of slope lines - and it soes not have to be just temperature vs, temperature..

Run a t-test on the temperature data before and after 1997 (or whatever) and you will get a statistically significant difference at >99.999999....% (I know because I've done it).

======

Raisin Caine –

>>1.) Linearity of the data. An assumption that warmers clearly do not believe. <<

What is that supposed to mean? It’s pretty easy to see by plotting the data and residuals.

>>2.) independence of error. 3.) normality of the error distribution 4.) Homoscedasticity of the error. <<

These are either unimportant/unnecessary and/or untrue unless you have too small of a sample. Tell the truth – You lifted this out of some introductory text book, huh?

>>The fact that they are running linear regression on smoothed data means that #2 is not a valid assumption.<<

What in the Wide Wide World of Sports are you talking about? What do mean by smoothed? If you mean filtered to remove extraneous signals and trends then you are insane - and regardless of what you mean, it has nothing to do with #2 – #2 does not even exist if you have a decent sample - because the coefficient estimates approach the true coefficient values – the Central Limit Theorem solves #3 – and #4 is not necessary for consistency – and even if it is wildly-insanely off, you might slightly overfit the model and inflate ‘r’ – and you will see that in the residuals, anyway

======

======

Raisin Caine –

Where do you get the idea that people are running regression on running means?

>>Smoothed data. They are taking a running average of the data, meaning that your error from timepoint n is not independent from your error at timepoint n+1.<<

Yeah, if anyone did that – but no one does. There are numerous mean error measures (e.g., Mean Absolute Deviation).

More importantly: What model is using linear regression or linear anything to project future temperatures. Climate models use nonlinear ordinary and partial differential equations.

>> Further, no one tries to filter out "extraneous signals". <<

Ever heard of autoregression?

>> There is no way of magically filtering out noise<

I don’t know about magic, but there are numerous ways to filter out noise.

High-pass filters, low-pass filters, band-path filters, Chebyshev filters, Fourier analysis, Laplace transforms. Read a book on signal processing.

The goal is to decompose a complex signal into its different frequency components and then remove (fliter) or suppress them.

>>The signal is what they are trying to find,<<

You want to find THE signal from A signal made up of multiple signals.

And, thank you for once again demonstrating that Deniers just make sh-t up and claim that it is real.

======

Raisin Caine –

>>"filter out extraneous signals". <<



What is more critical than this in studying AGW?

Mean global temperature data contain multiple component signals (solar, ENSO, PDO, and other teleconnected cyclonic variables, as well as geophysical variables, and atmospheric CO2). If you want to look at any one of them, you need to control the others (because they are “extraneous”).

Whether or nor you realize it, when Deniers talk about “natural cycles” like PDO, AMO, etc., as driving global temperature they are implicitly, at least, claiming that they have filtered out the other “extraneous” signals.

How can you claim to know what is responsible for the observed temperature data if you do not know what information the data contain?

You have again demonstrated that Deniers do not even understand question and, so, have no solutions to offer. Hence, they simply deny AGW - which has earned them their name - Deniers.

For example, we may hear that a warming trend is statistically significant (or not). What exactly does that mean?