Any statisticians out there in TubeNet land?

Be kind. No government, state, or local politics allowed. Admin has final decision for any/all removed posts.
Forum rules
Be kind. No government, state, or local politics allowed. Admin has final decision for any/all removed posts.
Post Reply
User avatar
SRanney
3 valves
3 valves
Posts: 362
Joined: Sun May 15, 2005 6:49 pm
Location: Bozeman, MT

Any statisticians out there in TubeNet land?

Post by SRanney »

(After several fruitless days of searching for a plain-speak (read: not statistics jargon) answer, I come asking the TubeNet Freak Jury: are any of you statisticians?)

I have data from some experimental work I did this summer that looks like:
level %surv
3 0.686440678
3 0.125
3 0.873873874
4 0.461538462
4 0.569105691
4 0.85046729
6 0.549450549
6 0.49382716
6 0.123076923
6 0.720930233
6 0.22972973
6 0.93
6 0.448275862
6 0.0
6 0.601851852
where % surv = the observed proportion of larval fish that survived in replicates at each level (the parameter is dissolved oxygen at mg/L). I can calculate "normal" means and 95% confidence intervals by
mean + or - 1.96*(SD)
but in many cases, that gives me an upper CI bound > 1.0. Unfortunately, % survival > 1.0 is impossible. Is there a published method that constrains confidence intervals to some a priori bounds? I could easily just chop off the CI bands at 0 and 1, but I'm certain that isn't the same.

Any help (or citations) anyone could provide would be appreciated. I've waded through several statistical texts (including Krebs, Zar, Sokal & Rohlf) and suffered through many papers to no avail. Perhaps I've overlooked something?

Thanks -

Steven
User avatar
SRanney
3 valves
3 valves
Posts: 362
Joined: Sun May 15, 2005 6:49 pm
Location: Bozeman, MT

Re: Any statisticians out there in TubeNet land?

Post by SRanney »

the elephant wrote:Good luck, sir.
You never know. I've always been amazed at the expertise that exists here on TubeNet. I just might get lucky. Otherwise, I'll continue to dig into the stats literature.
User avatar
Donn
6 valves
6 valves
Posts: 5977
Joined: Fri Aug 19, 2005 3:58 pm
Location: Seattle, ☯

Re: Any statisticians out there in TubeNet land?

Post by Donn »

I don't have the knowledge of statistics to be anything but a Tubenet commentator, but ...

Consider that the survival percentage is really a measure of the number of surviving individuals at a certain point. If you had extended the observation period, the survival percentage would be lower, etc. The `population' as it were, the individual larvae each have an actual survival span that is shorter, or potentially longer, than the observation period, and might in the simplest case follow a normal distribution. I have enjoyed quite a variety of alcoholic beverages this evening and become confused when I try to contemplate the relationship between this distribution, and the set of survival percentages you're working with, but at any rate I think it is fairly reasonable for an interval statistical measure based on normal distributions, to find an upper bound survival percentage larger than 1.0, when that 1.0 is just about some arbitrary sampling moment. My facile explanation doesn't account for a lower bound less than 0.0 - I wouldn't be certain you can't just chop it off, but that seems like such a common constraint of natural distributions, that there may be some more suitable statistical measures for that kind of data?
User avatar
bort
6 valves
6 valves
Posts: 11223
Joined: Wed Sep 22, 2004 11:08 pm
Location: Minneapolis, Minnesota

Re: Any statisticians out there in TubeNet land?

Post by bort »

I have a BS in Mathematics, but took exactly zero Statistics classes. Not sure how that happened. Wish I could help!
TubaRay
6 valves
6 valves
Posts: 4109
Joined: Mon Mar 22, 2004 4:24 pm
Location: San Antonio, Texas
Contact:

Re: Any statisticians out there in TubeNet land?

Post by TubaRay »

Since I have 4 sem. hrs. of statistics, I believe I qualify as an expert. Here is my expert opinion: 76.3947% of all statistics are totally fabricated.

I'm just sayin'....
Ray Grim
The TubaMeisters
San Antonio, Tx.
rocksanddirt
4 valves
4 valves
Posts: 552
Joined: Sat Mar 01, 2008 10:14 pm

Re: Any statisticians out there in TubeNet land?

Post by rocksanddirt »

Well....not an expert, but do have some familiarity with stats.

I think what you want is a confidence interval of the mean, yes?

that is a bit more complicated than the formula you posted. If you have/use MS Excel, there is a 'statistics' extentsion package to the normal formula's present, you can use that get better formulas.
User avatar
SRanney
3 valves
3 valves
Posts: 362
Joined: Sun May 15, 2005 6:49 pm
Location: Bozeman, MT

Re: Any statisticians out there in TubeNet land?

Post by SRanney »

rocksanddirt wrote:Well....not an expert, but do have some familiarity with stats.

I think what you want is a confidence interval of the mean, yes?

that is a bit more complicated than the formula you posted. If you have/use MS Excel, there is a 'statistics' extentsion package to the normal formula's present, you can use that get better formulas.
Yes, a bounded (by 0 and 1) confidence interval of the mean, by level. While I'm not a statistics expert either, unless we're talking about two different things, for a normal distribution, calculating a confidence interval of the mean is as simple as the formula I provided. The confidence interval of a proportion is quite different, but as I'm calculating a mean of proportions, I think this is close to the formula I need.

I'm familiar with Excel as an analysis tool and have several additional "add-ins" added in (along with PopTools and the Analyais ToolPak), but none help me calculate a bounded CI on the mean. Any chance you could be more specific with the "statistics" package for Excel that you've mentioned? Excel generally has pretty good documentation/citations for their formulae. For statistical analysis, I generally use R.

Ultimately, these data will be used to test for differences in the means. The simplest way to test for differences would be using some form of the linear model (either regression or ANOVA), but now I'm thinking of pooling all data and eventually using a Z-test or Fischer's Exact test to determine the statistical significance level (at alpha = 0.05/# of paired comparisons) between each means.

Thanks -

Steven
tbn.al
6 valves
6 valves
Posts: 3004
Joined: Thu Apr 21, 2005 6:00 pm
Location: Atlanta, Ga

Re: Any statisticians out there in TubeNet land?

Post by tbn.al »

If I ever questioned your intelligence, I apologize. I have absolutely no clue as to what you are even asking, but I thought I might quiz our resident clarinetist and actuary, Jim Brooks. He declined to make an attempt but referred it to his son who works with a bunch of stat guys. For what it is worth here are two possible answers:

In a survey of two of the statisticians here, one says to just truncate the confidence intervals at 1. For something more fancy, the other
recommends:

1) Construct a confidence interval for b = log(p/(1-p)), where p is the observed proportion. In jargon, you are constructing a confidence interval for the log odds.

2) Let (b_1, b_2) be the confidence interval for the log odds.
Transform your confidence interval back to a probability interval with (exp(b_1)/(1+exp(b_1)), exp(b_2)/(1+exp(b_2))). These limits will always be between 0 and 1.
I am fortunate to have a great job that feeds my family well, but music feeds my soul.
User avatar
Rick Denney
Resident Genius
Posts: 6650
Joined: Mon Mar 22, 2004 1:18 am
Contact:

Re: Any statisticians out there in TubeNet land?

Post by Rick Denney »

Have you tested your data to be sure that it is normally distributed? If one population can never be greater than another, then the two distributions are not independent and the difference between them cannot be normally distributed. And if those differences are not normally distributed, then you can't use confidence intervals based on standard deviation, because standard deviation assumes normally distributed data.

Instead of comparing percentage survival, try comparing raw survival numbers. So, instead of writing it down as 0.90, try writing it a 900 survivors out of 1000. Then, do your statistics on the 900 number. You'll be comparing your before distribution to your after distribution, and for that I would recommend a non-parametric test like chi-square rather than a parametric test like mean analysis.

I run into it in traffic analysis when characterizing the time headway from one car to the next leaving a queue when the light turns green. That headway can never be less than about a quarter of a second because it if was the vehicles would be touching, and two objects can't occupy the same space at the same time. So I don't assume they are normally distributed. I end up with a distribution curve that is not symmetrical (as the normal distribution is--the classic symmetrical bell curve). My "bell" is tilted--skewed--to one side. The best distribution for that data turns out to be a negative exponential distribution that is shifted by the mean.

You could also compare the distributions directly without characterizing them by just comparing their discrete shapes. Chi-square is one way, Kolmogorov-Smirnov is another, and both of these non-parametric tests are in any statistics book. They keep you from having to characterize parameters such as the mean.

I don't know your data, and my knowledge of statistics is rather limited to my own domain, so that's about as far as I can go. Just remember that the tails of the normal distribution are asymptotic, and thus those tails can be infinitely high and low at small enough likelihoods. That's why it should only be used to compare means of independent populations.

Rick "thinking there might also be a failure of heteroskedasticity here, which also undermines the assumption of normal distribution, but you didn't want stat-speak--the log conversion suggested above addresses that possible issue, but not my issue" Denney
User avatar
SRanney
3 valves
3 valves
Posts: 362
Joined: Sun May 15, 2005 6:49 pm
Location: Bozeman, MT

Re: Any statisticians out there in TubeNet land?

Post by SRanney »

Al and Rick -

Thanks for your thoughts. Regarding construction of a bounded CI of the mean, I'm thinking that either A) truncation is/will be fine, no matter how much I don't think it is, or B) pooling my replicate data and constructing a CI on the percent survival of the pooled data by level. Al, I considered the first option you provided which is similar to my B above. The second I hadn't considered, but will look into. It looks like a logit transformation that I'm familiar with, but will require more reading. However, it all may be moot based upon some other discussions I've had recently. Rick, regarding normality, when dealing with proportions, unless the data are skewed toward zero or one, the assumption is that no data transformation is necessary and parametric statistics can be used.

However, that said, in consultation with a fishing acquaintance of mine (who also happens to be a professor of biostatistics at a well known university in Atlanta), I think what I'm going to do analyze the data categorically (dead vs. alive) in each of three categories (i.e., "level) rather than equal-interval continuously (percentage/proportion). Analysis, then--rather than t-test/ANOVA/regression--will be Chi^2 or Fisher's Exact Test (which is very similar to Rick's suggestion). The argument he offered was that as soon as I moved away from the number of larvae dead vs. the number of larvae alive in a given replicate/treatment combination, I transformed my data to meet the preconceptions that most experimenters have. (In reality, 90% of the stats that fisheries biologists use are t-test/ANOVA/regression. As a result, we conceptualize our experiments to fit that simple statistics mold.) His suggestion then, allows me to treat my data as binomial (dead or alive) instead of continuous.
Rick Denney wrote:Rick "... but you didn't want stat-speak-..." Denney
Stat-speak I don't mind. I find it difficult wading through much of the symbolic language used in statistics journals. That statistics language was what I was trying to avoid...

Thanks for your thoughts!

Steven
User avatar
Rick Denney
Resident Genius
Posts: 6650
Joined: Mon Mar 22, 2004 1:18 am
Contact:

Re: Any statisticians out there in TubeNet land?

Post by Rick Denney »

SRanney wrote:However, that said, in consultation with a fishing acquaintance of mine (who also happens to be a professor of biostatistics at a well known university in Atlanta), I think what I'm going to do analyze the data categorically (dead vs. alive) in each of three categories (i.e., "level) rather than equal-interval continuously (percentage/proportion). Analysis, then--rather than t-test/ANOVA/regression--will be Chi^2 or Fisher's Exact Test (which is very similar to Rick's suggestion). The argument he offered was that as soon as I moved away from the number of larvae dead vs. the number of larvae alive in a given replicate/treatment combination, I transformed my data to meet the preconceptions that most experimenters have. (In reality, 90% of the stats that fisheries biologists use are t-test/ANOVA/regression. As a result, we conceptualize our experiments to fit that simple statistics mold.) His suggestion then, allows me to treat my data as binomial (dead or alive) instead of continuous.
His suggestion is not just similar to what I said, it's exactly what I said. My only addition was that the transformation might undermine your assumptions of normal distribution. The transformation is dividing one set of data (survivors) into another (total population). Both have to be normally distributed for the result to be normally distributed, which is why you claim you can make the assumption without a skew to zero or one, which it won't be if both are normally distributed. But my point is that there is another test, too, and that's that the two sets of data have to be independent in addition to both being normal. Since number of survivors can never exceed total, they are not independent. That is not the same problem as a lack of heteroskedasticity, which is the problem exposed by being skewed.

Rick "who feels so validated" Denney
User avatar
Uncle Buck
5 valves
5 valves
Posts: 1243
Joined: Fri Aug 27, 2004 3:45 pm
Location: Salt Lake City, Utah
Contact:

Re: Any statisticians out there in TubeNet land?

Post by Uncle Buck »

4, 8, 15, 16, 23, 42
User avatar
elimia
3 valves
3 valves
Posts: 359
Joined: Wed Apr 21, 2004 9:30 pm
Location: Hermitage, Tennessee

Re: Any statisticians out there in TubeNet land?

Post by elimia »

The binary comparison is a good tip, it simplifies things. I too work in natural sciences (freshwater mussels and fishes) and would agree that most of what we use are simple linear models, so I tend to jump to T-test, regression, best of fit models. A tip a professor once told me, with continuous data, is to look at the mean vs median. If they are close, you can probably bet they are normally distributed. I think Mantel test is handy at looking at group normality.

A stats book I REALLY like is 'Analysis of Ecological Communities' by McCune and Grace. It is more geared to multivariate stats (species space vs environmental space concepts) but has some excellent writing on stats that is understandable. For questions like these, another book is by Robert Stoecker (sp?) - 'environmental analysis for non-scientists' or something like that. I am a scientist but it certainly helps refresh many of the concepts that I don't use everyday.

Stats are tough, it is good to have someone to consult with. Unless you are at a university or work for USGS, it is rare to find statisticians on staff.
Post Reply