Statistics, trees in the forest, and straights & bisexuals (etc) in the population...
Babyruthiezhubby,
babyruthiezhubby said:
Cool... it's good to see, at long last, I've established some modest credibility with you.
babyruthiezhubby said:
Baker's numbers do seem more believable to me, but in any case, they're still an estimate that can't be proven or disproven.
Actually, statistical methods for making reliable estimates are well established. Lots of people use them every day. Those who are knowledgeable about estimation, however, rarely (if ever) claim the quantities they estimate are "exact." Rather, the goal is to estimate the quantity, whatever it might be, to within an uncertainty range that is known and that establishes the estimate as sufficiently accurate for the required purpose.
Whether a given estimation method does or does not have the required accuracy can usually be established in a straightforward way. Consider the following example. Suppose you're a forester, and you want to know the percentage of each tree species in an area of forest that has multiple tree species (fir, hemlock, spruce, aspen, birch, etc).
You also want to verify the accuracy of your estimation method, because you work for a logging co. and you know getting it wrong will have possibly-severe economic consequences. You might proceed as follows.
1. Define a sub-area of forest small enough to count the trees.
2. Within your subarea, define a smaller subarea where you don't count the trees.
3. On foot, walk through your subarea (but not your smaller subarea), counting and tabulating the number of trees of each species. You find (let's say) that douglas fir are 20%, hemlocks are 25%, spruce are 30%, aspen are 20%, and birch are 5% of all trees.
4. "Using statistics," you estimate the uncertainty of your tree percentage breakdown, when applied to an area of uncounted trees, will be +/- 3%. That is, the percentage of douglas fir will be (20 +/- 0.6)% or somewhere between 19.4% and 20.6%, the percentage of spruce trees will be (30 +/- 0.9)% or somewhere between 29.1% and 30.9%, etc.
5. Now, again on foot, you count and tabulate the number of each tree species throughout your smaller previously-uncounted subarea, which is your test area. You find the percentage of douglas fir is 19.6%, the percentage of spruce is 29.3%, etc. These are within your estimated uncertainty ranges for those species. If the percentage of each tree species in your previously-uncounted test area falls within the uncertainty range of your estimate, you conclude your estimation method has passed the test. It's as accurate as you thought it would be, and (presumably) accurate enough for your purposes.
6. For some tree species, however, you might find the numbers in your previously-uncounted test area fall outside your estimated uncertainty ranges for those species. You would then adjust your uncertainty ranges upward... say, to +/- 4 or 5% for those species.
7. Now, you apply your method to the entire forest that's of interest to you. It's so large you can't possibly walk through it counting all the tree species, but you feel confident your estimates will be sufficiently accurate... where by "sufficiently accurate," I mean the percentage of each tree species will fall within your estimated uncertainty range... because you tested your method and *proved* its accuracy to your satisfaction.
Finally, suppose in addition you know your employer the logging company will break even if douglas fir —a high-value tree — constitute at least 17% of the tree species throughout the forest. Any higher percentage will result in a profit; any lower percentage will result in a loss... AND, more than likely, you will be fired.
This would be an example of a statistical estimation problem where economic consequences ride on the results, so there's strong motivation for using a method that's mathematically valid. It can't consist of just picking some numbers out of the sky. (In real life, the estimation method would likely be considerably more sophisticated than this simple example.)
To conclude this shaggy-dog example: the problem of estimating the percentages of straight, bisexual, lesbian, and gay people in the population is similar (in principal), although it's complicated by most people not being rooted in place... they move around. But, once you get the idea, one can see it isn't impossible.
—Custer