tag:blogger.com,1999:blog-49187699426319514792024-03-08T12:24:59.540-08:00all I knowharyantinovitasarihttp://www.blogger.com/profile/14292624334717648995noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-4918769942631951479.post-65857039259816713782010-12-05T23:33:00.000-08:002010-12-05T23:33:16.258-08:00statistical independenceIn <a href="http://www.fact-index.com/p/pr/probability_theory_2.html" title="Probability theory">probability theory</a>, when we assert that two <a href="http://www.fact-index.com/e/ev/event__probability_theory_.html" title="Event (probability theory)">events</a> are <strong>independent</strong>, we intuitively mean that knowing whether or not one of them occurred makes it neither more probable nor less probable that the other occurred. For example, the events "today is Tuesday" and "it rains today" are independent.<br />
Similarly, when we assert that two <a href="http://www.fact-index.com/r/ra/random_variables.html" title="Random variables">random variables</a> are independent, we intuitively mean that knowing something about the value of one of them does not yield any information about the value of the other. For instance, the height of a person and their IQ are independent random variables. Another typical example of two independent variables is given by repeating an experiment: roll a <a href="http://www.fact-index.com/d/di/dice.html" title="Dice">die</a> twice, let <em>X</em> be the number you get the first time, and <em>Y</em>the number you get the second time. These two variables are independent.<br />
<a href="" name="Independent events"></a><br />
<h3 style="font-size: 18px; margin-bottom: 0px;"><a href="" name="Independent events">Independent events</a></h3><a href="" name="Independent events">We define two events <em>E</em><sub>1</sub> and <em>E</em><sub>2</sub> of a </a><a href="http://www.fact-index.com/p/pr/probability_space.html" title="Probability space">probability space</a> to be <em>independent</em> iff<br />
<dl><dd><em>P</em>(<em>E</em><sub>1</sub> ∩ <em>E</em><sub>2</sub>) = <em>P</em>(<em>E</em><sub>1</sub>) · <em>P</em>(<em>E</em><sub>2</sub>).</dd></dl>Here <em>E</em><sub>1</sub> ∩ <em>E</em><sub>2</sub> (the <a href="http://www.fact-index.com/i/in/intersection__set_theory_.html" title="Intersection (set theory)">intersection</a> of <em>E</em><sub>1</sub> and <em>E</em><sub>2</sub>) is the event that <em>E</em><sub>1</sub> and <em>E</em><sub>2</sub> both occur; <em>P</em> denotes the probability of an event.If P(<em>E</em><sub>2</sub>) ≠ 0, then the independence of <em>E</em><sub>1</sub> and <em>E</em><sub>2</sub> can also be expressed with <a href="http://www.fact-index.com/c/co/conditional_probability_1.html" title="Conditional probability">conditional probabilities</a>:<br />
<dl><dd>P(<em>E</em><sub>1</sub> | <em>E</em><sub>2</sub>) = P(<em>E</em><sub>1</sub>)</dd></dl>which is closer to the intuition given above: the information that <em>E</em><sub>2</sub> happened does not change our estimate of the probability of <em>E</em><sub>1</sub>.If we have more than two events, then pairwise independence is insufficient to capture the intuitive sense of independence. So a set <em>S</em> of events is said to be independent if every finite nonempty subset { <em>E</em><sub>1</sub>, ..., <em>E</em><sub><em>n</em></sub> } of <em>S</em>satisfies<br />
<br />
<dl><dd><em>P</em>(<em>E</em><sub>1</sub> ∩ ... ∩ <em>E</em><sub><em>n</em></sub>) = <em>P</em>(<em>E</em><sub>1</sub>) · ... · <em>P</em>(<em>E</em><sub><em>n</em></sub>).</dd></dl>This is called the <em>multiplication rule</em> for independent events.<a href="" name="Independent random variables"></a><br />
<h3 style="font-size: 18px; margin-bottom: 0px;"><a href="" name="Independent random variables">Independent random variables</a></h3><a href="" name="Independent random variables">We define random variables <em>X</em> and <em>Y</em> to be independent if</a><br />
<a href="" name="Independent random variables"></a><br />
<dl><dd><a href="" name="Independent random variables">Pr[(<em>X</em> in <em>A</em>) & (<em>Y</em> in <em>B</em>)] = Pr[<em>X</em> in <em>A</em>] · Pr[<em>Y</em> in <em>B</em>]</a></dd></dl><a href="" name="Independent random variables">for A and B any </a><a href="http://www.fact-index.com/b/bo/borel_algebra.html" title="Borel algebra">Borel subsets</a> of the <a href="http://www.fact-index.com/r/re/real_number.html" title="Real number">real numbers</a>.If <em>X</em> and <em>Y</em> are independent, then the <a href="http://www.fact-index.com/e/ex/expected_value.html" title="Expected value">expectation operator</a> has the nice property<br />
<dl><dd>E[<em>X</em>· <em>Y</em>] = E[<em>X</em>] · E[<em>Y</em>]</dd></dl>and for the <a href="http://www.fact-index.com/v/va/variance.html" title="Variance">variance</a> we have<br />
<dl><dd>Var(<em>X</em> + <em>Y</em>) = Var(<em>X</em>) + Var(<em>Y</em>).</dd></dl>Furthermore, if <em>X</em> and <em>Y</em> are independent and have <a href="http://www.fact-index.com/p/pr/probability_density_function.html" title="Probability density function">probability densities</a> <em>f</em><sub><em>X</em></sub>(<em>x</em>)and <em>f</em><sub><em>Y</em></sub>(<em>y</em>), then (<em>X</em>,<em>Y</em>) has a joint density of<br />
<dl><dd><em>f</em><sub><em>XY</em></sub>(<em>x</em>,<em>y</em>)d<em>x</em> d<em>y</em> = <em>f</em><sub><em>X</em></sub>(<em>x</em>)d<em>x</em> <em>f</em><sub><em>Y</em></sub>(<em>y</em>)d<em>y</em>.</dd><dd><em>Still need to deal with independence of sets of more than 2 random variables.</em></dd><div><em>
</em></div></dl>haryantinovitasarihttp://www.blogger.com/profile/14292624334717648995noreply@blogger.com4tag:blogger.com,1999:blog-4918769942631951479.post-20736921885111584002010-12-01T20:46:00.000-08:002010-12-01T20:46:41.336-08:00psychological statistic (iNferensial statistic)<div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> With <span style="color: maroon;"><strong style="font-weight: 400;">descriptive statistics</strong></span> we condense a set of known numbers into a few simple values (either numerically or graphically) to simplify an understanding of those data. This is analogous to writing up a summary of a lengthy book. The book summary is a tool for conveying the gist of a story to others, and the mean and standard deviation of a set of numbers is a tool for conveying the gist of the individual numbers (without having to specify each and every one). <span style="color: maroon;"><strong style="font-weight: 400;">Inferential statistics</strong></span>, on the other hand, is used to make claims about the populations that give rise to the data we collect. This requires that we go beyond the data available to us. Consequently, the claims we make about populations are always subject to error; hence the term "inferential statistics" and not deductive statistics.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> Inferential statistics encompasses a variety of procedures to ensure that the inferences are sound and rational, even though they may not always be correct. In short, inferential statistics enables us to make confident decisions in the face of uncertainty.</span></div><div align="center"><center><table bgcolor="#C0C0C0" border="1" cellpadding="2" style="width: 735px;"><tbody>
<tr><td width="723"><div align="center" style="margin-bottom: 0px; margin-top: 0px;"><b><span style="color: maroon; font-family: Arial; font-size: x-small;">At best, we can only be confident in our statistical assertions, but never certain of their accuracy.</span></b></div></td></tr>
</tbody></table></center></div><div style="margin-bottom: 6px; margin-top: 24px;"><a href="" name="SEC2"><span style="font-family: Arial; font-size: x-small;"><strong>Trying to Understand the True State of Affairs</strong></span></a></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> The world just happens to be a certain way, regardless of how we view it. The phrase "true state of affairs" refers to the real nature of any phenomenon of interest. In statistics, the true state of affairs refers to some quantitative property of a population. Numeric properties of populations (such as their means, standard deviations, and sizes) are called <span style="color: maroon;"><strong style="font-weight: 400;">parameters</strong></span>. Samples (or subsets) of populations also have numeric properties, but we call them<span style="color: maroon;"><strong style="font-weight: 400;">statistics</strong></span>. Thus, for the scientist using inferential statistics, population parameters represent the true state of affairs.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> We seldom know the true state of affairs. The process of inferential statistics consists of making use of the data we <i>do</i> have (observed data) to make inferences about population parameters. Unfortunately, the true state of affairs is also dependent on all of the data we <i>don't</i> have (unobserved data). Nevertheless, an important aspect of sample data is that they are actual elements from an underlying population<i>. </i> In this way, sample data are 'representatives' of the population that gave rise to them. This implies that sample data can be used to estimate population parameters.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> However, as sample data are only representatives, they are not expected to be perfect estimators. Consider that we necessarily lose information about a book when we only read a book review. Similarly, we lack information about a population when we only have access to a subset of that population. Remember that the parameters of a population (say, its mean and standard deviation) <i>are based on each and every element in that population</i>. It would be useful to have some measure of how reliable (or representative) our sample data really are. To this end, we must first consider the sampling process itself, and it is in this context that the importance of <span style="color: maroon;"><strong style="font-weight: 400;">probability theory</strong></span> and<span style="color: maroon;"><strong style="font-weight: 400;">random and independent sampling</strong></span> begin to emerge.</span></div><div align="center"><center><table bgcolor="#C0C0C0" border="1" cellpadding="2"><tbody>
<tr><td width="100%"><div align="center" style="margin-bottom: 0px; margin-top: 0px;"><span style="color: maroon; font-family: Arial; font-size: x-small;"><b>In the absence of prior knowledge about the details of some population of interest, sample data serve as our best estimate of that population.</b></span></div></td></tr>
</tbody></table></center></div><div style="margin-bottom: 6px; margin-top: 24px;"><a href="" name="SEC3"><span style="font-family: Arial; font-size: x-small;"><b>True State of Affairs + Chance = Sample Data</b></span></a></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> Some elements (say, 'heights') in a population are more frequent than others. These more frequent elements are thus over-represented in the population compared to less common elements (e.g., the heights of very short and very tall individuals). The laws of chance tell us that it is always possible to randomly select <i>any</i> element in a population, no matter how rare (or under-represented) that element may be in the population. If the element exists, then it can be sampled, plain and simple. However, the laws of probability tell us that rare elements are not expected to be sampled often, given that there are more numerous elements in that same population. It is the more numerous (or more frequent) elements that tend to be sampled each time a random and independent sample is obtained from the population.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> A sample is <span style="color: maroon;">random</span> if all elements in the population are equally eligible to be sampled, meaning that chance, and chance alone, determines which elements are included in the sample. A sample is <span style="color: maroon;">independent</span> if the chances of being sampled are not affected by which elements have already been sampled. To illustrate these two ideas, imagine that you are interested in the average age of all university students in the United States. For convenience sake, you decide to randomly select one student from each class offered at your university this term. With respect to the original population of interest (all university students in the U.S.), your sample is <em>not</em> random, because only students at your university are eligible to be sampled. Your sample is also <em>not</em> independent, because once you select a student from a class, no other student in that class has a chance of being sampled. In this case, any claims you make based on your sample cannot be applied to the population you are really interested in. At best, you are only investigating the population of students at one<i> </i>particular university.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> When the sampling process is truly random and independent, samples are expected to reflect the most representative elements of the underlying population. But rare outcomes do occur (every now and then). A <span style="color: maroon;">rare sample</span>occurs when, just by chance, a relatively large number of the extreme (high <i>or</i> low) elements in the population end up in the sample. In other words, the percentage of extreme values in the sample is higher than the actual percentage in the population, as might be the case if you measured the heights of everyone present in the basketball locker room. Although the heights of basketball players are part of the overall population, they are likely to be over<i>-</i>represented in the sample, in which case the sample mean would not accurately reflect the true state of affairs. Specifically, the sample mean would be biased by the presence of too many heights from "tall" people. </span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> An important consequence of random and independent sampling is that chance factors virtually guarantee that sampled data will vary in their degree of representativeness from sample to sample. Most samples will tend to be good approximations of the underlying population, and a minority of samples will provide misleading accounts of the true state of affairs--just by chance selection. The problem, of course, is that we can never know whether our particular sample is biased by the presence of too many extreme (i.e., rare) elements. But just as you probably don't expect to win the lottery, you should also not expect to be the rare individual who just happens to obtain a rare sample. It is not rational to expect an outcome that has a low probability associated with it. Hence, the logic is to assume that any particular sample mean is typical of the underlying population. This assumption is reasonable<u><em style="font-style: normal;">only</em></u> when the sampling process is random and independent; otherwise, rare samples might artificially occur too often.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> To summarize thus far, the underlying population represents the true state of affairs, which naturally affects the outcome of any particular sample. For instance, if the shortest person in the population is 4' and the tallest person is 8', then it must be the case that the mean of any sample taken from the population will fall within the range of 4 to 8 feet. There are also chance factors operating on the sampling process, which makes it very unlikely that <i>exactly</i>the same elements will be sampled each time. Thus, sample data are expected to vary across repeated sampling. This "sampling error" must be taken into account when making inferences about a population from sample data. </span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b> </b><span style="color: maroon;">Sampling error</span> refers to discrepancies between the statistics of random samples and the true population values; but this "error" is simply due to which elements in the population end up in the sample. In other words, sampling error refers to <i>natural chance factors</i>, not to errors of measurement or errors due to poorly designed and poorly executed experiments. We have control over the latter, but nature imparts a certain degree of unavoidable error.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> To illustrate the idea of sampling error, imagine that we toss a fair coin six times and obtain {HHHHHH}. We expect a fair coin to land heads 50% of the time, so what went wrong? To answer this question, we have to think about the <i>population</i> of outcomes when a fair coin is tossed six times (see Figure 1).</span></div><div align="center" style="margin-bottom: 0px; margin-top: 6px;"><span style="font-family: Arial; font-size: x-small;"><img height="334" hspace="0" src="http://www.sdecnet.com/psychology/fig1-1.jpg" vspace="6" width="336" /></span></div><div align="center" style="margin-bottom: 12px; margin-top: 0px;"><span style="color: maroon; font-family: Arial; font-size: x-small;"><b>Figure 1.</b> Sampling distribution of heads when a fair coin is tossed six times.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> It turns out there are <i><b>N</b> </i>= 64 possibilities, but only 20 contain exactly three heads and three tails. Nonetheless, three heads (in any order) is the most frequent element in this population; it is also the mean. In contrast, there is only one outcome containing exactly six heads, which makes it a rare (but not impossible) event. In fact, Figure 1 allows us to easily calculate the exact probability of {HHHHHH}; it is 1/64 (or .016). Likewise, the probability of three heads is 20/64 (or .313), meaning that we expect to get three heads about 1/3 of the time we toss a fair coin six times. It was because of random sampling that we failed to observe one of these more representative samples, such as {HTHHTT}, <i>not</i> because the mean of the population isn't really 3. Thus, {HHHHHH} is an example of sampling error. It is "error" in the sense that the true population mean is 3 heads, but the sample (i.e., the six tosses) yielded 6 heads, just by chance. If our sampling (coin tossing) process is fair, then we expect this rare event to occur about once every 64 times, on average.</span></div><div align="center"><center><table bgcolor="#C0C0C0" border="1" cellpadding="2"><tbody>
<tr><td><div align="center" style="margin-bottom: 0px; margin-top: 0px;"><b><span style="color: maroon; font-family: Arial; font-size: x-small;">The laws of chance combined with the true state of affairs create a natural force that is always operating on the sampling process. Consequently, the means of different samples taken from the same population are expected to vary around the 'true' mean just by chance.</span></b></div></td></tr>
</tbody></table></center></div><div style="margin-bottom: 6px; margin-top: 24px;"><a href="" name="SEC4"><span style="font-family: Arial; font-size: x-small;"><b>Sampling Distributions</b></span></a></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> A population is the collection of all possible elements that fit into some category of interest, such as "all adults living in the United States." Once we've defined a population, we need to specify <i>with respect to what?</i> For instance, all adults living in the United States <i>with respect to their height</i>. Now the population of interest has shifted from a collection of people to a collection of numbers (heights, in this case). When the elements in the population have been measured or scored in some way, it is possible to talk about <span style="color: maroon;"><strong style="font-weight: 400;">distributions</strong></span>. We can generate a distribution of anything, as long as the elements can take on values. This is precisely what we did in the coin-tossing example. First we obtained a sample of six tosses, and then we scored the sample with respect to the <i>number of heads</i>. If we had done this for all 64 possible samples and then counted the number of times each value (0 through 6) occurred, we would have ended up with the frequency distribution in Figure 1. We could also have calculated the <i>mean</i> number of heads for each sample, in which case the x-axis would have consisted of seven means ranging from 0 (0/6) to 1 (6/6), with 0.5 (3/6) in the middle. This would show more clearly that the probability of heads is 0.5 (or 50%) in the population, regardless of the number of tosses.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> When the distribution of interest consists of all the unique samples of size <b><i>n</i></b> that can be drawn from a population, the resulting distribution of sample <em>means</em> is called the <span style="color: maroon;">sampling distribution of the mean</span>. Thus, a "sampling distribution" in general is a distribution of sampling outcomes, like the one depicted in Figure 1. A sampling distribution of the mean is one particular kind of a sampling distribution, one that is based on sample means. There are also sampling distributions of medians, standard deviations, and any other statistic you can think of.</span></div><div align="center"><center><table bgcolor="#C0C0C0" border="1" cellpadding="2"><tbody>
<tr><td width="100%"><div align="center" style="margin-bottom: 0px; margin-top: 0px;"><b><span style="color: maroon; font-family: Arial; font-size: x-small;">Populations, which are distributions of individual elements, give rise to sampling distributions, which describe how <u>collections</u> of elements are distributed in the population.</span></b></div></td></tr>
</tbody></table></center></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 24px;"><span style="font-family: Arial; font-size: x-small;"> It may be helpful to think of populations has <i>having</i> their own sampling distributions, because we are now making a distinction between two distributions: (a) the distribution of individual elements (the population) and (b) the distribution of all unique samples of a particular size <i>from</i> that population (the sampling distribution). [A sample is unique if no other sample in the distribution contains exactly the same elements.] Before reading on, make certain that you are comfortable with the idea that a <i><strong style="font-weight: 400;">sample</strong></i> of elements can represent a single, unique element in a distribution consisting of many other unique samples (see Table 1).</span></div><div align="center" style="margin-bottom: 3px; margin-top: 12px;"><span style="color: maroon; font-family: Arial; font-size: x-small;"><b>Table 1<i>.</i></b> Basic Properties of Populations, Samples, and Sampling Distributions</span></div><div align="center"><center><table bgcolor="#FFFFFF" border="1" bordercolor="#000000" bordercolordark="#000000" bordercolorlight="#000000" cellpadding="3" cellspacing="3" style="border-collapse: collapse; width: 550px;"><tbody>
<tr><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="150"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>Level</b></span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="200"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>Collection</b></span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="200"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>Elements</b></span></div></td></tr>
<tr><td bgcolor="#FFFFFF" bordercolor="#000000" width="150"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">Population</span></div></td><td bgcolor="#FFFFFF" bordercolor="#000000" width="200"><br />
<dl><dt><span style="font-family: Arial; font-size: x-small;">All individuals<br />
(<b><i>N</i></b> = size of population)</span></dt>
</dl></td><td bgcolor="#FFFFFF" bordercolor="#000000" width="200"><br />
<dl><dt><span style="font-family: Arial; font-size: x-small;">The scores each individual receives on some attribute.</span></dt>
</dl></td></tr>
<tr><td bgcolor="#FFFFFF" bordercolor="#000000" width="150"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">Sample</span></div></td><td bgcolor="#FFFFFF" bordercolor="#000000" width="200"><br />
<dl><dt><span style="font-family: Arial; font-size: x-small;">Subset of individuals from the population.<br />
(<b><i>n</i></b> = size of sample)</span></dt>
</dl></td><td bgcolor="#FFFFFF" bordercolor="#000000" width="200"><br />
<dl><dt><span style="font-family: Arial; font-size: x-small;">The scores each individual in the sample receives on some attribute.</span></dt>
</dl></td></tr>
<tr><td bgcolor="#FFFFFF" bordercolor="#000000" width="150"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">Sampling Distribution</span></div></td><td bgcolor="#FFFFFF" bordercolor="#000000" width="200"><br />
<dl><dt><span style="font-family: Arial; font-size: x-small;">All unique samples of size <b><i>n</i></b>from the population.</span></dt>
</dl></td><td bgcolor="#FFFFFF" bordercolor="#000000" width="200"><br />
<dl><dt><span style="font-family: Arial; font-size: x-small;">The values of a statistic applied to each sample.</span></dt>
</dl></td></tr>
</tbody></table></center></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 12px;"><span style="font-family: Arial; font-size: x-small;"> Why are sampling distributions important in inferential statistics? The answer is simple: because we obtain <i>samples</i> of data when we conduct studies. If we are going to make inferences about populations based on sample data, then we need to understand the sampling properties of those samples. In inferential statistics we make use of two important properties of sampling distributions, better known as the <span style="color: maroon;">central li</span><strong style="font-weight: 400;"><span style="color: maroon;">mit theorem</span>:</strong></span></div><ol><li><div align="justify" style="margin-bottom: 6px; margin-left: 18px; margin-right: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">The mean of all unique samples of size <b><i>n</i></b> (i.e., the average of all the means) is identical to the mean of the population from which those samples are drawn. This is equivalent to saying that the mean of the sampling distribution equals the mean of the original population. Thus, any claims about the mean of the sampling distribution apply to the population mean.</span></div></li>
<li><div align="justify" style="margin-bottom: 6px; margin-left: 18px; margin-right: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">The shape of the sampling distribution increasingly approximates a normal curve as sample size (<b><i>n</i></b>) is increased, even if the original population is not normally distributed. [<strong>Note</strong>--If the original population is itself normally distributed, then the sampling distribution will be normally distributed even when the sample size is only one. <i>Why?</i>] </span></div></li>
</ol><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">Confused? Perhaps if you <i>see</i> these properties you'll understand just how simple they really are. First let's create a small, hypothetical population of numbers:</span></div><div align="center" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><i>Pop</i> = {2, 5, 7, 3, 2}</span></div><div style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">The distribution for our hypothetical population looks like this:</span></div><div align="center" style="margin-bottom: 18px; margin-top: 6px;"><span style="font-family: Arial; font-size: x-small;"><img border="1" height="246" src="http://www.sdecnet.com/psychology/fig1-2.jpg" vspace="6" width="299" /></span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">In this case <b><i>N</i></b> = 5 (because there are five elements in the population), and <b>µ</b> = 3.8 (the mean of the population). Property #1 says that if we gather all the unique samples of a particular size, and then calculate means for each sample, the average of those means will equal the population mean. We'll do this twice, once using <b><i>n</i></b> = 3, and again using <b><i>n</i></b> = 4. Table 2.1 lists all of the unique samples (and their means) that are possible when three elements are sampled at a time.</span></div><div align="center" style="margin-bottom: 3px; margin-top: 12px;"><span style="color: maroon; font-family: Arial; font-size: x-small;"><b>Table 2.1. </b>All unique samples from the hypothetical population when <b><i>n</i></b> = 3.</span></div><div align="center"><center><table bgcolor="#000000" border="0" bordercolor="#111111" bordercolordark="#000000" bordercolorlight="#000000" cellspacing="1" style="border-collapse: collapse; width: 200px;"><tbody>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>Sample</b></span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>Sample Mean</b></span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 2 3}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">2.33</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 2 5}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">3.00</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 3 5}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">3.33</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 3 5}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">3.33</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 2 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">3.67</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 3 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">4.00</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 3 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">4.00</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 5 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">4.67</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 5 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">4.67</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{3 5 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">5.00</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF" nowrap="" width="100"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>Grand Mean</b></span></div></td><td align="center" bgcolor="#FFFFFF" bordercolordark="#FFFFFF" bordercolorlight="#FFFFFF"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>3.80</b></span></div></td></tr>
</tbody></table></center></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 12px;"><span style="font-family: Arial; font-size: x-small;"> At first glance it may appear that the samples in Table 2.1 are not unique because, for example, {2 3 5} has been listed twice. However, remember that there are two 2s in the population; they are different elements that simply share the same value. Thus, Table 2.1 indicates there are 10 unique samples in the sampling distribution when <b><i>n</i></b> = 3. Notice also that the mean of the 10 sample means is 3.8. This is the same value we obtained when we calculated the mean of the five elements in the population (<b>µ</b>). Now consider Table 2.2, which lists all of the unique samples that are possible when sample size is increased to four. The first thing to notice is that the range of sampling outcomes is smaller ( 3.00 to 4.25 instead of 2.33 to 5.00)—there is less variability. Nonetheless, the mean of the sample means is still 3.8.</span></div><div align="center" style="margin-bottom: 3px; margin-top: 12px;"><span style="color: maroon; font-family: Arial; font-size: x-small;"><b>T</b><strong>able 2.2.</strong> All unique samples from the hypothetical population when <b><i>n</i></b> = 4. </span></div><div align="center"><center><table bgcolor="#FFFFFF" border="1" bordercolor="#000000" cellpadding="0" cellspacing="0" style="border-collapse: collapse; width: 200px;"><tbody>
<tr><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><strong><br />
Sample</strong></span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><strong>Sample Mean</strong></span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 2 3 5}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">3.00</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 2 3 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">3.50</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">(2 2 5 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">4.00</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">{2 3 5 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">4.25</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">(2 3 5 7}</span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">4.25</span></div></td></tr>
<tr><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>Grand Mean</b></span></div></td><td align="center" bgcolor="#FFFFFF" bordercolor="#000000" valign="top" width="100%"><div style="margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"><b>3.80</b></span></div></td></tr>
</tbody></table></center></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 12px;"><span style="font-family: Arial; font-size: x-small;"> The central limit theorem also states that the sampling distribution will approximate a normal distribution if sample size is sufficiently large, even if the underlying population is not normally distributed. It is clear that the hypothetical population in our example is not normally distributed, primarily because it is so small. For example, the distribution is not symmetrical around its mean, which is the most salient feature of normal distributions. But compare the shape of the population with the shape of the sampling distribution corresponding to Table 2.1: </span></div><div align="center" style="margin-bottom: 18px; margin-top: 6px;"><span style="font-family: Arial; font-size: x-small;"><img height="251" src="http://www.sdecnet.com/psychology/fig1-3.jpg" vspace="6" width="338" /></span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">These 10 sample means are far from being normally distributed, but we can see hints of a bell curve: The distribution is peaked near the center and shorter at the tails. This distribution is also more symmetrical around its mean compared with the underlying population. If our hypothetical population were somewhat larger (so that more samples could be generated), the sampling distribution would be more normal. Nonetheless, we can still see the effects of the central limit theorem even with this overly-simplified example. Most real-world populations are very large, and so their sampling distributions contain millions of sample combinations and therefore many possible values of a statistic.</span></div><div style="margin-bottom: 6px; margin-top: 24px;"><a href="" name="SEC5"><span style="font-family: Arial; font-size: x-small;"><b>The Standard Error of the Mean: A Measure of Sampling Error</b></span></a></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> Sampling distributions have a standard deviation, which describes the variability of sample means from <i>their</i> mean (which, remember, equals the population mean). There is a different sampling distribution for each value of <b><i>n</i></b>, for two reasons. First, as illustrated above, the <i>number</i> of unique samples that can be drawn from a population depends on the size of those samples. In other words, sample size determines how many elements (sample means) are in the sampling distribution to begin with. Second, as sample size increases, the <i>variability</i> among all possible sample means decreases. This must be the case, because if all the elements in the original population are sampled (i.e., if <b><i>n</i></b> = <b><i>N</i></b>), then there is only one possible sample that can be obtained (the sample <i>is</i> the population) and the variability of a single number is zero. Thus, sample size determines both the size and the variability of a sampling distribution (compare Tables 2.1 and 2.2).</span></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> The standard deviation of a sampling distribution of means is given a special name: <span style="color: maroon;">standard error of the mean</span> (abbreviated as SEM). It may not be obvious, but the SEM is a measure of sampling error because it describes the variability among all possible means that could be sampled in an experiment. [<em>Recall that the elements of interest are now sample <u>means</u>, not the individual scores within a sample or population</em>.] Simply put, the degree of variability in the sampling distribution bears directly on the degree to which observed results (sample means) are expected to vary just by chance. If there is a lot of variability in the sampling distribution (as is the case when the distribution consists of <em>small samples</em>), then sample means can vary greatly. On the other hand, if there is little variability in the sampling distribution (as is the case when the distribution consists of <em>large samples</em>), then sample means will tend to be very similar, and very close to the true population mean.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> At this point we can begin to address the question raised earlier, namely <i>How can we know whether our sample is representative of the underlying population?</i> Obviously it is important to avoid small samples, as there are more extreme (i.e., rare) sample means in the sampling distribution--and we are more likely to get one of them in an experiment. Thus, we can increase our confidence in a particular sample (as being representative of the population) by increasing the number of elements included in the sample. The means of large samples tend to cluster tightly around the true population mean. Consequently, rare samples (whose means are very different from the true population mean) are less common in the sampling distribution and therefore less likely to arise just by chance. Notice that by choosing a sample size we are also determining which sampling distribution our sample will come from. Ideally, we always want to sample from the distribution with the least variability, because less variability translates into <i>more reliability!</i></span></div><div align="center"><center><table bgcolor="#C0C0C0" border="1" cellpadding="2"><tbody>
<tr><td width="100%"><div align="center" style="margin-bottom: 0px; margin-top: 0px;"><b><span style="color: maroon; font-family: Arial; font-size: x-small;">We have some control over sampling error because sample size determines the standard error (variability) in a sampling distribution.</span></b></div></td></tr>
</tbody></table></center></div><div style="margin-bottom: 6px; margin-top: 24px;"><a href="" name="SEC6"><span style="font-family: Arial; font-size: x-small;"><b>Theoretical Sampling Distributions as Statistical Models of the True State of Affairs</b></span></a></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> Unless the details of a population are known in advance, it is not possible to describe any of its sampling distributions. We would have to first measure all the elements in the population, in which case we could simply calculate the desired parameter, and then there would be no point in collecting samples. For this reason, a variety of idealized, theoretical sampling distributions have been described mathematically. The <span style="color: maroon;"><strong style="font-weight: 400;">Student-t distribution</strong></span>, for instance, is a standardized version of a theoretical sampling distribution, meaning that it can be used as a statistical <em>model</em> for many of the real sampling distributions of interest to behavioral scientists. The reason for using theoretical sampling distributions is to obtain the likelihood (or probability) of sampling a particular mean if the mean of the sampling distribution (and hence the mean of the original population) is some particular value. In practice, the population parameter must first be hypothesized, as the true state of affairs is generally unknown. This is called the <span style="color: maroon;">null hypothesis</span>.</span></div><div align="justify" style="line-height: 24px; margin-bottom: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> In the coin-tossing example we were able to deduce the sampling distribution shown in Figure 1. It too is theoretical because we constructed it without tossing a single coin! This underscores an important point, namely that many of the populations and sampling distributions addressed in statistics are abstract; they exist in a mathematical sense. </span></div><div align="center"><center><table bgcolor="#C0C0C0" border="1" cellpadding="2"><tbody>
<tr><td width="100%"><div align="center" style="margin-bottom: 0px; margin-top: 0px;"><b><span style="color: maroon; font-family: Arial; font-size: x-small;">Theoretical sampling distributions have been generated so that researchers can estimate the probability of obtaining various sample means from a pre-specified population (real or hypothetical).</span></b></div></td></tr>
</tbody></table></center></div><div style="margin-bottom: 6px; margin-top: 24px;"><a href="" name="SEC7"><span style="font-family: Arial; font-size: x-small;"><b>Making Formal Inferences about Populations: Preview to Hypothesis Testing</b></span></a></div><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> When there are many elements in the sampling distribution, it is always possible to obtain a rare sample (i.e., one whose mean is very different from the true population mean). The probability of such an outcome occurring just by chance is determined by the particular sampling distribution specified in the null hypothesis (in much the same way that Figure 1 provided us with the probability of tossing 6 heads). When the probability (<i>P</i>) of the observed sample mean occurring by chance is really low (typically less than one in 20, e.g., <i>P</i> < .05), the researcher has an important decision to make regarding the hypothesized true state of affairs. One of two inferences can be made: </span></div><ol><li><div align="justify" style="margin-bottom: 6px; margin-left: 18px; margin-right: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">The hypothesized value of the population mean is correct and a rare outcome has occurred just by chance (as in the coin-tossing example).</span></div></li>
<li><div align="justify" style="margin-bottom: 6px; margin-left: 18px; margin-right: 24px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">The true population mean is probably some other value that is more consistent with the observed data. Reject the null hypothesis in favor of some alternative hypothesis. </span></div></li>
</ol><div align="justify" style="line-height: 24px; margin-bottom: 6px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;">The rational decision is to assume #2, because the observed data (which represent direct, albeit partial, evidence of the true state of affairs), are just too unlikely if the hypothesized population is true. Thus, rather than accept the possibility that a rare event has taken place, the statistician chooses the more likely possibility that the hypothesized sampling distribution is <u>wrong</u>. However, rare samples do occur, which is why statistical inference is always subject to error. Indeed, even when observed data are consistent with a hypothesized population, they are also consistent with many<i> </i>other hypothesized populations. It is for this reason that the hypothesized value of a population parameter can never be proved nor disproved from sample data. We use inferential statistics to make tentative assertions about population parameters that are most consistent with the observed data. Actually, inferential statistics only helps us to <i>rule out</i> values; it doesn't tell us what the population parameters are. We have to infer the values, based on what they are likely not to be.</span></div><div align="left" style="line-height: 24px; margin-bottom: 0px; margin-top: 0px;"><span style="font-family: Arial; font-size: x-small;"> Only in the natural sciences does evidence contrary to a hypothesis lead to rejection of that hypothesis without error. In statistical reasoning there is also rejection (inference #2), but with the possibility that a rare sample has occurred by chance (sampling error). This is the nature of making inferences based on random sampling.</span></div><div align="left" style="margin-bottom: 0px; margin-left: 12px; margin-top: 21px;"><span style="color: maroon;">The proper APA citation for this article is:</span></div>haryantinovitasarihttp://www.blogger.com/profile/14292624334717648995noreply@blogger.com4