Sample means

The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables.

The sample mean is the average value (or mean value) of a sample of numbers taken from a larger population of numbers, where "population" indicates not number of people but the entirety of relevant data, whether collected or not. A sample of 40 companies' sales from the Fortune 500 might be used for convenience instead of looking at the population, all 500 companies' sales. The sample mean is used as an estimator for the population mean, the average value in the entire population, where the estimate is more likely to be close to the population mean if the sample is large and representative. The reliability of the sample mean is estimated using the standard error, which in turn is calculated using the variance of the sample. If the sample is random, the standard error falls with the size of the sample and the sample mean's distribution approaches the normal distribution as the sample size increases.

The term "sample mean" can also be used to refer to a vector of average values when the statistician is looking at the values of several variables in the sample, e.g. the sales, profits, and employees of a sample of Fortune 500 companies. In this case, there is not just a sample variance for each variable but a sample variance-covariance matrix (or simply covariance matrix) showing also the relationship between each pair of variables. This would be a 3×3 matrix when 3 variables are being considered. The sample covariance is useful in judging the reliability of the sample means as estimators and is also useful as an estimate of the population covariance matrix.

Due to their ease of calculation and other desirable characteristics, the sample mean and sample covariance are widely used in statistics to represent the location and dispersion of the distribution of values in the sample, and to estimate the values for the population.

Definition of the sample mean

The sample mean is the average of the values of a variable in a sample, which is the sum of those values divided by the number of values. Using mathematical notation, if a sample of N observations on variable X is taken from the population, the sample mean is:

{\bar {X}}={\frac {1}{N}}\sum _{i=1}^{N}X_{i}.

Under this definition, if the sample (1, 4, 1) is taken from the population (1,1,3,4,0,2,1,0), then the sample mean is ${\bar {x}}=(1+4+1)/3=2$ , as compared to the population mean of $\mu =(1+1+3+4+0+2+1+0)/8=12/8=1.5$ . Even if a sample is random, it is rarely perfectly representative, and other samples would have other sample means even if the samples were all from the same population. The sample (2, 1, 0), for example, would have a sample mean of 1.

If the statistician is interested in K variables rather than one, each observation having a value for each of those K variables, the overall sample mean consists of K sample means for individual variables. Let $x_{ij}$ be the i^th independently drawn observation (i=1,...,N) on the j^th random variable (j=1,...,K). These observations can be arranged into N column vectors, each with K entries, with the K×1 column vector giving the i-th observations of all variables being denoted $\mathbf {x} _{i}$ (i=1,...,N).

The sample mean vector $\mathbf {\bar {x}}$ is a column vector whose j-th element ${\bar {x}}_{j}$ is the average value of the N observations of the j^th variable:

{\bar {x}}_{j}={\frac {1}{N}}\sum _{i=1}^{N}x_{ij},\quad j=1,\ldots ,K.

Thus, the sample mean vector contains the average of the observations for each variable, and is written

\mathbf {\bar {x}} ={\frac {1}{N}}\sum _{i=1}^{N}\mathbf {x} _{i}={\begin{bmatrix}{\bar {x}}_{1}\\\vdots \\{\bar {x}}_{j}\\\vdots \\{\bar {x}}_{K}\end{bmatrix}}

Definition of sample covariance

The sample covariance matrix is a K-by-K matrix $\textstyle \mathbf {Q} =\left$ with entries

q_{jk}={\frac {1}{N-1}}\sum _{i=1}^{N}\left(x_{ij}-{\bar {x}}_{j}\right)\left(x_{ik}-{\bar {x}}_{k}\right),

where $q_{jk}$ is an estimate of the covariance between the $j$ ^th variable and the $k$ ^th variable of the population underlying the data. In terms of the observation vectors, the sample covariance is

\mathbf {Q} ={1 \over {N-1}}\sum _{i=1}^{N}(\mathbf {x} _{i}.-\mathbf {\bar {x}} )(\mathbf {x} _{i}.-\mathbf {\bar {x}} )^{\mathrm {T} },

Alternatively, arranging the observation vectors as the columns of a matrix, so that

\mathbf {F} ={\begin{bmatrix}\mathbf {x} _{1}&\mathbf {x} _{2}&\dots &\mathbf {x} _{N}\end{bmatrix}}

,

which is a matrix of K rows and N columns. Here, the sample covariance matrix can be computed as

\mathbf {Q} ={\frac {1}{N-1}}(\mathbf {F} -\mathbf {\bar {x}} \,\mathbf {1} _{N}^{\mathrm {T} })(\mathbf {F} -\mathbf {\bar {x}} \,\mathbf {1} _{N}^{\mathrm {T} })^{\mathrm {T} }

,

where $\mathbf {1} _{N}$ is an N by $1$ vector of ones. If the observations are arranged as rows instead of columns, so $\mathbf {\bar {x}}$ is now a 1×K row vector and $\mathbf {M} =\mathbf {F} ^{\mathrm {T} }$

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.