- statistics is the study of how to collect, organize, analyze and interpret numerical information and data
- nominal - can't be ordered; ordinal - can be ordered but the differences may not be meaningful
- population parameter vs sample statistics
- sampling frame, sampling error (sample of mean, not that of the population)
- random sampling, stratified sampling (cluster then simple random sampling), systematic sampling (ordering then pick the kth), cluster sampling (in a particular geographical situation, pick cluster/area then all the people in that area), convenience sampling and multi-stage sampling
- steps of sampling
- state a hypothesis
- identify the individuals of interest
- specify the variables to measure
- determine to use the entire population or sampling, and sampling method
- ethical concerns before data collection, eg. privacy
- collect the data
- use descriptive or inferential statistics to answer your hypothesis
- note concerns of the data collection, analysis, and future recommendation
- randomization
- is used to assign individuals to treatment groups, helps prevent bias in selecting members of each group
- the placebo effect, psychologically affecting
- frequency histogram, distribution (stem and leaf)
- histogram reveals the distribution of the data; relative frequency (%) histogram can include the various samples on the same chart
- normal, left skewed (where it's light, so here the left is light), right skewed, uniform, bimodal distribution (meaning two high points, like camel with two humps)
- outlier, very different from the other measurements
- cumulative frequency, ogive
- time series
- same variable over time
- bar charts
- can be vertical or horizontal
- pareto charts, frequency of events and in decreasing order
- pie charts
- circle graphs
- mutually exclusively, only fallen into one category
- frequency class
- the same width, the width can be using empirically
- median and mean (5% trimmed mean)
- central tendency, mean, mode, median positions can tell normal vs left vs right-skewed distribution
- variation from the central tendency, evaluate your dataset
- the range is the distance between the top (max) and bottom (min)
- CV, coefficient of variation
- standard deviation / mean of sample * 100 %
- compare two different ways of measuring a lab value
- measures the spread of the data over the average of the data
- the higher, the less stable, moves around a lot
- Chebyshev and intervals
- take 2s to the average, can cover 75% of x are between that
- 3s, 88.9%
- 4s, 93.8%
- percentiles
- boxer and whisker plots
- r correlation
- 0.4 week correlation
- 0.4-0.7 moderate correlation
- 0.7 strong correlation
- coefficient of determination is r squared
- to normal distribution, 68%, 95%, 99.7%
- z score
Probability
- P (A or B) = P(A) + P(B) - P(A&B)
No comments:
Post a Comment