Saturday, June 22, 2019

|| Saturday afternoon statistics

54. Saturday afternoon statistics


  • statistics is the study of how to collect, organize, analyze and interpret numerical information and data
  • nominal - can't be ordered; ordinal - can be ordered but the differences may not be meaningful
  • population parameter vs sample statistics
  • sampling frame, sampling error (sample of mean, not that of the population)
  • random sampling, stratified sampling (cluster then simple random sampling), systematic sampling (ordering then pick the kth), cluster sampling (in a particular geographical situation, pick cluster/area then all the people in that area), convenience sampling and multi-stage sampling
  • steps of sampling
    • state a hypothesis
    • identify the individuals of interest
    • specify the variables to measure
    • determine to use the entire population or sampling, and sampling method
    • ethical concerns before data collection, eg. privacy
    • collect the data
    • use descriptive or inferential statistics to answer your hypothesis
    • note concerns of the data collection, analysis, and future recommendation
  • randomization
    • is used to assign individuals to treatment groups, helps prevent bias in selecting members of each group
    • the placebo effect, psychologically affecting
  • frequency histogram, distribution (stem and leaf)
    • histogram reveals the distribution of the data; relative frequency (%) histogram can include the various samples on the same chart
    • normal, left skewed (where it's light, so here the left is light), right skewed, uniform, bimodal distribution (meaning two high points, like camel with two humps)
    • outlier, very different from the other measurements
    • cumulative frequency, ogive
  • time series
    • same variable over time
  • bar charts
    • can be vertical or horizontal
    • pareto charts, frequency of events and in decreasing order
  • pie charts
    • circle graphs
    • mutually exclusively, only fallen into one category
  • frequency class
    • the same width, the width can be using empirically
  • median and mean (5% trimmed mean)
  • central tendency, mean, mode, median positions can tell normal vs left vs right-skewed distribution
  • variation from the central tendency, evaluate your dataset
    • the range is the distance between the top (max) and bottom (min)
  • CV, coefficient of variation
    • standard deviation / mean of sample * 100 %
    • compare two different ways of measuring a lab value
    • measures the spread of the data over the average of the data
    • the higher, the less stable, moves around a lot
  • Chebyshev and intervals
    • take 2s to the average, can cover 75% of x are between that
    • 3s, 88.9%
    • 4s, 93.8%
  • percentiles
    • boxer and whisker plots
  • r correlation
    • 0.4 week correlation
    • 0.4-0.7 moderate correlation
    • 0.7 strong correlation
  • coefficient of determination is r squared
  • to normal distribution, 68%, 95%, 99.7%
  • z score
Probability

  • P (A or B) = P(A) + P(B) - P(A&B)



No comments: