Elementary Statistics
 |Sofia Home | Content Gallery |
Home
Syllabus
Schedule
Lessons
Assignments
Exams
Resources
Calculator

Project 1 | Project 2 | Project 3

Project 1

Bivariate Data, Linear Regression and Univariate Data

Objectives:
  • Sample bivariate data.
  • Fit the data to a linear model.
  • Determine appropriateness of linear fit.
  • Employ sampling techniques.
  • Analyze and graph univariate data.
Instructions:
  • As you complete each task below, check it off.
  • Answer all questions in your introduction or summary.
    Check your course calendar for intermediate and final due dates.
  • Graphs may be constructed by hand or by computer, unless your instructor informs you otherwise. All graphs must be neat and accurate.
  • All other responses must be done on the computer.
  • Neatness and quality of explanations are used to determine your final grade.

Part I - Bivariate Data

Introduction

______ State the bivariate data you are going to study. (Here are two examples, but you may NOT use them: height vs. weight, age vs. running distance.)
______ Describe how you are going to collect the data (for instance, collect data from the web, survey students on campus).
______ Describe your sampling technique in detail. Use cluster, stratified, systematic, or simple random sampling (using a random number generator) sampling. Convenience sampling is NOT acceptable.
______ Conduct your survey. Your number of pairs must be at least 30.
______ Print out a copy of your data.

Analysis

______ On a separate sheet of paper, construct a scatter plot of the data. Label and scale both axes.
______ State the least squares line and the correlation coefficient.
______ On your scatter plot, in a different color, construct the least squares line.
______ Is the correlation coefficient significant? Explain and show how you determined this.
______ Is the relationship a positive or negative one? Explain how you know.
______ Interpret the slope of the linear regression line in the context of the data in your project. Relate the explanation to your data, and quantify what the slope tells you.
______ Does the regression line seem to fit the data? Why or why not? If the data does not seem to be linear, explain if any other model seems to fit the data better.
______ Are there any outliers? If so, what are they? Show your work in how you used the potential outlier formula in lesson 12 (since you have bivariate data) to determine whether or not any pairs might be outliers.


Part II -- Univariate Data

In this section, you will use the data for ONE variable only. Pick the variable that is more interesting to analyze.

Example: If your independent variable is sequential data such as year with 30 years and one piece of data per year, your x-values might be 1971, 1972, 1973, 1974, …, 2000. This would not be interesting to analyze. In that case, choose to use the dependent variable to analyze for this part of the project.

______ Summarize your data in a chart with columns showing data value, frequency, relative frequency, and cumulative relative frequency.
______ Answer the following, rounded to 2 decimal places.
  sample mean = ________ sample standard deviation = __________
  first quartile = _________ third quartile = _____________
  median = ___________ 70th percentile = _____________
  value that is 2 standard deviations above the mean = _____________
  value that is 1.5 standard deviations below the mean = _____________
______

Construct a histogram displaying your data. Group your data into 6 – 10 intervals of equal width. Pick regularly spaced intervals that make sense in relation to your data.

Example: Use age groups 19.5-24.5, 24.5-29.5, . . . or 19.5-29.5, 29.5-39.5, 39.5-49.5. DO NOT group data by age as 20-26,27-33,34-40,41-47,48-54,55-61.

_______ In complete sentences, describe the shape of your histogram.
______ Are there any potential outliers? Which values are they? Show your work and calculations as to how you used the potential outlier formula in chapter 2 (since you are now using univariate data) to determine which values might be outliers.
______ Construct a box plot of your data.
______ Does the middle 50% of your data appear to be concentrated together or spread out? Explain how you determined this.
______ Looking at both the histogram AND the box plot, discuss the distribution of your data. For example: how does the spread of the middle 50% of your data compare to the spread of the rest of the data represented in the box plot; how does this correspond to your description of the shape of the histogram; how does the graphical display show any outliers you may have found; does the histogram show any gaps in the data that are not visible in the box plot; are there any interesting features of your data that you should point out?
Due Dates:


Part I, Intro: __________ (Keep a copy for your records.)
Part I, Analysis: __________ (Keep a copy for your records.)

Entire Project, typed and stapled: __________

______ 1. Cover sheet: names, class time, and name of your study.
______ 2. Part I: label the sections “Intro” and “Analysis.”
______ 3. Part II:
______ 4. Summary page containing several paragraphs written in complete sentences describing the experiment, including what you studied and how you collected your data. The summary page should also include answers to ALL the questions asked above.
______ 5. All graphs requested in the project
______ 6. All calculations requested to support questions in data.
______ 7. Description: what you learned by doing this project, what challenges you had, how you overcame the challenges?


Include answers to ALL questions asked, even if not explicitly repeated in items 1- 7 above.

 

Back to Top

Project 1 | Project 2 | Project 3

Content Developed by Susan Dean and Barbara Illowsky, Licensed under a Creative Commons License
Published by the Sofia Open Content Initiative
© 2004 Foothill-De Anza Community College District & The William and Flora Hewlett Foundation