STAT 330

Assignment 7: SAS Assignment 1

In this document I begin by showing you examples of 1 sample and two sample procedures using SAS. Then I have several data sets which I describe and expect you to analyze. You must use SAS. I want you to hand in: copies of the SAS commands which you submit, the SAS output you get and a short (two or three sentences) summary of the practical conclusions. Uninterpreted computer output cannot get more than 25% of the possible marks. (At the the same time without the SAS input and output you won't get anything.)

One sample tests and confidence intervals

The data for this example are taken from question 42 in chapter 7 which you should see for an explanation of the setting. I ran the following SAS code which is in the file g:asbestos.sas which is called Macintosh HD:Student Folder:asbestos.sas on the Macs. (In order to duplicate this analysis on a MAC you must make a copies of the files CLASS:STAT:330:asbestos.sas and CLASS:STAT:330:asbestos.dat in the folder Macintosh HD:Student Folder or on your own floppy disk; the MAC version of SAS cannot read files from the folder CLASS:STAT:330 where the instructor normally puts files for students to use.)

  options pagesize=60 linesize=80;
  data asbestos;
  infile 'g:asbestos.dat';
  [infile 'Macintosh HD:Student Folder:asbestos.dat'; on the Macs]
  input comply;
  complyd=comply-200;
  proc means mean std stderr t prt maxdec=2;
  run;

The words mean, std, stderr, t, and prt after means in the proc means statement request the computation of the the sample mean, the sample standard deviation, the standard error of the mean, the value of the t statistic for testing the hypothesis of 0 mean and the two sided P-value for a t-test of that null hypothesis. The expression maxdec=2 limits the printout to 2 decimal places for means and such.

The output from proc means is

                        The SAS System                                9
                                       12:47 Thursday, October 12, 1995

 Variable        Mean       Std Dev     Std Error          T  Prob>|T|
 ----------------------------------------------------------------------
 COMPLY        209.75        24.16         6.04         34.73    0.0001
 COMPLYD         9.75        24.16         6.04          1.61    0.1273
 ----------------------------------------------------------------------
Notice that the second line tests the hypothesis that the mean of COMPLY is actually 200. The two sided P value is about 13% indicating that this there is only very weak evidence against this null. To compute a 95% confidence interval take . I don't know if I can get SAS to actually do this little piece of arithmetic easily.

Two sample tests and confidence intervals

The data for the question about Michelson's measurements of the speed of light from Assignment 4 are in the file g:michlson.dat which is called CLASS:STAT:330:michlson.dat on the Macs and I use proc ttest to test for no change in mean.

  options pagesize=60 linesize=80;
  data michlson;
  infile 'g:michlson.dat';
  [ infile 'Macintosh HD:Student Folder:michlson.dat'; on the Macs]
  input set $ speed ;
  proc sort data=michlson;
   by set;
  proc ttest cochran;
   class set;
  proc univariate plot normal;
   by set;
  run;

The output is

                                 The SAS System                                1
                                                  14:31 Monday, October 16, 1995

                                TTEST PROCEDURE

Variable: SPEED

SET          N         Mean      Std Dev    Std Error      Minimum      Maximum
-------------------------------------------------------------------------------
First       20  909.0000000  104.9260391  23.46217561  650.0000000  1070.000000
Second      20  831.5000000   54.2193401  12.12381302  740.0000000   950.000000

Variances        T    Method              DF    Prob>|T|
--------------------------------------------------------
Unequal     2.9346    Satterthwaite     28.5      0.0065
                      Cochran           19.0      0.0085
Equal       2.9346                      38.0      0.0056

For H0: Variances are equal, F' = 3.75    DF = (19,19)    Prob>F' = 0.0060
Notice that the two means are clearly different and that the two variances are also clearly different. The ``Unequal'' line reports on tests which try to adjust for unequal variances; Satterthwaite is the technique mentioned in previous solution sets. You have to do your own arithmetic to get confidence intervals. The output of proc univariate is:
                                 The SAS System                                1
                                               10:11 Wednesday, October 25, 1995

---------------------------------- SET=First -----------------------------------

                              Univariate Procedure

Variable=SPEED

                                    Moments

                    N                20  Sum Wgts         20
                    Mean            909  Sum           18180
                    Std Dev     104.926  Variance   11009.47
                    Skewness   -0.96461  Kurtosis   0.573188
                    USS        16734800  CSS          209180
                    CV         11.54302  Std Mean   23.46218
                    T:Mean=0   38.74321  Pr>|T|       0.0001
                    Num ^= 0         20  Num > 0          20
                    M(Sign)          10  Pr>=|M|      0.0001
                    Sgn Rank        105  Pr>=|S|      0.0001
                    W:Normal   0.920264  Pr<W         0.1059


                                Quantiles(Def=5)

                     100% Max      1070       99%      1070
                      75% Q3        980       95%      1035
                      50% Med       940       90%      1000
                      25% Q1        850       10%       750
                       0% Min       650        5%       695
                                               1%       650
                     Range          420
                     Q3-Q1          130
                     Mode           980


                                    Extremes

                       Lowest    Obs     Highest    Obs
                          650(      14)      980(      12)
                          740(       2)     1000(      11)
                          760(      15)     1000(      17)
                          810(      16)     1000(      18)
                          850(       6)     1070(       4)


                Stem Leaf                     #             Boxplot
                  10 7                        1                |
                  10 000                      3                |
                   9 566888                   6             +-----+
                   9 033                      3             *--+--*
                   8 558                      3             +-----+
                   8 1                        1                |
                   7 6                        1                |
                   7 4                        1                |
                   6 5                        1                0
                     ----+----+----+----+
                 Multiply Stem.Leaf by 10**+2

                                 The SAS System                                2
                                               10:11 Wednesday, October 25, 1995

---------------------------------- SET=First -----------------------------------

                              Univariate Procedure

Variable=SPEED

                                 Normal Probability Plot
              1075+                                       +++++*
                  |                                  *+*++*
                  |                          ** *++*+
                  |                      ** ++++
               875+                  **+*+++
                  |               +*+++
                  |          ++++*
                  |      ++++ *
               675+ +++++*
                   +----+----+----+----+----+----+----+----+----+----+
                       -2        -1         0        +1        +2


                                 The SAS System                                3
                                               10:11 Wednesday, October 25, 1995

---------------------------------- SET=Second ----------------------------------

                              Univariate Procedure

Variable=SPEED

                                    Moments

                    N                20  Sum Wgts         20
                    Mean          831.5  Sum           16630
                    Std Dev    54.21934  Variance   2939.737
                    Skewness   0.692545  Kurtosis   0.328607
                    USS        13883700  CSS           55855
                    CV         6.520666  Std Mean   12.12381
                    T:Mean=0   68.58403  Pr>|T|       0.0001
                    Num ^= 0         20  Num > 0          20
                    M(Sign)          10  Pr>=|M|      0.0001
                    Sgn Rank        105  Pr>=|S|      0.0001
                    W:Normal   0.934107  Pr<W         0.1953


                                Quantiles(Def=5)

                     100% Max       950       99%       950
                      75% Q3        870       95%       945
                      50% Med       810       90%       915
                      25% Q1        805       10%       770
                       0% Min       740        5%       750
                                               1%       740
                     Range          210
                     Q3-Q1           65
                     Mode           810


                                    Extremes

                       Lowest    Obs     Highest    Obs
                          740(      14)      870(      12)
                          760(       5)      870(      20)
                          780(       3)      890(       1)
                          790(       7)      940(      16)
                          800(      18)      950(      17)


                Stem Leaf                     #             Boxplot
                   9 5                        1                |
                   9 4                        1                |
                   8 57779                    5             +-----+
                   8 011111124                9             *--+--*
                   7 689                      3                |
                   7 4                        1                |
                     ----+----+----+----+
                 Multiply Stem.Leaf by 10**+2


                                 The SAS System                                4
                                               10:11 Wednesday, October 25, 1995

---------------------------------- SET=Second ----------------------------------

                              Univariate Procedure

Variable=SPEED

                                 Normal Probability Plot
               975+                                            *  ++++
                  |                                      +*+++++++
                  |                             *+*++*+*+
                  |                  **++*+*++**
                  |          +*++*+*+++
               725+ +++++*+++
                   +----+----+----+----+----+----+----+----+----+----+
                       -2        -1         0        +1        +2


                                 The SAS System                                5
                                               10:11 Wednesday, October 25, 1995

                              Univariate Procedure
                                Schematic Plots

Variable=SPEED

                          |
                     1100 +
                          |
                          |            |
                          |            |
                     1050 +            |
                          |            |
                          |            |
                          |            |
                     1000 +            |
                          |            |
                          |         +-----+
                          |         |     |
                      950 +         |     |        |
                          |         *-----*        |
                          |         |     |        |
                          |         |  +  |        |
                      900 +         |     |        |
                          |         |     |        |
                          |         |     |     +-----+
                          |         |     |     |     |
                      850 +         +-----+     |     |
                          |            |        |  +  |
                          |            |        |     |
                          |            |        *-----*
                      800 +            |        +-----+
                          |            |           |
                          |            |           |
                          |            |           |
                      750 +            |           |
                          |            |           |
                          |
                          |
                      700 +
                          |
                          |
                          |
                      650 +            0
                           ------------+-----------+-----------
                      SET             First      Second

You will see that the normal probability plots are reasonably straight but basically horrible to look at; other packages produce better graphs easily.

Two sample paired comparisons

You do this with proc means:

  options pagesize=60 linesize=80;
  data michpair;
  infile 'g:michpair.dat';
  [infile 'Macintosh HD:Student Folder:michpair.dat'; on the Macs]
  input speed1 speed2 ;
    diff=speed1-speed2
  proc means mean std stderr t prt maxdec=2;
  proc univariate plot normal;
   var speed1 diff;
  run;
The output is
                                 The SAS System                                2
                                                  14:31 Monday, October 16, 1995

   Variable          Mean       Std Dev     Std Error             T  Prob>|T|
   --------------------------------------------------------------------------
   SPEED1          909.00        104.93         23.46         38.74    0.0001
   SPEED2          831.50         54.22         12.12         68.58    0.0001
   DIFF             77.50        109.78         24.55          3.16    0.0052
   --------------------------------------------------------------------------

Only the third line actually matters.

Your Assignment

  1. The file g:glucose.dat (or CLASS:STAT:330:glucose.dat on the Macs) contains blood glucose levels for 52 women after their first pregnancy and then their second. The following SAS commands read the file and print out the data set.
      options pagesize=60 linesize=80;
      data glucose;
        infile 'g:glucose.dat';
        [infile 'Macintosh HD:Student Folder:glucose.dat'; on the MAC]
        input frstpreg scndpreg ;
      proc print;
      run;
    

    1. Get 95% confidence intervals for first pregnancy mean, second pregnancy mean and difference in means.

    2. Is there a difference in blood glucose levels between the two pregnancies?

    3. Does the population look reasonably normal?

  2. For the body fat data in the introductory handout on SAS do men and women have different average percent body fat? Do they have different population standard deviations? Are the normality assumptions adequate? (The data are in g:bodyfat.dat or, on the MACS CLASS:STAT:330:bodyfat.dat.)

  3. In the file g:iris.dat or, on the MACS, CLASS:STAT:330:iris.dat are the measurements of 4 dimensions on each of 50 flowers of 2 species of iris. Read them with input species $ sepallen; --- the file has 3 other columns which are ignored by this command. Do Versicolor and Virginica Irises have different average sepal lengths?


NOTE: If you are working on the MACS you must copy the files from the folder CLASS:STAT:330 to the folder Macintosh HD:Student Folder or to a floppy before you can use them with SAS.

DUE: Wednesday 6 November 1996


Solutions


Richard Lockhart
Tuesday October 29 1996