1 ! 500.18 - STAT11 - STEPWI STEP-WISE REGRESSION MODULE RELEASED FOR SUBMISSION TO THE DECUS LIBRARY BY THE DEC ENGINEERING SYSTEMS GROUP AND THE EDUCATION PRODUCTS GROUP SEPTEMBER, 1977 2 ! COPYRIGHT (C) 1973, DIGITAL EQUIPMENT CORPORATION, MAYNARD, MASSACHUSETTS 3 ! THIS SOFTWARE IS FURNISHED TO PURCHASER UNDER A LICENSE FOR USE ON A SINGLE COMPUTER SYSTEM AND CAN BE COPIED (WITH INCLUSION OF DEC'S COPYRIGHT NOTICE) ONLY FOR USE IN SUCH SYSTEM, EXCEPT AS MAY OTHERWISE BE PROVIDED IN WRITING BY DEC. 4 ! THE INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE AND SHOULD NOT BE CONSTRUED AS A COMMITMENT BY DIGITAL EQUIPMENT CORPORATION. 5 ! DEC ASSUMES NO RESPONSIBILITY FOR USE OR RELIABILITY OF ITS SOFTWARE ON EQUIPMENT WHICH IS NOT SUPPLIED BY DEC. 6 ! THIS MODULE PERFORMS LINEAR STEP-WISE AND/OR MULTIPLE REGRESSION ANALYSIS. 7 ! AUTHOR: MICHAEL D. KNAUER VERSION NUMBER: 001 DATE: OCTOBER, 1973 8 ! MODIFICATIONS: MAY, 1975 MODIFIED TO ACCEPT MISSING DATA MODIFIED TO ALLOW THE USER TO PRINT AND/OR SAVE THE REDUCED DATA MATRIX 00009! MODIFICATIONS: JUNE, 1976 CTRL/C TRAP ADDED BY ARDOTH HASSLER WILSON CENTRAL STATE UNIVERSITY EDMOND, OKLAHOMA 10 ! CALLING ARGUMENTS 11 ! 1) VARIABLE NAME: F3$ RANGE OF VALUES: S00000.RWM - S99999.RWM USE: THIS IS THE 250 ROW BY 15 COLUMN VIRTUAL MATRIX USED BY ALL STAT11 MODULES 13 ! 2) VARIABLE NAME: N RANGE OF VALUES: 1 - 250 USE: CONTAINS THE NUMBER OF ROWS OF ACTUAL DATA IN THE VIRTUAL MATRIX FILE F3$ 15 ! 3) VARIABLE NAME: M RANGE OF VALUES: 1 - 15 USE: CONTAINS THE NUMBER OF COLUMNS OF ACTUAL DATA IN THE VIRTUAL MATRIX FILE F3$ 20 ! RETURNING ARGUMENTS NONE -- THIS MODULE DOES NOT ALTER OR ADD TO ANY OF THE FILES OR VARIABLES PASSED TO IT. 30 ! DESCRIPTION OF FUNCTION STEPWI PERFORMS LINEAR STEP-WISE AND/OR MULTIPLE REGRESSION ANALYSIS. 31 ! FIRST, A MATRIX OF THOSE SUBJECTS HAVING ALL DATA POINTS IS BUILT (SXXXXX.REG) THEN A SQUARE MATRIX OF THE CROSS-PRODUCTS OF DEVIATIONS FROM MEANS FOR ALL VARIABLES IS COMPUTED, USING SXXXXX.REG AS INPUT 32 ! ALL OF THE REGRESSION CALCULATIONS USE THIS CROSS-PRODUCT MATRIX RATHER THAN THE INPUT DATA MATRIX. 33 ! THE ONLY OUTPUT FROM STEPWI IS WHAT IS PRINTED ON THE TERMINAL. 34 ! THIS OUTPUT INCLUDES THE RESULTS OF THE REGRESSION CALCULATIONS AND QUESTIONS TO THE USER REGARDING WHICH OPTIONS ARE DESIRED. 35 ! AFTER THE REGRESSION IS FINISHED, STEPWI ALWAYS RETURNS (CHAINS) TO THE STATCM MODULE. 699 ! THE FIRST SECTION OF CODE RETRIEVES THE CALLING ARGUMENTS FROM COMMON, OPENS THE INPUT DATA FILE, AND INITIALIZES SEVERAL VARIABLES. THE "GOSUB 6000" INSTRUCTION INITIALIZES A CHARACTER STRING ARRAY. 700 ON ERROR GOTO 9000: GOSUB 10000: H9=.9E-38: GOSUB 6000 800 PRINT "STEP-WISE REGRESSION":PRINT 810 PRINT "ALL DATA POINTS MUST EXIST FOR THIS ANALYSIS. IF THERE": PRINT "IS A VALUE MISSING (COLUMN) FOR A SUBJECT (ROW), THAT": PRINT "SUBJECT (ROW) WILL BE DROPPED FROM THE ANALYSIS": PRINT: PRINT 900 F$=SYS(CHR$(7%)): N=VAL(MID(F$,46%,5%)): M=VAL(MID(F$,51%,5%)): F3$=MID(F$,31%,15%) 905 A9$=MID(F$,62%,9%): A9$=LEFT(A9$,INSTR(5%,A9$,"]")) 910 OPEN F3$ FOR INPUT AS FILE 1%: DIM #1%, Q(250%,15%) 960 DIM C%(15%) 1000 DIM Z(15,15),A(15),L%(15),B(15),P%(15) 1120 INPUT "DO YOU WANT TO SEE THE LAST STEP ONLY";A$: IF LEFT(A$,1%)="Y" THEN A%=1% ELSE A%=0% 1123 M%=M: N%=N: B%=0%: C%=0% 1125 L%(J%)=1% FOR J%=1% TO M% 1130 F4$=LEFT(F3$,6%)+".REG": OPEN F4$ AS FILE 2%: DIM #2%, X(250%,15%) 1530 ! IN THIS SECTION, THE USER IS ASKED TO ENTER THE NUMBER OF THE DEPENDENT VARIABLE AND THE NUMBERS OF THE VARIABLES, IF ANY, WHICH HE WANTS TO OMIT FROM THE REGRESSION. 1540 N4=0: PRINT: ON ERROR GOTO 6009 1560 PRINT "TYPE THE NUMBER OF THE COLUMN CORRESPONDING TO THE": INPUT "DEPENDENT VARIABLE (Y)";N1% 1590 IF N1%>M% OR N1%<1% THEN 1630 ELSE 1670 1630 PRINT USING A1$(1%),N1%: GOTO 1580 1670 PRINT: PRINT A1$(8%) 1680 INPUT "DO YOU WISH TO OMIT A VARIABLE FROM THE ANALYSIS";A$ 1685 IF LEFT(A$,1%)="Y" GOTO 1780 1690 IF LEFT(A$,1%)="N" OR A$="" GOTO 2070 1730 PRINT: PRINT "IF YOU DO NOT WISH TO ENTER CERTAIN VARIABLES": PRINT "IN THE REGRESSION, YOU CAN STOP THEM FROM ENTERING": PRINT "THE PRESENT ANALYSIS": GOTO 1680 1780 PRINT "WHEN REQUESTED, ENTER THE COLUMN CORRESPONDING TO THE": PRINT "VARIABLE TO BE OMITTED. TYPE 0 TO TERMINATE THE REQUESTS": PRINT: P1=0: I9%=9% 1820 INPUT "ENTER A VARIABLE TO BE OMITTED";D1 1840 IF D1 = 0 THEN 1970 1850 IF D1=N1% THEN 1950 1860 IF D1 > M THEN 1930 1870 IF D1<1 THEN 1930 1880 IF INT(D1) < D1 THEN 1930 1885 IF L%(D1) < 0 THEN 1820 1890 P1 = P1 + 1 : N4 = N4 + 1 : L%(D1) = -1 : GOTO 1820 1930 PRINT "IMPOSSIBLE--VARIABLE DOES NOT EXIST. TRY AGAIN": GOTO 1820 1950 PRINT "OH NO! YOU ARE TRYING TO OMIT Y. TRY AGAIN.": GOTO 1820 1970 IF P1=0 THEN 2030 1980 IF P1>=M-1 THEN 2000 1990 GO TO 2070 2000 L%(J%)=1% FOR J%=1% TO M% 2030 PRINT: PRINT "YOU HAVE ASKED TO OMIT EITHER ALL OR NONE OF THE VARIABLES": N4=0: GOTO 1680 2070 M1=M-N4: PRINT: PRINT: V1=1: Q2=0: R3=0: N3=M1-1 2080 I9%=0%: ON ERROR GOTO 9000 2200 ! THIS SECTION EXTRACTS THOSE SUBJECTS WHO HAVE ALL OF THE DESIRED VARIABLES FOR THE ANALYSIS AND STORES THEM IN Q. ALSO, MEANS ARE COMPUTED. 2210 K%=0%: L%=0%: MAT A=ZER 2220 FOR I%=1% TO N% 2230 FOR J%=1% TO M%: IF L%(J%)=-1% GOTO 2240 2235 IF Q(I%,J%)=H9 GOTO 2290 2240 NEXT J%: K%=K%+1%: L%=0% 2250 FOR J%=1% TO M%: IF L%(J%)=-1% OR J%=N1% GOTO 2280 2260 L%=L%+1%: X(K%,L%)=Q(I%,J%) 2270 A(L%)=A(L%)+X(K%,L%) 2280 NEXT J%: L%=L%+1%: X(K%,L%)=Q(I%,N1%): A(L%)=A(L%)+X(K%,L%) 2290 NEXT I% 2300 IF K%<=1% OR K%=R5 THEN 3190 3170 LET Q1=R5 3180 LET N5%=J% 3190 NEXT J% 3195 IF Q1<>0 GOTO 3200 ! IF Q1=0 THEN NO MORE STEPS ARE PERFORMED SINCE NO OTHER VARIABLES WILL CONTRIBUTE ANY MORE TO THE REDUCTION OF VARIANCE OF Y. 3197 IF A% THEN S1=S1-1: N3=S1:B%=1%: GOTO 3200 3198 PRINT:PRINT A1$(0%): GOTO 5500 3199 ! THIS SECTION COMPUTES AND PRINTS STATISTICS FOR THIS STEP AND ASKS THE USER IF HE WANTS TO ENTER THE CURRENTLY PICKED VARIABLE IN THE REGRESSION. IF HE DOESN'T WANT TO, THEN THE REGRESSION IS ENDED AT THIS POINT 3200 L%(N5%) = 0 : R5 = Q1/Z1 : IF A% THEN 3280 3210 PRINT "VARIABLE SELECTED IS ... X";C%(N5%) 3230 PRINT USING A1$( 2 ),Q1 ! OLD IMAGE LINE #3240 3260 PRINT USING A1$( 3 ),R5 ! OLD IMAGE LINE #3270 3280 LET Q2=Q2+Q1 3290 LET R3=R3+R5 3294 IF A% THEN IF R3>=.99994 THEN C%=1%: N3=S1 3300 REM PARTIAL F NEXT 3310 J%=N9%-S1-1 3320 LET R4=(Z1-Q2)/J% 3329 IF R4<=0 THEN T1=9999.999: GOTO 3340 3330 LET T1=Q1/R4 3332 IF T1<=9999.9999 THEN 3340 3334 T1=9999.9999 3340 IF T1<0 THEN T1=0 3342 IF A% THEN 3360 ELSE PRINT USING A1$(4%),N9%-2%,T1 3360 IF S1=1 THEN3520 3370 V1=S1-1 3380 IF A% THEN 3520 ELSE PRINT 3385 IF S1=2 THEN PRINT A1$(8%) 3390 INPUT "DO YOU WISH TO ENTER THIS VARIABLE IN THE REGRESSION";A$ 3400 IF LEFT(A$,1%)="Y" GOTO 3520 3410 IF LEFT(A$,1%)="N" OR A$="" GOTO 5500 3440 PRINT 3450 PRINT "IN A GIVEN STEP, THE VARIABLE THAT REDUCES THE LARGEST" 3460 PRINT "AMOUNT OF SUM OF SQUARES IS SELECTED. IF THE REDUCTION " 3470 PRINT "INDICATED BY THE ABOVE 3 LINES IS SIGNIFICANT, ENTER THIS" 3480 PRINT "VARIABLE IN THE REGRESSION; OTHERWISE, SELECTION OF " 3490 PRINT "VARIABLES WILL TERMINATE." 3500 GO TO 3380 3520 LET M1=S1 3525 V1=S1 3530 P%(S1-1) = N5% 3535 IF A% THEN IF S1 < N3 THEN 3630 3540 PRINT 3550 PRINT USING A1$( 5 ),Q2 ! OLD IMAGE LINE #3560 3570 PRINT USING A1$( 6 ),R3,Z1 ! OLD IMAGE LINE #3580 3590 LET R5=SQR(R3) 3600 PRINT 3610 PRINT USING A1$( 7 ),R5 ! OLD IMAGE LINE #3620 3630 LET R5=S1 3639 IF R4<=0 THEN R5=9999.999: R4=0: GOTO 3645 3640 LET R5=(Q2/R5)/R4 3642 IF R5 <= 9999.9999 THEN 3645 3644 R5 = 9999.9999 3645 S5 = SQR(R4) : IF A% THEN IF S1 < N3 THEN 3700 3650 A$="F FOR ANALYSIS OF VAR. (D.F. = ## , ###) ####.####": PRINT USING A$,S1,J%,R5 3680 A$ = "STANDARD ERROR OF ESTIMATE ............... ####.####": PRINT USING A$,S5 3694 IF B% THEN GOTO 3945 3700 R4 = Z(N5%,N5%) ! THIS CODE PERFORMS A REDUCTION ON THE Z MATRIX -- SOMETHING THAT IS DONE IN EACH STEP OF THE REGRESSION. 3710 FOR J%=1% TO L% 3720 IF L%(J%)>0 GOTO 3760 3740 IF J%=N5% THEN 3780 3750 Z(J%,J%) = Z(J%,J%) + Z(N5%,J%)*Z(N5%,J%)/R4 3760 Z(N5%,J%) = Z(N5%,J%)/R4 3770 GOTO 3790 3780 Z(N5%,N5%) = 1/R4 3790 NEXT J% ! THIS SECTION COMPUTES THE REGRESSION COEFFICIENTS FOR ALL VARIABLES ENTERED INTO THE REGRESSION SO FAR. 3810 B(S1-1)=Z(N5%,L%) 3820 IF S1=1 THEN 3940 3840 FOR J% = 2 TO S1 3850 LET J3%=S1-J% 3860 LET J4%=P%(J3%) 3870 B(J3%)=Z(J4%,L%) 3880 FOR K% = 1 TO J% - 1 3890 K1% = S1 - K% 3900 K2% = P%(K1%) 3910 LET B(J3%)=B(J3%)-Z(J4%,K2%)*B(K1%) 3920 NEXT K% 3930 NEXT J% ! HERE A TABLE IS PRINTED OF CURRENT STATISTICS FOR THE VARIABLES ENTERED IN THE REGRESSION SO FAR. ALSO, THE Y-AXIS INTERCEPT(AS OF THIS STEP) IS COMPUTED AND PRINTED. 3940 IF A% THEN IF S1 < N3 THEN 3970 3945 PRINT 3950 PRINT "VARIABLE REG. COEFF. STD. ERR-COEFF."; 3960 PRINT " COMPUTED T" 3970 B1=A(L%) 3980 FOR J% = 0 TO S1 - 1 3990 LET K3%=P%(J%) 4000 LET Q1=S5*SQR(Z(K3%,K3%)) 4005 IF Q1=0 THEN T1=10000: GOTO 4020 4010 LET T1=B(J%)/Q1 4020 LET B1=B1-B(J%)*A(K3%) 4025 IF A% THEN IF S1 < N3 THEN 4040 4030 A$=" ## ####.### ###.#### ####.####" 4035 PRINT USING A$,C%(K3%),B(J%),Q1,T1 4040 NEXT J% : IF A% THEN IF S1 < N3 THEN 4080 4050 PRINT 4060 A$="INTERCEPT AFTER STEP ## IS ####.####": PRINT USING A$,S1,B1 4070 ! THIS CODE PERFORMS SOME MORE REDUCTION ON THE Z MATRIX -- SOMETHING THAT IS DONE IN EACH STEP OF THE REGRESSION. 4080 FOR J%=1% TO L% 4090 IF L%(J%)<=0% THEN 4160 4100 FOR K%=1% TO L% 4120 IF K%=N5% THEN 4140 4130 LET Z(J%,K%)=Z(J%,K%)-Z(J%,N5%)*Z(N5%,K%) 4140 NEXT K% 4150 LET Z(J%,N5%)=Z(J%,N5%)/(-R4) 4160 NEXT J% ! THE NEXT LINE TRANSFERS OUT OF A MULTIPLE REGRESSION IF SWITCHES HAVE BEEN SET SAYING THAT THE REGRESSION IS AS COMPLETE AS IT'S GOING TO BE. 4169 IF B% OR C% THEN PRINT: PRINT A1$(0%): GOTO 5500 4170 NEXT S1 4185 GO TO 5500 4190 PRINT ! IF NO MORE REGRESSION IS DESIRED, THEN THIS SECTION CHAINS TO STATCM 4200 IF F9%=0% GOTO 4400 4205 F9%=0%: INPUT "DO YOU WISH TO COMPUTE MORE REGRESSION";A$ 4210 IF LEFT(A$,1%)="Y" GOTO 1120 4215 IF LEFT(A$,1%)="N" OR A$="" GOTO 4300 4220 PRINT "PLEASE ANSWER EITHER YES OR NO": GOTO 4200 4300 R$=SYS(CHR$(8%)+F$): CLOSE 1%,2%: KILL F4$: CHAIN "STATCM"+A9$ 4399 ! THIS SECTION GIVES THE USER THE OPTION TO SAVE OR PRINT THE DATA USED IN THIS ANALYSIS 4400 INPUT "DO YOU WISH TO PRINT THE DATA USED IN THIS ANALYSIS";A$: IF LEFT(A$,1%)<>"Y" GOTO 4500 4405 PRINT: PRINT "THE DEPENDENT VARIABLE WILL BE PRINTED LAST": PRINT 4410 C8%=1%: C9%=M9% 4420 IF C9%-C8%>5% THEN C9%=C8%+5% 4430 PRINT: PRINT "ROW";TAB(37%);"COLUMN": PRINT 4435 I%=10% 4440 FOR K%=C8% TO C9%: PRINT TAB(I%);C%(K%);: I%=I%+11%: NEXT K%: PRINT 4450 FOR I%=1% TO N9%: PRINT I%;TAB(5%); 4460 FOR J%=C8% TO C9%: PRINT USING " #####.####",X(I%,J%);: NEXT J%: PRINT 4470 NEXT I%: PRINT: PRINT 4480 IF M9%=C9% GOTO 4500 4490 C8%=C9%+1%: C9%=M9%: GOTO 4420 4500 INPUT "DO YOU WISH TO SAVE THE DATA USED IN THIS ANALYSIS";A$: IF LEFT(A$,1%)<>"Y" GOTO 4205 4510 INPUT "PLEASE TYPE IN A NEW OUTPUT FILE NAME";A$: IF LEFT(A$,3%)="KB:" OR LEFT(A$,1%)="*" THEN PRINT "STAT11 CANNOT SAVE YOUR DATA ON THE KEYBOARD": GOTO 4510 4520 ON ERROR GOTO 4600 4530 OPEN A$ AS FILE 3% 4535 ON ERROR GOTO 9000 4540 DIM #3%, C(250%,15%) 4550 FOR I%=1% TO N9%: FOR J%=1% TO M9% 4560 C(I%,J%)=X(I%,J%) 4570 NEXT J%: NEXT I% 4580 C(0%,0%)=N9%: C(1%,0%)=M9% 4590 CLOSE 2%: PRINT "OUTPUT FILE ";A$;" CREATED AS A VIRTUAL MATRIX" 4595 PRINT "WITH DIMENSION OF";N9%;" BY";M9%: PRINT: GOTO 4205 4600 IF ERR=28% THEN GOSUB 10000: RESUME 4205 4610 INPUT "DO YOU WISH TO TRY AGAIN";A9$: IF LEFT(A9$,1%)="Y" GOTO 4510 4620 ON ERROR GOTO 0 4630 GOTO 4205 5500 PRINT: PRINT A1$(8%) ! HERE THE USER IS ASKED IF HE WANTS A TABLE OF RESIDUALS PRINTED. IF HE DOESN'T, THEN CONTROL IS TRANSFERRED ABOVE TO THE 'MORE REGRESSION ?' QUESTION. 5510 INPUT "DO YOU WISH TO PRINT THE TABLE OF RESIDUALS";A$ 5515 IF LEFT(A$,1%)="Y" GOTO 5590 5520 IF LEFT(A$,1%)="N" OR A$="" GOTO 4190 5550 PRINT:PRINT"IF YOU WISH TO PRINT Y OBSERVED, Y ESTIMATED, RESIDUAL, AND": PRINT "STANDARDIZED VALUE OF RESIDUAL FOR EACH CASE, TYPE YES": GOTO 5500 5580 ! THIS SECTION PRINTS THE TABLE OF RESIDUALS. 5590 PRINT: PRINT "OBS. NO. Y OBSERVED Y ESTIMATED RESIDUAL STD. RESID." 5610 FOR J%=1% TO N9%: E1=B1 5630 FOR K%=0% TO V1-1%: K1%=P%(K%): E1=E1+B(K%)*X(J%,K1%) 5660 NEXT K% 5670 R6=X(J%,M9%)-E1: IF S5=0 THEN S6=0: GOTO 5690 5675 S6=R6/S5 5690 A$=" ### #######.### #######.### #######.### #####.###" 5700 PRINT USING A$,J%,X(J%,L%),E1,R6,S6 5710 NEXT J% 5720 GOTO 4190 ! THIS SECTION CONTAINS SEVERAL ERROR MESSAGES AND PRINTLINES. NOTE THAT LINES 6000 - 6008 ARE EXECUTED AT THE BEGINNING OF THE PROGRAM IN ORDER TO STORE SEVERAL PRINTLINES IN THE A1$ ARRAY. 5760 PRINT: PRINT: PRINT "A VALUE OF ZERO FOR AN ELEMENT OF THE"; "DIAGONAL OF THE": PRINT "CROSS-PRODUCT MATRIX HAS BEEN COMPUTED. THIS IS BECAUSE": PRINT "VARIABLE";C%(J%);" HAS A CONSTANT VALUE FOR ALL OBSERVATIONS" 5770 PRINT "YOU MUST DELETE THIS COLUMN FROM THE MATRIX BEFORE YOU": PRINT "CAN DO STEPWISE REGRESSION" 5780 GOTO 4200 6000 DIM A1$(8%): A1$(0%)="NO OTHER VARIABLES REDUCE ANY MORE OF THE VARIANCE OF Y": A1$(8%)="TYPE 'HELP' FOR AN EXPLANATION" 6001 A1$(1)="YOUR DATA DOES NOT CONTAIN VARIABLE ##.## TYPE AGAIN, PLEASE" 6002 A1$( 2 )='SUM OF SQUARES REDUCED IN THIS STEP.... #######.####' 6003 A1$( 3 )='PROPORTION OF VARIANCE OF Y REDUCED.... ##.####' 6004 A1$( 4 )='PARTIAL F (D.F. = 1,###)............... ####.####' 6005 A1$( 5 )='CUMULATIVE SUM OF SQUARES REDUCED...... #######.####' 6006 A1$( 6 )='CUMULATIVE PROPORTION REDUCED.......... ##.#### (OF #######.####)' 6007 A1$(7%) ='MULTIPLE CORRELATION COEFFICIENT.......... ####.####' 6008 RETURN 6009 IF ERR=28% THEN GOSUB 10000: RESUME 4205 6020 PRINT "YOU TYPED NON-NUMERIC CHARACTERS FOR THE VARIABLE NUMBER" : PRINT "PLEASE TYPE ONLY NUMBERS WHEN NUMBERS ARE REQUESTED" : IF I9%=9% THEN RESUME 1820 ELSE RESUME 1560 09000 IF ERR=28% THEN GOSUB 10000: RESUME 4205 09010 ON ERROR GOTO 0 10000 V0$=SYS(CHR$(6%)+CHR$(-7%)): RETURN ! CTRL/C TRAP 32750 END