对应分析 SAS讲义12

来源：九壹网

对应分析SAS程序

2010年5月

一、对应分析的统计思想二、对应分析的原理

三、对应分析的SAS程序与应用四、对应分析练习题

第一节对应分析的基本理论

对应分析又称相应分析,于1970年由法国统计学家J.P.Beozecri提出的.

对应分析是将频数或计数表的各种联系用图来表示的方法。对应分析本质是一种在低维空间中用图形方法表示联系的技术。对应分析（Correspondence Analysis)：通过分析由定性变量构成的交互汇总表来揭示变量间的联系。对应分析可以揭示同一变量的各个类别之间的差异，不同变量各个类别之间的对应关系。可以将两个变量的联系做在一个图里表示出来。

它是在R型和Q型因子分析基础上发展起来的多元统计分析方法,故也称为R-Q型因子分析. 因子分析方法是用少数几个公共因子去

提取研究对象的绝大部分信息,既减少了因子的数目,又把握住了研究对象的相互关系.在因子分析中根据研究对象的不同,分为R型和Q型,如果研究变量间的相互关系时采用R型因子分析;如果研究样品间相互关系时采用Q型因子分析.

第二节对应分析原理

5、将因子载荷为座标作图，得到对应分析图

总惯量2ni1j1pqpijpipjpipj2

奇异值是惯量（特征值）的平方根。惯量用于说明对

应分析各个维度的结果能够解释列联表中两个变量联系的程度。

第三节 SAS对应分析程序例：自评健康状况很好A 好B 一般C 差D 很差E 理1 129 931 660 251 11 生活自理能力完全自部分自不能自合计理2 14 146 116 104 7 13 400 理3 8 96 74 81 23 24 306 151 1173 850 436 41 52 2703 没回答F 15 合计 1997

Data ex2;

Input zipin$ zili datalines; a 1 129 a 2 14 a 3 8 b 1 931 b 2 146 b 3 96 c 1 660 c 2 116 c 3 74 d 1 251 d 2 104 d 3 81 e 1 11 e 2 7 e 3 23 f 1 15 f 2 13 f 3 24 ;

Proc corresp data=ex2 all outc=result; tables zipin , zili ; weight renshu; Run;

%plotit(data= result, datatype=corresp)

renshu;

卡方分解表

The CORRESP Procedure Inertia and Chi-Square Decomposition Singular Principal Chi- Cumulative Value Inertia Square Percent Percent 18 36 54 72 90 ----+----+----+----+----+--- 0.29615 0.08770 237.060 92.45 92.45 ************************** 0.08463 0.00716 19.358 7.55 100.00 ** Total 0.09486 256.418 100.00 Degrees of Freedom = 10 奇异值（Singular Value ）是主惯量（Principal Inertia）特征值的平方根。惯量用于说明对应分析各个维度的结果能够解释列联表中两个变量联系的程度。第一维度可解释总信息的92.45%

Row Coordinates Dim1 Dim2 a -0.2546 -0.0768 b -0.1257 -0.0267 c -0.0941 -0.0018 d 0.3384 0.1530 e 1.3810 -0.4086 f 1.1856 -0.1051 Q型因子载荷矩阵也就是样本的因子载荷阵,用来反映样本在第一公因子(Dim1)和第二公因子(Dim2)上的载荷程度

从第一因子载荷可以看出，a,b,c是一组负值，d,e,f是正值，值越大健康状况越差

Column Coordinates Dim1 Dim2 1 -0.1590 -0.0216 2 0.2317 0.1920 3 0.7346 -0.1097 R型因子载荷矩阵也就是变量的因子载荷阵,用来反映各变量在第一公因子(Dim1)和第二公因子(Dim2)上的载荷程度.

第一维度说明，自理能力，越小则自理能力越强。从图形上可以综合将两个变量的关系加以说明。

Summary Statistics for the Column Points Quality Mass Inertia 1 1.0000 0.7388 0.2005 2 1.0000 0.1480 0.1412 3 1.0000 0.1132 0.6583

Summary Statistics for the Row Points Quality Mass Inertia a 1.0000 0.0559 0.0417 b 1.0000 0.4340 0.0755 c 1.0000 0.3145 0.0294 d 1.0000 0.1613 0.2346 e 1.0000 0.0152 0.3317 f 1.0000 0.0192 0.2873 边缘频率（MASS密度）如（0.0559＝151/2703）; Inertia 对总惯量的相对贡献,e,f,d对惯量贡献最大

The CORRESP Procedure Squared Cosines for the Row Points Dim1 Dim2 a 0.9166 0.0834 b 0.9568 0.0432 c 0.9996 0.0004 d 0.8303 0.1697 e 0.9195 0.0805 f 0.9922 0.0078 从原点到这一点的向量和每一座标轴夹角余弦平方。表示哪个座标轴起作用。

Indices of the Coordinates that Contribute Most to Inertia for the Row Points Dim1 Dim2 Best a 0 0 2 b 0 0 1 c 0 0 1 d 2 2 2 e 2 2 2 f 1 0 1 指示哪个点能最好地解释每一维的惯量。

Partial Contributions to Inertia for the Row Points Dim1 Dim2 a 0.0413 0.0460 b 0.0781 0.0432 c 0.0317 0.0001 d 0.2106 0.5273 e 0.3299 0.3537 f 0.3083 0.0297 例2

SAS手册例题

各年份读博士学位的专业选择特点

title \"Number of Ph.D's Awarded from 1973 to 1978\"; data PhD;

input Science $ 1-19 y1973-y1978; label y1973 = '1973' y1974 = '1974'

y1975 = '1975' y1976 = '1976' y1977 = '1977' y1978 = '1978'; datalines;

Life Sciences 44 4303 4402 4350 4266 4361 Physical Sciences 4101 3800 3749 3572 3410 3234 Social Sciences 3354 3286 3344 3278 3137 3008 Behavioral Sciences 2444 2587 2749 2878 2960 3049 Engineering 3338 3144 2959 2791 21 2432 Mathematics 1222 1196 1149 1003 959 959 ;

proc corresp data=PhD out=Results short; var y1973-y1978; id Science; run;

proc print data=results;run; proc plot data=results; plot Dim1*Dim2$science=\"*\"; run;

%plotit(data=Results, datatype=corresp, plotvars=Dim1 Dim2)

The CORRESP Procedure Inertia and Chi-Square Decomposition Singular Principal Chi- Cumulative Value Inertia Square Percent Percent 18 36 54 72 90 ----+----+----+----+----+--- 0.04149 0.00172 119.400 91.34 91.34 ************************* 0.010 0.00011 7.847 6.00 97.34 ** 0.00708 0.00005 3.478 2.66 100.00 * Total 0.00188 130.726 100.00 Degrees of Freedom = 15 Row Coordinates Dim1 Dim2 Life Sciences -0.0400 0.0068 Social Sciences -0.0142 -0.0131 Engineering 0.0571 -0.0038 Mathematics 0.0515 0.0218 Column Coordinates Dim1 Dim2 1973 0.0513 0.0049 1974 0.0452 0.0035 1975 0.0093 0.0003 1976 -0.0200 -0.0165 1977 -0.0328 -0.0096 1978 -0.07 0.0175

练习：

1、对应分析的数据形式

ABCD是发掘地点，P0-P6是出土文物品种

P0 P1 P2 P3 P4 P5 P6 A 30 53 73 20 46 45 16 B 10 4 1 6 36 6 28 C 10 16 41 1 37 59 169 D 39 2 1 4 13 10 5 2、父母社会经济状况高低高1 2 60 3 94 4 78 低5 71 71 54 21 心理受Impaired 86 损轻MildSymp 188 105 微中Moderate 112 65 等好 Well 3、

121 57 141 97 77 72 54 36

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文