您的当前位置:首页正文

jmp特性和案例(DOC)

来源:九壹网


JMP特性和案例

鼠标操作事件:

随时随地响应您的操作

许多统计分析软件包对用户操作的响应都非常有限 -- 数据和结果都端坐在一旁 -- 偶而根据指令做出一些响应,描述结果的报表也相当死气沉沉。 JMP作为一套动态数据分析系统。任何在您的桌面显示出来的窗口都是互动的。数据,分析过程都可以根据您的需求即时作出变化。每一样东西在他们被关闭之前都是动态的,JMP会根据您鼠标点击的部位立即进行响应。

点击您的电子表格 数据表永远以您熟悉的电子表格方式出现。一切操作都不需重新学习。

您可以随时呼出弹出菜单,重新定义各种属性。

点击您的分析报告 点击报告的标题可以切换隐藏与展现两种方式。点击弹出菜单图标(

)来发布指令。点击并拖

动图形的边角来改变它的大小。双击报表的一行可以修改

它的格式,双击图形的轴线可以修改它的坐标。

点击您的直方图 使用“grabber”工具来拖动您的直方图,当您左右拖动它时,直方图的间隔将立即改变。向左使间隔变宽,向右使间隔变窄。您也可以上下移动以改变视觉位置。

三维旋转图 使用“Grabber“工具,三维图将跟随您的鼠标进行实时的全方向旋转。

为了特殊操作而设置的工具集 一共有9种工具以配合您可能用到的各种操作。“箭头”工具用于一般的选择,点击操作。您可以使用“问号”工具点击任何地方以获取相关帮助。用“画刷”工具来选择矩形区域中的数据点。用“绳套”工具来选取不规则区域中的数据点。使用“grabber(手)\"工具来移动物体。使用“裁剪” 工具来进行裁剪与粘贴。使用“发丝”工具来精确定位和获取精确坐标值。使用“放大镜”来对图形进行缩放。使用“文本工具”来加上各种文本,注释。

互动式图形探索:

获取数据点

如果在JMP数据表中的某一行被选中以高亮显示,与其相关的任何部分都会被同时以高亮显示。

数据窗体的链接 -- 当您在散点途中选中一点时,该点会以高亮显示,同时它的标注会出现在旁边。但是这还不够。您会发现与其相关的各个部分--如数据表中的该条记录,都会被同时以高亮显示。因为各个显示窗口中的数据都有着动态的联系,所以他们会协调的一起行动。这与SAS/Insight组件

的特性是完全一样的。

直方图高亮 -- 让我们为几个变量分别作直方图,例如上图中的年龄,性别,体重。然后点击性别直方图中表示男性数量的柱状条,您立刻可以看到,数据表中

所有男性的记录都被高

亮显示,同时年龄直方图,体重-身高散点图中的男性区域或数据点都被同时以高亮显示。

属性 -- 一旦您选中了您所需的纪录,您就可以为它们设置一些特殊的属性。记录的属性会在数据表的第一列以图标方式显示。

表示隐藏,

表示在进行分析时排除该点, 表示该点带有标注。

使用调色版与形状选择器来为您的数据加上鲜明的个性。比方说,染红所有性别是“男性”的数据点;把体重大于70公斤的数据点的形状改成圆形等。这样便可以使你在一张简单的二维

图上观察到更多的信息。

最后,不要忘记您可以通过按住按钮拖动鼠标来选取一组数据,也可以使用“放大镜”工具来进行无级放大。

数据表与计算:

强大的表单

JMP 以电子表单的形式呈现数据表,使你始终能看见你的数据。这一点非常重要。它不象数据库那样,使你不得不通过查询过程来检索数据。数据是所有分析和发

现的原始资料,它最简便的把数据呈现在你眼前。

数据表 窗口有一个很好的互动控制界面。指向并点击一个单元可以对它进行编辑,指向并点击某列 名可以对它进行重命名。点击拉下菜单可以选择属性,指向并点击某行或者某列的头部可以选择某行或者某列。拖拉一组单元代表选择它们,然后你就可以对它们进行复制和粘贴了。点击并拖拉某列的

边界可以改变它的宽度。双击某列使它高亮并改变信息。你甚至可以在同一个数据表上打开几个窗口并独立地查看它们。在一个窗口上改变某个值同样能在其它窗口上看到它的变化。

数据表菜单 可以使你以七种方式修改原数据表成为新的数据表: Subset:使所选的行和列成为子集, Sort:按照变量的数字以升序或者讲序排列, Stack columns:产生一个长窄形表, Split columns :产生一个短宽形表, Transpose:交换行和列,Concatenate :上下连接表格, Join :左右连接表格。例如:你可以利用关系连接两个表格的观测值。

Group/Summary 功能允许你以强有力的方式分组归纳。

选中一个分组的同时,它选中了所有在原数据表上的对应

观测。现在让我们回到原数据表并执行 Subset 命令,将在一个新的数据表上得到所选的一组数据。 Tables 菜单也包括了 JMP 的强大的实验模块设计。你选择了你所需要的设计类型,JMP就会建立运行表格。这使它方便于收集和分析数据。

Calculator 能计算出一个新的变量。点击 Columns 菜单 New Column 选项,将出现一个新列对话框,填上名字并设置属性。

在数据表弹出菜单中选择 Formula。 单击 OK。 这时一个计算器窗口出现以便你能输入公式。为了建立 weight 和 height 的比,先单击列面板的

height ,单击分

隔按钮 ,单击 weight。你将看到公式显示在表中。单击关闭按钮,当窗口被关闭以后,公式将自动完成计算。如果你改变了 weight 的一个值,比率列将自动修改加入一个新行,再计算新的比率。在电子表格中,公式是即时演算的。

数据分析:

易于理解的统计分析

统计学里的一个麻烦就是它有太多的方法。 大型的统计软件包里有许多数千页的过程文档。 以菜单或者高级系统的形式,尽最大的努力来包含你所想要的方法。不但如此,你还可以集中有顺序的几种方法来得到你想要的报告。因此,如果我们中的大多数不能够安排我们的VCRs, 我们又怎么期望能接近统计学这样如此大型和复杂的领域? JMP 用途有:

• • •

可以选择统一的、流行的方法系列,并以图形把它们展现出来。 可以对与每一个变量相关的模型类型进行分析。 能为方法组织多个平台,每个平台处理一种概要情况。

使用模型类型是JMP的最大区别特征。在数据表里,你可以指定JMP 如何处理变量的值。有三种选择:如果数值是无序类的,选择 nominal ;如果数值是有序类的,选择 ordinal ;如果数值是被作为连续数值处理,选择 continuous 。数据表上的属性意味着你以后不必进行例如 CLASSES 语句的说明。它帮助指明了分析的类型。

JMP 把方法组织成平台,每种统计情况提供一个平台。 每个平台可以有几种依赖于变量的建模类型的特性,例如:当你选择 Fit Y by X, 平台将执行显示在帮助屏幕上、根据建模类型组合的四种不同分析展示中的某几种。

一旦平台产生了它的原始显示,你就自然可以进行更深入的分析。例如:你想做各种处理下的响应情况测量。因此你先运行 Fit Y by X 平台。因为响应 Y 是 continuous 而处理 X 是 nominal, 这就形成了平台的单向特性,对每一

个处理产生了一个垂直散点图。一个弹出菜单提供了合适的均值显示和产生方差分析的方法。你可以通过选择 Fit Quantiles 来查看组内的分布。怀疑方差不相等,你使用同一

个菜单中的 Test Variances are Equal 命令,导致了方差测试的四个均匀性。推断出方差不相同,你参考方差测试均匀性的Welch-Anova 报告已经产生了。 此外,你还可以通过选择不同的菜单进行非参数分析。或者你可以请求均值比较。每种请况给出了进行下一步的提示。

JMP 图形化的显示结果,并把图形结合到报告中。 这与其它产品形成了反差,它们的统计信息与图形更多的时候是不独立的。JMP的文本结果和图形在同一个窗口里,它们互相提供支持。

JMP 预测。当你用离散效应检验一个模型时,你将自动获得一个最小均值平方;而不必另外请求。如果观测值上存在重复,而模型又是不饱和的,你蒋自动得到一个拟和不足报告;而不必另外请求。当选项可利用的时候,它们也相应的可利用;你不必返回分析

数据分析:

为每一种可能事件做准备

离散响应 指的是对每个响应水平出现的概率来建模。关键问题是:各种响应水平的概率是 X 水平上的函数,还是 X 水平上的常数?结果以Mosaic图展现,首先根据 X 水平的总体大小来对 X 轴进行划分。然后 Y 轴也被根据每个响应种类的响应比率来进行划分。如果响应比率在同一个 X 组上,所占部分将在同一列上显示。

边界均匀性的卡方检验 在文本报告中列出。 (如果两个变量都被被考虑响应,这种检验等价于独立性检验。)

一致性分析 JMP 一直尝试给出数据的图形呈现。当数据临界均匀时,可能性表格和Mosaic图是很好的显示。但在描述它们的区别,特别是有多个水平的时候,所起的作用就比较小。

为了取得关于数据的更多信息,你可以请求一致性分析图,一项被法国人拥护的技术,在一个可能性表里显示了行和列的关系。对于每一行和每一列, 都有一个点。紧密结合的行点表示数据表里的行具有相似的剖面。这种技术是对称的,从而列的剖面相似性也能这样显示。

在以上的例子里,列是从1到9的Y值,为四种不同的干酪显示了优先选择值, X 的取值是 A, B, C 和 D。它们的联系在Mosaic图中很难看出(如上图),但在一致性分析图中则很容易发现。

公式:

公式作坊

JMP数据表中的每一列都能够查询自己的计算器窗口。你可以通过在计算器窗口中建立公式来为列计算值。公式可以包含表达式、列名、下标列名、迭代运算、算术运算、逻辑运算、比较运算、条件运算以及各种包括算术、超常、字符、时间和日期、参数、或者临时变量

的函数。 你可以成为公式技能的专家,通过简单的计算来完成计算概率的复杂迭代过程。

你可以编辑任何公式。你可以剪切和粘帖一个公式到另一个数据表,你还可以把它看成图画或者文本进行复制和粘帖。

统计平台的结果不只是包括表格和图形,还包括了输出数据。在 JMP 中,你可以在数据表的列中保存结果。在许多情况下,你可以把公式作为列本身的数据进行保存。

保存预测公式

当你用JMP拟合一个模型时, 你所有的保存选项中的一个就是保存预测公式。假设你为拟合Y对X和具有a、d、f三个水平的主效应保存了公式,一个名为 Pred Formula Y 的新列出现在数据表中。双击这列的顶部和公式图画,你将看到下图,得到预测值的公式。

计算器窗口

保持更新

作为一个公式,它保持不断更新值。如果你改变了某列的一个值,公式值也随着改变,它将适当地重新计算预测。如果你在数据表中加入行,公式将为这些新行计算预测值。

模型拟合

显示在计算器窗口中的这些预测公式,适用于连续偏差响应拟合。对于离散和顺序响应的逻辑回归情况,你得到一个带有每个响应水平的拟合概率的公式的列。

保存主要成分

对于来自Y平台的旋转性和相关性的主要成分和旋转成分,保存包括公式的成分得分。

保存判别式得分

对于一个多响应的拟合,你可以和它们的公式一起保存判别式得分。包含 Mahalanobis 距离和后期概率的列,带有一个最近似水平的列。 粘贴与复制:

协同工作

人们不需要做每件事情的程序,但需要能和其他程序一起工作的程序。使得现代软件如此强大的一个功能就是你能在不同的应用中复制和粘帖数据。你怎样看待这项功能?我们可以复制JMP图画并把他们粘帖到一个字处理过程中,有时候,我们还可以把他们粘帖到图象处理程序中进行修饰。

剪刀

如果你想复制图画的一部分而不是整幅图画,你可以从Tool菜单中调出剪刀,单击并拖拉出一个矩形,建立或者加入选择区域来进行复制。选择copy,转换应用,选择paste来完成,见下图一所示。

图一

图二

圈套目标

拖曳程序对JMP图画特别好用,因为JMP以圈套的形式聚类图画形式。例如:直到画线图块成为一个组你才能聚类。标记的字体和尺寸都是很容易改变的。(见上图二)

文本

图形和文本结合在一起。使用 copy 和 paste,把图形和文本结合成一幅图画,没有独自的文本。如果你只想要文本,你要用 Copy as Text 命令而不是 Copy命令。这个文本可以被插进字处理文件,可以使用文本中的标签建立表格。

输出到字处理

但可以使得图形就是图画,文本就是文本吗?可以的,使用 Journal 命令,来自窗口的结果出现在打开的`journal' 窗口中。你可以翻滚 JMP journal 窗口,写入或者粘帖内容进去,但他不是一个完整的字处理。当你用 Save 或者 Save As 命令保存 journal,窗口将按你指定的字处理格式保存文件。然后,用你的

字处理打开一个已保存的 JMP journal ,你就可以进行处理了。

异常值检测

图形化探索:

之所以把它们称为异常值,是因为它们与众不同,远离大部分数据。它们有可能是一些错误数据,将会破坏您的分析结果。或者它们有可能是一些真实存在的现象,正在等待您的发现和理解,以便进行一些精彩的应用。无论是哪一种情况,您都应该重视它们。

对于一维数据 -- 他们只是一些极端值,很容易被发现。

对于二维数据 -- 异常值在一些偏僻的方向延伸出来。如果变量具有相关性,那您会看到异常值在二维的方向延伸出来,而不是在某个维度分别延伸出来。您可以通过测量该点与正态分布云图的偏离距离来量化它的偏移。该距离称为马哈朗诺比斯距离(Mahalanobis distance)。

对于三维数据 -- 三维旋转图用于发现三维的异常值。如果您的数据变量多于三维,那您不得不使用其它的技术。如果您的数据变量都具有相关性, 您将可以看到您的数据有着一定的延伸方向。同时您可以看到异常值从偏僻的方向延伸出来。所以您可以选取三个主要变量来制作三维旋转图,以发现异常值。

考虑N维的情况 另一方面,您可以考虑整个相关矩阵,为每一个观测计算其马哈朗诺比斯距离。再从多元均值中得到N维的距离。但是这样一来,所有的观测,变量,包括被测量值本身都会被考虑进去,这使得测量出的距离与被测量值具有相关性,影响结果的准确性。所以在这种情况下,使用折叠距离(Jackknifed distance)会更好 -- 每一点将与不包含该点的观测进行距离测量。

如果您正在拟合模型,您可能会想知道每一个观测对结果的影响。此时您可以使用杠杆图(Leverage plot)。它将显示某个观测的残差以及该残差对模型所造成的影响。如果您希望从数据中发现潜在信息,灵活运用JMP强大的图形工具绝对会对您有很大的 帮助。

优秀试验设计的重要性

在我们进行相关因素优化的过程中,常见的方法是挑选一个起始值,然后不断的改变某个因

素,使结果尽可能的优良。但是由于某些随机或者客观因素影响,这种方法常常会发生错

误,而且这种错误很难被预知。最大的问题在于可能影响结果的参 数有很多,而且参数间存

在一定的联系。结果是当您调优了A参数时,您再去调优B参数,然后A参数就不再处于最优

状态了 -- 因为B参数的改变同时影响了A参数。为了使所有参数达到最优,您可能要进行数百次试验,每次只能有一点点的进展。而且会缺乏对所有参数的全面了解。

如左图,假设我们使用了1到5步来改变因素1,发现结果不断改良。直到我们进行步骤6时,结果变差了。所以下一步把因素1的取值回退到第5步,并认为在因素2为常数的情况下,因素1=0.9是最佳配置。接着固定因素1,通过7-10步来找到因素2的最佳配置。我们会发现当固定因素1时,因素2= -0.38是最佳配置。 < P > 完成上

述过程后,我们找到了使结果最优的配置了吗?没有!所有的最佳配置都是在固定其它因素情况

下发现的,这并不是全局最佳配置。 您可以不断重复上述过程,可能会不断慢慢逼近结果。或者,另一种做法是首先进行试验设计,精确估算整个响应面以减少试验次数。

一种更科学的方法是推论出会影响结果的所有参数,然后选择其中最重要的12个进行研究。JMP可以帮助您进行鉴别设计(screening design),只需要16次的运行。然后把结果送回JMP。您就可以立刻知道最重要的三个因素。然后JMP会完成响应曲面试验设计(response-surface design),只需20次的运

行。然后您就可以对响应曲面有整体的了解,知道最佳配置的位置,与处理过程的变化情况。

使用JMP进行试验设计

试验在质量控制的整个过程中扮演着非常重要的角色 -- 它是我们的产品质量与工艺流程能够不断提高的重要保证。

如果您可以对您的产品质量进行量化测量,而且您可以控制与其相关的因素,那么您就可以进行实验来发现如何配置各种因素,以使产品质量最优化。 试验通常也是昂贵的,每个人都希望用最少的实验来获取最多的信息。

目前,统计分析技术在试验设计这个领域已经形成了许多完善的理论,通过它们的帮助,您完全可以完成各种优秀的实验设计,在提高产品质量的同时降低成本。 一个好的软件也将带领您避开复杂的数学公式与运算,选择正确的实验来获取结果。

好的软件将会促进好的试验设计。即使您是一个实验设计方面的专家,您也需要一套好的软件辅助。或者您不想花费时间与资源去成为专家,但只要你有一些基础知识、入门指引和一套好的软件,您同样可以完成令人赞叹的实验设计工作。 在JMP中,DOE工具集几乎为每一种情况提供了设计选择。

使用它来进行试验设计的第一步是选择您的实验设计的种类。接着您就可以按步骤完成您的试验,得到结果。例如,进行二水平判别试验设计,以测量各种因素对结果的影响,同时尽量减少试验次数。或者进行响应表面试验设计,来优化参数设置。等等。 JMP案例:

那一种洗涤剂最好?

基本分布分析与ANOVA 修订: 7/30/1999

样本文件: Cleansing.jmp

简介

这些数据来源于一家设计与制作化学溶剂与洗涤剂的公司。

研究员正在试图检测哪一种洗涤剂对于依附在桶壁上的煤渣具有最好的清洁性。每一个大桶在开始时都依附了500毫克的煤渣。研究员将测试三种不同的洗涤剂:A, B, 与 C. 洗涤剂A是公司目前的产品。研究员同时记录下桶内的pH值,来观察不同的pH值对不同洗涤剂的影响。他们已经决定使用6种pH值来观察清洁的效果。

分析过程

I. 数据录入正确吗?

最简单的方法是对数据进行分布分析(Distribution),来图解与量化数据的有效性。

1. 选择 Analyze: Distribution of Y.

2. 在弹处的对话框中,选择所有的三个变量, Coal particles, pH, 和 Polymer , 然后单击 Add. 3. 单击 OK.

第一件事是检查数据是否在期望范围之内。因为所有的大桶在一开始只依附有500毫克的煤渣, coal particles 的值应该处于0到500毫克之间。我们也知道pH值只能在0到14之间。任何超出范围的数据都是明显的错误,OK,数据看起来如何?

The mosaic bar chart shows that we have each of the 3 types of polymers and the frequency table beneath confirms that each type has 6 runs. We can also verify that each polymer has a full range of pH levels by clicking

on each polymer to see where the data lie in the adjacent histogram of pH. We also note that the rows for each selected polymer are highlighted in the data table.

The Moments box is a terrific way to illustrate the help tool. The context sensitive help shows a description of the statistic with the formula used below. Try changing the tool to the ?, then clicking on a statistic in the moments box. To retrieve the pointer, click on the pointer tool on the JMP toolbar.

II. How can we determine which polymer is best?

One popular way to easily compare differences is with bar charts. 1. Select Graph: Bar/Pie Charts.

2. Select Coal particles as the Y and Polymer as the X. JMP displays a message asking us to summarize the data.

3. Click OK. The summary dialog appears and defaults to averaging Coal Particles for each Polymer. 4. Click OK.

Polymer C clearly jumps out as the winner with the lowest number of coal particles remaining in the tank.

Now, the chemists need to determine just how different C is from A. To switch from producing A to C involves a cost. If the difference is marginal, it might not be worthwhile to change manufacturing systems.

III. How can we tell - How large are the differences?

To see whether the difference is significant, we can use analysis of variance, or ANOVA in Fit Y by X.

1. Select Analyze: Fit Y by X. In the dialog box, choose Coal Particles as the Y and Polymer and pH as the X. 2. Click OK.

3. Beneath the \"Coal Particles by Polymer\" scatterplot, choose Means,

Anova/t-Test from the Analysis reveal button.

4. Beneath the \"Coal Particles by pH\" scatterplot, choose Fit Line from the Fitting reveal button to see the relationship.

What do the graphs tell us? From the means diamonds graph, we can see that polymer C has substantially fewer coal particles than polymer A - but it's hard to tell just how different polymers are B & C. By shift-clicking each of the polymer C points in this graph, their corresponding values are shown in the linear fit scatterplot. From the graph of Coal particles by pH, we see that there appears to be a positive relationship, as pH increases, the amount of coal left in the chamber increases.

The statistical output reinforces what we see in the graphs. The output for the differences in polymers (Prob>F) in the Analysis of Variance table shows a p-value of 0.0258 which is significant at the alpha=0.05 level. The regression output also shows evidence that there is a strong relationship between pH and Coal Particles. The Parameter Estimates table indicates a (Prob >|t|) of <.0001.

IV. Do the polymers react differently with changing pH levels?

The method used to test whether the different levels of a factor change their effect with differing amounts of another variable is called analysis of covariance. We could use the Fit Model platform to perform this analysis. As an alternative, we will use the graphical and grouping capabilities of the Fit Y by X platform to visually determine whether pH affects the amount of coal removed by the different polymers.

1. Close the \"Coal Particles by Polymer\" results by clicking on the title above the scatterplot.

Let's first identify the groups by color and marker.

2. Select Color/Marker by Col.. from the Rows menu. Select Polymer as the variable and select the checkbox for marker, as well as for color. 3. Click OK.

4. Select Grouping Variable from the Fitting reveal button. In the dialog, select Polymer as the grouping variable. 5. Click OK.

6. Now that we have selected a grouping variable, select Fit Line from the Fitting reveal button and you'll see a different line fit for each polymer.

7. Use the Annotate tool to create the legend that appears in the yellow box on the graph. The Annotate tool is like putting a yellow sticky note on your graph. It allows you to place a text note on your graph for better documentation. The annotate tool is too on the far right of the JMP toolbar. It appears as a letter A in a box.

To use the tool, just click on the tool and then move the cursor to the top left corner of the area you want the note. Draw a box with the left mouse button depressed. Let go of the mouse when your box is finished and type in your note. The only way to delete the note is to grab it with the mouse and drag it off the JMP window.

From the graph, we can see that polymer B has approximately as few coal particles as polymer C when at low pH levels, but as the pH levels increase, B approaches the range of polymer A. We can also see more clearly that polymer A consistently has more coal particles remaining than polymer C.

V. Conclusion

It looks like it will be a worthwhile investment switching production lines from making polymer A to making polymer C, since C is a consistent improvement over polymer A.

We also learned that polymer B may be just as good as polymer C at lower pH levels, so the researchers may be able to use that compound for different applications that don't require the broad range of adaptability. JMP案例:

那一种早餐最好,那一种早餐最差。

分布分析与聚类 修订: 7/30/1999 样本数据: Cereal98.jmp

Introduction

This is real data extracted from the side and back panels of cereal boxes. This example is a good illustration of how to navigate and explore using JMP. It will step through histograms and descriptive statistics,

correlations and outlier detection, cluster analysis, and scatter plots. In the process we will discover what dynamic linking and statistical discovery is all about.

We will consider the data first as a consumer - with concerns about fat content, calories, and fiber per serving for instance, and secondly as a market researcher looking for undiscovered market segments.

Procedure

I. Basic Questions:

What manufacturers are represented?

Are there any \"good\" cereals that simultaneously are low in fat (< 2 grams) and calories (< 150) but high in fiber (>= 8 grams)? If so who makes them and who are they?

These are questions suitably addressed by looking at the one-dimensional distributions of these variables and clicking on histogram bars corresponding to the questions asked.

JMP is a powerful discovery tool. One of the quickest ways to discovery is with the dynamic linking of the histograms. As you will see in the next few steps, all histograms (and other graphs) are linked together within a JMP data table. So click around and view the relationship of data. Get to know your data. You may find something interesting. 1. Choose Analyze: Distribution of Y.

2. In the dialog box, select four variables: Manufacturer, Calories, Fat, and Fiber, by clicking on each one separately and then clicking Add. 3. Click OK.

4. Click on the histogram for Nabisco. You will immediately see the Calories, Fat, and Fiber breakdown of Nabisco cereals.

5. Shift-click (to group more than one histogram bar) on all the histogram bars corresponding to fiber >=8. (This will select those cereals with high fiber.)

6. Notice the darker shade area on the Calories and Fat histograms bars. Through dynamic linking we discover that some of these high fiber cereals are also low in Fat and Calories!

7. Control-click (to eliminate observations) the Calories histograms that are near or at 200. This will eliminate from view the high calorie cereals.

8. Shown here, in two pieces, are the histograms showing how the dynamic linking is seen. Some settings have been changed from the default settings to better display here. Histograms can be enlarged on screen by clicking in the bottom right corner of the histogram, grabbing the handle and adjusting the size.

You can see that as in most of JMP, graphics will come before the statistical text output.

8. Notice that even the data table view is dynamically linked. In the bottom left corner of the table view, it will display the number of observation that have been selected by all the clicking, shift-clicking and control-clicking you have done. As you can tell, we now have 4 observations selected.

9. Choose Tables: Subset. This captures all the data on this subset of cereals having these \"good\" criteria into a new data table. If you followed the precious steps, you will have a data table with 4 observations. 10. To keep the data tables in order, a good thing to do is to give the resulting data table a distinct, informative name. Select Window:Select Window Name. Type in a name, such as High in Fiber, Low in Fat & Calories.

Using this \"mining\" technique of shift-clicking for selection of cases and control-clicking histograms for de-selection, lets us accomplish a wide variety of logical \"ANDs\" and \"ORs\" graphically without dialogs or coding. Additional questions you might be tempted to ask might be:

• • • •

What cereals does American Home (Products) make? What are the Kellogg cereals? What cereals are high in fat?

What are all the low in fat cereals made by Kellogg?

All of these are answered with click selections or de-selections on histogram bars and a subset through dynamic linkage back to the underlying data table.

JMP is a smart, self-revealing+, and instructional product.

Smart: Did you notice the report for Manufacturer was a frequency table while the report for Calories, Fat, and Fiber gave quantiles (percentages) and moments (statistics like means and standard deviations)?

JMP does the appropriate statistics based on the modeling type of the variable (nominal, ordinal, or continuous). For Distribution of Y, nominal or ordinal variables generate frequency counts. Continuous variables generate quantiles and moments. Mistakes in analysis specification are minimized with JMP.

Self-revealing: JMP displays both the graph and the report together, each reinforcing the other, just by selecting the variable once, but it has additional options available. In the bottom left corner of the window the () menu provides additional graphs and displays and the ($) menu lets you save additional statistics in the current JMP table. In addition, there are () pop-up choices within the window that will produce

additional statistics. JMP reveals more and more with point and click selection choices instead of forcing you to return to main menus or dialog boxes.

Instructional: JMP has extensive contact-sensitive help. In our example by selecting the (?) tool from the tools on the menu bar and placing it over the outlier box plot and clicking once gives a picture and explanation as shown below. Likewise, placing it over the word Std Error Mean and clicking down, gives the formula for this statistic. With JMP, help is always just a (?) click away.

Modifying the look and the resizing of graphs is easy.

In this example for instance if we choose Horizontal Layout under (), the histograms appear as vertical bars. If we choose Normal Quantile Plot and Smooth Curve these appear for each continuous variable displayed. To make the histogram for Manufacturer bigger, just click the pointer tool once, any place inside the chart, and a small stretch box appears in the very bottom-right of the chart. Place the tip of the pointer tool inside the stretch box and with the mouse down drag it to any size you want. Using the bottom left corner selections, pop-up menus, the (?) tool and stretch boxes are all common throughout JMP. By keeping statistics simple and visual, JMP allows you to explore, interact and better understand your data. The design is elegant. It doesn't overpower you with reports or graphs but let's you quickly click them open when needed.

II. What are the significant patterns, relationships, and outliers in this data?

1. Choose Analyze: Correlation of Y's.

2. Drag-select all the Variables from Calories through Potassium. 3. Click Add. Click OK.

4. To add to the default Correlation Table use () and successively select Correlations-Pairwise, and Scatterplot Matrix.

In this case the text report of pairwise Pearson Correlations is not nearly as revealing as the graphics that are available from the border check selection.

It is important to look at two dimensional outliers. 5. Shift-click all points on or outside the ellipses

6. Select Rows and mark these points. Select Color to be red and Markers to be X.

7. With all outliers selected, go to Tables and choose Subset. This yields 19 cereals considered to be outliers from a two dimensional point of view.

To be more discerning, let's consider the N-dimensional outliers where N=9 which will take into account all the variables from Calories through Potassium.

8. Click on the () and Select Outlier Analysis.

9. From the () pop up choose Jackknife Distances and stretch the resulting graph of jackknifed distances.

Many of the two dimensional outliers lie on or above the dotted line, indicating that taking all 9 of the nutritional factors into account, they are still uniquely different.

10. Shift-click on the few that are significantly above the line and name these cereals by choosing Label/Unlabel under Rows.

Or, by using the lasso tool, select the cereals on or above the dotted line and choose Tables:Subset. The information is captured.

Results: Examining the subset, a fat conscious cereal eater will want to avoid 100% Natural Bran Oats & Honey. It has 9 grams of fat and supplies 30 Calories from fat.

Wheaties is a good all- around cereal but has a whopping 420 milligrams of sodium. If you are cutting down on salt intake you may want to avoid Wheaties. Special K is really only special in that it is high in protein. Fiber One and All-Bran with Extra Fiber both have very good qualities in high grams of fiber per serving and minimal calories, fat and sugars and also high values of potassium. But they are woefully lacking in complex carbohydrates.

III. From a consumer's point of view, we pose the question:

Which Cereals are similar to each other and which ones are dissimilar? By using cluster analysis techniques we will find that many of the cereals kids eat are loaded with sugar; some cereals have nutritional \"clones\" no one expects; and some have fascinating cluster memberships. 1. Select Rows:Clear Row States to clear row states.

2. Select Analyze:Distribution of Y's and fill the Add dialog with all variables, Calories through Enriched.

3. Select Analyze:Cluster and fill the Y dialog with all variables Fat through Enriched.

4. Scroll below the dendogram and to the bottom Cluster Distance plot and stretch vertically so you can see each \"x\" clearly.

Hint: The Cluster Distance plot has a point for each cluster join. The ordinate is the distance that was bridged to join the clusters at each step. Often there is a natural break where the distance jumps up suddenly. These breaks suggest natural cutting points to determine the number of clusters. For tight clusters choose n to be the number of clusters right before the jump.

You can get help by choosing the ? tool and clicking once on the Cluster Distance plot. In our case n=7 is a good choice indicating there are seven natural clusters of cereals.

5. Identify these 7 clusters by dragging the diamond side-to-side on the dendogram (either the upper or lower diamond) until it reads as 7. 6. With 7 clusters identified we want to first color and mark the clusters and then save the cluster membership. We do this as options under () and ($).

Click on each cluster stem and the marginal distributions for each of the cereal constituents will appear in the histograms revealing the nature of each cluster.

7. Results:

Cluster 1, which contains bran cereals such as Fiber One, All-Bran with Extra Fiber, 100% Bran, and All-Bran is the cluster characterized by very high fiber and potassium and very low complex carbohydrates.

Cluster 2, which is a very big, homogenous, cluster containing many of your kid's favorites like Cap'n' Crunch, Froot Loops, Trix and Honeycomb is high in sugar and low in fiber, complex carbohydrates, and protein. These cereals offer little more than pure sugar!

Cluster 3 cereals have the most fat and calories from fat. They should be avoided if you are counting fat grams. It is a small group consisting of 100%Natural Bran Oats & Honey, Banana Nut Crunch and Cracklin' Oat Bran. Cluster 4 is the \"enriched\" group. You'll get 100% of your daily requirement of vitamins and minerals with this group of Multi-Grain Cheerios, Product 19, Smart Start, Total Corn Flakes, and Total Whole Grain but for it's promise as a group its is mediocre in high fiber and low fat.

Cluster 5 is the low in sodium, high in complex carbohydrates, and high in protein group. It consists of wheat and oat cereals like Frosted Mini-Wheats, Quaker Oatmeal, and Shredded Wheat 'n'Bran.

Cluster 6 that contains many of the old traditional cereals such as Cheerios, Corn Flakes, and Rice Krispies, is low in sugar.

Cluster 7 is characterized by being high in protein and complex carbohydrates but unfortunately also high in calories from fat.

Traditional cereals like Wheaties and GrapeNuts are in Cluster 7 along with many \"fruit included\" cereals such as Just Right Fruit & Nut, Low Fat Granola w raisins, and Mueslix Healthy Choice.

Here we show the dynamic linking between the dendogram of cluster 4 and the discovery that it contains all the enriched cereals from the histogram.

8. Clones: By looking at early joins in the dendogram we see which cereals are essential clones (intentional or unintentional) of each other.

In cluster 1 Fiber One by General Mills, which was introduced later, is very similar in nutrient value (a clone) to All-Bran with Extra Fiber by Kelloggs.

9. The above example shows the real power of JMP as a discovery tool. Notice how we can identify the cluster in the dendogram and discover its' high fiber content in the histogram. We can then capture all the discovered cereals in a separate data table by clicking on Tables: Subset. In cluster 2, Frosted Flakes and Honey Frosted Wheaties are clones even though one is a corn flake and the other a wheat flake. Also Lucky Charms

and Frosted Cheerios are clones as are Cap'n'Crunch and Trix (different shapes but the \"same\" cereal).

An interesting clone in cluster 4 is Multi-Grain Cheerios and Total Whole Grain (the \"same\" cereal), but it comes in two different shapes and from two different companies!

In cluster 6 we have a much newer product, Complete Wheat Bran (by Kelloggs), being almost identical to a successful well-established cereal, Bran Flakes (by Post).

IV. Market Segmentation

Clones introduce the idea of market segmentation and market share

penetration. Obviously Kellogg had captured the major market segment of super fiber rich cereals with All-Bran with Extra Fiber until General Mills introduced Fiber One to compete head to head with it. Likewise Post enjoyed a good market share with Bran Flakes until Kellogg introduced Complete Wheat Bran.

These decisions may have been competitor driven. A possibly better approach by companies is to be consumer driven. This is what we see evident in Nabisco's Shredded Wheat 'n'Bran cereal. The Bran in the cereal gives fiber, the wheat gives complex carbohydrates. This cereal, which

incidentally has no sodium, is also high in potassium and protein, and low in calories from fat and sugar, all excellent attributes. The only problem is that a serving is 200 calories! 1. Select Analyze: Fit Y by X.

2. In the dialog box, select Fiber to be X and Complex Carbos to be Y. 3. Rescale the Y (5 to 45) and X (-2 to 20) axes, by double clicking on those areas.

4. Add tick marks and make gridlines as well.

5. Highlight, mark, and label all the cereals near the upper right corner. Note that we will not consider a combination of Almond Crunch w/Raisins with Fiber One because of its relatively high value of Fat and Calories compared to the other high complex carbohydrate cereals. For instance Shredded Wheat has the same identical values for Fiber and Complex Carbos so its combination is preferable.

6. Select the row corresponding to Almond Crunch with Raisins and choose Exclude under Rows. Shredded Wheat will appear in the Y by X Plots. We discover two in the lower right that are extra high in fiber, and four in the upper middle that are high in complex carbohydrates.

We see from the scatter plot that if we mixed Fiber One half and half with each of the labeled high complex carbohydrate cereals we would get 4 new cereals each greater in fiber and lower in calories than Shredded Wheat 'n'Bran alone.

Our contention is that if we, as consumers, can come up with cereals as good or better than existing cereals then the cereal manufacturers should be able to concoct (Shredded Wheat 'n'Bran is a good example) better cereals to meet customer demands. The new concocted cereals are: Shredded Wheat 'n'Bran (50%) + Fiber One (50%) name=SWnB+FO Grape-Nuts (50%) + Fiber One (50%) name=GN+FO

Shredded Wheat 2B (50%) + Fiber One (50%) name=SW2B+FO Shredded Wheat sp size (50%) + Fiber One (50%) name=SWSS+FO 7. Using the Cereal98 data table, choose Rows:Add Rows.

8. In the Dialog box type 4 to add 4 rows.

9. Type in our concocted cereal names and fill in the appropriate columns with the correct mean values for each \"new\" cereal.

10. Select Save As under File and save it as Cereal98plus4.jmp. 11. Select Analyze:Fit Y by X with X being Fiber and Y being Complex Carbos. Highlight and label the same ones as before as well as our four \"new\" cereals.

To show just how good these made up cereals are, consider SW2B+FO vs. Shredded Wheat 'n'Bran. If we go ahead and subset just these two we find that the \"made-up\" cereal exceeds Wheat 'n'Bran in several key categories:

• • • •

almost half the calories

higher fiber lower fat lower sugar

V. Conclusion

Even without fruit on the top, it is possible to find cereals off the shelf that are high in both fiber, potassium and complex carbohydrates, and low in calories, fat, sodium and calories from fat.

They can be enhanced by mixing them with other complementary cereals. Have a wonderful breakfast!

因篇幅问题不能全部显示,请点此查看更多更全内容

Top