Systat Software Newsletter
SigmaPlot Data Smoothing
SigmaPlot provides six different data smoothing algorithms that should satisfy most smoothing needs – negative exponential, loess, running average, running median, bisquare, inverse square and inverse distance. Each smoother contains options that make them very flexible. For example, unequally spaced data that occurs in clumps is better analyzed using the nearest neighbor rather than a fixed bandwidth method. Also, outlier rejection is available in some smoothers.
Smoothing is used to elicit trends from noisy data. The three examples in Tukey’s book “Exploratory Data Analysis” (Addison-Wesley, 1977) show the need for smoothing beautifully. The trends in the U.S. gold production from 1872 to 1956, Figure 1A, are fairly clear.
Figure 1. Data with trends that are increasingly more difficult to visualize
Loess smoothed curves for the three examples in Figure 1 are shown in Figure 2. The smoothed curves in Figure 2A and 2B make the trends in the gold and wheat data very clear. It is still difficult to visualize in the raw data the precipitation trend shown in Figure 2C. To confirm the results of the loess smoothed curve the histogram of average rainfall in ten year intervals was computed and superimposed on the smooth curve. There is a good comparison between the histogram and the loess smooth.
The loess smoothing parameters were varied to achieve the best visualization. A polynomial degree of one was used in all cases. A 0.1 sampling proportion was used in Figure 2A and B and 0.3 in Figure 2C. Since the data was unequally spaced along the x axis the nearest neighbor bandwidth method was used. The default number of intervals (100) for generation of the smooth curve was found to be the best. This generates a line using straight lines between curve points. Sometimes this leads to sharp corners in the smooth so the spline interpolation line type (Smoothed (spline)) was used.
Figure 2. Smoothed curves for data in Figure 1. A ten year average rainfall histogram is also shown
Several of the smoothing methods, including loess, are based on local polynomial regression and the polynomial order is selectable. Increasing the order tends to include more high frequency components in the smooth. The effect of increasing the order from 1 (local linear regressions) to 2 (local quadratic regressions) is shown in Figure 3. The effect is to increase peak height magnitude and introduce additional high frequency components (wiggles) in B. A subsequent increase of the sampling proportion in C results in a smooth very much like the original for order 1 in A.
Figure 3. Effect of increasing the regression polynomial order. The order is 1 and sampling proportion is 0.1 in A. The order is increased to 2 in B and then the sampling proportion is increased to 0.2 in C.
Three Dimensional Smoothing
This data is relatively sparse so a large sampling proportion 0.6 was required to avoid oscillations and spikes in the loess surface. A polynomial degree of 1 and the nearest neighbor bandwidth method were used. The Preview feature allows a quick comparison of smoothing methods on a given data set. For this data essentially equivalent smooth surfaces were obtained with the negative exponential and bisquare methods.
Smoothers is a generic name for a variety of techniques that can be used to either smooth a data set by removing undesired high-frequency components (locations of rapid variation, such as noise contamination), or to resample dependent variable values to other independent variable locations using the values of the data at nearby points. The smoothing methods provided in SigmaPlot operate by weighting the data in a neighborhood of the smoothing location and applying linear or non-linear methods to combine the weighted values to produce a smoothed value. These non-parametric smoothing techniques provide a good complement to the parameterized curve/surface fitting facility (Regression Wizard) in SigmaPlot. For data subjected to measurement errors, noise, etc., either method can be used to predict behavior or to estimate true values.
The kernel used in the smoothing computation and the smoothing method are given in the following table.
The equations used for each kernel are:
|Systat Software GmbH - Tel.: 0 2104 9540 - Email: firstname.lastname@example.org|