Before starting any sort of analysis classify the data set as either continuous or attribute, and in some cases it is a blend of both types. Continuous data is described as variables that can be measured on a continuous scale like time, temperature, strength, or monetary value. A test is to divide the worth in two and discover if it still makes sense.
Attribute, or discrete, data can be associated with defined grouping then counted. Examples are classifications of positive and negative, location, vendors’ materials, product or process types, and scales of satisfaction such as poor, fair, good, and ideal. Once a specific thing is classified it can be counted and the frequency of occurrence can be determined.
Another determination to make is whether or not the information is 统计代写. Output variables are often known as the CTQs (critical to quality characteristics) or performance measures. Input variables are what drive the resultant outcomes. We generally characterize an item, process, or service delivery outcome (the Y) by some purpose of the input variables X1,X2,X3,… Xn. The Y’s are driven from the X’s.
The Y outcomes can be either continuous or discrete data. Examples of continuous Y’s are cycle time, cost, and productivity. Samples of discrete Y’s are delivery performance (late or punctually), invoice accuracy (accurate, not accurate), and application errors (wrong address, misspelled name, missing age, etc.).
The X inputs can also be either continuous or discrete. Examples of continuous X’s are temperature, pressure, speed, and volume. Examples of discrete X’s are process (intake, examination, treatment, and discharge), product type (A, B, C, and D), and vendor material (A, B, C, and D).
Another set of X inputs to always consider are definitely the stratification factors. They are variables that may influence the item, process, or service delivery performance and must not be overlooked. Whenever we capture these details during data collection we can study it to find out when it makes a difference or not. Examples are duration of day, day of the week, month of the year, season, location, region, or shift.
Given that the inputs can be sorted through the outputs and the data can be classified as either continuous or discrete the selection of the statistical tool to use boils down to answering the question, “What is it that we would like to know?” This is a summary of common questions and we’ll address each one of these separately.
What is the baseline performance? Did the adjustments created to the procedure, product, or service delivery make a difference? Are there any relationships involving the multiple input X’s and also the output Y’s? If you can find relationships do they really make a significant difference? That’s enough questions to be statistically dangerous so let’s start with tackling them one-by-one.
Precisely what is baseline performance? Continuous Data – Plot the information in a time based sequence using an X-MR (individuals and moving range control charts) or subgroup the data utilizing an Xbar-R (averages and range control charts). The centerline of the chart provides an estimate from the average of the data overtime, thus establishing the baseline. The MR or R charts provide estimates from the variation as time passes and establish the top and lower 3 standard deviation control limits for that X or Xbar charts. Develop a Histogram in the data to look at a graphic representation in the distribution in the data, test it for normality (p-value needs to be much more than .05), and compare it to specifications to assess capability.
Minitab Statistical Software Tools are Variables Control Charts, Histograms, Graphical Summary, Normality Test, and Capability Study between and within.
Discrete Data. Plot the data in a time based sequence utilizing a P Chart (percent defective chart), C Chart (count of defects chart), nP Chart (Sample n times percent defective chart), or even a U Chart (defectives per unit chart). The centerline supplies the baseline average performance. The top and lower control limits estimate 3 standard deviations of performance above and underneath the average, which accounts for 99.73% of expected activity with time. You will have an estimate of the worst and best case scenarios before any improvements are administered. Produce a Pareto Chart to see a distribution from the categories as well as their frequencies of occurrence. In the event the control charts exhibit only normal natural patterns of variation as time passes (only common cause variation, no special causes) the centerline, or average value, establishes the capacity.
Minitab Statistical Software Tools are Attributes Control Charts and Pareto Analysis. Did the adjustments designed to the procedure, product, or service delivery change lives?
Discrete X – Continuous Y – To check if two group averages (5W-30 vs. Synthetic Oil) impact gas mileage, make use of a T-Test. If there are potential environmental concerns that may influence the exam results make use of a Paired T-Test. Plot the final results on a Boxplot and measure the T statistics using the p-values to make a decision (p-values lower than or similar to .05 signify which a difference exists with at the very least a 95% confidence that it is true). If you have a change select the group using the best overall average to meet the objective.
To evaluate if several group averages (5W-30, 5W-40, 10W-30, 10W-40, or Synthetic) impact fuel useage use ANOVA (analysis of variance). Randomize the order from the testing to lower any time dependent environmental influences on the test results. Plot the results on a Boxplot or Histogram and evaluate the F statistics using the p-values to create a decision (p-values under or similar to .05 signify that the difference exists with at least a 95% confidence that it is true). If you have a change pick the group with all the best overall average to satisfy the aim.
In either of the aforementioned cases to check to determine if there is a difference within the variation brought on by the inputs since they impact the output utilize a Test for Equal Variances (homogeneity of variance). Make use of the p-values to create a decision (p-values lower than or similar to .05 signify that a difference exists with a minimum of a 95% confidence that it is true). If you have a change choose the group with all the lowest standard deviation.
Minitab Statistical Software Tools are 2 Sample T-Test, Paired T-Test, ANOVA, and Test for Equal Variances, Boxplot, Histogram, and Graphical Summary. Continuous X – Continuous Y – Plot the input X versus the output Y using a Scatter Plot or maybe you can find multiple input X variables make use of a Matrix Plot. The plot supplies a graphical representation of the relationship between the variables. If it would appear that a relationship may exist, between one or more from the X input variables as well as the output Y variable, conduct a Linear Regression of merely one input X versus one output Y. Repeat as required for each X – Y relationship.
The Linear Regression Model offers an R2 statistic, an F statistic, and also the p-value. To become significant to get a single X-Y relationship the R2 needs to be greater than .36 (36% of the variation within the output Y is explained through the observed alterations in the input X), the F ought to be much more than 1, and also the p-value should be .05 or less.
Minitab Statistical Software Tools are Scatter Plot, Matrix Plot, and Fitted Line Plot.
Discrete X – Discrete Y – In this type of analysis categories, or groups, are in comparison to other categories, or groups. For example, “Which cruise line had the greatest client satisfaction?” The discrete X variables are (RCI, Carnival, and Princess Cruise Companies). The discrete Y variables are the frequency of responses from passengers on their own satisfaction surveys by category (poor, fair, good, very good, and excellent) that relate to their vacation experience.
Conduct a cross tab table analysis, or Chi Square analysis, to evaluate if there have been variations in degrees of satisfaction by passengers based upon the cruise line they vacationed on. Percentages can be used for the evaluation and also the Chi Square analysis provides a p-value to advance quantify whether the differences are significant. The general p-value associated with the Chi Square analysis needs to be .05 or less. The variables that have the largest contribution towards the Chi Square statistic drive the observed differences.
Minitab Statistical Software Tools are Table Analysis, Matrix Analysis, and Chi Square Analysis.
Continuous X – Discrete Y – Does the fee per gallon of fuel influence consumer satisfaction? The continuous X will be the cost per gallon of fuel. The discrete Y is definitely the consumer satisfaction rating (unhappy, indifferent, or happy). Plot the info using Dot Plots stratified on Y. The statistical method is a Logistic Regression. Once more the p-values are used to validate that the significant difference either exists, or it doesn’t. P-values that are .05 or less mean that people have a minimum of a 95% confidence that the significant difference exists. Utilize the most frequently occurring ratings to help make your determination.
Minitab Statistical Software Tools are Dot Plots stratified on Y and Logistic Regression Analysis. Are there relationships involving the multiple input X’s as well as the output Y’s? If you will find relationships will they change lives?
Continuous X – Continuous Y – The graphical analysis is really a Matrix Scatter Plot where multiple input X’s can be evaluated from the output Y characteristic. The statistical analysis strategy is multiple regression. Evaluate the scatter plots to search for relationships between the X input variables and also the output Y. Also, try to find multicolinearity where one input X variable is correlated with another input X variable. This really is analogous to double dipping therefore we identify those conflicting inputs and systematically eliminate them from the model.
Multiple regression is actually a powerful tool, but requires proceeding with caution. Run the model with all of variables included then review the T statistics and F statistics to identify the first set of insignificant variables to remove from your model. Through the second iteration from the regression model turn on the variance inflation factors, or VIFs, which are utilized to quantify potential multicolinearity issues 5 to 10 are issues). Assess the Matrix Plot to identify X’s associated with other X’s. Eliminate the variables with the high VIFs and also the largest p-values, but ihtujy remove one of the related X variables inside a questionable pair. Evaluate the remaining p-values and remove variables with large p-values through the model. Don’t be surprised if this process requires a few more iterations.
If the multiple regression model is finalized all VIFs will be lower than 5 and all of p-values will likely be less than .05. The R2 value should be 90% or greater. It is a significant model and the regression equation can certainly be used for making predictions as long while we keep the input variables within the min and max range values that have been utilized to make the model.
Minitab Statistical Software Tools are Regression Analysis, Step Wise Regression Analysis, Scatter Plots, Matrix Plots, Fitted Line Plots, Graphical Summary, and Histograms.
Discrete X and Continuous X – Continuous Y
This case requires using designed experiments. Discrete and continuous X’s can be used as the input variables, however the settings for them are predetermined in the appearance of the experiment. The analysis technique is ANOVA which had been earlier mentioned.
Is an illustration. The objective is always to reduce the quantity of unpopped kernels of popping corn in a bag of popped pop corn (the output Y). Discrete X’s may be the brand of popping corn, kind of oil, and form of the popping vessel. Continuous X’s may be level of oil, quantity of popping corn, cooking time, and cooking temperature. Specific settings for all the input X’s are selected and included in the statistical experiment.