
Home > News > SAS Statistics and Operations Research News
SAS Statistics and Operations Research News
June 27, 2018
Featured news from SAS.



Greetings from a steamy week in North Carolina. Hopefully the weather is more comfortable where you are!
A lot of people are making summer plans, so if yours include a trip to Vancouver for the Joint Statistical Meetings at the end of July, please note that SAS statisticians are teaching various courses at the event. It’s easy to add them to your registration. We will also be in the exhibition area, and we look forward to talking to customers about our statistical and econometrics software as well as our latest machine learning software with SAS^{®} Viya^{®}.
In this edition of the newsletter, we provide a little of everything: additional syntax for SAS/IML^{®}, applications of the OPTMODEL procedure in SAS/OR^{®} software, tips on using the GLIMMIX procedure in SAS/STAT^{®} software for modeling categorical outcomes with random effects, and much more.
Here’s to a great summer.
Maura Stokes
Senior R&D Director, Statistical Applications



An oil company has a set of wells and a set of well operators. Each well has an established amount of time required for servicing. For a given planning horizon, the company wants to determine which operator should perform service on which wells, on which days, and in which order, with the goal of minimizing service time plus travel time. A frequency constraint for each well restricts the number of days between visits. The solution approach that is presented in this paper uses several features in the OPTMODEL procedure in SAS/OR software. A simple idea and a small change in the code reduced the run time from one hour to one minute.


Modeling categorical outcomes with random effects is a major use of the GLIMMIX procedure. Building, evaluating, and using the resulting model for inference, prediction, or both require many considerations. This paper, written for experienced users of SAS® statistical procedures, illustrates the nuances of the process with two examples: modeling a binary response using random effects and correlated errors and modeling a multinomial response with random effects. In addition, the paper provides answers to common questions that are received by SAS Technical Support concerning these analyses with PROC GLIMMIX. These questions cover working with events/trials data, handling bias issues in a logistic model, and overcoming convergence problems.


Feature extraction is the practice of enhancing machine learning by finding characteristics in the data that help solve a particular problem. For time series data, feature extraction can be performed using various time series analysis and decomposition techniques. In addition, features can be obtained by sequence comparison techniques such as dynamic time warping and by subsequence discovery techniques such as motif analysis. This paper surveys some of the time series feature extraction methods and demonstrates them through examples that use SAS/ETS® and SAS® Visual Forecasting software.


One of the key questions a data scientist asks when interpreting a predictive model is “How do the model inputs work?” Variable importance rankings are helpful for identifying the strongest drivers, but these rankings provide no insight into the functional relationship between the drivers and the model’s predictions. Partial dependence (PD) and individual conditional expectation (ICE) plots are visual, modelagnostic techniques that depict the functional relationships between one or more input variables and the predictions of a blackbox model. ICE plots enable data scientists to drill much deeper to explore individual differences and identify subgroups and interactions between model inputs. This paper shows how PD and ICE plots can be used to gain insight from and compare machine learning models, particularly socalled blackbox algorithms such as random forest, neural network, and gradient boosting. It also discusses limitations of PD plots and offers recommendations about how to generate scalable plots for big data.




Distinguished Research Statistician Developer Rick Wicklin discusses how to order variables in a heat map or scatter plot matrix as well as how to produce calibration plots in SAS. In addition, find out how to create a butterfly plot, which is a useful way to display the distribution of a variable for two subpopulations.


This book provides useful illustrations of applying modern statistical methodology to clinical trials data. Commonly used methods are covered, including doseescalation and dosefinding methods that are applied in Phase I and Phase II clinical trials, as well as important trial designs and analysis strategies that are employed in Phase II and Phase III clinical trials, such as multiplicity adjustment, data monitoring, and methods for handling incomplete data. This book also features recommendations from clinical trial experts and a discussion of relevant regulatory guidelines. SAS/STAT procedures that are applied include the SEQDESIGN and SEQTEST procedures as well as the GLIMMIX and GEE procedures for analyzing repeated measurements, among others. Macros are also provided for implementing randomizationbased methods, performing complex multiplicity adjustments, and investigating the design and analysis of earlyphase trials.





Differences among the levels of a categorical predictor in a generalized linear model (GLM) can be estimated and tested using the SLICE and LSMESTIMATE statements, the DIFF option in the LSMEANS statement, or the more general ESTIMATE statement. These statements are available in several SAS/STAT modeling procedures for GLMs, including PROC GENMOD, PROC GLIMMIX, PROC LOGISTIC, and others.
However, for GLMs that use a link function other than the identity link, these statements provide differences on the link scale rather than on the mean scale. For example, in a logistic model the differences from these statements are differences of log odds, not probabilities. Comparisons on the mean scale can be obtained using the NLMeans macro. All pairwise differences, sequential differences, differences with a control, and more general contrasts of means are available. You can also estimate and test ratios of pairs of means. This note illustrates how to use the NLMeans and NLEstimate macros to perform the estimation and testing.


