SAS Statistics and Operations Research News

Posted: 6/27/2018

June 27, 2018

Featured news from SAS.

 

 

SAS The Power to Know logo

banner-statistics-operations-news-text

 

Greetings from a steamy week in North Carolina. Hopefully the weather is more comfortable where you are!

A lot of people are making summer plans, so if yours include a trip to Vancouver for the Joint Statistical Meetings at the end of July, please note that SAS statisticians are teaching various courses at the event. It’s easy to add them to your registration. We will also be in the exhibition area, and we look forward to talking to customers about our statistical and econometrics software as well as our latest machine learning software with SAS® Viya®.

In this edition of the newsletter, we provide a little of everything: additional syntax for SAS/IML®, applications of the OPTMODEL procedure in SAS/OR® software, tips on using the GLIMMIX procedure in SAS/STAT® software for modeling categorical outcomes with random effects, and much more.

Here’s to a great summer.

Maura Stokes

Senior R&D Director, Statistical Applications

 

Technical Papers

 

 

Using SAS/OR to Optimize Scheduling and Routing of Service Vehicles

An oil company has a set of wells and a set of well operators. Each well has an established amount of time required for servicing. For a given planning horizon, the company wants to determine which operator should perform service on which wells, on which days, and in which order, with the goal of minimizing service time plus travel time. A frequency constraint for each well restricts the number of days between visits. The solution approach that is presented in this paper uses several features in the OPTMODEL procedure in SAS/OR software. A simple idea and a small change in the code reduced the run time from one hour to one minute.

 

Insights into Using the GLIMMIX Procedure to Model Categorical Outcomes with Random Effects

Modeling categorical outcomes with random effects is a major use of the GLIMMIX procedure. Building, evaluating, and using the resulting model for inference, prediction, or both require many considerations. This paper, written for experienced users of SAS® statistical procedures, illustrates the nuances of the process with two examples: modeling a binary response using random effects and correlated errors and modeling a multinomial response with random effects. In addition, the paper provides answers to common questions that are received by SAS Technical Support concerning these analyses with PROC GLIMMIX. These questions cover working with events/trials data, handling bias issues in a logistic model, and overcoming convergence problems.

 

 

Time Series Feature Extraction

Feature extraction is the practice of enhancing machine learning by finding characteristics in the data that help solve a particular problem. For time series data, feature extraction can be performed using various time series analysis and decomposition techniques. In addition, features can be obtained by sequence comparison techniques such as dynamic time warping and by subsequence discovery techniques such as motif analysis. This paper surveys some of the time series feature extraction methods and demonstrates them through examples that use SAS/ETS® and SAS® Visual Forecasting software.

 

Interpreting Black-Box Machine Learning Models Using Partial Dependence and Individual Conditional Expectation Plots

One of the key questions a data scientist asks when interpreting a predictive model is “How do the model inputs work?” Variable importance rankings are helpful for identifying the strongest drivers, but these rankings provide no insight into the functional relationship between the drivers and the model’s predictions. Partial dependence (PD) and individual conditional expectation (ICE) plots are visual, model-agnostic techniques that depict the functional relationships between one or more input variables and the predictions of a black-box model. ICE plots enable data scientists to drill much deeper to explore individual differences and identify subgroups and interactions between model inputs. This paper shows how PD and ICE plots can be used to gain insight from and compare machine learning models, particularly so-called black-box algorithms such as random forest, neural network, and gradient boosting. It also discusses limitations of PD plots and offers recommendations about how to generate scalable plots for big data.

 

 

Technical Highlights

 

Blue blog icon

The DO Loop

Distinguished Research Statistician Developer Rick Wicklin discusses how to order variables in a heat map or scatter plot matrix as well as how to produce calibration plots in SAS. In addition, find out how to create a butterfly plot, which is a useful way to display the distribution of a variable for two subpopulations.

 

Analysis of Clinical Trials Using SAS: A Practical Guide, Second Edition

This book provides useful illustrations of applying modern statistical methodology to clinical trials data. Commonly used methods are covered, including dose-escalation and dose-finding methods that are applied in Phase I and Phase II clinical trials, as well as important trial designs and analysis strategies that are employed in Phase II and Phase III clinical trials, such as multiplicity adjustment, data monitoring, and methods for handling incomplete data. This book also features recommendations from clinical trial experts and a discussion of relevant regulatory guidelines. SAS/STAT procedures that are applied include the SEQDESIGN and SEQTEST procedures as well as the GLIMMIX and GEE procedures for analyzing repeated measurements, among others. Macros are also provided for implementing randomization-based methods, performing complex multiplicity adjustments, and investigating the design and analysis of early-phase trials.

 

Video Camera - Icon

Lights, Camera, Action!

A New Syntax for Lists in SAS/IML
By Rick Wicklin

How to Use the New Random Number Generators in SAS
By Rick Wicklin and Warren Sarle

 

Tech Support Points Out

 

Tech Support Points Out icon

Estimate and Test Differences, Ratios, or Contrasts of Means in Generalized Linear Models

Differences among the levels of a categorical predictor in a generalized linear model (GLM) can be estimated and tested using the SLICE and LSMESTIMATE statements, the DIFF option in the LSMEANS statement, or the more general ESTIMATE statement. These statements are available in several SAS/STAT modeling procedures for GLMs, including PROC GENMOD, PROC GLIMMIX, PROC LOGISTIC, and others.

However, for GLMs that use a link function other than the identity link, these statements provide differences on the link scale rather than on the mean scale. For example, in a logistic model the differences from these statements are differences of log odds, not probabilities. Comparisons on the mean scale can be obtained using the NLMeans macro. All pairwise differences, sequential differences, differences with a control, and more general contrasts of means are available. You can also estimate and test ratios of pairs of means. This note illustrates how to use the NLMeans and NLEstimate macros to perform the estimation and testing.

 

Talks and Tutorials

 

JSM 2018

The following courses and tutorials are being taught at JSM this summer by SAS statistical developers:

Advanced Methods for Survival Analysis Using SAS

Generalized Additive Modeling Using SAS

Propensity Score Analysis and Causal Effect Estimation Using SAS

Survey Data Imputation and Analysis Using SAS

Practical Hierarchical Bayesian Modeling

 

 

Quick Links

 

 

Statistics & Operations Research Home
Bayesian Resources
SAS Analytics: 14.3, 14.2, 14.1, and 13.2
FASTats: Frequently Asked-For Statistics

 

 

SAS/STAT Procedures A-Z
Analytical SAS Software Video Portal
SAS/STAT Example Programs
SAS/ETS Example Programs