SAS Statistics and Operations Research News

Posted: 8/29/2018

View as a web page


August 29, 2018

SAS The Power to Know logo



It’s back-to-school time in North Carolina, although that seems backwards here at SAS headquarters, as we’ve recently ended another successful summer of internships for both undergraduate and graduate students. Interns showed off their work in poster presentations in an exhibition or individual seminars. We saw some very impressive results! And speaking of students, in case the offspring of SAS employees didn’t have enough schoolwork ahead of them, SAS also provided a two-week class for these children who are in high school or college to teach them SAS® programming skills. The two students I spoke to—one heading to UNC and the other to Stanford—thought it was great fun.

We saw continued interest in SAS® University Edition at the SAS exhibition area at the Joint Statistical Meetings in Vancouver a few weeks ago, where it seemed as if the Educational Practice staff were constantly surrounded by attendees. We had oversized demand for several of our tutorials this year.

One of my favorite sessions at JSM was on how to evolve the teaching of statistics and data science to make them relevant in today’s world. Panelists covered K–12, undergraduate, and graduate education, and they discussed experiences with pilot programs that centered on getting data and software into the hands of students as quickly as possible.

The SAS fall regional conferences, which are coming up soon, provide a more intimate opportunity to learn about SAS from SAS and to interact with the staff. See Talks and Tutorials below for the list of conferences and tutorials we are giving.

Here’s to back-to-schooling!

Senior R&D Director, Statistical Applications


Technical Papers



Causal Mediation Analysis with the CAUSALMED Procedure

Important policy and health care decisions often depend on understanding the direct and indirect (mediated) effects of a treatment on an outcome. For example, does a youth program directly reduce juvenile delinquent behavior, or does it indirectly reduce delinquent behavior by changing the moral and social values of teenagers? Or, for example, is a particular gene directly responsible for causing lung cancer, or does it have an indirect (mediated) effect through its influence on smoking behavior? Causal mediation analysis deals with the mechanisms of causal treatment effects, and it estimates direct and indirect effects. A treatment variable is assumed to have causal effects on an outcome variable through two pathways: a direct pathway and a mediated (indirect) pathway through a mediator variable. This paper introduces the CAUSALMED procedure, new in SAS/STAT® 14.3, for estimating various causal mediation effects from observational data in a counterfactual framework. The paper also defines these causal mediation and related effects in terms of counterfactual outcomes and describes the assumptions that are required for unbiased estimation. Examples illustrate the ideas behind causal mediation analysis and the applications of the CAUSALMED procedure.


Tips and Techniques for Using the Random-Number Generators in SAS

SAS® 9.4M5 introduces new random-number generators (RNGs) and new subroutines that enable you to initialize, rewind, and use multiple random-number streams. This paper describes the new RNGs and provides tips and techniques for using random numbers effectively and efficiently in SAS. Applications of these techniques include statistical sampling, data simulation, Monte Carlo estimation, and random numbers for parallel computation.



Managing the Expense of Hyperparameter Autotuning

Determining the best values of machine learning algorithm hyperparameters for a specific data set can be a difficult and computationally expensive challenge. The recently released AUTOTUNE statement and autotune action set in SAS® Visual Data Mining and Machine Learning software automatically tune hyperparameters of modeling algorithms by using a parallel local search optimization framework to ease the challenges and expense of hyperparameter optimization. This implementation allows multiple hyperparameter configurations to be evaluated concurrently, even when data and model training must be distributed across computing resources because of the size of the data set.

With the ability to both distribute the training process and parallelize the tuning process, one challenge then becomes how to allocate the computing resources for the most efficient autotuning process. The best number of worker nodes for training a single model might not lead to the best resource usage for autotuning. To further reduce autotuning expense, early stopping of long-running hyperparameter configurations that have stagnated can free up resources for additional configurations. For big data, when the model training process is especially expensive, subsampling the data for training and validation can also reduce the tuning expense. This paper discusses the trade-offs that are associated with each of these performance-enhancing measures and demonstrates tuning results and efficiency gains for each.


Optimization Modeling with Python and SAS® Viya®

Python has become a popular programming language for both data analytics and mathematical optimization. With SAS Viya and its Python interface, Python programmers can use the state-of-the-art optimization solvers that SAS provides. This paper demonstrates an approach for Python programmers to naturally model their optimization problems, solve them by using SAS® Optimization solver actions, and view and interact with the results. The common tools for using the optimization solvers in SAS for these purposes are the OPTMODEL and IML procedures, but programmers more familiar with Python might find this alternative approach easier to grasp.



Regime-Switching Models: Capturing Structural Changes in Time Series

Stock market conditions, government policy changes, or even weather patterns can be regarded as stochastic processes that are driven by unobserved regimes. A powerful tool to explore these behavioral patterns is the regime-switching model (RSM) that is offered in the HMM procedure and the associated action in SAS® Econometrics software. This model, which is widely used in finance, economics, science, and engineering, has two characteristics: it allows different parameter values for different regimes, and it models the transition probabilities between regimes. These characteristics enable it to fully capture the structural changes in the time series. This paper uses two examples to illustrate how you can use RSMs to better understand the regime patterns in your data and improve your economic analysis. The first example demonstrates how regime-switching autoregression (RS-AR) models help you characterize the volatility and dynamics of stock returns. The second example examines the relationship and movement between the Japanese yen and the Thai baht by using regime-switching regression (RS-REG) models.


Technical Highlights



Video camera icon

Video Prize  

Filmmakers Brett Wujek and Patrick Koch, also known as Principal Data Scientist Wujek and Principal Machine Learning Developer Koch, were awarded the Audience Appreciation Award for Best Promotional Video based on their autotuning paper for the KDD conference in London this past week. This is the machine learning and data mining conference, with over 3,500 attendees. For some fun, watch their short piece on the need for autotuning.  

And you thought YouTube was only for learning how to put your kid’s bike together!


Blog Icon

Graphically Speaking

Distinguished Research Statistician Developer Warren Kuhfeld discusses the SGPLOT procedure, BY groups, and SG annotation.



Video camera icon

Lights, Camera, Action!

SAS Optimization on SAS Viya Using PROC OPTMODEL
By Ed Hughes

Introducing the CAUSALMED Procedure for Causal Mediation Analysis
By Yiu-Fai Yung


Blog Icon

The DO Loop on Steroids

Sometimes you just have to have a megadose of the DO Loop. Distinguished Research Statistician Developer Rick Wicklin teaches you about:

Using a grid search to find initial parameter values for regression models in SAS

Color cells in a mosaic plot by deviation from independence

Which variables are in the final selected model?

How to score and graph a quantile regression model in SAS



Tech Support Points Out


Tech Support Points Out icon

Usage Note 60335: Choice of Continuous Response Distribution in Log-Linked GLMs

There are several response distributions available when you are fitting generalized linear models (GLMs) in procedures such as PROC GENMOD and PROC GLIMMIX. The distributions include the normal, Poisson, gamma, inverse Gaussian, Tweedie, and others. One important aspect of how these distributions differ is in the relationship between the mean and the variance. When it comes to selecting a distribution to use when modeling a response, choosing a distribution that matches the observed mean-variance relationship is important. This note discusses and illustrates some tools for finding a suitable response distribution if it is not already known, including graphical and analytical methods.



Talks and Tutorial



WUSS 2018

Date: Sept. 5–7, 2018

Location: Sacramento, CA

Introduction to Mixed Models — Gordon Brown




MWSUG 2018

Date: Sept. 30 – Oct. 2, 2018

Location: Indianapolis, IN

Advanced ODS Graphics Examples — Warren Kuhfeld



SESUG 2018

Date:  Oct. 14–17, 2018

Location: St. Pete Beach, FL

Modeling Longitudinal Categorical Response Data — Maura Stokes




SCSUG 2018

Date: Nov. 4–7, 2018

Location: Austin, TX

Causal Analysis with Observational Data: Methods and Applications


Quick Links




Statistics & Operations Research Home
Bayesian Resources
SAS Analytics: 14.3, 14.2, 14.1, and 13.2
FASTats: Frequently Asked-For Statistics



SAS/STAT Procedures A-Z
Analytical SAS Software Video Portal
SAS/STAT Example Programs
SAS/ETS Example Programs


SAS® Statistics and Operations Research News








You Tube


SAS Blog


COMMUNITIES: SAS Statistical Procedures | SAS/IML | SAS/OR | subscribeunsubscribe