- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
Grapevine Today
By the People, for the People
Boosting Rare Event Data with SMOTE in SAS Data Maker
Leveraging SAS Data Maker's synthetic data generation to overcome imbalanced datasets
Apr. 1, 2026 at 5:07pm
Got story updates? Submit your updates here. ›
A SAS employee explores using the SMOTE (Synthetic Minority Oversampling Technique) method in SAS Data Maker to generate synthetic data and boost the representation of rare event cases in a dataset, which can help improve the training of predictive models on imbalanced data.
Why it matters
Imbalanced datasets, where the target variable has a disproportionately small number of positive cases compared to negative cases, can pose challenges for training accurate predictive models. Techniques like SMOTE that can intelligently generate synthetic data for the minority class can help address this issue and lead to better model performance.
The details
The author initially tried to use SMOTE in SAS Data Maker to oversample a rare binary target variable, but found that the synthetic data remained just as imbalanced as the original data. They realized this was because they had provided the full 40,000 observation dataset to SMOTE, rather than just the 500 positive cases. By focusing SMOTE only on the rare event cases, it was able to effectively generate 5,000 synthetic positive examples, which could then be combined with the original 40,000 cases to create a more balanced training dataset for predictive modeling.
- The author started playing with SAS Data Maker and exploring its synthetic data generation capabilities last Fall.
- The author needed to boost a rare event rate for a binary target variable in a 40,000 record dataset that only had 500 positive cases.
The players
SAS Data Maker
A SAS software tool that allows users to quickly and easily generate realistic synthetic data from an original dataset, using techniques like SMOTE to handle imbalanced classes.
Dan Obermiller
A friend of the author who provided the insight that led to the realization that SMOTE should only be applied to the minority class cases, rather than the full dataset.
What they’re saying
“Don't put 40,000 training cases in if you want to oversample from only the 500 event cases. Just put in the event cases.”
— Dan Obermiller
What’s next
The author plans to attend a hands-on workshop on using SAS Data Maker, including the SMOTE technique, at the SAS Innovate conference in Grapevine, Texas on April 29.
The takeaway
Leveraging synthetic data generation techniques like SMOTE in SAS Data Maker can be a powerful way to address imbalanced datasets and improve the training of predictive models, as long as the minority class cases are specifically targeted rather than the full dataset.

