Migrating a Credit Risk Model from SAS to Python
Project Overview
Project Name: SAS Credit Risk Model Migration
Languages: SAS → Python
Description: This guide demonstrates how to use Artemis Planning Agent to migrate a credit risk model from SAS to modern Python. The model uses real-world credit data and logistic regression for credit default prediction. This tutorial shows how to maintain model accuracy while improving code maintainability. Finally, at we use the Artemis Intelligence Optimiser to generate new features and boost model performance.
Repository: turintech/sas (feat/use-kaggle-data branch)
Commit: 6e4da96990ee9822bd2e8aaf39339202c1d53e28
Where is SAS Used?
SAS has been an enterprise standard for analytics for over 40 years, powering business-critical systems across industries, but faces increasing competition from newer and more open tooling such as Python and R.
This tutorial focuses on financial services credit risk modeling, demonstrating how Artemis Planning Agent enables systematic, verifiable migration while maintaining model accuracy.
Starting Point: The SAS Model
The baseline SAS model is a production-ready credit risk scoring system for evaluating credit applicants.
Prediction Target
The model predicts the probability that a credit applicant will experience serious delinquency (90+ days past due or worse) within the next 2 years. This is a binary classification problem: default vs. no default.
Capabilities
- Complete pipeline: data loading, feature engineering, training, validation, scoring
- Risk scoring system: 300-850 FICO-like scale with risk grades (A-F)
- Lending recommendations: Approve/Review/Decline based on risk grade
- Comprehensive metrics: ROC curves, confusion matrices, decile analysis, calibration plots
- Production-ready outputs: Model coefficients, validation reports, Kaggle submission files
Model Architecture
- Algorithm: Logistic regression with stepwise variable selection (PROC LOGISTIC)
- Selection criteria: p-value thresholds of 0.05 for entry and stay
Data
- Source: Kaggle "Give Me Some Credit" competition dataset
- Size: 150,000 credit applicant records
- Split: 70% training (105,000) / 30% validation (45,000)
- Features: 10 original features + 40+ engineered features
- Core: Age, income, credit utilization, debt ratio
- Late payments: 30-59, 60-89, and 90+ day delinquencies
- Risk flags: High utilization, high debt, low income
- Engineered: Financial health score, log transformations, interaction features
Baseline Performance
- Validation AUC-ROC: 0.8564 (excellent discrimination)
- Validation Accuracy: 93.62%
- Validation Precision: 56.90%
- Validation Recall: 20.08%
- Validation F1-Score: 0.2968
- Gini Coefficient: 0.7128
- Population Stability Index (PSI): 0.0013 (excellent stability)
Example SAS Code:
/* Model Training with Stepwise Selection */
proc logistic data=work.model_features_train descending;
model default_flag =
credit_utilization_capped debt_ratio_capped
num_late_30_59 num_late_60_89 num_late_90_plus
age monthly_income log_monthly_income
flag_high_utilization flag_serious_delinquency
financial_health total_risk_flags
age_income_interaction
/ selection=stepwise slentry=0.05 slstay=0.05;
output out=work.scored predicted=probability;
store work.logit_model;
run;
/* Generate Risk Scores */
data work.risk_scores;
set work.scored;
credit_risk_score = round(600 + 250 * (1 - probability));
if credit_risk_score >= 750 then risk_grade = 'A';
else if credit_risk_score >= 700 then risk_grade = 'B';
/* ... additional risk grades ... */
run;
Step 1: Import and Create Migration Plan
Import your SAS repository into Artemis as a new project, then activate the Planning Agent.
The Planning Agent analyzes your SAS code and asks clarifying questions:
- Which Python libraries do you prefer? (pandas, scikit-learn, etc.)
- Should we match SAS output exactly or allow optimizations?
- What validation approach do you want?
- Are there specific business constraints?
After your input, the agent generates a structured migration plan with hierarchical tasks, success criteria, and dependencies.
Step 2: Execute Data Migration Tasks
The plan breaks migration into manageable tasks. First, translate SAS DATA steps to pandas operations.
Before (SAS):
data work.features;
set work.raw_data;
debt_ratio_category =
case
when debt_ratio < 0.3 then 'Low'
when debt_ratio < 0.6 then 'Medium'
else 'High'
end;
run;
After (Python):
import pandas as pd
def engineer_features(df):
"""Translate SAS feature engineering to pandas"""
df = df.copy()
# Debt ratio categorization
df['debt_ratio_category'] = pd.cut(
df['debt_ratio'],
bins=[0, 0.3, 0.6, float('inf')],
labels=['Low', 'Medium', 'High']
)
return df
features_df = engineer_features(raw_data)
Step 3: Migrate Model Training
Convert SAS PROC LOGISTIC to statsmodels Logit while maintaining identical model specifications.
Before (SAS):
proc logistic data=work.features;
model default_flag = debt_ratio income age;
output out=predictions p=predicted_prob;
run;
After (Python):
import statsmodels.api as sm
# Prepare features
X = features_df[['debt_ratio', 'income', 'age']]
X = sm.add_constant(X) # Add intercept
y = features_df['default_flag']
# Train model
model = sm.Logit(y, X)
results = model.fit()
# Generate predictions
predictions = results.predict(X)
Step 4: Validation and Verification
Artemis validates that the Python model produces equivalent results to the SAS original.
Validation checks include:
- Prediction equivalence (Python predictions match SAS within tolerance)
- Performance metrics match (ROC-AUC, accuracy, F1 score)
- Feature importance comparison
- Model coefficients alignment
Model Performance Comparison:
The migration achieves functional equivalence, with all metrics closely matching:
| Metric | SAS Baseline | Python Migration | Difference |
|---|---|---|---|
| AUC-ROC | 0.8564 | 0.8634 | ~ 0.006 |
| Accuracy | 0.9362 | 0.9350 | ~ 0.001 |
| Precision | 0.5690 | 0.5705 | < 0.001 |
| Recall | 0.2008 | 0.1938 | ~ 0.006 |
| F1-Score | 0.2968 | 0.2893 | ~ 0.006 |
| Gini Coefficient | 0.7128 | 0.7269 | ~ 0.01 |
This validates that the Python implementation produces equivalent results to the SAS original, maintaining model accuracy while improving code maintainability.
Results and Benefits
This migration delivers:
Technical Success:
- Functionally equivalent Python implementation
- Performance maintained or improved
- Clean, maintainable code following Python best practices
Business Impact:
- Improved maintainability with modern Python
- Integration-ready for ML infrastructure
- Team productivity gains (Python skills more widely available)
Key Capabilities Demonstrated:
- Verifiability: Automated validation at each step ensures correctness
- Observability: Real-time progress tracking and metrics throughout migration
- Optimization: Improvements beyond simple translation using Artemis Intelligence
Conclusion
Artemis Planning Agent transforms complex SAS-to-Python migrations into a systematic, verifiable process. By combining intelligent planning with automated validation and optimization, Artemis allows fast verifiable migration while ensuring model accuracy and improving code quality. Whether migrating individual models or entire model suites, Artemis makes legacy modernization efficient and reliable.