Skip to main content

Migrating a Credit Risk Model from SAS to Python

Project Overview

Project Name: SAS Credit Risk Model Migration

Languages: SAS → Python

Description: This guide demonstrates how to use Artemis Planning Agent to migrate a credit risk model from SAS to modern Python. The model uses real-world credit data and logistic regression for credit default prediction. This tutorial shows how to maintain model accuracy while improving code maintainability. Finally, at we use the Artemis Intelligence Optimiser to generate new features and boost model performance.

Repository: turintech/sas (feat/use-kaggle-data branch)

Commit: 6e4da96990ee9822bd2e8aaf39339202c1d53e28

Where is SAS Used?

SAS has been an enterprise standard for analytics for over 40 years, powering business-critical systems across industries, but faces increasing competition from newer and more open tooling such as Python and R.

This tutorial focuses on financial services credit risk modeling, demonstrating how Artemis Planning Agent enables systematic, verifiable migration while maintaining model accuracy.

Starting Point: The SAS Model

The baseline SAS model is a production-ready credit risk scoring system for evaluating credit applicants.

Prediction Target

The model predicts the probability that a credit applicant will experience serious delinquency (90+ days past due or worse) within the next 2 years. This is a binary classification problem: default vs. no default.

Capabilities

  • Complete pipeline: data loading, feature engineering, training, validation, scoring
  • Risk scoring system: 300-850 FICO-like scale with risk grades (A-F)
  • Lending recommendations: Approve/Review/Decline based on risk grade
  • Comprehensive metrics: ROC curves, confusion matrices, decile analysis, calibration plots
  • Production-ready outputs: Model coefficients, validation reports, Kaggle submission files

Model Architecture

  • Algorithm: Logistic regression with stepwise variable selection (PROC LOGISTIC)
  • Selection criteria: p-value thresholds of 0.05 for entry and stay

Data

  • Source: Kaggle "Give Me Some Credit" competition dataset
  • Size: 150,000 credit applicant records
  • Split: 70% training (105,000) / 30% validation (45,000)
  • Features: 10 original features + 40+ engineered features
    • Core: Age, income, credit utilization, debt ratio
    • Late payments: 30-59, 60-89, and 90+ day delinquencies
    • Risk flags: High utilization, high debt, low income
    • Engineered: Financial health score, log transformations, interaction features

Baseline Performance

  • Validation AUC-ROC: 0.8564 (excellent discrimination)
  • Validation Accuracy: 93.62%
  • Validation Precision: 56.90%
  • Validation Recall: 20.08%
  • Validation F1-Score: 0.2968
  • Gini Coefficient: 0.7128
  • Population Stability Index (PSI): 0.0013 (excellent stability)

Example SAS Code:

/* Model Training with Stepwise Selection */
proc logistic data=work.model_features_train descending;
model default_flag =
credit_utilization_capped debt_ratio_capped
num_late_30_59 num_late_60_89 num_late_90_plus
age monthly_income log_monthly_income
flag_high_utilization flag_serious_delinquency
financial_health total_risk_flags
age_income_interaction
/ selection=stepwise slentry=0.05 slstay=0.05;

output out=work.scored predicted=probability;
store work.logit_model;
run;

/* Generate Risk Scores */
data work.risk_scores;
set work.scored;
credit_risk_score = round(600 + 250 * (1 - probability));
if credit_risk_score >= 750 then risk_grade = 'A';
else if credit_risk_score >= 700 then risk_grade = 'B';
/* ... additional risk grades ... */
run;

Step 1: Import and Create Migration Plan

Import your SAS repository into Artemis as a new project, then activate the Planning Agent.

The Planning Agent analyzes your SAS code and asks clarifying questions:

  • Which Python libraries do you prefer? (pandas, scikit-learn, etc.)
  • Should we match SAS output exactly or allow optimizations?
  • What validation approach do you want?
  • Are there specific business constraints?

After your input, the agent generates a structured migration plan with hierarchical tasks, success criteria, and dependencies.

Step 2: Execute Data Migration Tasks

The plan breaks migration into manageable tasks. First, translate SAS DATA steps to pandas operations.

Before (SAS):

data work.features;
set work.raw_data;
debt_ratio_category =
case
when debt_ratio < 0.3 then 'Low'
when debt_ratio < 0.6 then 'Medium'
else 'High'
end;
run;

After (Python):

import pandas as pd

def engineer_features(df):
"""Translate SAS feature engineering to pandas"""
df = df.copy()

# Debt ratio categorization
df['debt_ratio_category'] = pd.cut(
df['debt_ratio'],
bins=[0, 0.3, 0.6, float('inf')],
labels=['Low', 'Medium', 'High']
)

return df

features_df = engineer_features(raw_data)

Step 3: Migrate Model Training

Convert SAS PROC LOGISTIC to statsmodels Logit while maintaining identical model specifications.

Before (SAS):

proc logistic data=work.features;
model default_flag = debt_ratio income age;
output out=predictions p=predicted_prob;
run;

After (Python):

import statsmodels.api as sm

# Prepare features
X = features_df[['debt_ratio', 'income', 'age']]
X = sm.add_constant(X) # Add intercept
y = features_df['default_flag']

# Train model
model = sm.Logit(y, X)
results = model.fit()

# Generate predictions
predictions = results.predict(X)

Step 4: Validation and Verification

Artemis validates that the Python model produces equivalent results to the SAS original.

Validation checks include:

  • Prediction equivalence (Python predictions match SAS within tolerance)
  • Performance metrics match (ROC-AUC, accuracy, F1 score)
  • Feature importance comparison
  • Model coefficients alignment

Model Performance Comparison:

The migration achieves functional equivalence, with all metrics closely matching:

MetricSAS BaselinePython MigrationDifference
AUC-ROC0.85640.8634~ 0.006
Accuracy0.93620.9350~ 0.001
Precision0.56900.5705< 0.001
Recall0.20080.1938~ 0.006
F1-Score0.29680.2893~ 0.006
Gini Coefficient0.71280.7269~ 0.01

This validates that the Python implementation produces equivalent results to the SAS original, maintaining model accuracy while improving code maintainability.

Results and Benefits

This migration delivers:

Technical Success:

  • Functionally equivalent Python implementation
  • Performance maintained or improved
  • Clean, maintainable code following Python best practices

Business Impact:

  • Improved maintainability with modern Python
  • Integration-ready for ML infrastructure
  • Team productivity gains (Python skills more widely available)

Key Capabilities Demonstrated:

  • Verifiability: Automated validation at each step ensures correctness
  • Observability: Real-time progress tracking and metrics throughout migration
  • Optimization: Improvements beyond simple translation using Artemis Intelligence

Conclusion

Artemis Planning Agent transforms complex SAS-to-Python migrations into a systematic, verifiable process. By combining intelligent planning with automated validation and optimization, Artemis allows fast verifiable migration while ensuring model accuracy and improving code quality. Whether migrating individual models or entire model suites, Artemis makes legacy modernization efficient and reliable.