Migrating a Credit Risk Model from SAS to Python

Videos: See the Complete Migration Workflow

Original SAS Credit Risk Model - See the baseline SAS implementation
Import and Create Migration Plan - Import your SAS project into Artemis
Generate Migration Plan - Create a structured migration plan
Execute Migration Tasks - Translate SAS to Python step by step
Validation and Verification - Verify Python matches SAS output

Project Overview

Project Name: SAS Credit Risk Model Migration

Languages: SAS → Python

Description: This guide demonstrates how to use Artemis Planning Agent to migrate a credit risk model from SAS to modern Python. The model uses real-world credit data and logistic regression for credit default prediction. This tutorial shows how to maintain model accuracy while improving code maintainability. Finally, we use the Artemis Intelligence Optimiser to generate new features and boost model performance.

Repository: turintech/sas-migration

Branches:

Initial (SAS) - Original SAS credit risk model
Final (Python) - Migrated Python implementation

Outline

This guide covers the following steps:

Migration Workflow - Understand the systematic migration process
Where is SAS Used? - Understand the context and industries relying on SAS
Starting Point: The SAS Model - Review the baseline credit risk model
Step 1: Import and Create Migration Plan - Import your SAS repository and generate a migration plan
Step 2: Execute Data Migration Tasks - Translate SAS DATA steps to pandas operations
Step 3: Migrate Model Training - Convert PROC LOGISTIC to statsmodels
Step 4: Validation and Verification - Verify Python matches SAS output
Results and Benefits - Review technical and business outcomes

Migration Workflow

The migration follows Artemis's systematic workflow:

Import & Analyze - Import your SAS repository and let Artemis analyze the codebase
Generate Plan - Answer clarifying questions to generate a structured migration plan
Execute Tasks - Translate SAS code to Python step by step
Validate Results - Verify Python output matches SAS within tolerance

Where is SAS Used?

SAS has been an enterprise standard for analytics for over 40 years, powering business-critical systems across industries, but faces increasing competition from newer and more open tooling such as Python and R.

This tutorial focuses on financial services credit risk modeling, demonstrating how Artemis Planning Agent enables systematic, verifiable migration while maintaining model accuracy.

Starting Point: The SAS Model

The baseline SAS model is a production-ready credit risk scoring system for evaluating credit applicants.

Prediction Target

The model predicts the probability that a credit applicant will experience serious delinquency (90+ days past due or worse) within the next 2 years. This is a binary classification problem: default vs. no default.

Capabilities

Complete pipeline: data loading, feature engineering, training, validation, scoring
Risk scoring system: 300-850 FICO-like scale with risk grades (A-F)
Lending recommendations: Approve/Review/Decline based on risk grade
Comprehensive metrics: ROC curves, confusion matrices, decile analysis, calibration plots
Production-ready outputs: Model coefficients, validation reports, Kaggle submission files

Model Architecture

Algorithm: Logistic regression with stepwise variable selection (PROC LOGISTIC)
Selection criteria: p-value thresholds of 0.05 for entry and stay

Data

Source: Kaggle "Give Me Some Credit" competition dataset
Size: 150,000 credit applicant records
Split: 70% training (105,000) / 30% validation (45,000)
Features: 10 original features + 40+ engineered features
- Core: Age, income, credit utilization, debt ratio
- Late payments: 30-59, 60-89, and 90+ day delinquencies
- Risk flags: High utilization, high debt, low income
- Engineered: Financial health score, log transformations, interaction features

Baseline Performance

Validation AUC-ROC: 0.8564 (excellent discrimination)
Validation Accuracy: 93.62%
Validation Precision: 56.90%
Validation Recall: 20.08%
Validation F1-Score: 0.2968
Gini Coefficient: 0.7128
Population Stability Index (PSI): 0.0013 (excellent stability)

Example SAS Code:

/* Model Training with Stepwise Selection */
proc logistic data=work.model_features_train descending;
    model default_flag =
        credit_utilization_capped debt_ratio_capped
        num_late_30_59 num_late_60_89 num_late_90_plus
        age monthly_income log_monthly_income
        flag_high_utilization flag_serious_delinquency
        financial_health total_risk_flags
        age_income_interaction
        / selection=stepwise slentry=0.05 slstay=0.05;

    output out=work.scored predicted=probability;
    store work.logit_model;
run;

/* Generate Risk Scores */
data work.risk_scores;
    set work.scored;
    credit_risk_score = round(600 + 250 * (1 - probability));
    if credit_risk_score >= 750 then risk_grade = 'A';
    else if credit_risk_score >= 700 then risk_grade = 'B';
    /* ... additional risk grades ... */
run;

Step 1: Import and Create Migration Plan

Import your SAS repository into Artemis as a new project, then activate the Planning Agent.

The Planning Agent analyzes your SAS code and asks clarifying questions:

Which Python libraries do you prefer? (pandas, scikit-learn, etc.)
Should we match SAS output exactly or allow optimizations?
What validation approach do you want?
Are there specific business constraints?

After your input, the agent generates a structured migration plan with hierarchical tasks, success criteria, and dependencies.

Step 2: Execute Data Migration Tasks

The plan breaks migration into manageable tasks. First, translate SAS DATA steps to pandas operations.

Before (SAS):

data work.features;
    set work.raw_data;
    debt_ratio_category =
        case
            when debt_ratio < 0.3 then 'Low'
            when debt_ratio < 0.6 then 'Medium'
            else 'High'
        end;
run;

After (Python):

import pandas as pd

def engineer_features(df):
    """Translate SAS feature engineering to pandas"""
    df = df.copy()

    # Debt ratio categorization
    df['debt_ratio_category'] = pd.cut(
        df['debt_ratio'],
        bins=[0, 0.3, 0.6, float('inf')],
        labels=['Low', 'Medium', 'High']
    )

    return df

features_df = engineer_features(raw_data)

Step 3: Migrate Model Training

Convert SAS PROC LOGISTIC to statsmodels Logit while maintaining identical model specifications.

Before (SAS):

proc logistic data=work.features;
    model default_flag = debt_ratio income age;
    output out=predictions p=predicted_prob;
run;

After (Python):

import statsmodels.api as sm

# Prepare features
X = features_df[['debt_ratio', 'income', 'age']]
X = sm.add_constant(X)  # Add intercept
y = features_df['default_flag']

# Train model
model = sm.Logit(y, X)
results = model.fit()

# Generate predictions
predictions = results.predict(X)

Step 4: Validation and Verification

Artemis validates that the Python model produces equivalent results to the SAS original.

Validation checks include:

Prediction equivalence (Python predictions match SAS within tolerance)
Performance metrics match (ROC-AUC, accuracy, F1 score)
Feature importance comparison
Model coefficients alignment

Model Performance Comparison:

The migration achieves functional equivalence, with all metrics closely matching:

Metric	SAS Baseline	Python Migration	Difference
AUC-ROC	0.8564	0.8634	~ 0.006
Accuracy	0.9362	0.9350	~ 0.001
Precision	0.5690	0.5705	< 0.001
Recall	0.2008	0.1938	~ 0.006
F1-Score	0.2968	0.2893	~ 0.006
Gini Coefficient	0.7128	0.7269	~ 0.01

This validates that the Python implementation produces equivalent results to the SAS original, maintaining model accuracy while improving code maintainability.

Results and Benefits

This migration delivers:

Technical Success:

Functionally equivalent Python implementation
Performance maintained or improved
Clean, maintainable code following Python best practices

Business Impact:

Improved maintainability with modern Python
Integration-ready for ML infrastructure
Team productivity gains (Python skills more widely available)

Key Capabilities Demonstrated:

Verifiability: Automated validation at each step ensures correctness
Observability: Real-time progress tracking and metrics throughout migration
Optimization: Improvements beyond simple translation using Artemis Intelligence

Conclusion

Artemis Planning Agent transforms complex SAS-to-Python migrations into a systematic, verifiable process. By combining intelligent planning with automated validation and optimization, Artemis allows fast verifiable migration while ensuring model accuracy and improving code quality. Whether migrating individual models or entire model suites, Artemis makes legacy modernization efficient and reliable.

Project Overview​

Outline​

Migration Workflow​

Where is SAS Used?​

Starting Point: The SAS Model​

Prediction Target​

Capabilities​

Model Architecture​

Data​

Baseline Performance​

Step 1: Import and Create Migration Plan​

Step 2: Execute Data Migration Tasks​

Step 3: Migrate Model Training​

Step 4: Validation and Verification​

Results and Benefits​

Conclusion​

Related Documentation​