Skip to main content

Building a Regression Tool

This guide demonstrates how to leverage the Artemis Planning Agent to create a command-line regression tool for tabular data. Beginning with a basic Python template, we work through an interactive planning session where the agent asks targeted questions to understand our requirements. This collaborative process produces a detailed, actionable plan with well-defined tasks, each including specific deliverables and success criteria. The final product is a fully-featured regression tool that handles data loading, preprocesses the data including imputation, scaling, and encoding to handle both numeric and categorical features, generates predictions, evaluates model performance using multiple metrics, and produces a report with both metrics and visualizations. It also includes complete documentation and tests.

Project Overview

Goal: Create a command-line tool for linear regression on tabular CSV data

Before: Python template (acc6e7a)

After: Complete regression tool (ad00452)

Target Users: Data scientists, analysts, and engineers working with predictive modeling

Use Cases:

  • Sales and revenue prediction
  • Price estimation
  • Resource demand prediction

Planning Process

We began by providing our initial request expressed in a simple prompt: "I want to build a regression tool." Through a series of targeted questions, the Artemis Planning Agent guided us to clarify and refine our requirements, exploring both user needs and technical specifications.

User Requirements:

  • Type of regression? → Linear regression
  • Primary interface? → Command-line tool (CLI)
  • Data format? → CSV files
  • What should the tool produce? → Predictions and a report with visualizations and metrics
  • Train model or use pretrained? → Train model and save for predictions

Technical Requirements:

  • Build from scratch or use library? → Use scikit-learn
  • How should CLI commands work? → Separate commands for train and predict
  • Report format? → HTML
  • Autodetect features/target or user specifies? → User specifies the target column

From these answers, Artemis generated a plan for building a command-line linear regression tool with two main workflows: training a model and using it for predictions. The plan is structured to deliver a fully functional train flow with report generation as quickly as possible.

The approach: Start by setting up dependencies (scikit-learn, pandas, matplotlib, jinja2), then build out the core components: create the CLI structure with separate train and predict subcommands, implement data loading from CSV files, add model training logic, enable model persistence, and generate visualizations and an HTML report.

After that: We shift focus to the prediction functionality and complete the predict command, which loads the saved model and generates predictions on new data. Finally, we add tests, sample data, and documentation.

The Planning Agent then asked us to confirm if we wanted to proceed with this plan. At this point, we requested adjustments to add basic preprocessing—specifically imputing missing values and scaling numerical features—to handle real-world datasets more robustly. The agent incorporated our feedback and refined the plan.

Implementation

The Artemis Planning Agent generated a 12-task plan that builds the regression tool incrementally, first completing the train flow and then the prediction flow.

Throughout the implementation, Artemis offers validation after each task. Running these validations helps us verify that everything works as expected. If we identify any issues or areas for improvement, we can provide comments to the agent, which then adjusts the implementation accordingly. This iterative validation process ensures each component functions correctly before moving to the next task.

1. Add project dependencies to pyproject.toml - Set up the foundation by adding dependencies (scikit-learn, pandas, matplotlib, jinja2, click) for data loading, model training, and HTML report generation. (8957515)

2. Create CLI structure with train and predict subcommands - Built the command-line interface with separate train and predict commands using Click, defining the core structure for user interaction. (b3ff20c)

3. Implement data loading and validation - Created robust CSV data loading with validation to ensure data quality and proper format before processing. (1c948f2)

4. Implement preprocessing pipeline - Built a preprocessing pipeline handling missing values and scaling features, making the tool robust to real-world data with imputation and standardization. (0b6b566)

5. Implement model training functionality - Added the core ML pipeline using scikit-learn's LinearRegression, training the model on preprocessed data and computing evaluation metrics. (d65b52a)

6. Implement model persistence (save/load) - Enabled saving the trained model and preprocessing pipeline to disk for reuse, allowing the predict command to load the pre-trained model. (4718e85)

7. Create visualization generation module - Built visualization capabilities for generating charts. (f466ac8)

8. Implement HTML report generation - Created HTML report generation using Jinja2 templates, producing a self-contained report with embedded visualizations. (9c17354)

First Milestone

After building visualizations and an HTML report, we have a fully functional train command that loads CSV data, trains a linear regression model, evaluates performance, generates a scatter plot, residuals plot, and feature importance chart, and produces an HTML report with metrics (R², MSE, RMSE, MAE).

9. Wire up train subcommand - Integrated all training components (data loading, preprocessing, model training, evaluation, visualization, reporting) into the train command workflow. (a2316b1)

Second Milestone

Next comes the predict command, which loads the trained model, applies it to new data, and generates prediction outputs in CSV. Finally, we'll add tests, sample data, and documentation.

10. Implement prediction functionality - Built the prediction logic that loads the saved model and preprocessing pipeline, applies transformations to new data, and generates predictions to CSV. (a54baba)

11. Wire up predict subcommand - Connected the prediction functionality to the CLI, making it straightforward to use. Everything is integrated: uv run python main.py train --input sample_data.csv --target price --output-model model.joblib --report report.html and uv run python main.py predict --model model.joblib --input new_data.csv --output predictions.csv. (1a9da27)

12. Add basic tests and documentation - Added tests and documentation to verify functionality and provide user guidance. (d95e5e6)


Final Result

A complete regression tool with training, prediction, and evaluation capabilities. The tool handles data loading, preprocessing, model training with scikit-learn, generates an HTML report with visualizations and metrics, and includes complete documentation and tests.