Skip to main content

Building an Anomaly Detection Tool

Here we will demonstrate how to use the Artemis Planning Agent to build a command-line tool for anomaly detection in tabular data. The starting point is a simple Python template. Through an iterative process of question answering with our planner, we develop a specific and well-structured plan to build our desired tool with clear validation points and success criteria. The end result is a complete tool with data loading, algorithm implementation, and visualization, along with comprehensive documentation.

Project Overview

Goal: Build a CLI tool that identifies outliers in CSV files using machine learning

Before: Python template (acc6e7a)

After: Complete anomaly detection tool (f8ad1ad)

Target Users: Data scientists and analysts working with numeric datasets

Use Cases:

  • Fraud detection in financial transactions
  • Sensor error identification in IoT data
  • Data quality assessment and validation

Planning Process

We started with a simple statement: "I want to build an anomaly detection tool."

The Artemis Planning Agent helped us think through the requirements by asking clarifying questions:

User Requirements:

  • What type of data? → General tabular data
  • Primary use case? → General-purpose tool
  • What to do with anomalies? → Visualize in dashboard/charts
  • How to provide data? → Supply path to CSV file
  • What form should it take? → Command-line tool (CLI)

Technical Requirements:

  • Specific algorithms? → Machine learning (Isolation Forest, One-Class SVM)
  • Expected data scale? → Large (hundreds of thousands of rows)
  • Configurable parameters? → Yes - via command-line flags
  • How to display charts? → Generate static image files
  • Both algorithms or one? → Both - run both and compare results

From these answers, Artemis generated a 9-task plan with a clear strategy: Build a working MVP first (tasks 1-6), then enhance it (tasks 7-9). This approach meant we'd have a functional tool using Isolation Forest after 6 tasks, then add the second algorithm and comparison features.

Each task in the plan comes with clear structure: specific goals and deliverables, detailed technical specifications, files to modify, and measurable success criteria. This transforms a vague request into actionable development steps with concrete validation points.

Implementation

We built the tool in two phases: first creating a working MVP, then enhancing it with comparison capabilities.

Phase 1: MVP - Working Anomaly Detection Tool (Tasks 1-6)

These six tasks created a complete, functional tool using Isolation Forest:

1. Set up project dependencies and structure - Established foundation with pandas, scikit-learn, matplotlib, and numpy for data processing, ML algorithms, and visualization. (1f17c19)

2. Implement CSV data loading and preprocessing - Built robust data pipeline with validation, encoding, and scaling optimized for large datasets. (1777042)

3. Implement Isolation Forest anomaly detection - Added first ML algorithm with configurable parameters and optimized performance for hundreds of thousands of rows. (83dfad3)

4. Implement visualization and chart generation - Created comprehensive visualization suite with PCA/t-SNE dimensionality reduction and multi-panel dashboards. (49d49d7)

5. Build CLI interface with argparse - Implemented command-line interface with parameter configuration, validation, and progress feedback. (3c7e222)

6. Integration and pipeline orchestration - Wired all components together into main.py and integrated the complete pipeline with comprehensive error handling. (d078f4a)

Result after Task 6: A fully functional anomaly detection CLI tool that processes CSV files, detects outliers using Isolation Forest, and generates clear visualizations.

Phase 2: Enhancement - Dual Algorithm Comparison (Tasks 7-9)

These three tasks added the second algorithm and comparison capabilities:

7. Implement One-Class SVM anomaly detection - Added second ML algorithm providing complementary detection approach with performance optimization for large-scale data. (acd5d8a)

8. Implement result comparison and aggregation - Built comparison logic with algorithm agreement metrics, consensus detection, and statistical measures of inter-algorithm agreement. (5682b45)

9. Documentation and README - Completed professional documentation with installation instructions, usage examples, algorithm guidance, and troubleshooting. (f8ad1ad)

Final Result: A sophisticated tool that runs both algorithms, compares results, identifies consensus anomalies, and provides agreement metrics showing multi-algorithm consensus.

The Artemis Planning Agent breaks down requirements into manageable tasks and guides you through building each component step by step, from initial template to complete, documented tool.