Skip to main content

Extract Code Snippets

Overview

Artemis can work with codebases of any size, from small projects to large enterprise applications. Extract a subset of your codebase to target Artemis on specific sections and manage the AI context effectively.

For smaller codebases, you can simply import and analyze the entire project. However, for larger codebases, you can use the filtering options to focus on specific areas of interest, such as:

  • Particular modules or components
  • Files matching certain patterns
  • Code sections with specific characteristics
  • Areas identified by performance profiling or 3rd party tools such as SonarQube or Coverity

This selective approach helps reduce processing time and resource usage, and focus on the most important or problematic areas.

Select files for extraction

In order to start the extraction, click on the Analyse button on the right corner of your codebase entry in Projects:

Analyse button

Clicking the Analyse button will take you to a page where you can choose from multiple extraction methods:

  1. Auto-extract code - The traditional method of manually selecting files and filtering content
  2. Tool-based code extraction - Extract code identified by third-party tools (e.g., SonarQube, VTune) that have reported issues
  3. Pull request review - Extract only the code differences between two branches to simulate a PR review
  4. Contributor code identification - Extract code authored by specific contributors within a specified date range

Select the extraction method that best suits your needs. The following sections detail the steps for the auto-extract code method, which is done in two sequential steps:

  1. File filtering - Select which files and folders to include in the analysis

  2. Content filtering - Further refine the selected files by analyzing their contents

NOTE: Both filters are combined, meaning Artemis will only include files which pass both filters.

Step 1: File Filtering: Select the code files you would like to analyse

Under File Filtering, you can choose code files to include or exclude. This can be done either by selecting the files / folders manually, or automatically matching with glob patterns.

Option 1: File filtering via File Explorer (manual selection).

File selection

Option 2: File filtering via Glob pattern (automatic selection). Alternatively, you can select or exclude a set of code files by their file title, using Glob patterns.

Regex selection

Accepted file formats
LanguageFile Extensions
C++.cpp, .c, .h, .hpp, .cc, .hh, .cxx, .hxx, .c++, .h++, .cu, .cuh
C.c, .h
Java.java
Python.py
Fortran.f, .for, .f90, .f95, .f03, .f08, .F, .F90
JavaScript.js, .jsx
TypeScript.ts, .tsx
Ruby.rb
PHP.php
C#.cs
Go.go
Swift.swift
Kotlin.kt, .kts
Scala.scala
Rust.rs
Dart.dart
R.r, .R
Lua.lua
Perl.pl, .pm
SQL.sql
Q.q
COBOL.cob, .cbl, .cpy
OCaml.ml
Elixir.ex
Text.txt, .md, .html, .css, .scss, .vue, .xml, .json, .yaml, .yml, .ini, .log, .conf, .cfg, .tsv, .rst, .tex, .bat, .sh, .pl, .toml, .properties, .gradle, .maven, .cmd, .awk, .env, .helm, .tpl, .kubeconfig, .npmrc, .prettierrc, .eslintrc, .babelrc, .terraformrc, .tfvars, .tf, .editorconfig, .gitignore, .gitconfig, .zshrc, .bashrc, .profile, .flake8, .pylintrc, .coveragerc, .drl, .m, .jl, .vba, .bas, .cls, .frm
Unsupported.csv

Step 2: Content Filtering: Filter files by content and split big files

Under Content Filtering, you can further filter which files to include based on code languages.

Content filtering

You can further split big code files by class or function, where each part (function or class) will be extracted into its own code snippet. The options available are to extract by: (1) Function, (2) Class, or (3) File, where you can choose the entire code file.

Click on the arrow button below Expand for content filters for further content filtering options.

Content Filtering: Further filtering options

Here is an overview of the further content filtering options:

Content filtering expand options

Further content filtering can be done via one of regular expressions, size, or treesitter structure.

Regex logic

Even though Advanced Query Options section only contains the condition, criteria, and the value fields, the logic of the filtering criteria follows from the code language and object dropdowns available at the top. For each of the three options below, we have included a complete example query.

See details of each option below:

(1) regular expression: where you can specify to filter content from files that either match or do not match a regular expression of your choosing.

Example: Consider the image below

Regex-based filtering

Here, the regex follows the logic:

Example query

Cpp Function matches regular expression \bSumPrimes\s*\(

This means that Artemis will scan the codebase to find any code files containing the function SumPrimes.

(2) size: where you can filter content based on the size of the selected object. The options available to filter are characters, words, or lines. Consider the example below:

Size-based filtering

Here, the filtering logic is:

Example query

Cpp Class has size >= 10 lines

(3) treesitter structure: where you can use a Tree-sitter query to extract loop or nested_loop content from your codebase.

See the image in the example below:

Treesitter-based filtering

Here, the filtering logic is:

Example query

Cpp Function contains treesitter structure nested_loop

For more information on the Treesitter library, please see: Treesitter library

Once you have provided your filtering criteria, press continue, to start extracting the snippets.

Content filtering: Select code snippets with an LLM query

You are also able to use an LLM query to select parts of the codebase that you wish to analyse. In order to do this, turn on the LLM Filtering toggle option, see image below:

LLM-based filtering

If you prefer, specify the target count, to control the number of outputs that the LLM will pick from your codebase. For instance, in the example above, the target count is 5, which means the LLM will pick the top 5 most ineffient snippets of code.

LLMs are not perfect

LLM-based filtering may lead to more false positives in code snippet selection, i.e., selection of code snippets that cannot in fact be optimised.

View code snippets

Upon completing the code analysis process, Artemis will extract the code snippets based on the criteria you have defined, and display them under the Snippets tab. See image below:

Snippets overview

Click on any of the snippets to get details of the snippet. You can also use the chat function to ask questions about the code snippet.

Snippet details

After analyzing your code and identifying snippets of interest, you can:

  • Evaluate their quality using Artemis's scoring functionality. See Code Scoring
  • (OPTIONAL) Index your code to enable better context-aware suggestions. See Code Indexing