Extract Code Snippets
Overview
Artemis can work with codebases of any size, from small projects to large enterprise applications. Extract a subset of your codebase to target Artemis on specific sections and manage the AI context effectively.
For smaller codebases, you can simply import and analyze the entire project. However, for larger codebases, you can use the filtering options to focus on specific areas of interest, such as:
- Particular modules or components
- Files matching certain patterns
- Code sections with specific characteristics
- Areas identified by performance profiling or 3rd party tools such as SonarQube or Coverity
This selective approach helps reduce processing time and resource usage, and focus on the most important or problematic areas.
Select files for extraction
In order to start the extraction, click on the Analyse
button on the right corner of your codebase entry in Projects
:
Clicking the Analyse
button will take you to a page where you can choose from multiple extraction methods:
- Auto-extract code - The traditional method of manually selecting files and filtering content
- Tool-based code extraction - Extract code identified by third-party tools (e.g., SonarQube, VTune) that have reported issues
- Pull request review - Extract only the code differences between two branches to simulate a PR review
- Contributor code identification - Extract code authored by specific contributors within a specified date range
Select the extraction method that best suits your needs. The following sections detail the steps for the auto-extract code method, which is done in two sequential steps:
-
File filtering - Select which files and folders to include in the analysis
-
Content filtering - Further refine the selected files by analyzing their contents
NOTE: Both filters are combined, meaning Artemis will only include files which pass both filters.
Step 1: File Filtering: Select the code files you would like to analyse
Under File Filtering
, you can choose code files to include or exclude. This can be done either by selecting the files / folders manually, or automatically matching with glob patterns.
Option 1: File filtering via File Explorer (manual selection).
Option 2: File filtering via Glob pattern (automatic selection). Alternatively, you can select or exclude a set of code files by their file title, using Glob patterns.
Language | File Extensions |
---|---|
C++ | .cpp, .c, .h, .hpp, .cc, .hh, .cxx, .hxx, .c++, .h++, .cu, .cuh |
C | .c, .h |
Java | .java |
Python | .py |
Fortran | .f, .for, .f90, .f95, .f03, .f08, .F, .F90 |
JavaScript | .js, .jsx |
TypeScript | .ts, .tsx |
Ruby | .rb |
PHP | .php |
C# | .cs |
Go | .go |
Swift | .swift |
Kotlin | .kt, .kts |
Scala | .scala |
Rust | .rs |
Dart | .dart |
R | .r, .R |
Lua | .lua |
Perl | .pl, .pm |
SQL | .sql |
Q | .q |
COBOL | .cob, .cbl, .cpy |
OCaml | .ml |
Elixir | .ex |
Text | .txt, .md, .html, .css, .scss, .vue, .xml, .json, .yaml, .yml, .ini, .log, .conf, .cfg, .tsv, .rst, .tex, .bat, .sh, .pl, .toml, .properties, .gradle, .maven, .cmd, .awk, .env, .helm, .tpl, .kubeconfig, .npmrc, .prettierrc, .eslintrc, .babelrc, .terraformrc, .tfvars, .tf, .editorconfig, .gitignore, .gitconfig, .zshrc, .bashrc, .profile, .flake8, .pylintrc, .coveragerc, .drl, .m, .jl, .vba, .bas, .cls, .frm |
Unsupported | .csv |
Step 2: Content Filtering: Filter files by content and split big files
Under Content Filtering
, you can further filter which files to include based on code languages.
You can further split big code files by class or function, where each part (function or class) will be extracted into its own code snippet. The options available are to extract by: (1) Function
, (2) Class
, or (3) File
, where you can choose the entire code file.
Click on the arrow button below Expand for content filters
for further content filtering options.
Content Filtering: Further filtering options
Here is an overview of the further content filtering options:
Further content filtering can be done via one of regular expressions, size, or treesitter structure.
Even though Advanced Query Options
section only contains the condition, criteria, and the value fields, the logic of the filtering criteria follows from the code language and object dropdowns available at the top. For each of the three options below, we have included a complete example query.
See details of each option below:
(1) regular expression
: where you can specify to filter content from files that either match or do not match a regular expression of your choosing.
Example: Consider the image below
Here, the regex follows the logic:
Cpp
Function
matches
regular expression
\bSumPrimes\s*\(
This means that Artemis will scan the codebase to find any code files containing the function SumPrimes.
(2) size
: where you can filter content based on the size of the selected object. The options available to filter are characters, words, or lines.
Consider the example below:
Here, the filtering logic is:
Cpp
Class
has
size
>=
10
lines
(3) treesitter structure
: where you can use a Tree-sitter query to extract loop
or nested_loop
content from your codebase.
See the image in the example below:
Here, the filtering logic is:
Cpp
Function
contains
treesitter structure
nested_loop
For more information on the Treesitter library, please see: Treesitter library
Once you have provided your filtering criteria, press continue, to start extracting the snippets.
Content filtering: Select code snippets with an LLM query
You are also able to use an LLM query to select parts of the codebase that you wish to analyse. In order to do this, turn on the LLM Filtering
toggle option, see image below:
If you prefer, specify the target count
, to control the number of outputs that the LLM will pick from your codebase. For instance, in the example above, the target count is 5, which means the LLM will pick the top 5 most ineffient snippets of code.
LLM-based filtering may lead to more false positives in code snippet selection, i.e., selection of code snippets that cannot in fact be optimised.
View code snippets
Upon completing the code analysis process, Artemis will extract the code snippets based on the criteria you have defined, and display them under the Snippets
tab. See image below:
Click on any of the snippets to get details of the snippet. You can also use the chat function to ask questions about the code snippet.
After analyzing your code and identifying snippets of interest, you can:
- Evaluate their quality using Artemis's scoring functionality. See Code Scoring
- (OPTIONAL) Index your code to enable better context-aware suggestions. See Code Indexing