Fetch runs or experiments
Similar to displaying and filtering runs in the experiments table, you can fetch runs meeting certain criteria and choose which attributes to include as columns.
Before you start
Install neptune-fetcher:
pip install -U neptune-fetcher
Step 1: Initialize read-only project
To create a read-only project to perform the fetching on, use:
from neptune_fetcher import ReadOnlyProject
project = ReadOnlyProject()
If you haven't set your Neptune credentials as environment variables, you can pass the project name or API token as arguments:
project = ReadOnlyProject(
project="team-alpha/project-x", # your full project name here
api_token="h0dHBzOi8aHR0cHM6...Y2MifQ==", # your API token here
)
Step 2: Use fetching methods
Each fetching method has a variant for both experiments and runs:
-
Fetching experiments: Only runs that represent current experiments are returned.
When fetching experiments that have a history of forked or restarted runs, the historical runs are not included.
-
Fetching runs: All runs, including those that no longer represent experiments, are returned.
What's the connection between a run and experiment?
In the code, experiments are represented as runs. An experiment run has the experiment name stored in its sys/name
attribute.
In the below example, a run is created as the head of the experiment gull-flying-skills
:
from neptune_scale import Run
run = Run(
experiment_name="gull-flying-skills",
run_id="vigilant-kittiwake-1",
)
If a new run is created with the same experiment name, it becomes the new representant run for the experiment:
run = Run(
experiment_name="gull-flying-skills",
run_id="vigilant-kittiwake-2",
)
The vigilant-kittiwake-1
run is still accessible as part of the experiment history, but it's no longer considered an experiment.
Fetch metadata as data frame
To fetch an experiment's metadata as a pandas DataFrame, use fetch_experiments_df()
:
project = ReadOnlyProject()
all_experiments_df = project.fetch_experiments_df()
Filter by name or ID
You can use a regular expression to match experiment names:
specific_experiments_df = project.fetch_experiments_df(
names_regex=r"astute-.+-135"
)
specific_experiments_df = project.fetch_experiments_df(
names_exclude_regex=r"experiment-\d{2,4}"
)
Neptune uses the RE2 regular expression library. For supported regex features and limitations, see the RE2 syntax guide.
You can also fetch experiments by custom run ID:
specific_experiments_df = project.fetch_experiments_df(
custom_ids=["astute-kittiwake-14", "bombastic-seagull-2", "regal-xeme-18"]
)
specific_experiments_df = project.fetch_experiments_df(
custom_id_regex=r"[a-e]{2}_.+"
)
The custom ID refers to the identifier set with the
run_id
argument at experiment creation.
Filter by metadata value
To construct a custom filter, use the query
argument and the Neptune Query Language:
experiments_df = project.fetch_experiments_df(
query="(last(`accuracy`:floatSeries) > 0.88) AND (`f1`:float > 0.9)",
)
Limit columns
To limit the number of returned columns, you can:
- specify columns with the
columns
argument - retrieve extra columns that match a regex pattern with the
columns_regex
argument
For example:
experiments_df = project.fetch_experiments_df(
columns=["sys/modification_time", "scores/f1"],
columns_regex=r"tree/.*",
)
Combine filters
If you combine multiple criteria, they're joined by the logical AND operator.
The below example returns experiments that meet the following criteria:
- The name matches the regular expression
tree/.*
- The last logged
accuracy
value is higher than0.9
- The logged
learning_rate
value is less than0.01
Additionally, the returned data frame only includes the creation and modification times as columns.
experiments_df = my_project.fetch_experiments_df(
names_regex=r"tree/.*",
query=r'(last(`accuracy`:floatSeries) > 0.9) AND (`learning_rate`:float < 0.01)',
columns=["sys/creation_time", "sys/modification_time"],
)
List project experiments or runs
To list the identifiers of all the experiments or runs of a project, use:
- List experiment IDs as dicts
- List run IDs as dicts
project = ReadOnlyProject()
for experiment in project.list_experiments():
print(experiment)
project = ReadOnlyProject()
for run in project.list_runs():
print(run)
The above methods return the identifiers as an iterator of dictionaries.
To instead get the identifiers as a data frame, use:
- List experiment IDs as data frame
- List run IDs as data frame
project = ReadOnlyProject()
df = project.fetch_experiments()
project = ReadOnlyProject()
df = project.fetch_runs()
Fetch read-only experiments or runs
To download metadata from individual experiments or runs, fetch them as ReadOnlyRun
objects.
For details, see Fetch metadata from a run or experiment.