A centralized repository for open-source MLRun functions, modules, and steps that can be used as reusable components in ML pipelines.
Before you begin, ensure you have the following installed:
- Python 3.11+ - Required (or let UV manage it for you)
- UV - Fast Python package manager (required)
- Git - For version control
- Make (optional) - For convenient command shortcuts
Note: This project uses UV as the baseline package manager. All dependency management and installation is done through UV.
-
Clone the repository:
git clone https://github.com/mlrun/functions.git cd functions -
Install dependencies:
uv sync
Note: you might need to install extra dependencies required for your asset tests. If your asset has a
requirements.txtfile, install it using:
uv pip install -r path/to/your/asset/requirements.txtThe project includes a Makefile for convenient command shortcuts:
| Command | Description |
|---|---|
make help |
Show all available commands |
make sync |
Sync dependencies from lockfile |
make format |
Format code with Ruff |
make lint |
Run code linters with Ruff |
make test NAME=<name> [TYPE=functions] |
Run tests for a specific asset (functions/modules/steps) |
make cli ARGS="<args>" |
Run CLI with custom arguments |
# Sync dependencies
make sync
# Format code
make format
# Run tests for a function (default type)
make test NAME=aggregate
# Run tests for a module
make test NAME=count_events TYPE=modules
# Run tests for a step
make test NAME=mystep TYPE=steps
# Run CLI command
make cli ARGS="generate-item-yaml function my_function"The project includes a CLI tool for managing MLRun functions, modules, and steps.
python -m cli.cli [COMMAND] [OPTIONS]Generate an item.yaml file from a Jinja2 template.
Syntax:
python -m cli.cli generate-item-yaml TYPE NAMEExamples:
# Generate item.yaml for a function
python -m cli.cli generate-item-yaml function aggregate
# Generate item.yaml for a module
python -m cli.cli generate-item-yaml module my_module
# Generate item.yaml for a step
python -m cli.cli generate-item-yaml step my_stepCreate a function.yaml file from an item.yaml file.
Syntax:
python -m cli.cli item-to-function --item-path PATHExample:
python -m cli.cli item-to-function --item-path functions/src/aggregateCreate an item.yaml file from a function.yaml file.
Syntax:
python -m cli.cli function-to-item --path PATHExample:
python -m cli.cli function-to-item --path functions/src/aggregateRun the test suite for a specific asset.
Syntax:
python -m cli.cli run-tests -r PATH -s TYPE -fn NAMEExample:
python -m cli.cli run-tests -r functions/src/aggregate -s py -fn aggregateWe welcome contributions! Follow these steps to contribute:
- Functions - MLRun runtime functions (job or serving)
- Modules - Generic modules or model monitoring applications
- Steps - MLRun steps to be used in serving graphs
- Fork the repository on GitHub
- Create a new branch for your asset:
git checkout -b feature/my-new-item
- Add your asset under the appropriate directory:
- Functions:
functions/src/ - Modules:
modules/src/ - Steps:
steps/src/
- Functions:
- Follow the asset structure (see below)
- Test your asset thoroughly
- Open a pull request to the development branch
functions/src/your_function_name/
├── item.yaml # Metadata (required)
├── function.yaml # MLRun function definition (required)
├── your_function_name.py # Main code file (required)
├── your_function_name.ipynb # Demo notebook (required)
├── test_your_function_name.py # Unit tests (required)
└── requirements.txt # Test dependencies (optional)Steps to create a function:
-
Generate the item.yaml template:
python -m cli.cli generate-item-yaml function your_function_name
-
Fill in the
item.yamlwith:kind:joborservingcategories: Browse MLRun Hub for existing categoriesversion,description, and other metadata
-
Generate the function.yaml:
python -m cli.cli item-to-function --item-path functions/src/your_function_name
-
Implement your function in
your_function_name.py:- Keep code well-documented (docstrings are used in the hub UI)
- Specify function dependencies in
item.yaml, not in requirements.txt
-
Create a demo notebook (
your_function_name.ipynb):- Must run end-to-end automatically
- Demonstrate the function's usage
-
Write unit tests (
test_your_function_name.py):- Cover functionality as much as possible
- Tests run automatically on each change
modules/src/your_module_name/
├── item.yaml # Metadata (required)
├── your_module_name.py # Main code file (required)
├── your_module_name.ipynb # Demo notebook (required)
├── test_your_module_name.py # Unit tests (required)
└── requirements.txt # Test dependencies (optional)Steps to create a module:
-
Generate the item.yaml:
python -m cli.cli generate-item-yaml module your_module_name
-
Fill in the
item.yamlwith:kind:genericormonitoring_applicationcategoriesand other metadataversion,description, etc.
-
Implement, document, and test your module
For model monitoring modules, see the MLRun model monitoring guidelines.
- Asset follows the proper directory structure
-
item.yamlis complete and accurate - Code is well-documented with docstrings
- Demo notebook runs end-to-end without errors
- Unit tests cover the functionality
- Code is formatted with Ruff (
make format) - All tests pass locally
- PR targets the development branch
Test a specific asset:
# Test a function (default)
make test NAME=aggregate
# Test a module
make test NAME=mymodule TYPE=modules
# Test a step
make test NAME=mystep TYPE=steps
# Or use the CLI directly
python -m cli.cli run-tests -r functions/src/aggregate -s py -fn aggregateRun tests manually with pytest:
cd functions/src/aggregate
pytest test_aggregate.py -v- Place tests in
test_<asset_name>.py - Use pytest as the testing framework
- Mock external dependencies when necessary
- Test edge cases and error conditions
- Ensure tests are reproducible
Note: Tests will be run automatically on each change in the CI pipeline
Example test structure:
import pytest
from your_function_name import your_function
def test_basic_functionality():
result = your_function(param1="value1")
assert result is not None
assert result.status == "success"
def test_error_handling():
with pytest.raises(ValueError):
your_function(invalid_param="bad_value")We follow PEP 8 style guidelines with the following configuration:
- Line length: 120 characters
- Imports: Automatically sorted and organized
- Type hints: Encouraged for function signatures
- Formatter: Ruff is used for formatting and linting
Ruff - Fast Python formatter and linter:
Format code automatically:
make format
# or
uv run ruff format .
uv run ruff check --fix .Check formatting without changes:
make lint
# or
uv run ruff format --check .
uv run ruff check .- Docstrings are mandatory for all public hub items
- Use clear, concise descriptions
- Include parameter types and return types
- Provide usage examples when helpful
Example (function auto_trainer):
def train(
context: MLClientCtx,
dataset: DataItem,
model_class: str,
label_columns: Optional[Union[str, List[str]]] = None,
drop_columns: List[str] = None,
model_name: str = "model",
tag: str = "",
sample_set: DataItem = None,
test_set: DataItem = None,
train_test_split_size: float = None,
random_state: int = None,
labels: dict = None,
**kwargs,
):
"""
Training a model with the given dataset.
example::
import mlrun
project = mlrun.get_or_create_project("my-project")
project.set_function("hub://auto_trainer", "train")
trainer_run = project.run(
name="train",
handler="train",
inputs={"dataset": "./path/to/dataset.csv"},
params={
"model_class": "sklearn.linear_model.LogisticRegression",
"label_columns": "label",
"drop_columns": "id",
"model_name": "my-model",
"tag": "v1.0.0",
"sample_set": "./path/to/sample_set.csv",
"test_set": "./path/to/test_set.csv",
"CLASS_solver": "liblinear",
},
)
:param context: MLRun context
:param dataset: The dataset to train the model on. Can be either a URI or a FeatureVector
:param model_class: The class of the model, e.g. `sklearn.linear_model.LogisticRegression`
:param label_columns: The target label(s) of the column(s) in the dataset. for Regression or
Classification tasks. Mandatory when dataset is not a FeatureVector.
:param drop_columns: str or a list of strings that represent the columns to drop
:param model_name: The model's name to use for storing the model artifact, default to 'model'
:param tag: The model's tag to log with
:param sample_set: A sample set of inputs for the model for logging its stats along the model in favour
of model monitoring. Can be either a URI or a FeatureVector
:param test_set: The test set to train the model with.
:param train_test_split_size: if test_set was provided then this argument is ignored.
Should be between 0.0 and 1.0 and represent the proportion of the dataset to include
in the test split. The size of the Training set is set to the complement of this
value. Default = 0.2
:param random_state: Relevant only when using train_test_split_size.
A random state seed to shuffle the data. For more information, see:
https://scikit-learn.org/stable/glossary.html#term-random_state
Notice that here we only pass integer values.
:param labels: Labels to log with the model
:param kwargs: Here you can pass keyword arguments with prefixes,
that will be parsed and passed to the relevant function, by the following prefixes:
- `CLASS_` - for the model class arguments
- `FIT_` - for the `fit` function arguments
- `TRAIN_` - for the `train` function (in xgb or lgbm train function - future)
"""
# Implementation hereProblem: python -m cli.cli fails
Solution:
# Check if you're in the right directory
pwd # Should be the project root
# Ensure dependencies are installed
make sync
Problem: Tests fail when running locally
Solution:
# Install test dependencies if the item has a requirements.txt
cd functions/src/your_function
uv pip install -r requirements.txt
# Run tests with verbose output
pytest test_your_function.py -v -s
# Check for missing environment variables or configurationIf you encounter issues:
- Check the MLRun documentation
- Search GitHub issues
- Open a new issue with:
- Error message
- Steps to reproduce
- Environment details (OS, Python version, UV version)
- MLRun Hub UI: https://www.mlrun.org/hub/
- MLRun Documentation: https://docs.mlrun.org/
- MLRun Marketplace Repository: http://github.com/mlrun/marketplace
- MLRun Community: https://github.com/mlrun/mlrun
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
For questions and support:
- Open an issue on GitHub
- Join the MLRun community discussions
- Check the MLRun documentation
Happy Contributing! 🚀