MLflow — ML Experiment Tracking & Model Registry
MLflow tracks your machine learning experiments — hyperparameters, metrics, model artifacts — and provides a model registry where you promote models from experimentation to production. It gives your ML work the same discipline as software engineering.
What MLflow Tracks
Each ML experiment run records:
✔ Hyperparameters (learning rate, batch size, epochs...)
✔ Metrics (accuracy, loss, F1 score per epoch)
✔ Artifacts (model weights, plots, confusion matrices)
✔ Environment (Python version, dependencies)
✔ Code version (git commit hash)
Architecture on Your Cluster
ML Training Job (pod or local machine)
│ logs metrics + artifacts
▼
MLflow Tracking Server (k3s pod)
│ stores
├── Metadata → PostgreSQL
└── Artifacts → MinIO (S3-compatible)
│
▼
MLflow UI (browser)
→ Compare runs, promote models to registry
Deploy MLflow
kubectl create namespace mlflow
apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow
namespace: mlflow
spec:
replicas: 1
selector:
matchLabels:
app: mlflow
template:
metadata:
labels:
app: mlflow
spec:
containers:
- name: mlflow
image: ghcr.io/mlflow/mlflow:latest
command:
- mlflow
- server
- --host=0.0.0.0
- --port=5000
- --backend-store-uri=postgresql://mlflow:password@postgres-svc/mlflow
- --default-artifact-root=s3://mlflow-artifacts/
env:
- name: MLFLOW_S3_ENDPOINT_URL
value: http://minio.minio.svc:9000
- name: AWS_ACCESS_KEY_ID
value: minioadmin
- name: AWS_SECRET_ACCESS_KEY
value: minioadmin
ports:
- containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: mlflow
namespace: mlflow
spec:
type: LoadBalancer
selector:
app: mlflow
ports:
- port: 5000
targetPort: 5000
kubectl apply -f mlflow.yaml
Log an Experiment (Python)
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
mlflow.set_tracking_uri("http://10.0.0.202:5000")
mlflow.set_experiment("iris-classification")
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
with mlflow.start_run():
n_estimators = 100
mlflow.log_param("n_estimators", n_estimators)
model = RandomForestClassifier(n_estimators=n_estimators)
model.fit(X_train, y_train)
accuracy = accuracy_score(y_test, model.predict(X_test))
mlflow.log_metric("accuracy", accuracy)
mlflow.sklearn.log_model(model, "random-forest-model")
print(f"Accuracy: {accuracy:.3f}")
Model Registry Workflow
1. Train multiple runs with different hyperparameters
2. Compare in MLflow UI → pick best run
3. Register model: "iris-classifier" → version 1
4. Transition: Staging → Production
5. Serving layer loads model from registry
In MLflow UI:
- Models tab → register from any run
- Set stage:
None → Staging → Production → Archived
Serve a Model
mlflow models serve \
-m "models:/iris-classifier/Production" \
--host 0.0.0.0 \
--port 8888
REST API:
curl http://10.0.0.202:8888/invocations \
-H "Content-Type: application/json" \
-d '{"dataframe_split": {"columns": ["sl","sw","pl","pw"], "data": [[5.1,3.5,1.4,0.2]]}}'
Done When
✔ MLflow tracking server Running
✔ MinIO bucket mlflow-artifacts created
✔ First experiment logged via Python client
✔ Model registered in model registry
✔ UI accessible at MetalLB IP