MLflow — ML Experiment Tracking & Model Registry

MLflow tracks your machine learning experiments — hyperparameters, metrics, model artifacts — and provides a model registry where you promote models from experimentation to production. It gives your ML work the same discipline as software engineering.

What MLflow Tracks

Each ML experiment run records:
  ✔ Hyperparameters (learning rate, batch size, epochs...)
  ✔ Metrics (accuracy, loss, F1 score per epoch)
  ✔ Artifacts (model weights, plots, confusion matrices)
  ✔ Environment (Python version, dependencies)
  ✔ Code version (git commit hash)

Architecture on Your Cluster

ML Training Job (pod or local machine)
        │ logs metrics + artifacts
        ▼
MLflow Tracking Server (k3s pod)
        │ stores
        ├── Metadata → PostgreSQL
        └── Artifacts → MinIO (S3-compatible)
        │
        ▼
MLflow UI (browser)
  → Compare runs, promote models to registry

Deploy MLflow

kubectl create namespace mlflow

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow
  namespace: mlflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow
  template:
    metadata:
      labels:
        app: mlflow
    spec:
      containers:
        - name: mlflow
          image: ghcr.io/mlflow/mlflow:latest
          command:
            - mlflow
            - server
            - --host=0.0.0.0
            - --port=5000
            - --backend-store-uri=postgresql://mlflow:password@postgres-svc/mlflow
            - --default-artifact-root=s3://mlflow-artifacts/
          env:
            - name: MLFLOW_S3_ENDPOINT_URL
              value: http://minio.minio.svc:9000
            - name: AWS_ACCESS_KEY_ID
              value: minioadmin
            - name: AWS_SECRET_ACCESS_KEY
              value: minioadmin
          ports:
            - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: mlflow
  namespace: mlflow
spec:
  type: LoadBalancer
  selector:
    app: mlflow
  ports:
    - port: 5000
      targetPort: 5000

kubectl apply -f mlflow.yaml

Log an Experiment (Python)

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

mlflow.set_tracking_uri("http://10.0.0.202:5000")
mlflow.set_experiment("iris-classification")

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

with mlflow.start_run():
    n_estimators = 100
    mlflow.log_param("n_estimators", n_estimators)

    model = RandomForestClassifier(n_estimators=n_estimators)
    model.fit(X_train, y_train)

    accuracy = accuracy_score(y_test, model.predict(X_test))
    mlflow.log_metric("accuracy", accuracy)

    mlflow.sklearn.log_model(model, "random-forest-model")
    print(f"Accuracy: {accuracy:.3f}")

Model Registry Workflow

Train multiple runs with different hyperparameters
Compare in MLflow UI → pick best run
Register model: "iris-classifier" → version 1
Transition: Staging → Production
Serving layer loads model from registry

In MLflow UI:

Models tab → register from any run
Set stage: None → Staging → Production → Archived

Serve a Model

mlflow models serve \
  -m "models:/iris-classifier/Production" \
  --host 0.0.0.0 \
  --port 8888

REST API:

curl http://10.0.0.202:8888/invocations \
  -H "Content-Type: application/json" \
  -d '{"dataframe_split": {"columns": ["sl","sw","pl","pw"], "data": [[5.1,3.5,1.4,0.2]]}}'

Done When

✔ MLflow tracking server Running
✔ MinIO bucket mlflow-artifacts created
✔ First experiment logged via Python client
✔ Model registered in model registry
✔ UI accessible at MetalLB IP

What MLflow Tracks​

Architecture on Your Cluster​

Deploy MLflow​

Log an Experiment (Python)​

Model Registry Workflow​

Serve a Model​

Done When​