Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@
// e.g "args": ["run", "--pipeline", "pipeline_name"]
},
{
"name": "Kedro Run Hyperparam",
"name": "Python: PyTEST",
"type": "python",
"request": "launch",
"console": "integratedTerminal",
"module": "kedro",
"module": "pytest",
"args": [
"run", "--pipeline", "optuna_pipeline"
"-sv"
],
"justMyCode": false
// Any other arguments should be passed as a comma-seperated-list
Expand Down
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,20 @@ The following parameters can be adjusted:
- Number of classes
- Number of qubits

### Tests
- run all tests:
pytest

- test data_processing:
pytest -v src/tests/pipelines/data_processing/test_pipeline.py::TestDataPreparation

- test training:
pytest -v src/tests/pipelines/data_science/test_pipeline.py::TestTraining

- test implementation of all optimizers:
pytest -v src/tests/pipelines/data_science/test_pipeline.py::test_optimizer


## Literature :books:

[1]: [Hybrid Quantum Classical Graph Neural Networks for Particle Track Reconstruction](https://arxiv.org/abs/2109.12636)\
Expand Down
6 changes: 3 additions & 3 deletions conf/base/catalog.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Das können wir vermutlich löschen oder?

Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ data_science.trained_model:
model: model
load_args: {"n_qubits": "${n_qubits}", "n_layers": "${n_layers}", "classes": "${classes}", "data_reupload": "${data_reupload}","quant_status": "${quant_status}", "n_shots" : "${n_shots}"}

# data_science.metrics:
# type: kedro_mlflow.io.metrics.MlflowMetricsDataSet
# prefix: metrics
#data_science.metrics:
# type: kedro_mlflow.io.metrics.MlflowMetricsDataSet
# prefix: metrics

data_science.metrics_fig:
type: kedro_mlflow.io.artifacts.MlflowArtifactDataSet
Expand Down
6 changes: 3 additions & 3 deletions conf/base/globals.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Das sieht für mich eher nach einer Test-Konfiguration aus. Das sollte eher lokal bleiben und nicht Teil des Commits werden.

Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
TRAINING_SIZE: 500
TEST_SIZE: 250
save: True
classes: [1,2,3,4,5]
n_qubits: 8
classes: [1,2]
n_qubits: 4
n_qubits_range_quant: [3, 6, "linear"] #make sure the lowest number is higher than n_classes
n_layers: 5
n_layers: 2
n_layers_range_quant: [1, 2, "linear"]
n_shots: 0
data_reupload: 1.0
Expand Down
2 changes: 1 addition & 1 deletion conf/base/logging.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auch das hier, gerne einfach löschen, da es nur eine lokale Konfiguration ist.

Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ loggers:
level: INFO

split_optimizer:
level: INFO
level: DEBUG

root:
handlers: [rich, info_file_handler]
4 changes: 2 additions & 2 deletions conf/base/parameters/data_processing.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S.o.

Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
data_processing:
batch_size: 10
batch_size: 100
# batch_size_range: [1, 2, 4, 10, 100]
TRAINING_SIZE: ${TRAINING_SIZE}
TEST_SIZE: ${TEST_SIZE}

classes: ${classes}

torch_seed: ${torch_seed}
2 changes: 1 addition & 1 deletion conf/base/parameters/data_science.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S.o.

Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ data_science:
n_shots: ${n_shots}
data_reupload: ${data_reupload}
data_reupload_range_quant: ${data_reupload_range_quant}
epochs: 10
epochs: 3
optimizer:
# combined:
# name: Adam
Expand Down
5 changes: 1 addition & 4 deletions pyproject.toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bin mir nicht ganz sicher, aber brauchen wir den Teil nicht?

Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,7 @@ kedro_init_version = "0.18.9"
[tool.isort]
profile = "black"

[tool.pytest.ini_options]
addopts = """
--cov-report term-missing \
--cov src/split_optimizer -ra"""


[tool.coverage.report]
fail_under = 0
Expand Down
3 changes: 2 additions & 1 deletion src/requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ mlflow
pennylane
torchmetrics
optuna
optuna_dashboard
optuna_dashboard
pytest
4 changes: 2 additions & 2 deletions src/split_optimizer/helpers/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
from split_optimizer.pipelines.data_science.hybrid_model import Model
import torchvision
from PIL import Image
from typing import Any, Callable, Dict, List, Optional, Tuple
from typing import Any, Dict, Tuple


from os.path import isfile
from typing import Any, Union, Dict
from typing import Any, Dict
import torch
from kedro.io import AbstractDataSet

Expand Down
6 changes: 4 additions & 2 deletions src/split_optimizer/pipeline_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@ def register_pipelines() -> dict[str, Pipeline]:
"""
data_processing_pipeline = data_processing.create_pipeline()
data_science_training_pipeline = data_science.create_training_pipeline()
post_processing_pipeline = data_science.create_postprocessing_pipeline()
data_science_hyperparam_opt_pipeline = data_science.create_hyperparam_opt_pipeline()

default_pipeline = data_processing_pipeline + data_science_training_pipeline

return {
"__default__": data_processing_pipeline + data_science_training_pipeline,
"debug_pipeline": data_processing_pipeline + data_science_training_pipeline,
"__default__": data_processing_pipeline + data_science_training_pipeline + post_processing_pipeline,
"debug_pipeline": data_processing_pipeline + data_science_training_pipeline + post_processing_pipeline,
"test_pipeline": data_processing_pipeline + data_science_training_pipeline,
"optuna_pipeline": data_processing_pipeline
+ data_science_hyperparam_opt_pipeline,
"preprocessing": data_processing_pipeline,
Expand Down
4 changes: 2 additions & 2 deletions src/split_optimizer/pipelines/data_science/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
generated using Kedro 0.18.1
"""

from .pipeline import create_training_pipeline, create_hyperparam_opt_pipeline
from .pipeline import create_training_pipeline, create_postprocessing_pipeline, create_hyperparam_opt_pipeline

__all__ = ["create_training_pipeline", "create_hyperparam_opt_pipeline"]
__all__ = ["create_training_pipeline","create_postprocessing_pipeline", "create_hyperparam_opt_pipeline"]

__version__ = "0.1"
2 changes: 1 addition & 1 deletion src/split_optimizer/pipelines/data_science/instructor.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import torch
import torch.nn as nn
from typing import Dict, List
from typing import List

from torch.utils.data.dataloader import DataLoader
from torchmetrics.functional.classification import (
Expand Down
4 changes: 2 additions & 2 deletions src/split_optimizer/pipelines/data_science/nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from sklearn import metrics
from sklearn import metrics as sk_metrics
from typing import Dict, List
import plotly.express as px
import mlflow
Expand Down Expand Up @@ -256,7 +256,7 @@ def plot_confusionmatrix(test_output: dict, test_dataloader: DataLoader):

label_predictions = test_output["pred"]

confusion_matrix = metrics.confusion_matrix(test_labels, label_predictions)
confusion_matrix = sk_metrics.confusion_matrix(test_labels, label_predictions)
confusion_matrix = confusion_matrix.transpose()
labels = [f"{l}" for l in np.unique(test_labels)]
fig = px.imshow(
Expand Down
28 changes: 25 additions & 3 deletions src/split_optimizer/pipelines/data_science/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,29 @@ def create_training_pipeline(**kwargs) -> Pipeline:
},
outputs={"test_output": "test_output"},
name="test_model",
),
)
# node(
# mlflow_tracking,
# inputs=["model_history", "test_output"],
# outputs={"metrics": "metrics"},
# ),
],
inputs={
"train_dataloader": "train_dataloader",
"test_dataloader": "test_dataloader",
"class_weights_train": "class_weights_train",
},
outputs={
"metrics": "metrics",
"test_output": "test_output",
},
namespace="data_science",
)


def create_postprocessing_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
plot_loss,
inputs={
Expand All @@ -91,9 +113,9 @@ def create_training_pipeline(**kwargs) -> Pipeline:
# ),
],
inputs={
"train_dataloader": "train_dataloader",
"metrics": "metrics",
"test_dataloader": "test_dataloader",
"class_weights_train": "class_weights_train",
"test_output": "test_output",
},
outputs={},
namespace="data_science",
Expand Down
1 change: 0 additions & 1 deletion src/split_optimizer/pipelines/data_science/qng.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import torch
import pennylane as qml
import time
from .metric_tensor import metric_tensor

import logging
Expand Down
2 changes: 1 addition & 1 deletion src/split_optimizer/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
# }

# Directory that holds configuration.
# CONF_SOURCE = "conf"
CONF_SOURCE = "conf"

# Class that manages how configuration is loaded.
# from kedro.config import OmegaConfigLoader
Expand Down
65 changes: 65 additions & 0 deletions src/tests/pipelines/data_processing/test_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
from pathlib import Path
import numpy as np
import torch
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project


def run_preprocessing():
bootstrap_project(Path.cwd())
with KedroSession.create() as session:
output = session.run(pipeline_name="preprocessing")

parameters = session.load_context().config_loader["parameters"]["data_processing"]
train_dataloader = output["train_dataloader"]
test_dataloader = output["test_dataloader"]

return parameters, train_dataloader, test_dataloader


class TestDataPreparation:
parameters, train_dataloader, test_dataloader = run_preprocessing()
_, second_train_dataloader, second_test_dataloader = run_preprocessing()

def test_data_shape(self):
train_data, _ = next(iter(self.train_dataloader))
train_data_size = train_data.size()

test_data, _ = next(iter(self.test_dataloader))
test_data_size = test_data.size()
test_size = self.test_dataloader.dataset.data.shape[0]

assert np.all(
np.array(test_data_size) == np.array([test_size, 1, 28, 28])
), f"test_data should have the shape[1, 1, 28, 28] but has the shape {np.array(test_data_size)}"
assert np.all(
np.array(train_data_size) == np.array([self.parameters["batch_size"], 1, 28, 28])
), f"train_data should have the shape[{self.parameters['batch_size']}, 1, 28, 28] but has the shape {np.array(train_data_size)}"


def test_data_size(self):
training_size = self.train_dataloader.dataset.data.shape[0]
test_size = self.test_dataloader.dataset.data.shape[0]

assert (
training_size == self.parameters["TRAINING_SIZE"]
), f"training_size is {training_size} but should be {self.parameters['TRAINING_SIZE']}"
assert (
test_size == self.parameters["TEST_SIZE"]
), f"test_size is {test_size} but should be {self.parameters['TEST_SIZE']}"

def test_normalization(self):
train_data, _ = next(iter(self.train_dataloader))
test_data, _ = next(iter(self.test_dataloader))

assert torch.max(train_data) <= 1, "train_data is not normalized"
assert torch.max(test_data) <= 1, "test_data is not normalized"

def test_data_reproducability(self):
train_data = self.train_dataloader.dataset.data
test_data = self.test_dataloader.dataset.data
second_train_data = self.second_train_dataloader.dataset.data
second_test_data = self.second_test_dataloader.dataset.data

assert torch.all(torch.eq(train_data, second_train_data)), "data preparation pipeline is not reproducable"
assert torch.all(torch.eq(test_data, second_test_data)), "data preparation pipeline is not reproducable"
Empty file.
67 changes: 67 additions & 0 deletions src/tests/pipelines/data_science/test_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
import numpy as np
from kedro.framework.session import KedroSession
import json
from pathlib import Path
from kedro.framework.startup import bootstrap_project

def run_training():
with KedroSession.create() as session:
output = session.run(pipeline_name="test_pipeline")
data_catalog = session.load_context().catalog

data_catalog = session.load_context().config_loader["catalog"]
metrics_fig = data_catalog["data_science.metrics_fig"]["data_set"]
filepath = metrics_fig["filepath"]

with open(filepath, "r") as file:
metrics = json.load(file)

train_loss = metrics["data"][0]["y"]
train_accuracy = metrics["data"][1]["y"]
val_accuracy = metrics["data"][3]["y"]
parameters = session.load_context().config_loader["parameters"][
"data_processing"
]

return train_loss, train_accuracy, val_accuracy, parameters


class TestTraining:
bootstrap_project(Path.cwd())
train_loss, train_accuracy, val_accuracy, parameters = run_training()
second_train_loss, second_train_accuracy, second_val_accuracy, _ = run_training()

def test_training(self):
coincidence_accuracy = 1 / len(self.parameters["classes"])
# check if accuracy is better than the minimum coincidence case
assert (
self.train_accuracy[-1] > coincidence_accuracy
), f"train accuracy should be higher than {coincidence_accuracy}"
assert (
self.val_accuracy[-1] > coincidence_accuracy
), f"validation accuracy should be higher than {coincidence_accuracy} "

def test_reproducability(self):
assert np.array_equal(
self.second_train_loss, self.train_loss
), "training is not consistent"

def test_optimizer():
bootstrap_project(Path.cwd())
# iterate all optimizer, run a training
for i in ["SGD", "Adam"]:
for p in ["Adam", "SPSA", "SGD", "NGD", "QNG"]:
params = {
"data_science": {
"optimizer": {
"split": {"classical": {"name": i}, "quantum": {"name": p}}
}
}
}
# create kedroSession and change optimizer by passing extra_params
with KedroSession.create(extra_params=params) as session:
parameters = session.load_context().params["data_science"]
optimizer = parameters["optimizer"]
if "split" not in optimizer:
raise ValueError("Enable Split Optimizer in config")
output = session.run(pipeline_name="debug_pipeline")
Loading