+1 vote
by
In my work, I deal with different type of data and try to develop machine learning models to learn relationship within that data. My data consist of a mix of images, tabular data or even text data. During my work I usually deal with issues, where my data runs through different processing pipelines. These pipelines on one hand can take a while (up to several hours) and also renames the files. For my work, I aim to guarantee reproducibility and also share my data.

Therefore, my question: Are there any tools that help machine learning practitioners to simplify there

2 Answers

0 votes
by (680 points)
 
Best answer
For organizing research date we have developed Kadi4Mat (https://kadi.iam.kit.edu/) which helps you to organize your date and enable them to be used with in conjunction different ML/AI methods (https://kadi.iam.kit.edu/kadi-ai). Check out the website to find more cited work how this can be applied to various applications.
0 votes
by (970 points)

For dealing with research software, there is archetype Betty:
https://nfdi4ing.de/archetypes/betty/

Still, there is no best practice on how to manage data in the context of AI applications.
You would have to document the inputs, used software and outputs of every step to make it reproducible.

All the best
Tobias

The NFDI4Ing Q&A platform is here to empower researchers in the engineering sciences with a collaborative space to ask and answer questions about their research data management. Whether you're a seasoned expert or just starting out, this platform is designed to foster knowledge exchange and support your research journey.
NFDI4Ing is supported by DFG under project number 442146713
...