Internship Overview
- Position: Data-Science Intern – Patent Classification Project
- Company: Umicore Belgium
- Internship Period: September 2024 – December 2024
Developed an automated patent classification system for battery-related technologies. Transitioned from the Bag of Words (BoW) model to advanced machine learning models like PatentBERT to improve classification accuracy and efficiency.
Project Responsibilities
- Analyzed the BoW model.
- Implemented and fine-tuned PatentBERT for classification.
- Handled class imbalance and large text processing.
- Experimented with data augmentation and meta-model embeddings.
- Used MLflow for experiment tracking.
Technologies Used
- Machine Learning: PatentBERT, BatteryBert, Huggingface Transformers, MLflow
- Programming: Python, PyTorch
- Data Processing: Pandas, NumPy
- Experiment Tracking: MLflow
- Microsoft Azure: Azure Cloud
Files for Reference
The internship project includes three main files:
- Project Plan: Overview of the internship and project details.
- Thesis: Document detailing the experiments and results from the project.
- Reflection: Personal reflection on the internship experience.