DistilBERT Token Classification Model for Unit Conversion

  • Fine-tuned DistilBERT using transfer learning to develop a token classification model for recognizing unit values and conversion entities in natural language text.
  • Trained on a large custom dataset (maliknaik/natural_unit_conversion) with over 830,000 labeled examples, optimizing accuracy in unit conversion tasks.
  • Achieved high F1-score performance, demonstrating robust named entity recognition (NER) capabilities for unit-related text processing.
  • Utilized Hugging Face Transformers and PyTorch, leveraging GPU-accelerated training (NVIDIA Tesla P4) for efficient model performance.
  • Published the model on Hugging Face, making it openly accessible under the CC0-1.0 license for research and commercial applications.