Posted Date : 07th Mar, 2025
Peer-Reviewed Journals List: A Guide to Quality Research Publications ...
Posted Date : 07th Mar, 2025
Choosing the right journal is crucial for successful publication. Cons...
Posted Date : 27th Feb, 2025
Why Peer-Reviewed Journals Matter Quality Control: The peer revie...
Posted Date : 27th Feb, 2025
The Peer Review Process The peer review process typically follows sev...
Posted Date : 27th Feb, 2025
What Are Peer-Reviewed Journals? A peer-reviewed journal is a publica...
OCR and Image Captioning Using Florence-2 Vision Transformer Model
Author Name : Sushant Kumar Mahato, Janaki Kandasamy, Deepshik Sharma
ABSTRACT This research project explores advancements in Optical Character Recognition (OCR) and image caption generation using Microsoft’s Florence-2 Vision Transformer model, a cutting-edge deep learning framework in computer vision. Florence-2, known for its robustness in processing visual data, leverages large-scale pretraining on multimodal datasets, enabling high accuracy in both text recognition and visual understanding tasks. The project aims to develop an integrated system for OCR and automatic captioning, where the Florence-2 model is adapted for recognizing complex text layouts and generating descriptive captions for varied visual contexts. This dual functionality is crucial for applications in assistive technology, document analysis, and autonomous systems that require contextual understanding of text and image data simultaneously. By fine-tuning Florence-2 on diverse datasets, including documents, scenes with embedded text, and web images, the model's capacity for nuanced captioning and accurate OCR is enhanced. We evaluate the system’s performance through metrics like BLEU and ROUGE for captions, and character accuracy rates for OCR. Our results demonstrate that Florence2’s multimodal learning approach improves accuracy in OCR and contextual richness in captions, suggesting its potential for transformative applications in content accessibility and automated document processing.