IJARESM Menu

Download

Latest News

IJARESM Impact Factor 9.175

Posted Date : 02nd Jan, 2026

International Journal of All Research Education & Scientific Metho...

More...
Peer-Reviewed Journals List

Posted Date : 07th Mar, 2025

Peer-Reviewed Journals List: A Guide to Quality Research Publications ...

More...
How to Choose the Right Peer-Reviewed Jo...

Posted Date : 07th Mar, 2025

Choosing the right journal is crucial for successful publication. Cons...

More...
Why Peer-Reviewed Journals Matter ?

Posted Date : 27th Feb, 2025

Why Peer-Reviewed Journals Matter Quality Control: The peer revie...

More...
What is Peer Review Process?

Posted Date : 27th Feb, 2025

The Peer Review Process The peer review process typically follows sev...

More...

Visitor Counter

9653592164

OCR and Image Captioning Using Florence-2 Vi...

You Are Here :

Issues

Volume 12

Issue 12 (December 2024)

OCR and Image Captioning Using Florence-2 Vi...

OCR and Image Captioning Using Florence-2 Vision Transformer Model

Author Name : Sushant Kumar Mahato, Janaki Kandasamy, Deepshik Sharma

ABSTRACT This research project explores advancements in Optical Character Recognition (OCR) and image caption generation using Microsoft’s Florence-2 Vision Transformer model, a cutting-edge deep learning framework in computer vision. Florence-2, known for its robustness in processing visual data, leverages large-scale pretraining on multimodal datasets, enabling high accuracy in both text recognition and visual understanding tasks. The project aims to develop an integrated system for OCR and automatic captioning, where the Florence-2 model is adapted for recognizing complex text layouts and generating descriptive captions for varied visual contexts. This dual functionality is crucial for applications in assistive technology, document analysis, and autonomous systems that require contextual understanding of text and image data simultaneously. By fine-tuning Florence-2 on diverse datasets, including documents, scenes with embedded text, and web images, the model's capacity for nuanced captioning and accurate OCR is enhanced. We evaluate the system’s performance through metrics like BLEU and ROUGE for captions, and character accuracy rates for OCR. Our results demonstrate that Florence2’s multimodal learning approach improves accuracy in OCR and contextual richness in captions, suggesting its potential for transformative applications in content accessibility and automated document processing.