International Journal of All Research Education & Scientific Methods

An ISO Certified Peer-Reviewed Journal

ISSN: 2455-6211

Latest News

Visitor Counter
4671267655

Web Scraping and Summarization using Machine ...

You Are Here :
> > > >
Web Scraping and Summarization using Machine ...

Web Scraping and Summarization using Machine Learning Techniques

Author Name : Mohammad Ibrahim ElShatarat, Hamza Ahmed Abdinoor

DOI: https://doi.org/10.56025/IJARESM.2024.1209241380

 

ABSTRACT This paper proposes a novel system which implements web scraping as well as automated text summarization through the use of modern machine learning and natural language processing (NLP) [2]. It uses Playwright and BeautifulSoup libraries written in Python to scrape information from both regular and interactive websites [8][11]. The extracted information is summarized using techniques like spaCy, Hugging Face Transformers (BART), and Sumy and makes it possible to provide both extractive as well as abstractive summaries [7][12][13]. The system is developed using Django framework to create a web application through which users can enter URLs and see scraped and summarized content in real time. Moreover, it also comes with a compare page where one can compare different summarization libraries and you can summarize normal text without extraction. Asynchronous processing makes it easier to manage lots of tasks at the same time without the need of waiting for one to be completed. The performance of summarization models is assessed with the help of metrics such as processing time, precision, recall and F1 score [12]. The system is very relevant in fields such as journalism, finance and health where there is need to analyze a large amount of data. The future development will be directed towards the enhancement of the summarization techniques and the extension of the system to Multimedia data