Posted Date : 07th Mar, 2025
Peer-Reviewed Journals List: A Guide to Quality Research Publications ...
Posted Date : 07th Mar, 2025
Choosing the right journal is crucial for successful publication. Cons...
Posted Date : 27th Feb, 2025
Why Peer-Reviewed Journals Matter Quality Control: The peer revie...
Posted Date : 27th Feb, 2025
The Peer Review Process The peer review process typically follows sev...
Posted Date : 27th Feb, 2025
What Are Peer-Reviewed Journals? A peer-reviewed journal is a publica...
Web Scraping and Summarization using Machine Learning Techniques
Author Name : Mohammad Ibrahim ElShatarat, Hamza Ahmed Abdinoor
DOI: https://doi.org/10.56025/IJARESM.2024.1209241380
ABSTRACT This paper proposes a novel system which implements web scraping as well as automated text summarization through the use of modern machine learning and natural language processing (NLP) [2]. It uses Playwright and BeautifulSoup libraries written in Python to scrape information from both regular and interactive websites [8][11]. The extracted information is summarized using techniques like spaCy, Hugging Face Transformers (BART), and Sumy and makes it possible to provide both extractive as well as abstractive summaries [7][12][13]. The system is developed using Django framework to create a web application through which users can enter URLs and see scraped and summarized content in real time. Moreover, it also comes with a compare page where one can compare different summarization libraries and you can summarize normal text without extraction. Asynchronous processing makes it easier to manage lots of tasks at the same time without the need of waiting for one to be completed. The performance of summarization models is assessed with the help of metrics such as processing time, precision, recall and F1 score [12]. The system is very relevant in fields such as journalism, finance and health where there is need to analyze a large amount of data. The future development will be directed towards the enhancement of the summarization techniques and the extension of the system to Multimedia data