Smart Crawler: for Efficiently Gathering Deep Web Contents
Author Name : Shivkanya Mane, Rajashree Galave, Sonal Lohabade,
Deep web is concerned with the hidden contents behind search interfaces on the World Wide Web. Due to the amount of web information is increasing speedily, vibrant nature of deep web and also because of the massive volume of deep web resources gathering information from a web record, achieving high frequency and wide coverage becomes more challenging for ordinary search engines and general purpose crawlers. So, we described the new approach to a resource discovery system called a Smart Crawler, for efficiently and effectually gathering information from deep web, by avoiding visiting a huge number of unnecessary pages. We proposed a two-stage framework which is having two stages- website locating and in-site searching is for discovering more contents which are relevant to a given search matter. Within the website locating stage. Smart Crawler performs website based searching for center pages. In this stage websites are ranked by Smart Crawler. This ranking mechanism prioritizes websites for a given search matter. Within the in-site searching stage Adaptive Link Ranking is used for speedy searching within the websites for discovering more contents which relates to a given search matter. For achieving wider coverage websites a link tree data structure is designed.