Apache Nutch Expert needed to diagnose the problem with partial crawling.
$10-30 USD
Closed
Posted over 9 years ago
$10-30 USD
Paid on delivery
I have a Apache Nutch 1.7 application running on Hadoop YARN 2.3.0 , I am facing two problems.
1. I have 10 urls(domains) in my seed list Nutch only cralws 5 of them.
2. The 5 domains that are being crawled in step #1 are being crawled only partially , meaning about only 5 to 10 % of the possible pages are being crawled.
I believe this is a problem , because of some configuration issues , I have plaed with different values of depth and topN while starting the crawl , but still faced the same issue, I need someone to help me point out the problem.