Automatic Blog Searcher - takes input parameters, finds related blogs with high PageRank
$100-500 USD
Closed
Posted over 13 years ago
$100-500 USD
Paid on delivery
This the functionality we want:
Navigate to a web page (eg [login to view URL])
Page will have the following form fields for input:-
- Search Term
- Minimum PageRank
- Maximum Outbound Links
- Email
- # of links to return
- Maximum Time to Search (minutes)
The script will accept these as input, and do the following -
- Go to [login to view URL] (and/or some other equivalent site which searches blogs?)
- conduct the search using the input Search Term
- step through the search results; for each result, check the Google PageRank and number of outbound links. If the PageRank is higher than the input Minimum PageRank and the number of outbound links is lower than the input Maximum Outbound links, add the site to an output CSV file
- output CSV file will be of the format:
URL, PageRank, Number of Outbound Links, Latest Blog Date
"Latest Blog Date" is found by navigating to the blog's home page and finding the date of the most recent post. If the script can't find one, it can leave it blank, but for example Blogspot makes this easy by use of the class "date-header".
- the script should continue to cycle through the blog search results, until the output CSV file has reached the "# of links to return" from the input criteria OR the "Maximum time to search" from the input criteria has passed
Send an email to the input Email when the file is ready, containing either the file as an attachment or a link to the file online.
The script needs to be able to handle large numbers without getting blocked by Google. For example, the script may be asked to find 100 blogs of Minimum PageRank 3, so it might have to search through thousands and thousands of possibilities in order to find this.
## Deliverables
The definition of an "outbound link" is any link on the page (ie a <a href= > tag) which links to a different domain or subdomain, and which does NOT have a "rel=nofollow" on it.