Build a FB Scraper using Macros, PHP cURL or anything else!
$750-1500 USD
In Progress
Posted over 10 years ago
$750-1500 USD
Paid on delivery
** As FB uses Ajax to load its content, it is important that you make sure that you are able to do this job before placing a bid. **
Summary
This project is to build a scraper using macros – ie. iMacros – or PHP Curl’s for Facebook’s Newsfeed (“NF”) ads.
--
Identifying NF Ads
Facebook's NF ads the ones that show between status updates of your friends here: [login to view URL] The challenge is, there's several different formats of NF ads as can be seen here: [login to view URL] What we're looking to do is identify story/update DIVs that contain the word "Sponsored" - as this is the common attribute among all NF ads. Then extract data from those DIVs only.
Here are a few sample NF ads:
[login to view URL]
[login to view URL]
[login to view URL]
We will provide you with the accounts that are display these ads.
--
Logging Into Accounts
When logging in to each account, the macro will need to ensure that the appropriate proxy for each account is used – which we will provide. When switching between accounts, the macro will need to clear all Temporary Internet Files (including Cookies) to ensure that the accounts aren't linked together.
--
Extracting NF Ads
The scraper will need to continue scrolling to the bottom of the page, until it encounters the following messages:
* Old FB Format: "Add your friends to see more of their photos and stories in your news feed."
* New FB Format: "There are no more posts to show right now."
It will then need to identify the DIVs that contain the text "Sponsored" and:
* Identify a unique parameter (to keep track of when it was first seen, last seen, times seen)
* Extract all the content within the DIV
* Click the advertiser's link and record all URL redirects
* Load and save any images displayed within the DIV
* Save which account saw the advertisement
* When the advertisement was first seen
* When the advertisement was last seen - if the same ad has been seen several times
* The number of times the advertisement was seen - if the same ad has been several times
However, we do not need to record social information such as the number of likes, comments, etc. All this information will need to be saved in whichever format you prefer.
** As FB uses Ajax to load its content, it is important that you make sure that you are able to do this job before placing a bid. **
Dear Customer!
I am an expert PHP developer with over 6 years of experience and very interested to work on this project. Available to start immediately and finish as soon as possible. My bid is for fast professional service exciting my customers. Please contact in PMB to discuss details.
Best Regards,
Zeke
hi
we could do it perfectly with accuracy,
please let us know if you want any clarification.
if you want us to share our skill and previous work
please initiate the chat..
MY WORK
I have developed (in the Java Programming Language) a Generic Web-Scraper Tool - called OpenMana Web Information Miner (OmanaWIM or OWIM) - that can be configured to scrape any information from any website.
It can do log-in, process JavaScript / AJAX call results, chase multi-level links, post search-forms and handle pagination; can accept / process response in XML; can download images and files; is multi-threaded in a configurable way; can use proxies; supports user-specifiable filters; scraped info can be delivered in JSON or XML / posted to database or Excel/CSV.
THERE WILL BE NO NEED TO WRITE SITE-SPECIFIC-CODE. CAN WORK FOR FUTURE NEEDS ALSO. When page / navigation structure on the web site changes, no need to write new code - just tweak the configuration.
This tool can also straightaway work with sites exposing HTTP-protocol-based APIs / web-services.
MY SOLUTION
I propose a solution in Java, built on top of my OmanaWIM tool. The solution will use the following open-source libraries:
1. Selenium WebDriver with FireFox
2. HtmlUnit
3. The Castor Framework
4. JExcel / SuperCSV / GSON
COMMERCIALS:
Deliverables:
1. Perpetual Non-exclusive non-transferable node-bound Use Licence for the OmanaWIM Tool with executable Java Application for scraping the multiple web-sites.
2. Custom Java classes for populating database / Excel document.
3. Configuration-files
ME
1.I am a full-time freelancer, with 15+ years of rich experience in software development.