Hi! Sir. I have read the project description carefully, and, checked the Wayback Machine to get an idea of the project requirements. We can work as a team and execute the project. My execution strategy is:
1. We will use scrapy ( a web crawling framework) and selenium ( browser automation tool), to get and parse the html of archived sites.
2. The acquired data, will be cleaned and stored in a MySQL or sq lite database.
3. We will deploy our spiders on cloud (scrapy cloud!!its free), and schedule them to run automatically.
Before starting on the project, I will need more details, and if possible, a sample file to reduce the chances of misunderstanding. I agree to all your terms and conditions, and i will sign formal agreement with you or your institution about the project. Kindly consider my proposal, and chat with me.
Regards,
Hamza Zubair