Find Jobs
Hire Freelancers

Python web scrapping/crawling in nofap.com

$30-250 USD

Closed
Posted almost 6 years ago

$30-250 USD

Paid on delivery
I need to save a snapshot (all html files) of the website [login to view URL] It is an online forum that allows people to post and follow each other. I want to save the following information: 1. [login to view URL] saved as '[login to view URL]' 2. On the index page, there are 23 forums (Notice the Porn Addiction and Porn-Induced Sexual Dysfunctions are two forums when I count). I need all pages of all threads in each of the 23 forums to be saved. For example, the first forum is shown as "Rebooting - Porn Addiction Recovery". After clicking on it, it leads to [login to view URL] The ending number 2 in the previous link is an identifier. I want this page to be saved to "[login to view URL]". There are 583 pages of threads (posts) in this forum. You can save them to "[login to view URL]" all the way to "[login to view URL]". In each of these pages, there are 50 threads (a little more on the first page due to some information and announcement at the top). Each of the 50+ thread may contain multiple pages as well. I need all these pages of html files saved too. For example, the first post is "[login to view URL]". The ending number 88344 is also an identifier, I want them to be saved to "[login to view URL]" to "[login to view URL]" (5 pages of this posting thread). 3. I want all the user profile pages to be saved as well. The website ([login to view URL]) shows there are 156,726 members. You can actually enumerate all of them starting from 1 to 156726 using the following link(for user 1): [login to view URL] In this user profile page, I need html pages that show the 5 tabs "Profile Posts"(It may have multiple pages, all pages needed), "Recent Activity" ("Click on Show older items" at the bottom until the button disappears so that everything is captured), "Postings" (No need to find all since all postings are captured in the previous step), "Information", "Groups". Moreover, I want to know the user_id of the "Following" and "Followers". For example, user 1 is following 8 other users and followed by 826 users. I want 2 tables (csv or sqlite) to save the Following/Followers information, each with 2 columns. Following Table: user_id, following_user_id; Followers Table: user_id, follower_user_id. In the Following/Followers information, only 20 users are shown each page, you need to click on the more button multiple times to enumerate all users. Required: 1. The program should be able to finish running within 24 hours (Multithreading might be needed. For example, several threads can handle several forums, one thread can handle the user profile pages). The shorter the time, the better. Because I plan to scrape the websites on different days to see the change of users and posts. 2. Since I want to scrape this website in different days, it would be great to do some type of incremental scrapping. Running it the first time would save everything, but running it again would keep a "diff" type of files necessary to know what is deleted (user, user following relationship, threads). That would save a lot of hard disk space because I don't need to save duplicate html files that are already saved. 3. Python 3.5+ and other packages that you find necessary 4. The program should login to the forum before saving the html files. It is free to register. Login credentials can be provided upon requested. 5. The program will run on Linux Ubuntu 6. Clear comments in the code so that I can modify later 7. Object oriented design is preferred
Project ID: 16684496

About the project

16 proposals
Remote project
Active 6 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
16 freelancers are bidding on average $259 USD for this job
User Avatar
Dear,Sir How are you? I am very interested in your project and am ready for starting your project for now. I have experienced in developing Python, Web Scraping. I will work very hard and best for you. Best Regards
$155 USD in 3 days
5.0 (53 reviews)
7.0
7.0
User Avatar
Hi there, I just checked the project details and i'm very interested to discuss with you. I have great knowledge in web scraping and i use python. Feel free to pm so that we can discuss and share sample work! Regards. Sohan D.
$250 USD in 3 days
5.0 (211 reviews)
7.2
7.2
User Avatar
A proposal has not yet been provided
$30 USD in 2 days
5.0 (127 reviews)
6.9
6.9
User Avatar
Hello, I have the good knowledge of Python web scrapping/crawling in nofap.com. I have more than 5 years of experience in Python, Web Scraping . We have worked on several similar projects before! We have worked on 300+ Projects. Please check the profile reviews. I can deliver your job with in your deadline. Please ping me for more discussion. I can assure the 100% job satisfaction. Thanks,
$300 USD in 3 days
4.9 (33 reviews)
6.1
6.1
User Avatar
Can do it with selenium/scrapy or beautifulsoup of python whatever you want.
$100 USD in 3 days
4.9 (38 reviews)
5.1
5.1
User Avatar
Hello i suggest to implement the crawler in java to support any OS linux and windows the crawler will be multithread and gives as output a xls file or a db file as you want i invite you to discuss more over chat Thank you in advance
$150 USD in 3 days
4.6 (26 reviews)
4.8
4.8
User Avatar
I have 6 year experience Freelancer,up work,Fiverr & 99design market place I have seen your project that i can to do easily because I have many experience to Graphic Design,Webdesign,Web Develop & programming .So I could create it for you as soon as that easy but how this have done ,,,,please discuses before start job,,,,
$155 USD in 3 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I am pretty much familiar with the task and do similar things frequently.
$222 USD in 5 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I checked all your requirements properly. Would be able to scrap the info. Much skilled in python. Rate: $18/hr Let us discuss and start. Sandeep
$250 USD in 5 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
United States
0.0
0
Member since Apr 12, 2018

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.