Find Jobs
Hire Freelancers

Extract Keyword Data from AWS Common Crawl Repository

$30-250 USD

Cancelled
Posted over 7 years ago

$30-250 USD

Paid on delivery
We are looking for a person with experience in AWS EMR and the Common Crawl (CC) Data set on AWS. The person should be able to independently create an EMR job to extract keywords and count them on a URL page basis given a list of TLDs to search (or alternatively for all URLs in CC). The project has two parts: 1) Run a sample project to prove that cost won't be too high on a limited set of URLS, for example [login to view URL], [login to view URL], msnbc.com. 2) Generate output that allows us to lookup any full URL without parameters and retrieve a list of keywords for that page. The reverse should also be possible, search for a keyword with a threshold, and list all URL's where that keyword occurs more times than the specified threshold. The ability to perform these searches are included in the project. We do not need UI to perform the searches, a command line or AWS API call is acceptable, as long as the output can be saves in S3 buckets for each search.
Project ID: 11689728

About the project

2 proposals
Remote project
Active 8 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
2 freelancers are bidding on average $417 USD for this job
User Avatar
Hey there, I've got extensive experience with Amazon AWS, Google Cloud and other cloud platforms. You can confirm this by checking out my profile page, you will see lots of AWS related projects. I'm well versed in Linux System Administration and top 5% of Linux expert here. I've worked with almost all of the Amazonn AWS services including Lambda, EMR, DynamoDB, CodeDeploy, ElasticBeanstalk, Elastic Load Balancer (ELB) in order to provide high availability with Scaling groups in order to load new instances whenever certain parameter metrics match. Further more, I've done deployments with LAMP/Ruby stack as well as Docker. I've also used puppet/chef and git for deployments. So I'm perfectly suitable for this kind of job. Please feel free to ask if you've any questions. Thanks
$500 USD in 10 days
4.8 (76 reviews)
6.0
6.0
User Avatar
Looking forward to discuss further details about the project and deliver the same to your needs .
$333 USD in 6 days
4.4 (15 reviews)
4.5
4.5

About the client

Flag of UNITED STATES
San Diego, United States
0.0
0
Payment method verified
Member since Aug 3, 2011

Client Verification

Other jobs from this client

Pokki Music
$250-750 USD
Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.