I need a web scraper written for the following url:
[login to view URL]
Click on "Swift" tab at the top of the page, in the middle. Data to be scraped will be under the "Swift" tab.
All pages will need to be retrieved not just page one.
The data on this site changes and the number of pages will vary, however, we need to scrape data from all available pages.
The output should be a pipe (|) delimited file with the following column mappings:
origin_city --> data located in the "Origin" column before the comma
origin_state --> data is the 2 letter abbreviation in the "Origin" column after the comma
the 5 digit zip code after the abbreviation is not needed
ship_date --> data located below the origin_city in the "Origin" column, changed to the
YYYY-MM-DD format, the time located after the date is not needed
destination_city --> data located in the "Destination" column before the comma
destination_state --> data is the 2 letter abbreviation in the "Destination" column after the
comma, the 5 digit zip code after the abbreviation is not needed
receive_date --> data located below the destination_city in the "Destination" column,
changed to the YYYY-MM-DD format, the time located after the date is not needed
trailer_type --> data located in the "Equipment" column
load_size --> add the text "Full" to this column
weight --> data located in the "Weight" column
length --> Leave blank
width --> leave blank
height --> leave blank
trip_miles --> data located in the "Miles" column
pay_rate --> leave blank
contact_phone --> 800-477-8025
contact_name --> leave blank
tarp_required --> leave blank
comment --> leave blank
load_number --> leave blank
commodity --> leave blank
The first line of the output should contain all of the column headers.
Any field that contain no data should be left blank.
Please do not use words like "null" or "blank" in blank columns.
Below is a sample output of the first 5 columns using sample data:
The deliverable will be a Perl .pl file that must run on
Ubuntu Linux and must use Modern::Perl. The Perl .pl file
should be called '[login to view URL]' and the output file should be
called '[login to view URL]'
It will be scheduled in cron to run unattended every 15 minutes.
Please specify what language/OS/modules you plan to use.
Also, please include the word "raccoon" in your bid so I know that
you read this description.