Applying text algorithm again, part1

Note: The first three payment/milestones below fall under “Applying text algorithm again, part1”, and the second three under “Applying text algorithm again, part2”

The aim of this project is to improve a classification+clustering method developed in a previous project, based on some technical issues that were identified.

1. The team is going to send the freelancer a list of irrelevant words.

2. As agreed in the end of the previous project, the same overall method should be carried out, with the same tools. As before, we are going to try three different levels of tolerance for the algorithm. We are going to try five versions of algorithm+clusterization to deal with the irrelevant word problem:

i. considering only words which occur in less than 10% of entries to create the algorithm and clustering

ii. considering only words which occur in less than XX% of entries (another cut-off chosen after the results from iii.) to create the algorithm and clustering

iii. ignoring words from the list of irrelevant words #1 to create the algorithm and clustering

iv. ignoring words from the list of irrelevant words #2 to create the algorithm and clustering

v. a version combining the use of the best list of irrelevant words and the best frequency cut-off

(In all versions we use the tool to fix typos. For each version we test three tolerance levels.)

3. In each case, compute the silhouette score both on predicted codes and on clusters

4. The team will assess the results using the list of irrelevant words #1 and if necessary bring some modifications to the list for the algorithm and clustering to be re-run (version ii. using list of irrelevant words #2).

5. Once the algorithm and clustering are finalized: assign a predicted code to each cluster, by comparing of "mean cluster sentence" with all code descriptions (from initial learning dataset + additional codebook) to choose the best matching code description.

6. The project ends when the algorithm and the clustering perform in a satisfactory way.

The team will then receive from the freelancer the codes/tool allowing them to re-run the exact same algorithm and clustering in the future and adjust them if necessary.

Payment plan:

First $278 for 2i, including algorithm results, clusterization results, silhouette score, and predicted code assigned to each cluster

Second Same, for 2ii

Third Same, for 2iii

Fourth Same, for 2iv

Fifth Same, for 2v

Sixth $280 for all codes/tools necessary to re-run and adjust the algorithm, clusterization, silhouette score calculation, and code for assigning codes to clusters

Skills: Python, R Programming Language

See more: php algorithm text plagiarism detection, java cluster algorithm text, algorithm text clustering, highlight part ritch text box, text statistics python sentence length, perl part line text, html hide part text, php script read part text file, javascript show part text, css show part text, richtextbox part text, show hide part text web page, algorithm search large text file php, random sentence generator algorithm, thesis sentence comparing novels, english sentence generation algorithm, php create text sentence, php search algorithm text file, web part rich text, editing text part graphic image template

About the Employer:
( 4 reviews ) New York, United States

Project ID: #12583580

Awarded to:


This is Andrey. Thank you for your proposal. I have to enter at least 100 characters to submit proposal.

$833 USD in 30 days
(6 Reviews)

8 freelancers are bidding on average $1149 for this job


Masters of science & professional statistician is here to help.................................................................................................................................

$1500 USD in 20 days
(17 Reviews)
$750 USD in 20 days
(35 Reviews)

4 years of experience in data science'.Data science and analytics professional with excellent coding skills in R and Python .  - Proficient in R, Python, SQL,Matlab ; Hands on experience with VBA & Tableau - Statisti More

$1000 USD in 20 days
(30 Reviews)

Hi I am a very experienced statistician and academic writer. I have completed several PhD level thesis projects involving advanced statistical analysis of data. I have worked with data from several companies and have d More

$1500 USD in 20 days
(15 Reviews)

note: I have previous experience in similar works hi, basically I'm an electronics engineer. expert in python. Surely I could help you. Come to chat for more discussion. thank you

$1250 USD in 20 days
(5 Reviews)

We can discuss I can do in R. . Regards

$1250 USD in 20 days
(1 Review)

Data Analyst/Scientist with more than 6+ years of experience in R Language, SPSS, STATA, SAS, MINITAB. I have been doing descriptive and inferential statistics Key Techniques are Regression Model Binary Logistic More

$1111 USD in 15 days
(0 Reviews)