Cancelled

Applying text algorithm again

The aim of this project is to improve a classification+clustering method developed in a previous project, based on some technical issues that were identified.

1. The team is going to send the freelancer a list of irrelevant words.

2. As agreed in the end of the previous project, the same overall method should be carried out, with the same tools. As before, we are going to try three different levels of tolerance for the algorithm. We are going to try five versions of algorithm+clusterization to deal with the irrelevant word problem:

i. considering only words which occur in less than 10% of entries to create the algorithm and clustering

ii. considering only words which occur in less than XX% of entries (another cut-off chosen after the results from iii.) to create the algorithm and clustering

iii. ignoring words from the list of irrelevant words #1 to create the algorithm and clustering

iv. ignoring words from the list of irrelevant words #2 to create the algorithm and clustering

v. a version combining the use of the best list of irrelevant words and the best frequency cut-off

(In all versions we use the tool to fix typos. For each version we test three tolerance levels.)

3. In each case, compute the silhouette score both on predicted codes and on clusters

4. The team will assess the results using the list of irrelevant words #1 and if necessary bring some modifications to the list for the algorithm and clustering to be re-run (version ii. using list of irrelevant words #2).

5. Once the algorithm and clustering are finalized: assign a predicted code to each cluster, by comparing of "mean cluster sentence" with all code descriptions (from initial learning dataset + additional codebook) to choose the best matching code description.

6. The project ends when the algorithm and the clustering perform in a satisfactory way.

The team will then receive from the freelancer the codes/tool allowing them to re-run the exact same algorithm and clustering in the future and adjust them if necessary.

Skills: Python, R Programming Language

See more: sample test case document for registration page, php algorithm text plagiarism detection, java cluster algorithm text, algorithm text clustering, search text algorithm, applying texture text illustrator, anchor text link domain algorithm, algorithm search long text php database, algorithm parsing formated text files java, php algorithm search text file, text search algorithm php mysql, search text algorithm php, anchor text link per domain algorithm, algorithm find keyword text java, algorithm search large text file php, paintbrush test case, selenium test case test suite, php search algorithm text file, selenium test suite test case, test case database manual testing, write test case part time, test case preparation methodology, standard test case wikipedia parser, test case test suite selenium, selenium test case php

About the Employer:
( 4 reviews ) New York, United States

Project ID: #12573990

17 freelancers are bidding on average $2114 for this job

lkhelladi

Hello, I'd be glad to continue implementing the text analysis tool for you . Looking forward to chat with you soon for more details. Best regards,

$1500 USD in 10 days
(40 Reviews)
5.4
tudiptechnology

Hi, Let me keep this really short as i am sure you would be swamped with proposals :)! We have been developing/maintaining various web applications in Python Django. Mostly these applications are hosted on clouds More

$2500 USD in 20 days
(2 Reviews)
4.8
$1500 USD in 10 days
(30 Reviews)
4.8
szymszteinsl

Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to work !Ready to More

$1500 USD in 14 days
(5 Reviews)
4.1
winnow1

I have the following experience in Text Mining: Term Frequency (TF) & Inverse Document Frequency (IDF) calculations Text classification and applying others ML algorithms (clustering, association etc.) Feature extrac More

$3000 USD in 30 days
(1 Review)
3.7
yamaf555

Hello sir, Hope you doing well, i read your project description so pls come technical discussion then we understand negotiate cost, timeline then we move proceed further , also i show my past work when we discuss.. tha More

$1666 USD in 20 days
(5 Reviews)
3.7
$2000 USD in 20 days
(12 Reviews)
3.7
abhijitbuet

Hi, I'm Abhijit Mondal from Bangladesh and my background is in Computer Science and Engineering at Bangladesh University of Engineering and Technology. I have done my major in artificial intelligence and completed ne More

$1500 USD in 30 days
(7 Reviews)
3.2
aki003iitr

4 years of experience in data science'.Data science and analytics professional with excellent coding skills in R and Python .  - Proficient in R, Python, SQL,Matlab ; Hands on experience with VBA & Tableau - Statisti More

$1555 USD in 30 days
(4 Reviews)
3.7
$1578 USD in 15 days
(2 Reviews)
3.3
$1777 USD in 30 days
(2 Reviews)
3.2
suyashdhoot

Hi I am a very experienced statistician and analyst. I have completed several PhD level thesis projects involving advanced statistical analysis of data. I have worked with data from several companies and have done proj More

$3000 USD in 30 days
(3 Reviews)
2.8
AleenaIlyas

Hello, I have read what you exactly need, however I would like to ask you a few questions. I do work smart and do not rest until I get the job done. Please feel free to ping me anytime so we can have a detailed discus More

$2500 USD in 30 days
(1 Review)
2.4
$1555 USD in 30 days
(6 Reviews)
2.4
denjohx

Hi my name is Dennis, I am partner I represent [url removed, login to view], We have a multi-disciplinary personnel ready to work in any project related software development. Among our recent projects we can mention: - http More

$2500 USD in 10 days
(1 Review)
3.5
expertdevteam

Hello Sir, We are an Indian development company here. we have checked your posted details here and want more clarification in it, so message us to discuss on it more then we will able to move on it. Thanks

$1518 USD in 30 days
(0 Reviews)
0.0
prashushinde9

Hello, We have accomplished 90% of the project which is similar of your requirement. All we need 10% customization as per your requirement set and specifications. I want to discuss in personal chat in order to explore More

$3092 USD in 50 days
(0 Reviews)
0.0
urgentprogrammer

Hello, First, may you please provide the previous code? It will help me understand what you need. Also, a lot of what you requested already exists in speech recognition libraries. Has anyone attempted a solution More

$2500 USD in 30 days
(0 Reviews)
0.0
chyconsl

Hi, I have read and understood the project outline and will gladly offer an outstanding service. Please give me a chance. A trial will convince you. Looking forward to work with you.

$1666 USD in 30 days
(0 Reviews)
0.0
dinovoloder

Hi, Dino here, I would be interested in discussing this project with you. Thanks for the consideration, I hope to hear from you [url removed, login to view] check my profile. [url removed, login to view] [url removed, login to view] More

$2500 USD in 30 days
(0 Reviews)
0.0