The aim of this project is to improve a classification+clustering method developed in a previous project, based on some technical issues that were identified.
1. The team is going to send the freelancer a list of irrelevant words.
2. As agreed in the end of the previous project, the same overall method should be carried out, with the same tools. As before, we are going to try three different levels of tolerance for the algorithm. We are going to try five versions of algorithm+clusterization to deal with the irrelevant word problem:
i. considering only words which occur in less than 10% of entries to create the algorithm and clustering
ii. considering only words which occur in less than XX% of entries (another cut-off chosen after the results from iii.) to create the algorithm and clustering
iii. ignoring words from the list of irrelevant words #1 to create the algorithm and clustering
iv. ignoring words from the list of irrelevant words #2 to create the algorithm and clustering
v. a version combining the use of the best list of irrelevant words and the best frequency cut-off
(In all versions we use the tool to fix typos. For each version we test three tolerance levels.)
3. In each case, compute the silhouette score both on predicted codes and on clusters
4. The team will assess the results using the list of irrelevant words #1 and if necessary bring some modifications to the list for the algorithm and clustering to be re-run (version ii. using list of irrelevant words #2).
5. Once the algorithm and clustering are finalized: assign a predicted code to each cluster, by comparing of "mean cluster sentence" with all code descriptions (from initial learning dataset + additional codebook) to choose the best matching code description.
6. The project ends when the algorithm and the clustering perform in a satisfactory way.
The team will then receive from the freelancer the codes/tool allowing them to re-run the exact same algorithm and clustering in the future and adjust them if necessary.
17 freelancers are bidding on average $2114 for this job
Hello Sir, We are an Indian development company here. we have checked your posted details here and want more clarification in it, so message us to discuss on it more then we will able to move on it. Thanks
Hi, I have read and understood the project outline and will gladly offer an outstanding service. Please give me a chance. A trial will convince you. Looking forward to work with you.