Please read the project and see the attachments before responding
1 Introduction
This project requires you to explore classication algorithms on a real world dataset, and write a
report explaining your experimental results. The language of implementation is to be C++. The other requirements are that your program be able to interpret the data format specied below, and
be able to classify instances and produce interesting statistics such as accuracy, false positive rate,
false negative rate, etc. You are free to construct whatever user interface for your program, but
you must fully document your interface.
2 Algorithm
Your algorithm should be based on the classfication algorithms: KNN, Decision Tree, SVM, Naive Bayes, Logistic Regression.
Usually a straight forward implementation of one method will not lead to satisfactory perfor-
mance. Your algorithm can be a combination of methods and should incorporate one or more
data mining techniques when the situation arises. These techniques include (and certainly
not limited to):
{ Handling imbalanced dataset
{ Proper imputation methods for missing values
{ Dierent treatment of various type of features: continuous, discrete, categorical, etc.
3 Data
You'll be examining the behavior of your model on a dataset from the UCI machine learning lab.
The dataset is represented in a standard format, consisting of 3 les. The rst le, [login to view URL],
describes the categories and features of the dataset. It also has some empirical results for your ref-
erence. The other two les are [login to view URL] and [login to view URL], containing the
actual data instances, formatted at one instance per line, as follows:
1
F1
1 ; F2
1 ; : : : ; Fk
1 ; label1
F1
2 ; F2
2 ; : : : ; Fk
2 ; label2
...
F1
n; F2
n; : : : ; Fk
n ; labeln
where Fj
i , labeli (i = 1; : : : ; n; j = 1; : : : ; k) represent the value of the jth feature and class category
for the ith instance respectively.
The data you will be examining was extracted from the census bureau database. Each instance
contains an individual's educational, demographic and family information. Prediction task is to
determine whether a person makes over 50K a year. You should use [login to view URL] to
train your classier and use [login to view URL] to evaluate the performance of your learning
algorithm.
4 Your Mission...
Deliverables for this project are:
Code to implement the classication algorithm for the data le formats given above
A README le, with simple, clear instructions on how to compile and run your
code
Testing statistics for the application of your learning algorithm. At a minimum you should
provide training set accuracy, test set accuracy
A discussion of data mining techniques employed in your algorithm
A report analyzing the behavior of your algorithm on the dataset, including any unusual or
anomalous (in your opinion) behavior
150usd for this project
Relevant Skills and Experience
I am very proficient in c and c++. I have 16 years c++ developing experience now, and have worked for more than 7 years. My work is online game developing, and mainly focus on server side.
Proposed Milestones
$150 USD - Finish this project
Let’s chat and discuss about price and work. 12 years experience in same filed you will get 100% correct work. No need to pay upfront money first check work quality and if you satisfied then pay.
Relevant Skills and Experience
12 years experience
Proposed Milestones
$50 USD - I will complete your work as per the guidelines and will deliver you according to time deadline.