Find Jobs
Hire Freelancers

Data science python - implement feature selection from large CSV

$50-140 USD

In Progress
Posted over 4 years ago

$50-140 USD

Paid on delivery
Implement feature selection from a very large dimensionality dataset: You need to implement one function in python. The input to the function is: 1. string - Path to a csv file 2. integer - n_dimensions - Desired number of output dimensions The function needs to select the best n_dimensions by fitting RandomForest **iteratively** classifier and selecting the features based on feature_importances_ . See: [login to view URL] IMPORTANT: 1. The size of the csv is very large: might be >5gb and the number of dimensions >1M. Therefore, your main task in this project is to find the best chunk size to read the data from the CSV and train the RandomForest classifier, so the function will work in any circumstances, even with very large input files. In other words, if the number of output dimensions is 10K and the number of input dimensions is 1M, the optimal way might be to fit 20 RandomForests, each with 50K, select 10K from each. Then - merge the 200K dimensions selected in the first iteration, and fit another 4 RandomForests, each with 50K. In the last iteration, fit one RandomForest with 40K and finally select the last 10K. 2. Regarding how to select how many iterations and how many random forests to fit: These could be extra parameters to the function. I also expect you to research and recommend best parameters based on execution time and memory usage. The function needs to run well on the client side which is normally a windows 10 pc with 4-8Gb RAM and 2-4 cores. Reading the input into numpy arrays should be done smartly, without loading the entire csv to memory. (maybe numpy has this required functionality built-in) 3. The function should support both parallel and serial modes. In parallel mode, the function should utilize N cores of the PC, by fitting several RandomForests on parallel, each on a separate core. In serial mode the function should fit the RandomForests one by one. 4. The function should be able to execute on python 3.5 environment on both windows. You must test it on windows! 5. The function should output to the console its progress 6. I will send example csv to coders with good experience
Project ID: 23258052

About the project

17 proposals
Remote project
Active 4 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
17 freelancers are bidding on average $109 USD for this job
User Avatar
I am a Python data science expert with experience in Classification and partitioning,Nureal Networks, Association rules. Also I am a Oracle Certified professional (OCP) with experience in Oracle,MySQL, SQL Server and MongoDB. I can help you with your requirements. Please initiate online chat.. so we can proceed with discussion.
$140 USD in 7 days
4.9 (48 reviews)
6.0
6.0
User Avatar
Hello there, I have read through your project description. I can help you complete this project. I will be looking forward to hear from you. Please contact me on PM for details.
$150 USD in 7 days
4.6 (53 reviews)
6.1
6.1
User Avatar
Hello, I am an MSC majored in mathematics. I have rich exp in ACM/ICPC and deep understandings of algorithms. I attended at ACM regional contest several times and won medals there. I am an MSc majored in mathematics. (Probability and Statistics in detail) I won medals in IMO (International Mathematics Olympiad). I also have experience in online algorithm contests such as Codechef and Hackerrank. Consultation is also welcome. I have have rich experience in mathematical and algorithm problems. I can help you get insights on the data you described. Master of Mathematics Algorithm(exp. in ACM/ICPC)/Artificial intelligence(AI) Machine Learning (Neural Networks), Deep Learning C++/C#/Matlab/R/Python/java I will be more than excited to provide you a quality solution and earn your respect, confidence and trust
$50 USD in 1 day
4.6 (20 reviews)
5.3
5.3
User Avatar
*** Feature Selection using PCA or SVC in Python *** Thank you for your attention. I read the project description with interest. I strongly believe I am the proper candidate. I fit all the requirements you mentioned, including Python. I can help you kindly and full time. Please check my profile and past reviews. Let's progress further to get the outstanding Results for you. All the Best. From HongYue Jin.
$100 USD in 7 days
5.0 (15 reviews)
4.6
4.6
User Avatar
I am a Machine learning expert. Language : ----- C++, C#, Python, Qt, Matlab, Java----- Skill : Machine learning, Deep learning Image processing(OpenCV, OpenGL...) Video codec processing(H264/265, Mpeg4, YUY2...) Database(MySQL, Access, Excel, MSSQL...) Project reversing, Multi-threading, System management
$100 USD in 7 days
5.0 (3 reviews)
4.1
4.1
User Avatar
Dear sir. I read your project description very carefully. I've really rich experience in Machine Learning & Python, so your project is very interesting to me. I'm really confident about your project, and very eager to join it. If you give me a chance, I'll do my best to provide wonderful result. I believe this will be a good starting point of the business relation between us. Looking forward to your response. Thank you.
$140 USD in 7 days
5.0 (3 reviews)
2.9
2.9
User Avatar
Hello, I am very happy to put my bid on your project. I have read your proposal and check the attached files and I am very interested in your project. I have good experience in Machine Learning and Python, C# for several years. I am sure I can do this project with good results on time. I am always ready for you. Please feel free anytime. Thanks. Best Regarding...
$100 USD in 3 days
5.0 (1 review)
2.6
2.6
User Avatar
Hi, I am a Data Scientist and expert in python with experience of 5 years. Please message to discuss.
$120 USD in 5 days
5.0 (1 review)
0.5
0.5
User Avatar
Hello, I have seen your requirement regarding about your project and analysed to have this opportunity and assist you across your project. I have strong expertise to accomplish this project in decided time frame as your project suit to my skill. I provide a quality work and support.I assure you a best quality work and support in future. Initiate chat for further discussion and will show you my project done and my experience toward the project.  Please leave a message with your available timings if I'm Offline. I will reach you ASAP Thank you.
$95 USD in 2 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Greetings, I am an expert data scientist and I am programming with python for more than two years of experience. I assure you that I can do your project and deliver it back to you with a high-quality outcome. I went through your description. This is something I can manage for you. Hope to hear from you soon. Navid.
$95 USD in 3 days
5.0 (1 review)
0.1
0.1
User Avatar
i have worked in feature engineering. like how to select a best features in our model ?? how can handle missing value ?? how can handle categorical value?? etc...
$95 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I really like the skills in the field of data science and artificial intelligence, especially related to Microsoft Excel and big data. I have been in this field for 7 years. I like activities in organizations and now in the field of human resources.
$166 USD in 3 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi there, I can implement the function in Python. Please share the example CSV. I'm a full-stack developer with extensive knowledge of Python with an experience of 7+ years. Here's a sample of Backend/DevOps tasks I can help you with: - Design and implement REST APIs in Flask/Postgres, documented with Swagger, with graphql support (using graphene) - Dockerise applications and deploy them to AWS ECS, setting up load balancing and auto scaling - Move infrastructures to code using Terraform. - Integrate apps in AWS EC2, RDS, S3, ElasticCache ... - Build CI/CD pipelines for testing and automation, recommending best practices like semantic versioning and changelog automation. - Implement Monitoring/Logging for Datadog and ElasticCache, including custom instrumentation for APM. - Automate existing workflows in Python Let's connect to discuss the details. Regards, Mishal
$140 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Dear sir. I am a Python expert. I've got many experience in processing data using python language. And also I am very familiar with ML. I've carefully read your description and I am sure I can help you perfectly. Hope to meet and have a talk. Thank you.
$50 USD in 7 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of GERMANY
Bergisch Gladbach, Germany
5.0
271
Payment method verified
Member since Aug 27, 2004

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.