What is K-means?
en.proft.me |
The "K" in K-means clustering implies the number of clusters the user is interested in. In other words, the user has the option to set the number of clusters he wants the algorithm to produce.
What data are we going to use?
We're going to use a made-up data set that details the lists the applicants and their attributes. The attribute names are all self-explanatory. It has 546 rows and 4 columns.Goal
We would like to cluster applicants using their physical and mental attributes in order to make it easier to pick the right people.TL;DR
RapidMiner Process
- Look for the Read CSV operator in the Operator panel
- Drag it to the Process panel
- Connect the operator's Out to Res
- Click on Import Configuration Wizard in the Parameters panel
- Look for our example file, click it, and then click Next
- In the Data Wizard window, make sure you pick Comma in the Column Separation section since we are using comma separated values (.CSV) type of file then click Next twice
- Make sure the data types are all correct and then click Finish
- Look for the Select Attributes operator and
- Drag it to the Process panel
- In the Parameters panel, click on Attribute Filter type dropdown menu and pick Single so that we can isolate only one attribute in the data set
- In the Attribute dropdown menu, pick Training Course to tell RM that it's the attribute we want to isolate
- Click on the Invert Selection checkbox to indicate that we want to exclude Training Course in the actual calculation
- Next, look for the Set Role operator and drag it to the right of Select Attributes
- Connect the operators via their Exa nodes
- In the Parameters panel, click on the Attributes Name dropdown menu and pick Applicant
- Click the Target Role dropdown menu and pick ID. This will make the Applicant variable an identifier of all our observations so we do not just get anonymous results
- Look for the Normalize operator and drag it to the right of the Set Role operator. Connect them via their Exa nodes. We do not need to modify any of the parameters for now. Normalization is used so that no particular attribute will over-influence the clustering
- Look for the K-means operator and drag it to the right of the Normalize operator. Connect the two via their Exa ports. Connect the K-means operator's Clu port to the Process panel's Res port to the right
- Press F11 on your keyboard to Run Process. Once it is done, RapidMiner will automatically switch to Results View
- You can look at the Cluster Model Description to find out the number of observations per cluster
- You can check the names of the people per cluster in the Folder View. This is the reason why we designated the ID role to Applicant using the Set Role operator earlier
- You can also check the means of the centroids per cluster in the Centroid Table
Simple and nice article. Thanks.
ReplyDeletehttps://analyticsblog.ravivk.com
permisi admin.
ReplyDeleteBagi mahasiswa yang perlu source code php, natif maupun framework bermetode AHP, SAW, Smart, Topsis, Fuzzy Logic, K-Means, Bayes dan lain-lain bisa kunjungi situ saya di :
https://code-skripsi.blogspot.com/
Terima kasih
Regular visits listed here are the easiest method to appreciate your energy, which is why why I am going to the website everyday, searching for new, interesting info. Many, thank you!data science course in noida
ReplyDeleteThank a lot. You have done excellent job. I enjoyed your blog . Nice effortsdata science course
ReplyDeleteThis is a great post I saw thanks to sharing. I really want to hope that you will continue to share great posts in the future.
ReplyDeletehttps://360digitmg.com/india/data-science-using-python-and-r-programming-in-delhi
Your content is very unique and understandable useful for the readers keep update more article like this.
ReplyDeletebusiness analytics courses in aurangabad
This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..
ReplyDeletemachine learning course aurangabad
Thanks for the information about Blogspot very informative for everyone
ReplyDeletedata science certification
the information provided in this article is so useful and would be of great help for people who are in this field and wanting to learn data management. If you want you can check
ReplyDeletedata science course they have a whole bunch of information on Data Science, Machine Learning and AI.
You completely match our expectation and the variety of our information.
ReplyDeletecyber security training malaysia