RapidMiner Tutorial: K-means Clustering

What is K-means?

en.proft.me

A unsupervised correlational technique that groups together like types of observations in a data set.

The "K" in K-means clustering implies the number of clusters the user is interested in. In other words, the user has the option to set the number of clusters he wants the algorithm to produce.

What data are we going to use?

We're going to use a made-up data set that details the lists the applicants and their attributes. The attribute names are all self-explanatory. It has 546 rows and 4 columns.

Goal

We would like to cluster applicants using their physical and mental attributes in order to make it easier to pick the right people.

TL;DR


RapidMiner Process

  1. Look for the Read CSV operator in the Operator panel
  2. Drag it to the Process panel 
  3. Connect the operator's Out to Res
  4. Click on Import Configuration Wizard in the Parameters panel
    rapidminer k-means steps 1-4
  5. Look for our example file, click it, and then click Next
  6. In the Data Wizard window, make sure you pick Comma in the Column Separation section since we are using comma separated values (.CSV) type of file then click Next twice
  7. Make sure the data types are all correct and then click Finish
    rapidminer k-means step 7
  8. Look for the Select Attributes operator and 
  9. Drag it to the Process panel
  10. In the Parameters panel, click on Attribute Filter type dropdown menu and pick Single so that we can isolate only one attribute in the data set
  11. In the Attribute dropdown menu, pick Training Course to tell RM that it's the attribute we want to isolate
  12. Click on the Invert Selection checkbox to indicate that we want to exclude Training Course in the actual calculation
    rapidminer k-means step 8-12
     
  13. Next, look for the Set Role operator and drag it to the right of Select Attributes
  14. Connect the operators via their Exa nodes
  15. In the Parameters panel, click on the Attributes Name dropdown menu and pick Applicant
  16. Click the Target Role dropdown menu and pick ID. This will make the Applicant variable an identifier of all our observations so we do not just get anonymous results
    rapidminer k-means step 13-16
  17. Look for the Normalize operator and drag it to the right of the Set Role operator. Connect them via their Exa nodes. We do not need to modify any of the parameters for now. Normalization is used so that no particular attribute will over-influence the clustering
  18. Look for the K-means operator and drag it to the right of the Normalize operator. Connect the two via their Exa ports. Connect the K-means operator's Clu port to the Process panel's Res port to the right
    rapidminer k-means step 18
  19. Press F11 on your keyboard to Run Process. Once it is done, RapidMiner will automatically switch to Results View
  20. You can look at the Cluster Model Description to find out the number of observations per cluster
  21. You can check the names of the people per cluster in the Folder View. This is the reason why we designated the ID role to Applicant using the Set Role operator earlier
  22. You can also check the means of the centroids per cluster in the Centroid Table
    rapidminer k-means Centroids

Comments

  1. Simple and nice article. Thanks.
    https://analyticsblog.ravivk.com

    ReplyDelete
  2. permisi admin.
    Bagi mahasiswa yang perlu source code php, natif maupun framework bermetode AHP, SAW, Smart, Topsis, Fuzzy Logic, K-Means, Bayes dan lain-lain bisa kunjungi situ saya di :
    https://code-skripsi.blogspot.com/

    Terima kasih

    ReplyDelete
  3. Regular visits listed here are the easiest method to appreciate your energy, which is why why I am going to the website everyday, searching for new, interesting info. Many, thank you!data science course in noida

    ReplyDelete
  4. Thank a lot. You have done excellent job. I enjoyed your blog . Nice effortsdata science course

    ReplyDelete
  5. This is a great post I saw thanks to sharing. I really want to hope that you will continue to share great posts in the future.
    https://360digitmg.com/india/data-science-using-python-and-r-programming-in-delhi

    ReplyDelete
  6. Your content is very unique and understandable useful for the readers keep update more article like this.
    business analytics courses in aurangabad

    ReplyDelete
  7. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..
    machine learning course aurangabad

    ReplyDelete
  8. Thanks for the information about Blogspot very informative for everyone
    data science certification

    ReplyDelete
  9. the information provided in this article is so useful and would be of great help for people who are in this field and wanting to learn data management. If you want you can check
    data science course they have a whole bunch of information on Data Science, Machine Learning and AI.

    ReplyDelete
  10. You completely match our expectation and the variety of our information.
    cyber security training malaysia

    ReplyDelete

Post a Comment