Partner links

R statistical language logo

K Means Clustering is an unsupervised learning algorithm that tries to cluster data based on their similarity.

Unsupervised learning means that there is no outcome to be predicted, and the algorithm just tries to find patterns in the data. In k means clustering, we have the specify the number of clusters we want the data to be grouped into. The algorithm randomly assigns each observation to a cluster, and finds the centroid of each cluster.

Then, the algorithm iterates through two steps:

  • Reassign data points to the cluster whose centroid is closest
  • Calculate new centroid of each cluster

These two steps are repeated till the within-cluster variation cannot be reduced any further. The within-cluster variation is calculated as the sum of the euclidean distance between the data points and their respective cluster centroids.

Exploring the data
The iris dataset contains data about sepal length, sepal width, petal length, and petal width of flowers of different species. Let us see what it looks like: Continue reading

R statistical language logo

Share:

Share on facebook
Facebook
Share on twitter
Twitter
Share on pinterest
Pinterest
Share on linkedin
LinkedIn

Partner links

Newsletter: Subscribe for updates

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

Get the latest

On social media

Security distros

Hacker
Linux distros for hacking and pentesting

Crypto mining OS

Bitcoin
Distros for mining bitcoin and other cryptocurrencies

Crypto hardware

MSI GeForce GTX 1070
Installing Nvidia GTX 1070 GPU drivers on Ubuntu

Disk guide

LVM
Beginner's guide to disks & disk partitions in Linux

Bash guide

Bash shell terminal
How to set the PATH variable in Bash
Categories
Archives
0
Hya, what do you think? Please comment.x
()
x