721
edits
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=Solutions to Homework Task 2= | ==Solutions to Homework Task 2== | ||
Spoiler Alert! Again, accept the challenge and try on your own first! :) | Spoiler Alert! Again, accept the challenge and try on your own first! :) | ||
Line 9: | Line 9: | ||
Until now, we have been looking at points on a 2D plane. We know how to calculate the distance between them and how to find the centroid of a point cloud. Let's call these point clouds classes from now on. With this tool set at hand, we can already code a simple classifier. We could e.g. detect whether any given point (P_new) is closer to the point cloud which we call class_0 or class_1. We can find out, by calculating the distances of P_new to the centroids of the classes c_0 abd c_1 as shown in the picture below. | Until now, we have been looking at points on a 2D plane. We know how to calculate the distance between them and how to find the centroid of a point cloud. Let's call these point clouds classes from now on. With this tool set at hand, we can already code a simple classifier. We could e.g. detect whether any given point (P_new) is closer to the point cloud which we call class_0 or class_1. We can find out, by calculating the distances of P_new to the centroids of the classes c_0 abd c_1 as shown in the picture below. | ||
[[File:2d_classify.png|800px]] | [[File:2d_classify.png|800px]] | ||
In this case the distance d1 is smaller than d0. That would mean our point belongs to class 1 of course. | |||
Why | ===Why care about points on an X-Y plane?!=== | ||
Imagine the points are not just drawings on a piece of paper, but actual measurements of real world objects. So instead of giving the axes in the figure arbitrary names like 'X' and 'Y', we can give them meaningful measures of a short sound recording: 'bass' and 'treble'. | Imagine the points are not just drawings on a piece of paper, but actual measurements of real world objects. So instead of giving the axes in the figure arbitrary names like 'X' and 'Y', we can give them meaningful measures of a short sound recording: 'bass' and 'treble'. | ||
Line 39: | Line 40: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Our classifier is written in a class called "KMeans" because it most closely resembles a KMeans classifier. | Our classifier is written in a class called "KMeans" because it most closely resembles a KMeans classifier. | ||
So inside the | So inside the "KMeans.cpp" you will find the guts of our classifier. You can see that, when the KMeans classifier gets instantiated (=the contructor is called), we feed it with the points and the class labels and it immediately calculates the centroids for each class. | ||
<syntaxhighlight lang="c++"> | <syntaxhighlight lang="c++"> | ||
KMeans::KMeans(vector<Point2D> & | KMeans::KMeans(vector<Point2D> &points, vector<int> &labels, int n_classes ) | ||
{ | { | ||
_n_classes = n_classes; | _n_classes = n_classes; | ||
for(int c=0; c < n_classes; c++) | for(int c=0; c < n_classes; c++) // traverse once for every class | ||
{ | { | ||
_centroids.push_back(Point2D(0,0)); | _centroids.push_back(Point2D(0,0)); | ||
int numPoints = 0; | int numPoints = 0; | ||
for(int i=0; i< | for(int i=0; i<points.size(); i++) | ||
{ | { | ||
if( | if(labels[i]==c) // see if point in points belongs to class c | ||
{ | { | ||
_centroids[c] = _centroids[c] + | _centroids[c] = _centroids[c] + points[i]; | ||
numPoints = numPoints + 1; | numPoints = numPoints + 1; | ||
} | } | ||
} | } | ||
// all points in | // all points in "points", that belong to the class c are added up | ||
// now, divide by num of points in class c | // now, divide by num of points in class c | ||
_centroids[c] = _centroids[c]/numPoints; | _centroids[c] = _centroids[c]/numPoints; | ||
Line 64: | Line 65: | ||
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
For that it traverses the vector of points a couple of times. Once for every class we have. In each of these traverses it only looks for points of the same class, sums them up, keeps a record of how many they were, and finally divides by the number of points it found. So we end up with the centroid for every class and save it in a vector of centroids called "_centroids". This is a private variable of our class KMeans. | |||
Now, when we classify, we know the centroids and just need to calculate the distance from our new point to every centroid. | |||
<syntaxhighlight lang="c++"> | |||
int KMeans::classify(Point2D newPoint) | |||
{ | |||
float min_distance = 99238719884798124; // just a biiig distance to start with | |||
int class_label = -1; // and a wrong class label | |||
for(int c=0; c<_n_classes ; c++) | |||
{ | |||
float distance = _centroids[c].getDistance(newPoint); | |||
if(distance < min_distance) | |||
{ | |||
min_distance = distance; | |||
class_label = c; | |||
} | |||
} | |||
return class_label; | |||
} | |||
</syntaxhighlight> | |||
The idea here is to start with a very large distance and only save a centroid as a candidate if its distance is smaller than the distance we already have recorded. We will automatically arrive at the closest centroid and the minimum distance. | |||
=Homework= | =Homework= | ||
Line 114: | Line 137: | ||
Have fun and if you're stuck, write a mail | Have fun and if you're stuck, write a mail, post a message in the forum, or probably its fastest to post on our signal group! | ||
[https://signal.group/#CjQKIGr9R1a7Znwe1Ca0pmbx3rvHHzDaS1c_LnmgwwVk1KoSEhB6aHUiokel3vtGVoTErtfB join our signal group!!!] | |||
Best wishes, | Best wishes, | ||
Clemens | Clemens |