Receive new articles directly in your inbox!! Register your email address
When analysing data a key solution to represent how it is organised is to cluster the data into segments
Clustering data into segments of equal width allows for a clearer comparison among different groups highlighting aspects that might not be clear in a row dataset
In order to organise data in different segments it is necessary to follow the following steps
Identify the minimum and maximum values in the data. Determine the size of the range containing the data by subtracting the minimum value from the maximum value
Determine the bins. Decide on the number of segments in which you want to split
Allocate each value in its respective bin
This post will show how to cluster the grades of 10 different students in 4 segments: Top, Mid High, Mid Low and Bottom
The code will assume that the students are already sorted in descending order by grade
A separate post details how to sort a list using the Bubble Sort algorithm.
The below picture shows the key steps that will be performed by the code
List of student names (indicated by the letters) and grade achieved by each student. The Min and Max grades are identified (as the list is sorted in descending order the are the first and last values of the list)
The Min is subtracted from the Max. The result is the width of the range containing all the grades. The range is divided by the number of segments in which you want to cluster the data In this case we will split the range in 4 segments. The result is the width of each segment
The width of the segment is deducted from the Max value. The result represents the top value of the next segment and the bottom value of the current one. The subtraction is repeated for a total of 4 times as this is the number of buckets we want to create. To each student
The end result that will be achieved by the code is illustrated in the below picture
Next to the grade of each student it will be reported in which of the four buckets does the student belong
Summary of content
The code assumes that there is a sheet called Grades containing a list of student names in column D and their respective grades in column E
The first step is to define the variables that will be used throughout the code
Sub buckets()
'i' is the variable that will be used for the loop; "size" is the variables that will identify the last used row of column D
'MAxG' and 'MinG' will be identifying the Max and the Min grade respectively
'bucket' is the variable that will be identifying the width of the segment
'segment1' is the variable that will host the number resulting when subtracting 'bucket' from 'MaxG'
'segment2' is the variable that will host the number resulting when subtracting 'bucket' from 'segment1'
'segment3' is the variable that will host the number resulting when subtracting 'bucket' from 'segment2'
Dim i As Integer
Dim size As Integer, MaxG As Integer, MinG As Integer
Dim bucket As Integer
Dim segment1 As Integer, segment2 As Integer, segment3 As Integer
The sheet containing the data is selected
The last row of used range is identified and column F is cleared from eventual data
Sheets("Grades").Select
Columns("F:F").Select
Selection.ClearContents
size = Range("d1").End(xlDown).Row
The buckets are created. The code creates 4 buckets: Top will be flagging the highest grades, MidHigh the grades below the Top ones, MidLow the grades below MidHigh and Bottom the lowest grades
The highest grade is identified As the code assumes that the data is sorted in descending order based on grades, the maximum grade is located on first row of column E
In the same way the last grade is located on the last used row of column E
The width of the segment is calculated by dividing by for the difference between 'MaxG' and 'MinG' The result is stored in the variable 'bucket'
MaxG = Cells(1, 5).Value
MinG = Cells(size, 5).Value
bucket = (MaxG - MinG) / 4
Each segment i calculated by subtracting 'bucket' starting from 'MaxG' and then from the results obtained
segment1 = MaxG - bucket
segment2 = segment1 - bucket
segment3 = segment2 - bucket
The code loops through the list of grades inserting in column F the segment to which the grade belongs
variables j and i, used to identify the rows of the variables, are set to 1
For i = 1 To size
If Cells(i, 5) > segment1 Then
Cells(i, 6).Value = "Top"
ElseIf Cells(i, 5) > segment2 Then
Cells(i, 6).Value = "Mid High"
ElseIf Cells(i, 5) > segment3 Then
Cells(i, 6).Value = "Mid Low"
Else
Cells(i, 6).Value = "Bottom"
End If
Next i
End Sub
For convenience the full code is reported below
Sub buckets()
Dim i As Integer
Dim size As Integer, MaxG As Integer, MinG As Integer
Dim bucket As Integer
Dim segment1 As Integer, segment2 As Integer, segment3 As Integer
Sheets("Grades").Select
Columns("F:F").Select
Selection.ClearContents
size = Range("d1").End(xlDown).Row
MaxG = Cells(1, 5).Value
MinG = Cells(size, 5).Value
bucket = (MaxG - MinG) / 4
segment1 = MaxG - bucket
segment2 = segment1 - bucket
segment3 = segment2 - bucket
For i = 1 To size
If Cells(i, 5) > segment1 Then
Cells(i, 6).Value = "Top"
ElseIf Cells(i, 5) > segment2 Then
Cells(i, 6).Value = "Mid High"
ElseIf Cells(i, 5) > segment3 Then
Cells(i, 6).Value = "Mid Low"
Else
Cells(i, 6).Value = "Bottom"
End If
Next i
End Sub