Receive new articles directly in your inbox!! Register your email address
When cleaning data, a crucial step is often to remove duplicate entries from a list, ensuring its uniqueness and accuracy.
The removal of duplicates is typically an intermediate step in a more comprehensive code aimed at preparing data for analysis.
This post will guide you through the process of eliminating duplicates from a list of items.The code will iterate through a list of numbers, adding only unique items to a new list, effectively filtering out any duplicates
A prerequisite of this code is that the starting list of numbers is already sorted.
A separate post details how to sort a list using the Bubble Sort algorithm.
The code will execute the following steps:
The original list will be assigned to a variable.
A second, empty variable will be initialized, mirroring the dimensions of the first.
The code will iterate through the first variable, comparing each element with the one preceding it. If they differ, the element will be transferred to the second variable. If they are identical, the code will proceed without copying.
Finally, the second variable, now containing all unique elements, will be transferred to the spreadsheet.
Summary of content
The code assumes that there is a sheet called Duplicate containing a list of numbers in column E
The first step is to define the variables that will be used throughout the code
Sub eliminateduplicate()
'start' is the variable that will host the original list. 'finish' is the second variable where the unique values will be copied
Dim start () As Variant, finish () as Variant
i and j are the variables that will be used to identify the rows of 'start' and 'finish' respectively
Dim i As Integer, j As Integer '
'size' is the variable that will identify the last row of both variables
Dim size As Integer
Following the definition of the variables the range of the first list is identified and placed in the start variable
the code selects the sheet containing the original list
Sheets("Duplicate").select
the last row of the used range in column E is identified
size = Range("E1").End(xlDown).Row
the range in column E containing the list goes from row 1 to row 'size'
this range is placed in the variable start
start = Range("E1:E" & size).Value
Column G is cleared of its content. In column G a range , with the same size of the one considered in column E, is then placed in the variable 'finish'
Columns("G:G").Select
Selection.ClearContents
A range with the same size as the one assigned to the variable 'start' is identified in column G using the 'size' variable. This range is then assigned to the variable 'finish'
finish = Range("G1:G" & size).Value
The first row of the variable 'start' is placed in the variable 'finish'. Being the first item of the list , it is unique
The loop starts at row 2 of 'start' and compares each item with the previous one. The item is copied to 'finish' in case it is different from the previous one
variables j and i, used to identify the rows of the variables, are set to 1
j = 1
i = 1
the first item of 'start' is placed at the first row of 'finish'. Variable j is increased by 1 to identify the next row in which the next item will be copied
finish(j, 1) = start(i, 1)
j = j + 1
The loop starts at row 2 and compares the item in 'start' with the previous one
For i = 2 To size
If start(i, 1) <> start(i - 1, 1) Then 'each item in variable start is compared with previous
finish(j, 1) = start(i, 1)'if different the current item is copied in variable finish
j = j + 1'variable j is increased by 1 to be used for next unique item
Else: End If
Next i
The variable 'finish' is copied in column G.
Variables 'start' and 'finish' are erased in order ot free up space in the memory of the PC
Range("g1:g" & size).Value = finish
Erase start
Erase finish
End sub
For convenience the full code is reported below
Sub eliminateduplicate()
Dim start() As Variant, finish() As Variant
Dim i As Integer, j As Integer
Dim size As Integer
Sheets("Duplicate").Select
size = Range("E1").End(xlDown).Row
start = Range("E1:E" & size).Value
Columns("G:G").Select
Selection.ClearContents
finish = Range("G1:G" & size).Value
j = 1
i = 1
finish(j, 1) = start(i, 1)
j = j + 1
For i = 2 To size
If start(i, 1) <> start(i - 1, 1) Then
finish(j, 1) = start(i, 1)
j = j + 1
Else: End If
Next i
Range("g1:g" & size).Value = finish
Erase start
Erase finish
End Sub