Receive new articles directly in your inbox!! Register your email address
How VBA can analyse data at incredible speed
Excel's primary mission includes data analysis.
Typically, data resides within objects like ranges and worksheets.
Another approach involves storing data in variables that you create. A variable serves as a designated storage location within your computer's memory, capable of holding a diverse range of data.
One notable advantage of utilizing variables is the potential speed increase in data analysis.
This post aims to demonstrate how employing VBA (Visual Basic for Applications) to interact with data stored in a variable significantly enhances processing speed compared to retaining it within a worksheet.
Upon completing the analysis, the data is returned to the worksheet.
This post will guide you through the process of importing data from worksheets to a variable.
The second part involves analyzing the data while it's stored in the variable. After completing the analysis, the data will be returned to the worksheet.
The key steps of this process include:
Define the variables
Select the data and import it into the variable
Analyze the data
Return the outcome to the worksheet
This post is the first of a series diving into the incredible power of VBA in analysing data
You can find the next post of the series at this link
Summary of content
In a workbook we have a sheet called "Data" In each row of column A there are 20.000 numbers as indicated in the picture
The task for the code is to sum the 20.000 numbers. Obviously this can be done with the sum function in Excel, but the aim is to show the improvement in speed while looping through the values stored in a variable
First step is to define the variable that will be hosting the data. The code will use the variant type for the variable
Variant variables dynamically adjust to match the data being stored, optimizing memory usage and enhancing code efficiency
The variable in which the data will be copied is called Host
Dim Host () as Variant
Additional variables are created to contain the number of the last used row in column A, the running total that will be calculated when looping through the variable and the time required to run through the variable
Dim lstRow as Long
Dim runTot as Long
Dim startTime as Long
Dim endTime as Long
Dim elapsedTime as Long
The full range containing the data needs to be identified
The data sheet is selected and then the last used line in column A is identified
Sheets("Data").select
lstRow = Cells(Rows.Count,"A").End(xlUp).Row
The range containing the data can now be clearly identified and stored in the variable Host
The value of all cells belonging to the identified range are assigned to the variable by using the equal sign.
In this way the data and its layout will be perfectly mirrored in the variable
Host = range(cells(1,1),cells(lstRow,1)).value
The third variable is set to 0
runTot = 0
The analysis part will consist in looping through the full list of numbers and calculate the running total.
This will be done using a For Loop
The timer function will be used in order to calculate the total number of seconds elapsed between the start and the end time of the loop
It is important to note that the running total is calculated when looping through the variable Host; therefore non reference is made to the worksheet cells
startTime = Timer
For i = 1 To lstRow
runTot = runTot + Host(i, 1)
Next i
endTime = Timer
elapsedTime = endTime - startTime
The variable elapsedTime reports the total seconds required by the code to loop through the data stored in the variable and calculate the running total
The results of the running total and elapsed time are written in cells B1 and C2 respectively
Range("b1").Value = elapsedTime
Range("c2").Value = runTot
The result is reported in the picture below. Total sum is 99957 and elapsed time is less than 1 millisecond
The code has been run without storing the data in the variable but ling directly through the cells.
In this case the total elapsed time has been reported in cell B2 showing that it took more than 8 millisecond
Therefore using the variables has proven to be more than 8 times faster than refering directly to the cells
For your reference the full code is reported below
Sub variables()
Dim Host () As Variant
Dim lstRow As Long
Dim runTot As Long
Sheets("Data").Select
lstRow = Cells(Rows.Count, "A").End(xlUp).Row
Host = Range(Cells(1, 1), Cells(lstRow, 1)).Value
runTot = 0
startTime = Timer
For i = 1 To lstRow
runTot = runTot + Host(i, 1)
Next i
endTime = Timer
elapsedTime = endTime - startTime
Range("b1").Value = elapsedTime
Range("b3").Value = runTot
End Sub