Can someone suggest how I can improve the performance of my nested For Each loops? This logic accepts a string variable CSV style parameter and converts it into a data table. We are adding some calculated columns to update specific fields later on in the process. I’ve noticed just from running the application via Visual Studio (also on a VM, which may attribute to some of the lagging) that my logic for updating the TotalHours & PTOHours fields runs very slow whenever a large amount of data is parsed into this data table.
After these columns are updated, we perform more calculations, update the columns as needed, then pass the data table back to be converted into a CSV text file with the updated fields.
I can’t help but wonder if there’s a better approach to updating my data instead using nested loops? One thought I had was creating a separate data table to use for manipulation and only work with records we care about based on my LINQ Select clauses. But I’m not sure how to communicate the changes back to my original data table so it can be parsed correctly into the import file.
Dim dtTime As DataTable = MyClass.dtFromCSV(TimeCSV)
Dim ProrateRows = dtTime.Select("PayID = 'SALARY' and PayType = 'Salary' and [Time Off Name] = ''")
If ProrateRows.Count > 0 Then
Dim TotalHours As New DataColumn("TotalHours", GetType(Decimal)) With {.DefaultValue = 0.00}
Dim PTOHours As New DataColumn("PTOHours", GetType(Decimal)) With {.DefaultValue = 0.00}
Dim PayWeek As New DataColumn("PayWeek", GetType(String))
Dim RealDate As New DataColumn("RealDate", GetType(Date))
dtTime.Columns.Add(TotalHours) ' total hours per employee for this pay period
dtTime.Columns.Add(PTOHours) ' total PTO hours per employee for this pay period
dtTime.Columns.Add(RealDate) ' date field for data comparisons
dtTime.Columns.Add(PayWeek) ' the beginning week of this pay period
' weekly or biweekly pay period?
Dim PayPeriodStart As Date = Convert.ToDateTime(StartPayDate)
Dim PayPeriodEnd As Date = Convert.ToDateTime(EndPayDate)
Dim PayPeriodSpan As Long = DateDiff("w", StartPayDate, EndPayDate) ' count number of weeks between pay period
Dim PayWeekDate As Date = DateAdd("d", 6, PayPeriodStart).ToString() ' calculate the start of the 1st pay week, if biweekly
Dim PayWeek2Date As Date = DateAdd("d", 7, PayWeekDate).ToString()
' assign record dates & pay period weeks; used for grouping hours distinctly
For Each timeRow As DataRow In ProrateRows
timeRow.Item("RealDate") = Date.Parse(timeRow.Item("Date"))
timeRow.Item("PayWeek") = IIf(timeRow.Item("RealDate") <= PayWeekDate, PayWeekDate.ToString("MM/dd/yyyy"), PayWeek2Date.ToString("MM/dd/yyyy"))
Next
' group all salaried employee hours for this pay period; if bi-weekly, group by pay week & username
Dim HoursGrouped = (From row In dtTime.Select("PayID = 'SALARY' and PayType = 'Salary' and [Time Off Name] = ''")
Group By groupTime = New With
{
Key .Username = row.Field(Of String)("Username"),
Key .PayWeek = row.Field(Of String)("PayWeek")
} Into Group
Select New With
{
.Grp = groupTime,
.SumUnits = Group.Sum(Function(x) Decimal.Parse(x.Field(Of String)("Units")))
}).OrderBy(Function(z) z.Grp.Username)
' PTO hours; need to subtract these from salary hours
Dim PTOHoursGrouped = (From row In dtTime.Select("PayID = 'SALARY' and PayType = 'Salary' and [Time Off Name] <> ''")
Group By groupTimePTO = New With
{
Key .Username = row.Field(Of String)("Username"),
Key .PayWeek = row.Field(Of String)("PayWeek")
} Into Group
Select New With
{
.GrpPTO = groupTimePTO,
.SumUnitsPTO = Group.Sum(Function(x) Decimal.Parse(x.Field(Of String)("Units")))
}).OrderBy(Function(z) z.GrpPTO.Username)
' jump out of this logic if we can't find any salary employee records; this shouldn't happen, but just in case
If HoursGrouped.Any() Then
For Each HoursRow In HoursGrouped ' this logic is slow for large data sets?
Dim User As String = HoursRow.Grp.Username.ToString()
For Each DataTimeRow As DataRow In dtTime.Select("Username = '" & User & "'")
For Each GroupedRow In HoursGrouped.Where(Function(x) x.Grp.Username = User)
DataTimeRow.Item("TotalHours") = GroupedRow.SumUnits ' update the total hours for this user
Next
For Each PTORow In PTOHoursGrouped.Where(Function(x) x.GrpPTO.Username = User)
DataTimeRow.Item("PTOHours") = PTORow.SumUnitsPTO ' update the PTO hours for this user
Next
Next
Next
'call the new function here; trying to reduce repeated code - instead we'll pass in specific select clauses based on number of pay periods
While PayPeriodSpan <> -1
Dim SelectClause As String = IIf(PayPeriodSpan >= 1, "TotalHours > 40 And [Time Off Name] = '' And RealDate >= '" & PayWeekDate & "'", "TotalHours > 40 And [Time Off Name] = '' And RealDate < '" & PayWeekDate & "'")
If Not ProrateUnits(dtTime, SelectClause) Then
Return False
Exit While
End If
PayPeriodSpan = (PayPeriodSpan - 1)
End While
' after assigning new units, remove the calculated columns and convert string to be uploaded as the new time import file
dtTime.Columns.Remove(TotalHours)
dtTime.Columns.Remove(PTOHours)
dtTime.Columns.Remove(RealDate)
dtTime.Columns.Remove(PayWeek)
Dim sb As New System.Text.StringBuilder()
For Each timeRow As DataRow In dtTime.Rows
Dim fields = timeRow.ItemArray.[Select](Function(field) """" + field.ToString().Replace("""", """""") + """").ToArray()
sb.AppendLine(String.Join(",", fields))
Next
UpdatedCSV = sb.ToString()
End If
End If
dtTime.Dispose()
dtTime = Nothing
I’ve tried using a ‘For I as integer to datatable.Rows.Count’ loop instead, but using this approach was iterating through every single record. My thought is to only iterate through the filtered records (IE, the records we care about). I couldn’t figure out how to only iterate through a filtered data table instead of every single record.
Also, my 2nd thought was to copy the data table into a new table for the data manipulation. But I was unclear how to communicate any changed rows to the original data table so the records can be correctly parsed into the import file?
Candace Aisenbrey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.