Friday, January 25, 2013

A simple way for data cleaning in VBA Excel

In data analysis data cleaning is the act of detecting and either removing or correcting inaccurate records from a record set. In case data is fetched from a Data Base Relational Systems, we're talking about incorrect or inaccurate records from a table.  For instance, in Excel 2007+ you can fetch data from a DBMS such as SQL Server in the Get External Data group. The following step is removing or correcting inaccurate records. A typical way to do it is scanning the Excel data sheet following from the top and from left to rigth, processing only columns storing data validity information. Hence, you need a VBA function such as the following:

clean("FromSheet", "ToSheet", CellCondition, Condition, True)

i.e. a VBA function with a signature like

Function clean(FromSheet As String, ToSheet As String, CellCondition As Variant, Condition As Variant, Caption As Boolean) As Long   


A typical safe way to implement such a action is copying in a new sheet the clean data. That works especially in case you refresh data sheet periodically from an external data source. Here's the VBA code. 


    Set wsI = Sheets(FromSheet)
    Set wsO = Sheets(ToSheet)

    LastRow = wsI.Range("A" & Rows.Count).End(xlUp).Row
   
    j = 1
    With wsI
        For i = 1 To LastRow
           
            ok = False
             For N = LBound(CellCondition) To UBound(CellCondition)
                 If Trim(.Range(CellCondition(N) & i).Value) = Condition(N) Then
                     ok = True
                 End If
             Next N
       
            If Caption And i = 1 Then
                ok = True
            End If
           
            If ok Then
                wsI.Rows(i).Copy wsO.Rows(j)
                j = j + 1
            End If
           
        Next i
    End With