Is it possible for Excel to recognize a pattern of comma separated numbers in a cell and remove duplicates of that pattern?

Question

Its an easy task for the human eye, but it will take me 2 weeks to do this if I do it manually.

I'm using a UDF that I got from this site to add the TextJoin functionality to my Excel 2011. Works great, but I've encountered a problem on a new spreadsheet that I'm working with that has nearly 50,000 lines. The problem, is that I can't just simply remove duplicates, because there are some instances where the data repeats, and I need to capture that. However, I located another column with unique data that correlates to the duplicate data. By looking at these 2nd data column I can easily tell what is a duplicate and what needs to be included.

I created a simplified spreadsheet example, because its much easier to show that explain with words.

Source Data:

Acct#   Lname,Fname Date    Data#1  Data#2
42  Doe, John   1/1/17  10001   1001
42  Doe, John   1/1/17  10001   1001
42  Doe, John   1/1/17  30003   1001
42  Doe, John   1/1/17  10001   1002
42  Doe, John   1/1/17  10001   1002
42  Doe, John   1/1/17  30003   1002
70  Smith, Jane 2/1/17  10001   2001
70  Smith, Jane 2/1/17  20002   2001
70  Smith, Jane 2/1/17  30003   2001
70  Smith, Jane 2/1/17  10001   2002
70  Smith, Jane 2/1/17  20002   2002
70  Smith, Jane 2/1/17  30003   2002
70  Smith, Jane 2/1/17  10001   2003
70  Smith, Jane 2/1/17  20002   2003
70  Smith, Jane 2/1/17  30003   2003
93  Blow, Joe   1/1/17  10001   3001
93  Blow, Joe   1/1/17  20002   3001
93  Blow, Joe   1/1/17  30003   3001
93  Blow, Joe   1/1/17  10001   3002
93  Blow, Joe   1/1/17  20002   3002
93  Blow, Joe   1/1/17  30003   3002
177 Bryant, Kobe    2/1/17  10001   4001
177 Bryant, Kobe    2/1/17  30003   4001
177 Bryant, Kobe    2/1/17  30003   4001
177 Bryant, Kobe    2/1/17  10001   4002
177 Bryant, Kobe    2/1/17  30003   4002
177 Bryant, Kobe    2/1/17  30003   4002
177 Bryant, Kobe    2/1/17  10001   4003
177 Bryant, Kobe    2/1/17  30003   4003
177 Bryant, Kobe    2/1/17  30003   4003

OUTPUT DATA:

Acct#   (Lname, Fname)  Date    Data#1  Data#2
42  Doe, John   1/1/17  10001, 10001, 30003, 10001, 10001, 30003    
70  Smith, Jane 2/1/17  10001, 20002, 30003, 10001, 20002, 30003, 10001, 20002, 30003   
93  Blow ,Joe   1/1/17  10001, 20002, 30003, 10001, 20002, 30003    
177 Bryant, Kobe    2/1/17  10001, 30003, 30003, 10001, 30003, 30003, 10001, 30003, 30003   

I don't know how to put the excel spreadsheet in my question so here is a screenshot of what I currently have, and the desired output (I'm just focusing on the Data#1 column right now).

enter image description here

As requested, this is the VBA code I'm using to extract the Data#1:

Function TEXTJOIN(delim As String, skipblank As Boolean, arr)
    Dim d As Long
    Dim c As Long
    Dim arr2()
    Dim t As Long, y As Long
    t = -1
    y = -1
    If TypeName(arr) = "Range" Then
        arr2 = arr.Value
    Else
        arr2 = arr
    End If
    On Error Resume Next
    t = UBound(arr2, 2)
    y = UBound(arr2, 1)
    On Error GoTo 0

    If t >= 0 And y >= 0 Then
        For c = LBound(arr2, 1) To UBound(arr2, 1)
            For d = LBound(arr2, 1) To UBound(arr2, 2)
                If arr2(c, d) <> "" Or Not skipblank Then
                    TEXTJOIN = TEXTJOIN & arr2(c, d) & delim
                End If
            Next d
        Next c
    Else
        For c = LBound(arr2) To UBound(arr2)
            If arr2(c) <> "" Or Not skipblank Then
                TEXTJOIN = TEXTJOIN & arr2(c) & delim
            End If
        Next c
    End If
        TEXTJOIN = Left(TEXTJOIN, Len(TEXTJOIN) - Len(delim))
    End Function

I tried the suggested subroutine and I got this output:

enter image description here


Show source
| excel-vba   | user-defined-functions   | excel-2011   2017-08-08 23:08 1 Answers

Answers to Is it possible for Excel to recognize a pattern of comma separated numbers in a cell and remove duplicates of that pattern? ( 1 )

  1. 2017-08-09 18:08

    Here is some code that at least works with your sample data, though I suspect how well it works will drop off severely as soon as the data changes even a little bit.

    Option Explicit
    Sub PatternFilter()
    
        Dim ws As Worksheet
    
        Dim index1_col As String
        Dim index2_col As String
        Dim data1_col As String
        Dim data2_col As String
        Dim lastrow As Long
        Dim lastentryrow As Long
        Dim outputline As Long
        Dim iter1 As Long
        Dim iter2 As Long
        Dim datastring As String
    
        Set ws = ThisWorkbook.Sheets("Sheet1")
    
        index1_col = "A" ' the column with the unique identifier, such as name or acct#
        index2_col = "C" ' another column which can either be unique
        data1_col = "D" ' data1 column
        data2_col = "E" ' data2 column
    
        lastrow = ws.Range(index1_col & ws.Rows.Count).End(xlUp).Row ' so we know where to stop
        outputline = 2 ' just an incrementor to keep putting data on unique lines
    
        For iter1 = 2 To lastrow Step 1
            datastring = "" ' reset our output string
            For iter2 = iter1 + 1 To lastrow Step 1 ' a for loop to find the last row in each pattern set
                If (ws.Range(index1_col & iter2).Value <> ws.Range(index1_col & iter2 + 1).Value) _
                        Or (ws.Range(index2_col & iter2).Value <> ws.Range(index2_col & iter2 + 1).Value) Then
                    lastentryrow = iter2
                    Exit For
                End If
            Next
            For iter2 = iter1 To lastentryrow Step 1 ' a for loop to collect all of the data1 pattern
                If ws.Range(data2_col & iter2).Value <> ws.Range(data2_col & iter2 + 1).Value Then
                    datastring = datastring & "," & ws.Range(data1_col & iter2).Value
                    Exit For
                End If
                If datastring = "" Then
                    datastring = ws.Range(data1_col & iter2).Value
                Else
                    datastring = datastring & "," & ws.Range(data1_col & iter2).Value
                End If
            Next
            ws.Range("I" & outputline).Value = datastring ' save the data1 pattern
            datastring = "" ' reset the output string
            For iter2 = iter1 To lastentryrow Step 1 ' a for loop to collect all of the data2 pattern
                If ws.Range(data2_col & iter2).Value <> ws.Range(data2_col & iter2 - 1).Value Then
                    If datastring = "" Then
                        datastring = ws.Range(data2_col & iter2).Value
                    Else
                        datastring = datastring & "," & ws.Range(data2_col & iter2).Value
                    End If
                End If
            Next
            ws.Range("J" & outputline).Value = datastring ' save the data2 pattern
            ws.Range("G" & outputline).Value = ws.Range(index1_col & iter1).Value ' put the unique identifier with the data so we know who the data belongs to
            ws.Range("H" & outputline).Value = ws.Range(index2_col & iter1).Value
            outputline = outputline + 1 ' increment the output line to avoid overwriting existing data
            iter1 = lastentryrow ' set this to last entry line (for loop will increment to the first line of the next entry for us)
        Next
    
    End Sub
    

    Of course, this can be manipulated to output to wherever you want, including a new sheet, etc. Let me know if there is anything in this you don't understand or that does not work for you

    *EDIT: A second index column was added per OP request

Leave a reply to - Is it possible for Excel to recognize a pattern of comma separated numbers in a cell and remove duplicates of that pattern?

◀ Go back