Algorithm - find all permutations of string a in string b

Question

Say we have string a = "abc" string b = "abcdcabaabccbaa"

Find location of all permutations of a in b. I am trying to find an effective algorithm for this.

Pseudo code:

sort string a // O(a loga)

for windows of length a in b  // O(b)?
   sort that window of b      // O(~a loga)?
   compare to a
   if equal
      save the index

So would this be a correct algorithm? Run time would be around O(aloga + ba loga) ~= O(a loga b)? How efficient would this be? Possibly way to reduce to O(a*b) or better?


Show source
| sorting   | permutation   | algorithm   2017-01-06 23:01 3 Answers

Answers ( 3 )

  1. 2017-01-06 23:01

    sorting is very expensive, and doesn't use the fact you move along b with a sliding window.

    I would use a comparison method that is location agnostic (since any permutation is valid) - assign each letter a prime number, and each string will be the multiplication of its letter values.

    this way, as you go over b, each step requires just dividing by the letter you remove from he left, and multiplying with the next letter.

    You also need to convince yourself that this indeed matches uniquely for each string and covers all permutations - this comes from the uniqueness of prime decomposition. Also note that on larger strings the numbers get big so you may need some library for large numbers

  2. 2017-01-06 23:01

    Write a function strcount() to count the number of occurrences of character ch in a string or sub-sring str.

    Then just pass through the search string.

      for(i=0;i<haystacklenN-NeedleN+1;i++)
      {
        for(j=0;j<needleN;j++)
           if(strcount(haystack + i, Nneedle, needle[j]) != strcount(needles, needlesN, needle[j])
            break
      }
      if(j == needleN)
             /* found a permuatation */
    
  3. 2017-01-07 00:01

    There is no need to hash, you can just count frequencies on your sliding window, and check if it matches. Assuming the size of your alphabet is s, you get a very simple O(s(n + m)) algorithm.

    // a = [1 .. m] and b = [1 .. n] are the input
    cnta = [1 .. s] array initialized to 0
    cntb = [1 .. s] array initialized to 0
    // nb_matches = the number of i s.t. cnta[i] = cntb[i]
    // thus the current subword = a iff. nb_matches = s
    nb_matches = s
    
    for i = 1 to m:
        if cntb[a[i]] = 0: nb_matches -= 1
        cntb[a[i]] += 1
    
    ans = 0
    for i = 1 to n:
        if cntb[b[i]] = cnta[b[i]]: nb_matches -= 1
        cntb[b[i]] += 1
        if nb_matches = s: ans += 1
        if cntb[b[i]] = cnta[b[i]]: nb_matches += 1
        if i - m + 1 >= 1:
            if cntb[b[i - m + 1]] = cnta[b[i - m + 1]]: nb_matches -= 1
            cntb[b[i - m + 1]] += 1
            if cntb[b[i - m + 1]] = cnta[b[i - m + 1]]: nb_matches += 1
            cntb[b[i - m + 1]] -= 1
    return ans
    
◀ Go back