Sunday, June 12, 2011

Printing all permutations of a string containing duplicate characters (C/C++)


This is a follow up post to my earlier post on printing all possible permutations of a string (also called Anagrams) using the C++ Standard Library. The earlier post is located here. This post should have come on this blog about 3 years ago. I'm no longer interested in solving problems like this but I wanted to put this up for the millions of people who get this problem in interviews and the like and to retain the same for posterity.

So, previously we've discussed how to print all possible permutations of a string without duplicates using the standard C++ library functions. That's all fine and dandy if we want to use the standard libraries – especially next_permutation, but what happens if we have to write our own code to do so?

In this blog post I'll present recursive solutions to the problem of printing all the possible permutations of a string. I'll even take up the case where the input string contains repeated letters later on in this post. The resultant function will be able to print all possible distinct permutations of a string containing duplicate characters.

Printing all permutations of a string containing no duplicate characters

Code 1: The recursive solution for a string with no duplicate characters in C/C++

char * full_string;
void permute(char * str, int length) {
    if(length == 0) {
        printf(“%s\n”, full_string);
        return;
    } else {
        for(int i = 0; i < length; ++i) {
            swap(str[0], str[i]);
            permute(str+1, length-1);
            swap(str[0], str[i]);
        }
    }
}

Call like: permute(full_string, strlen(full_string));



To understand how this works, just look at the string “ABC”.

Let us assume that we have a magic function permute2 that generates all possible permutations of a given string of length 2.

Then, the permutations problem for the string “ABC” can then be broken down as:


These permute2 values themselves can be broken down to smaller subproblems.



Here, it is obvious that permute1(any char) = char itself. This is the base case of this recursion. Unrolling the recursion for the 3 required values yields:

permute2(“BC”) = { “BC”, “CB” }
permute2(“AC”) = { “AC”, “CA”}
permute2(“AB”) = {“AB”, “BA”}

So the resultant sets of strings are:
permute3(“ABC”) = {“ABC”, “ACB”} U {“BAC”, “BCA”} U {“CAB”, “CBA”}

which is the set of all permutations of the string “ABC”. It is obvious to see that we are in fact just choosing the starting prefix of the permutation and then requesting the permute function to run on a smaller subproblem of permuting a smaller string.

To generalize, we can state that we can get all the permutations of a string using the general recurrence:



This formula basically means that you select a character sk from the string S of length n to represent the first character of the permutation, find all the permutations of the smaller string formed by removing sk from S and extend the permutations thus obtained by prefixing sk.

This is essentially what the code above is implementing. The two swap statements perform the work of selecting an element sk from the string S and replacing it back after permuting the smaller string



Printing all distinct permutations of a string containing duplicate characters

To achieve this, we just modify the above code a little. To avoid duplicates, we need to enforce the constraint that no character will be selected twice to place in the prefix before calling the subproblems. So, the resulting recursion gets modified as:


This simply means that at every stage of the recursion, pick only distinct characters. So, how do we do that? We pick the sk in ascending order.

Code 2: C/C++ code for finding all distinct permutations of a string with duplicate characters in it

char* full_string;
int counter = 0;
void permute(char *str, int length) {
    if(length == 0) {
        printf("%s\n", full_string);
        ++counter;
        return;
    } else {
        // Find the smallest char in the string set it to be the first character. Solve the subproblem for the smaller string.
        char *smallest = min_element(str, str + length);
        iter_swap(str, smallest);
        permute(str+1, length-1);
        
        // Look for the smallest element strictly greater than the first element of the current string
        char *smallest_greater = str + length;
        for(char *i = str+1; i != str+length; ++i)
            if(*i > *str && (smallest_greater == str + length || *i < *smallest_greater))
                smallest_greater = i;
        while(smallest_greater != str+length) {
            // If such an element is found, swap it into the first slot and recurse
            iter_swap(str, smallest_greater);
            permute(str+1, length-1);

            // Repeat the loop if possible for the next greater character
            smallest_greater = str + length;
            for(char *i = str+1; i != str+length; ++i)
                if(*i > *str && (smallest_greater == str + length || *i < *smallest_greater))
                    smallest_greater = i;

        }
    }
}



Code Explanation:
The fact that we are selecting the character to put at the beginning of the string str in strict ascending order gives us 2 major guarantees:

  1. No character will be selected more than once. Hence the distinct selection requirement is satisfied.
  2. All distinct characters in the string str will be selected. Hence, all the permutations will end up being generated

A small bonus of selecting the sk character in this fashion is that the permutations generated in the output will be in lexicographical order (dictionary order). This makes it easy to verify the correctness of the program.
This code uses two functions that you might not be aware of :
The functions iter_swap and min_element are provided by the standard C++ library. They are really simple and you could code your own versions easily.

  • iter_swap – it just swaps the elements pointed to by the respective pointers without changing the pointers themselves. So, it's basically equivalent to the function:
    void iter_swap(char *ptr1, char *ptr2) {
      char tmp = *ptr1; *ptr1 = *ptr2;
      *ptr2 = tmp;
    }
  • min_element – finds the location of the minimum element that exists in the given range. In this case, it is the char* pointer pointing to the location of that element. It can be implemented like:
    char * min_element(char *start, char *end) { // end is 1 beyond the last valid element
      if(start == end) return end; // empty range
      char *min_pos = start;
      for(char *iter = start+1; iter != end; ++iter) {
        if(*iter < *min_pos) min_pos = iter;
      }
      return min_pos;
    }

Of course, the standard library implements these as template functions so that they can be used with a variety of datatypes.

Code 3: The iterative solution.

The C++ standard library is an elegant repository of algorithms. You can also use the iterative next_permutation function to generate all the permutations of a string which might contain duplicates. Mark Nelson has done a superb job of explaining how next_permutation works at http://marknelson.us/2002/03/01/next-permutation/ - do take the time to go through his complete blog post. It is the best description of this algorithm till date. Sample code on how to use that function has already been provided in my previous post.

Code for this and many other algorithms can be found at the github archive: http://github.com/swagatata/ds_and_algos
However, all this is a work in progress.