Sunday, June 12, 2011

Printing all permutations of a string containing duplicate characters (C/C++)


This is a follow up post to my earlier post on printing all possible permutations of a string (also called Anagrams) using the C++ Standard Library. The earlier post is located here. This post should have come on this blog about 3 years ago. I'm no longer interested in solving problems like this but I wanted to put this up for the millions of people who get this problem in interviews and the like and to retain the same for posterity.

So, previously we've discussed how to print all possible permutations of a string without duplicates using the standard C++ library functions. That's all fine and dandy if we want to use the standard libraries – especially next_permutation, but what happens if we have to write our own code to do so?

In this blog post I'll present recursive solutions to the problem of printing all the possible permutations of a string. I'll even take up the case where the input string contains repeated letters later on in this post. The resultant function will be able to print all possible distinct permutations of a string containing duplicate characters.

Printing all permutations of a string containing no duplicate characters

Code 1: The recursive solution for a string with no duplicate characters in C/C++

char * full_string;
void permute(char * str, int length) {
    if(length == 0) {
        printf(“%s\n”, full_string);
        return;
    } else {
        for(int i = 0; i < length; ++i) {
            swap(str[0], str[i]);
            permute(str+1, length-1);
            swap(str[0], str[i]);
        }
    }
}

Call like: permute(full_string, strlen(full_string));



To understand how this works, just look at the string “ABC”.

Let us assume that we have a magic function permute2 that generates all possible permutations of a given string of length 2.

Then, the permutations problem for the string “ABC” can then be broken down as:


These permute2 values themselves can be broken down to smaller subproblems.



Here, it is obvious that permute1(any char) = char itself. This is the base case of this recursion. Unrolling the recursion for the 3 required values yields:

permute2(“BC”) = { “BC”, “CB” }
permute2(“AC”) = { “AC”, “CA”}
permute2(“AB”) = {“AB”, “BA”}

So the resultant sets of strings are:
permute3(“ABC”) = {“ABC”, “ACB”} U {“BAC”, “BCA”} U {“CAB”, “CBA”}

which is the set of all permutations of the string “ABC”. It is obvious to see that we are in fact just choosing the starting prefix of the permutation and then requesting the permute function to run on a smaller subproblem of permuting a smaller string.

To generalize, we can state that we can get all the permutations of a string using the general recurrence:



This formula basically means that you select a character sk from the string S of length n to represent the first character of the permutation, find all the permutations of the smaller string formed by removing sk from S and extend the permutations thus obtained by prefixing sk.

This is essentially what the code above is implementing. The two swap statements perform the work of selecting an element sk from the string S and replacing it back after permuting the smaller string



Printing all distinct permutations of a string containing duplicate characters

To achieve this, we just modify the above code a little. To avoid duplicates, we need to enforce the constraint that no character will be selected twice to place in the prefix before calling the subproblems. So, the resulting recursion gets modified as:


This simply means that at every stage of the recursion, pick only distinct characters. So, how do we do that? We pick the sk in ascending order.

Code 2: C/C++ code for finding all distinct permutations of a string with duplicate characters in it

char* full_string;
int counter = 0;
void permute(char *str, int length) {
    if(length == 0) {
        printf("%s\n", full_string);
        ++counter;
        return;
    } else {
        // Find the smallest char in the string set it to be the first character. Solve the subproblem for the smaller string.
        char *smallest = min_element(str, str + length);
        iter_swap(str, smallest);
        permute(str+1, length-1);
        
        // Look for the smallest element strictly greater than the first element of the current string
        char *smallest_greater = str + length;
        for(char *i = str+1; i != str+length; ++i)
            if(*i > *str && (smallest_greater == str + length || *i < *smallest_greater))
                smallest_greater = i;
        while(smallest_greater != str+length) {
            // If such an element is found, swap it into the first slot and recurse
            iter_swap(str, smallest_greater);
            permute(str+1, length-1);

            // Repeat the loop if possible for the next greater character
            smallest_greater = str + length;
            for(char *i = str+1; i != str+length; ++i)
                if(*i > *str && (smallest_greater == str + length || *i < *smallest_greater))
                    smallest_greater = i;

        }
    }
}



Code Explanation:
The fact that we are selecting the character to put at the beginning of the string str in strict ascending order gives us 2 major guarantees:

  1. No character will be selected more than once. Hence the distinct selection requirement is satisfied.
  2. All distinct characters in the string str will be selected. Hence, all the permutations will end up being generated

A small bonus of selecting the sk character in this fashion is that the permutations generated in the output will be in lexicographical order (dictionary order). This makes it easy to verify the correctness of the program.
This code uses two functions that you might not be aware of :
The functions iter_swap and min_element are provided by the standard C++ library. They are really simple and you could code your own versions easily.

  • iter_swap – it just swaps the elements pointed to by the respective pointers without changing the pointers themselves. So, it's basically equivalent to the function:
    void iter_swap(char *ptr1, char *ptr2) {
      char tmp = *ptr1; *ptr1 = *ptr2;
      *ptr2 = tmp;
    }
  • min_element – finds the location of the minimum element that exists in the given range. In this case, it is the char* pointer pointing to the location of that element. It can be implemented like:
    char * min_element(char *start, char *end) { // end is 1 beyond the last valid element
      if(start == end) return end; // empty range
      char *min_pos = start;
      for(char *iter = start+1; iter != end; ++iter) {
        if(*iter < *min_pos) min_pos = iter;
      }
      return min_pos;
    }

Of course, the standard library implements these as template functions so that they can be used with a variety of datatypes.

Code 3: The iterative solution.

The C++ standard library is an elegant repository of algorithms. You can also use the iterative next_permutation function to generate all the permutations of a string which might contain duplicates. Mark Nelson has done a superb job of explaining how next_permutation works at http://marknelson.us/2002/03/01/next-permutation/ - do take the time to go through his complete blog post. It is the best description of this algorithm till date. Sample code on how to use that function has already been provided in my previous post.

Code for this and many other algorithms can be found at the github archive: http://github.com/swagatata/ds_and_algos
However, all this is a work in progress.


12 comments:

  1. which is faster? recursive or iterative?

    ReplyDelete
    Replies
    1. Anonymous11:19 PM

      when it is between iterative and recursive the running code is always the same...but recursion makes the program simpler and shorter...

      Delete
    2. "Running code is always the same..."
      Actually, that is not true. Compilers optimize both differently and in many cases, the recursive formulation turns out to be slower by having more instructions to execute (eg. non-tail recursive calls cannot be optimized away to loops in many cases leading to needless push and pop instructions).

      However, in certain cases, the recursive formulation turns out to be faster if the compiler can take advantage of immutability of data. Referentially transparent languages provide a much greater opportunity for optimizations across function call boundaries leading to faster code in large codebases. Haskell has a number of these benchmarks and there is anecdotal evidence too. So generalizing identical cost of recursive and iterative formulations is actually incorrect.

      Regarding "recursion makes the program simpler and shorter...": This cannot be generalized too. This is probably true only for a certain class of stack based algorithms. Consider a program to do a breadth first search of a tree. If this complete program was to be written without the use of iteration (and hence without in-place queues), you'd have a nightmare of managing state across recursive invocations and best of luck explaining why that program even works to anybody. :) (This is not to say that you won't be able to do it, just that it is needlessly complicated).

      So, in short, the answer is - it depends. My gut feel is that the iterative version is going to be faster as the recursive calls to permute are not tail call optimizable leading to inefficiencies.

      @Anon/Swagat: If your views differ, I'd be happy to hear your rationale here.

      Delete
  2. Nice one again sir. ;)

    ReplyDelete
  3. Sir I am Mr. sudhansu Sekhar Nayak. I read this code carefully and understand this code.It is very nice. Thanks......

    ReplyDelete
  4. Sir I am Mr.sudhansu sekhar nayak(odisha).This Code is very interesting and it is very Nice.Thanks..

    ReplyDelete
  5. Sir I am Mr.sudhansu sekhar nayak(odisha).This Code is very interesting and it is very Nice.Thanks..

    ReplyDelete
  6. I think this would be a smaller solution..


    void permute(char s[], int m, int n)
    {
    if(m == n - 1)
    {
    puts(s);
    }

    int a[26] = {0};

    for(int i = m; i < n; i++)
    {
    if(a[s[i] - 'a'] == 1)
    continue;
    a[s[i]-'a'] = 1;
    swap(&s[i], &s[m]);
    permute(s, m+1, n);
    swap(&s[i], &s[m]);
    }
    }

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Not only does your code fail for strings containing numbers, it does not generate all permutations for an input string that contains pairs of duplicates or more:

      S = baab

      will never output 'baab' since the recursive case will hit the continue statement twice and will never reach the base case.

      Also, your solution takes O(2**sizeof(char)) extra memory and is not extensible to situations where the elements being permuted are not simple strings.

      Delete
  7. Hi, thats an interesting solution, Divye.
    How abt using visit counts to identify unique chars and altering the basic recursive solution :

    char str[SIZE];
    bool visit[SIZE][CSIZE];

    void clear(int l){
    int i,j;
    for(i=0; i<=l ; i++)
    for(j=0 ; j<CSIZE ; j++)
    visit[i][j]=false;
    }

    void permuteDup(char* s, int l){
    if(l==1)
    cout<<str<<endl;
    else{
    for(int i=0 ; i<l ; i++){

    if(visit[l-1][(int)s[i]]==false){
    visit[l-1][(int)s[i]]=true;

    SWAP(s[0],s[i],tmp);
    permuteDup(s+1,l-1);
    SWAP(s[0],s[i],tmp);
    clear(l-2);
    }
    }
    }
    }

    Extra space needed to store counts is needed O(string_length).
    Do let me know if you think it will fail on some interesting cases.

    ReplyDelete
  8. Sir your recursive solution is elegant.So i found a code on net in C#regarding string permutation could you help me out on this one?What i don't get is how this non tail recursive call works?the order of execution is also confusing me The code is posted below
    using System;


    class Permute
    {

    public void swap (ref char a,ref char b)
    {
    char c;
    if(a==b)
    return;
    c = a;
    a = b;
    b = c;
    }

    public void Set_Permutation(char[] list)
    {
    int arrayLength=list.Length-1;
    Permutation_Method(list,0,arrayLength);
    }

    public void Permutation_Method (char[] list,int k,int m)
    {
    // k intial index passed
    // m last index passed
    int i;

    if(k == m) // ---------- What is this Condition ?
    {
    Console.Write(list);
    Console.WriteLine(" ");
    }

    for(i = k; i <= m;i++) // why i=k ?
    {


    swap(ref list[k], ref list[i]); // what does this Swap do ?

    //recursive call
    Permutation_Method (list, k+1, m);

    swap(ref list[k], ref list[i]); // what does this Swap do ?
    }


    }




    static void Main()
    {

    Permute objPermutation =new Permute();
    char[]a=new char[20];
    a = Console.ReadLine().ToCharArray();
    /*calling the permute*/
    objPermutation.Set_Permutation(a);
    Console.ReadLine();
    }

    }

    ReplyDelete