samedi 28 mars 2015

Algorithm to truncate string and eliminate duplicates (case insensitive)

I have a set of strings that meet the following constraints



  • Case sensitive

  • Max character length of 10


I want to convert these strings so that following new constraints are valid (instead of the previous constraints)



  • Case insensitive

  • Max character length of 5


Suppose the initial set of strings are as follows



city, City, cIty, ciTy, citY, CIty, cITy, ciTY, CITy, cITY, CITY


I have a partial algorithm that maps these strings to the following



cit, cit1, cit2, cit3, cit4, cit5, cit6, cit7, cit8, cit9, cit10


This is done by using the following logic



  • Consider the first string as the common prefix

  • Count the number of matches in the rest of the strings (case insensitive match). In the current case it is 10

  • Determine the number of characters required for the suffix. In the current case since I need to generate suffices for 1 through 10, I need to reserve 2 characters for the suffix

  • Truncate the common prefix to (Max characters - Number of characters for suffix). In the current case it is (5 - 2) i.e 3 characters

  • Generate strings by concatenating the truncated common prefix and the suffix


Using the above I am able to map the old set of strings to the new set and satisfy the new constraints.


However my algorithm breaks if the original set itself had any string that gets generated by the algorithm


Suppose the initial set of strings was



city, cit1, cit2, City, cIty, ciTy, citY, CIty, cITy, ciTY, CITy, cITY, CITY,


In this case, since cit1 and cit2 already exist in the initial set, the algorithm breaks (since it generates duplicate cit1 and cit2)


Is there any way I can recursively handle this ?


Aucun commentaire:

Enregistrer un commentaire