C++ Wildcard Pattern Matching

 www.partow.net  .: Home :.   .: Links :.   .: Search :.   .: Contact :. 


Description

The C++ wildcard pattern matching library is a simple to use and very efficient wildcard matching/globbing library. The library supports the following wildcard options:

WildcardMeaning
*Match zero or more characters
?Match exactly one character





Wildcard Pattern Matching Algorithm

The following is a generic implementation of an iterative algorithm for wildcard pattern matching (aka globbing) taking into account the wildcards as described above. The algorithm is wholly iterator based and as such can be used to match patterns over sequences of any element type that is trivially comparable [C++ code: globmatch.hpp] :

template <typename Compare,
          typename Iterator,
          typename ValueType = typename std::iterator_traits<Iterator>::value_type>
inline bool match_impl(const Iterator pattern_begin,
                       const Iterator pattern_end  ,
                       const Iterator data_begin   ,
                       const Iterator data_end     ,
                       const ValueType zero_or_more,
                       const ValueType exactly_one )
{
   typedef typename std::iterator_traits<Iterator>::value_type type;

   const Iterator null_itr(0);

   Iterator p_itr  = pattern_begin;
   Iterator d_itr  = data_begin;
   Iterator np_itr = null_itr;
   Iterator nd_itr = null_itr;

   for ( ; ; )
   {
      if (pattern_end != p_itr)
      {
         const type c = *(p_itr);

         if ((data_end != d_itr) && (Compare::cmp(c,*(d_itr)) || (exactly_one == c)))
         {
            ++d_itr;
            ++p_itr;
            continue;
         }
         else if (zero_or_more == c)
         {
            while ((pattern_end != p_itr) && (zero_or_more == *(p_itr)))
            {
               ++p_itr;
            }

            const type d = *(p_itr);

            while ((data_end != d_itr) && !(Compare::cmp(d,*(d_itr)) || (exactly_one == d)))
            {
               ++d_itr;
            }

            // set backtrack iterators
            np_itr = p_itr - 1;
            nd_itr = d_itr + 1;

            continue;
         }
      }
      else if (data_end == d_itr)
         return true;

      if ((data_end == d_itr) || (null_itr == nd_itr))
          return false;

      p_itr = np_itr;
      d_itr = nd_itr;
   }

   return true;
}

Note: Compare is a policy that is used to compare the current c from the pattern with the next data element. Examples of such policies include case sensitive/insensitive matching, the following are some possible implementations:

struct cs_match
{
   static inline bool cmp(const char c0, const char c1)
   {
      return (c0 == c1);
   }
};

struct cis_match
{
   static inline bool cmp(const char c0, const char c1)
   {
      return (std::tolower(c0) == std::tolower(c1));
   }
};



Wildcard Pattern Matching Finite State Machine Diagram

The following is an FSM diagram of the above denoted algorithm. Where state S0 is the initial state, and state S12 is the final state.

Wildcard Matching Algorithm Finite State Machine Diagram - Copyright Arash Partow



Wildcard Pattern Matching Examples


Example 01 - Basic string matching

The following example demonstrates the use of the glob::match free function in testing if the string data matches the pattern string pattern, where the defaults of * and . are assumed for match_one_or_more and match_exatcly_one respectively. [source: wildcardmatch_example01.cpp]

#include <cstdio>
#include <string>

#include "globmatch.hpp"

int main()
{
   const std::string data    = "How now brown cow!";
   const std::string pattern = "How*bro.n*cow.";

   if (glob::match(data, pattern))
   {
      printf("'%s' matches pattern '%s'\n",
             data.c_str(),
             pattern.c_str());
   }

   return 0;
}
 


Example 02 - Redefining wildcard characters

The following example demonstrates the use of the glob::match free function in testing if the string data matches the pattern string pattern, where the wildcard characters are explicitly defined as being * and ? for match_one_or_more and match_exatcly_one respectively. [source: wildcardmatch_example02.cpp]

#include <cstdio>
#include <string>

#include "globmatch.hpp"

int main()
{
   const std::string data    = "How now brown cow!";
   const std::string pattern = "How*bro?n*cow?";

   if (glob::match(data, pattern, '*', '?'))
   {
      printf("'%s' matches pattern '%s'\n",
             data.c_str(),
             pattern.c_str());
   }

   return 0;
}
 


Example 03 - Case-insensitive string matching

The following example demonstrates the use of the glob::match free function in testing if the string data matches the pattern string pattern. Furthermore a case-insensitive comparator policy is used and the defaults of * and . are assumed for match_one_or_more and match_exatcly_one respectively. [source: wildcardmatch_example03.cpp]

#include <cstdio>
#include <string>

#include "globmatch.hpp"

int main()
{
   const std::string data    = "How now brown cow!";
   const std::string pattern = "hOw*BrO.n*CoW.";

   if (glob::match<glob::details::cis_match>(data, pattern))
   {
      printf("'%s' matches pattern '%s'\n",
             data.c_str(),
             pattern.c_str());
   }

   return 0;
}
 


Example 04 - Case-insensitive and redefined wildcard character string matching

The following example demonstrates the use of the glob::match free function in testing if the string data matches the pattern string pattern. Furthermore a case-insensitive comparator policy is used and where the wildcard characters are explicitly defined as being * and ? for match_one_or_more and match_exatcly_one respectively. [source: wildcardmatch_example04.cpp]

#include <cstdio>
#include <string>

#include "globmatch.hpp"

int main()
{
   const std::string data    = "How now brown cow!";
   const std::string pattern = "hOw*BrO?n*CoW?";

   if (glob::match<glob::details::cis_match>(data, pattern, '*', '?'))
   {
      printf("'%s' matches pattern '%s'\n",
             data.c_str(),
             pattern.c_str());
   }

   return 0;
}
 


Example 05 - Integer sequence wildcard matching

The following example demonstrates the use of the glob::match free function in testing if the std::vector of type int data matches the given pattern, where the wildcard symbols are explicitly defined as being the integer values of 0 and -1 for match_one_or_more and match_exatcly_one respectively. [source: wildcardmatch_example05.cpp]

#include <cstdio>
#include <vector>

#include "globmatch.hpp"

int main()
{
   using T = int;

   const std::vector<T> data    = { 1, 2,  3, 9, 8, 7, 4, 5, 6 };
   const std::vector<T> pattern = { 1, 2, -1, 9, 0, 5, 6 };

   const T zero_or_more = T( 0);
   const T exactly_one  = T(-1);

   const auto result = glob::details::match_impl<glob::details::general_match<int>>
   (
      std::begin(pattern), std::end(pattern),
      std::begin(data   ), std::end(data   ),
      zero_or_more,
      exactly_one
   );

   if (result)
   {
      printf("'data' vector matches 'pattern' vector\n");
   }

   return 0;
}
 


Downloads

License

Free use of this code is permitted under the guidelines and in accordance with the MIT License.

Compatibility

The C++ Wildcard Pattern Matching implementation is compatible with the following C++ compilers:

  • GNU Compiler Collection (3.5+)
  • Clang/LLVM (1.1+)
  • Microsoft Visual Studio C++ Compiler (7.1+)
  • Intel® C++ Compiler (8.x+)
  • AMD Optimizing C++ Compiler (1.2+)
  • Nvidia C++ Compiler (19.x+)
  • PGI C++ (10.x+)
  • IBM XL C/C++ (9.x+)
  • C++ Builder (XE4+)


Updates

  • 20111023: Updated to C++11, fixes to support cbegin/cend iterators and improved bounds checking
  • 20080108: Added comparison policy functionality
  • 20030226: Added loops for skip zero_or_more and skip to next pattern element
  • 20010524: Added user defined wildcard characters
  • 20010517: Initial release



© Arash Partow. All Rights Reserved.