0 Comments

When running user interface that accepts search criteria where you have to build and run query against database, would it be wonderful to support rules similar to Google advanced search rules?

Search for an exact word or phrase

"search"

Use quotes to search for an exact word or set of words on a web page. This is helpful when searching for song lyrics or a line from a book. But only use this if you're looking for an exact word or phrase, otherwise you'll exclude many helpful results by mistake.

"imagine all the people"

Exclude a word

-search

Add a dash (-) before a word or site to exclude all results that include that word. This is especially useful for words with multiple meanings, like Jaguar the car brand and jaguar the animal. 

jaguar speed –car

Regular Expressions would be perfect for this task.

And even though I am no expert on regular expressions I can certainly try.

If you think about it, the underlying task here is to take a search sentence and break it down into a list of tokens which contains individual words or phrases. One list for positive criteria with every word or phrase that began with either a ‘+’ character in front or empty space. Another list of negative tokens, this would be the words or phrases prefixed with ‘-‘ character. Anything in double quotes should be considered as single token.

To build a regular expression for matching positive and negative search criteria you must use negative look behind to see if your search keyword started with ‘-‘ character or ‘+’ character. The problem is when you use look-behind with long search expressions in double quotes it becomes a challenge to ignore everything inside the double quotes. For example for negative search you might match the word positive even though it is inside the double quotes just because it has ‘-‘ character in front of it:

+”this is a -positive search

And vice versa can match something inside negative search criteria:

-"this is negative"

So, instead of writing more complex regular expression I would cheat, i.e. replace every double quoted phrase with a temporary token first. Would use the following function:

private static Dictionary<string string> TextInDoubleQuotesToTokenList(string data, out string outputData) 
{ 
    var items = new Regex("\"[^\"]*\""); 
    outputData = data; 
    var key = 0; 
    var matches = items.Matches(data).Cast<object>()
        .ToDictionary(match => Token + key++, match => match == null || string.IsNullOrEmpty(match.ToString()) ? 
        string.Empty : 
        match.ToString().Replace("\"", "")); 
    var count = 0; 
    foreach (var match in matches) 
    { 
        outputData = data.Replace(match.Value, Token + count++); 
    } 
    return matches; 
}

This function will replace any text in double quotes with temporary token. For example:

“this is a positive” –“this is negative” one two –three

Will be replaced with:

@1 –@2 one two –three

Function also returns back dictionary of replaced strings where the key is the token like @1 or @2.

This way you do not have to worry about any characters inside the double quotes. After matching all the negative and positive criteria replace the tokens with original strings from the dictionary and you are done.

Here is what positive match function looks like:

private static List GetPositiveSearchCriteria(string data)
{
    string outputData;
    var tokens = TextInDoubleQuotesToTokenList(data, out outputData);

    var positiveSearch = new List();            
    try
    {
        var exp = new Regex("((?<=(\"(?[^\\s\"+-][^\"]+))(?=\")|(?[\\w][^,\\s\\-\\+\"]+)(?![\\w\"])", RegexOptions.CultureInvariant);
        var matchList = exp.Matches(outputData);

        if (matchList.Count == 0)
        {
            return null;
        }

        for (var i = 0; i < matchList.Count; i++)
        {
            if (!matchList[i].Success)
                continue;

            var index = i;

            var key = matchList[index].Groups["PosSearch"].Value;
            if (string.IsNullOrEmpty(key))
                continue;

            positiveSearch.Add(tokens.ContainsKey(key) ? tokens[key] : key);
        }
    }
    catch(ArgumentException)
    {
        return null;
    }

    return positiveSearch;
}

And negative:

 

private static List GetNegativeSearchCriteria(string data)
{
    string outputData;
    var tokens = TextInDoubleQuotesToTokenList(data, out outputData);

    var negativeSearch = new List();
    try
    {
        var exp = new Regex("(?<=[\\-]\"?)((?<=\")(?[^\"]+)|(?[^\\s,\\+\\-\"]+))", RegexOptions.IgnorePatternWhitespace);
        var matchList = exp.Matches(outputData);

        if (matchList.Count == 0)
        {
            return null;
        }

        for (var i = 0; i < matchList.Count; i++)
        {
            if (!matchList[i].Success)
                continue;

            var index = i;

            var key = matchList[index].Groups["NegSearch"].Value;
            if (string.IsNullOrEmpty(key))
                continue;

            negativeSearch.Add(tokens.ContainsKey(key) ? tokens[key] : key);
        }
    }
    catch(ArgumentException)
    {
        return null;
    }

    return negativeSearch;
}

This is it. Once you have both lists of positive and negative criteria you can run SQL in and not in or if you are using Linq to SQL or Entity Framework can use contains function

var positive = GetPositiveSearchCriteria(searchDetails);
var negative = GetNegativeSearchCriteria(searchDetails);

if (positive != null)
{
	searchItems = from p in searchItems
  	where positive.Contains(p.ColumnName)
    select p;
}

if (negative != null)
{
	searchItems = from n in searchItems
  	where !negative.Contains(n.ColumnName)
    select n;
}

Cheers.