0 Comments

When running user interface that accepts search criteria where you have to build and run query against database, would it be wonderful to support rules similar to Google advanced search rules?

Search for an exact word or phrase

"search"

Use quotes to search for an exact word or set of words on a web page. This is helpful when searching for song lyrics or a line from a book. But only use this if you're looking for an exact word or phrase, otherwise you'll exclude many helpful results by mistake.

"imagine all the people"

Exclude a word

-search

Add a dash (-) before a word or site to exclude all results that include that word. This is especially useful for words with multiple meanings, like Jaguar the car brand and jaguar the animal. 

jaguar speed –car

Regular Expressions would be perfect for this task.

And even though I am no expert on regular expressions I can certainly try.

If you think about it, the underlying task here is to take a search sentence and break it down into a list of tokens which contains individual words or phrases. One list for positive criteria with every word or phrase that began with either a ‘+’ character in front or empty space. Another list of negative tokens, this would be the words or phrases prefixed with ‘-‘ character. Anything in double quotes should be considered as single token.

To build a regular expression for matching positive and negative search criteria you must use negative look behind to see if your search keyword started with ‘-‘ character or ‘+’ character. The problem is when you use look-behind with long search expressions in double quotes it becomes a challenge to ignore everything inside the double quotes. For example for negative search you might match the word positive even though it is inside the double quotes just because it has ‘-‘ character in front of it:

+”this is a -positive search

And vice versa can match something inside negative search criteria:

-"this is negative"

So, instead of writing more complex regular expression I would cheat, i.e. replace every double quoted phrase with a temporary token first. Would use the following function:

private static Dictionary<string string> TextInDoubleQuotesToTokenList(string data, out string outputData) 
{ 
    var items = new Regex("\"[^\"]*\""); 
    outputData = data; 
    var key = 0; 
    var matches = items.Matches(data).Cast<object>()
        .ToDictionary(match => Token + key++, match => match == null || string.IsNullOrEmpty(match.ToString()) ? 
        string.Empty : 
        match.ToString().Replace("\"", "")); 
    var count = 0; 
    foreach (var match in matches) 
    { 
        outputData = data.Replace(match.Value, Token + count++); 
    } 
    return matches; 
}

This function will replace any text in double quotes with temporary token. For example:

“this is a positive” –“this is negative” one two –three

Will be replaced with:

@1 –@2 one two –three

Function also returns back dictionary of replaced strings where the key is the token like @1 or @2.

This way you do not have to worry about any characters inside the double quotes. After matching all the negative and positive criteria replace the tokens with original strings from the dictionary and you are done.

Here is what positive match function looks like:

private static List GetPositiveSearchCriteria(string data)
{
    string outputData;
    var tokens = TextInDoubleQuotesToTokenList(data, out outputData);

    var positiveSearch = new List();            
    try
    {
        var exp = new Regex("((?<=(\"(?[^\\s\"+-][^\"]+))(?=\")|(?[\\w][^,\\s\\-\\+\"]+)(?![\\w\"])", RegexOptions.CultureInvariant);
        var matchList = exp.Matches(outputData);

        if (matchList.Count == 0)
        {
            return null;
        }

        for (var i = 0; i < matchList.Count; i++)
        {
            if (!matchList[i].Success)
                continue;

            var index = i;

            var key = matchList[index].Groups["PosSearch"].Value;
            if (string.IsNullOrEmpty(key))
                continue;

            positiveSearch.Add(tokens.ContainsKey(key) ? tokens[key] : key);
        }
    }
    catch(ArgumentException)
    {
        return null;
    }

    return positiveSearch;
}

And negative:

 

private static List GetNegativeSearchCriteria(string data)
{
    string outputData;
    var tokens = TextInDoubleQuotesToTokenList(data, out outputData);

    var negativeSearch = new List();
    try
    {
        var exp = new Regex("(?<=[\\-]\"?)((?<=\")(?[^\"]+)|(?[^\\s,\\+\\-\"]+))", RegexOptions.IgnorePatternWhitespace);
        var matchList = exp.Matches(outputData);

        if (matchList.Count == 0)
        {
            return null;
        }

        for (var i = 0; i < matchList.Count; i++)
        {
            if (!matchList[i].Success)
                continue;

            var index = i;

            var key = matchList[index].Groups["NegSearch"].Value;
            if (string.IsNullOrEmpty(key))
                continue;

            negativeSearch.Add(tokens.ContainsKey(key) ? tokens[key] : key);
        }
    }
    catch(ArgumentException)
    {
        return null;
    }

    return negativeSearch;
}

This is it. Once you have both lists of positive and negative criteria you can run SQL in and not in or if you are using Linq to SQL or Entity Framework can use contains function

var positive = GetPositiveSearchCriteria(searchDetails);
var negative = GetNegativeSearchCriteria(searchDetails);

if (positive != null)
{
	searchItems = from p in searchItems
  	where positive.Contains(p.ColumnName)
    select p;
}

if (negative != null)
{
	searchItems = from n in searchItems
  	where !negative.Contains(n.ColumnName)
    select n;
}

Cheers.

0 Comments

Exceptions is very fundamental concept in CLR and every developer uses them extensively, but I am still surprised how many times I see incorrect use of exceptions during code reviews. For example:

When is it appropriate to have an empty catch block? Too many times I see some variation of this:

try
{
    PerformOperation();
}
catch
{                    
    // Do nothing
}

In my opinion – this is always bad. At the very least there has to be some logging framework in place like log4net, or Enterprise Library or custom logging solution, or tracing. Besides, you cannot possibly anticipate every type of exception that PerformOperation() function might throw, so empty catch block is the equivalent of sweeping issues under the rug, which is careless and even dangerous.

How about this code?

try
{
    PerformOperation();
}
catch (Exception ex)
{                    
    throw ex;
}

Also very sloppy and absolutely not recommended. Rethrowing exception this way resets the starting point of the exception, so you loose all the valuable stack trace information that caused that exception in the first place. The appropriate way is to execute throw; And even then if exception becomes unhandled Windows will reset the stack’s starting point, so tough luck Sad smile. If this bothers you you can do the following suggestion straight from Richter’s book:

 

var trySucceeds = false;
try
{
    // method code goes here
    trySucceeds = true;
}
finally
{
    if (!trySucceeds)
    {
       // catch code goes here
    }
}

So, going back to the previous issue, here is the question: is the code below correct?

try
{
    PerformOperation();
}
catch (Exception)
{                    
    throw;
}

 

And the answer is – not really.

Even though the stack trace is preserved (possibly) and exception is re-thrown. There is no logging statement, there is no code to reset data into its original state, therefore there is no value in having try / catch block around that function. Let that exception bubble up the stack. Thus try / catch block in this instance is totally unnecessary.

This is important because it is unusual to have a single layer under your client code or UI. Usually there would be some dependency on a service layer or a database layer, so you call a function with try / catch that calls a function with try / catch that calls another one etc. And if every one of these layers catches everything … – you get the picture.

For example. Let’s say you have your logging framework in place. And let’s say the requirement is not to show any exceptions to the user, or there is no user, because you have some process calling your code so you do need to catch everything. So you write something like:

// Some logic on UI thread calling WCF service 
try
{
      var result = _serviceModel.CallService();
      // ...
}
catch (Exception ex)
{
     ProcessException(ex);
}

// then somewhere in ServiceModel class ... 
public string CallService ()
{
     try
     {
          OpenChannel();
          return Client.GetData();
     }
     catch(Exception ex)
     {
          LogException(ex);
          throw;
     }
     finally
     {
          CloseChannel();
     }
}

So “catch all” statements are bad but nested “catch all” statements are even worst.

 

My point is: that in case when you absolutely have to have “catch all” statement, then keep it on the very top level, everything else either catch a specific exception or let it propagate up the stack. So this should look like:

// Some logic on UI thread calling WCF service 
try
{
    var result = _serviceModel.CallService();
}

// hopefully the service is wrapping exceptions into specific faults. 
catch (FaultException ex)
{
     ProcessFault(ex);
}
catch (Exception ex)
{
     ProcessException(ex);
}

The lower code should be either this:

public string CallService ()
{
     try
     {
          OpenChannel();
          return Client.GetData();
     }
     finally
     {
          CloseChannel();
     }
}

Or this:

public string CallService ()
{
     try
     {
          OpenChannel();
          return Client.GetData();
     }
     catch(CommunicationException ex)
     {
          LogException(ex);
          return null;
     }
     finally
     {
          CloseChannel();
     }
}

There is a difference. When you catching specific exceptions you anticipate the failure, so there is no reason to throw it again. So catching everything statement should be reserved for unexpected faults only.

Same thing when making database calls, i.e. only catch specific exceptions like DbException or SQLxception and let the others bubble up.

Bottom line: catching everything is bad, catching everything inside the dependent layers of your code is worst, and having empty catch statement is just evil.

Cheers.