Fuzzy logic in TSQL

Pain: Fuzzy logic in TSQL

For those who have worked as SQL developer must have gone through the tough period of working with manually entered data.
 
A certain company works with manual data entry for most of its early life and when IT enters (drumroll), they feel so happy to dump all those excel spreadsheets to create Master Data.

 
There is a famous saying “To err is human”
But for manual data entry, to err everywhere is truth. 

 
So when the IT guy (I wanted to add a gif here showing big specs, scrambled formals, atleast 3 pizza boxes on desk, eyes stretched out with no sleep in last 48 hours but still trying to find out the bug)
 
So when the IT guy, tries to match multiple rows of “similar” data, with no valid reference whatsoever, the problem arises.
 
I know this can be done very well in SSIS by Fuzzy Lookup but not everyone is acquainted with SSIS nor want to leave the warmth of stored procedure

Dream: Google “Did you mean” functionality

I dont think there is any another awesome, incredible, excellent solution to “similar” word problem in the world. Building Google’s “Did you mean” functionality is a task of lifetime

Fix: Use TSQL inbuilt function SOUNDEX()

This is one of the high potential but very less popular function of TSQL. It uses a very unique algorithm briefly explained here .

Lets see this amazing function in operation

We will make a quick table.
CREATE TABLE [dbo].[tblCompanyName]([Name] [nvarchar](50) NOT NULL).

Lets add a company called “ABC Holdings INC” in multiple ways to illustrate data entry problems.
INSERT INTO TBLCOMPANYNAME VALUES(‘ABC HOLDINGS’),(‘ABC HONDING LTD’),(‘ABC HOLDINGS INC’),(‘ABC HOLDING inc’),(‘abc HAOLDING INC’).

Now that same IT guy has challenge to assign all these rows with unique company code.

Lets see how our traditional method will work out here.

SELECT NAME FROM TBLCOMPANYNAME WHERE NAME LIKE ‘%ABC HOLDING%’.
Result

NAME

————————————————–

ABC HOLDINGS

ABC HOLDINGS INC

ABC HOLDING inc

(3 row(s) affected)

Lets do the same thing with SOUNDEX function.
SELECT NAME FROM TBLCOMPANYNAME WHERE SOUNDEX(NAME) = SOUNDEX(‘ABC HOLDINGS’).
Result

NAME

————————————————–

ABC HOLDINGS

ABC HONDING LTD

ABC HOLDINGS INC

ABC HOLDING inc

abc HAOLDING INC

(5 row(s) affected)

 

TADA

Needless to say LIKE operator in our traditional method is processor intensive too

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s