Marketing Research Roundtable  

Go Back   Marketing Research Roundtable > This Is Research Stuff > General Research Discussion

Reply
 
Thread Tools Display Modes
  #1  
Old 06-25-2008, 11:02 AM
woaini woaini is offline
Apprentice
 
Join Date: Jun 2008
Posts: 5
Default Data validation tool?

Hi Gurus, I am wondering whether anybody knows what data validation tools are currently available other than the spss data preparation module? We wand to find something that can automate our data QA process, i.e., identify and flag cases that have out-of-range values or unlabeled cases, use the wrong missing data setting. We have a large quantity of variables to be checked on a frequeny basis therefore such a tool would really help. Our data is in spss format but we'd like to explore other products outside of the spss world as well. I'd appreiciate any information you could share. thanks in advance!
Reply With Quote
  #2  
Old 06-25-2008, 11:50 AM
phil_hearn's Avatar
phil_hearn phil_hearn is offline
Troubadour
 
Join Date: Sep 2004
Location: Kuala Lumpur (Malaysia), Pattaya (Thailand) and Tonbridge (UK)
Posts: 717
Default

There are several types of data validation. I am not sure whether you have a specific preference or not. I am also not sure about the level of sophistication you want.

1) Double entry - this means that a different person keys the data a second time so that it can be compared, then cleaned
2) Verification - this is where a different person keys the data a second time but can choose whether to override the first person's data or use the original
3) Batch data checking - where the checks are specified in some way then run on a batch of data - the output would be a report that allow the user to correct data
4) Interactive data checking - where the checks are specified in some way so that errors are displayed interactively for correction
5) Checking & fixing - errors are reported, changes are made with some interface and then the data is re-checked, usually automatically
6) Any of the above but with tools to make global fixes on all appropriate records satisfying a particular condition

There are different packages available for all of these solutions. Options 1 and 2 usually come with the data entry software, whereas options 3-5 are usually driven by some cleaning spec, probably using some language or command driven approach to specify the checks. Fixes can then be made on the real data or by some other interface.

My preference is to leave data in its originally entered state, putting fixes into Excel, so that an easily readable audit control can report what caused the error, who made the error, who fixed the error and whether the fix caused other errors. I prefer this approach because it is easy to undo anything that is wrongly corrected. However, you need software capable of reading Excel spreadsheets and merging/overwriting the data to do this.
Reply With Quote
  #3  
Old 06-27-2008, 06:20 PM
woaini woaini is offline
Apprentice
 
Join Date: Jun 2008
Posts: 5
Default

Quote:
Originally Posted by phil_hearn View Post
There are several types of data validation. I am not sure whether you have a specific preference or not. I am also not sure about the level of sophistication you want.

1) Double entry - this means that a different person keys the data a second time so that it can be compared, then cleaned

Phil, the data we receive is already double keyed and verified.

2) Verification - this is where a different person keys the data a second time but can choose whether to override the first person's data or use the original
3) Batch data checking - where the checks are specified in some way then run on a batch of data - the output would be a report that allow the user to correct data
4) Interactive data checking - where the checks are specified in some way so that errors are displayed interactively for correction
we have logic checks at a later processing phase.

5) Checking & fixing - errors are reported, changes are made with some interface and then the data is re-checked, usually automatically
6) Any of the above but with tools to make global fixes on all appropriate records satisfying a particular condition

There are different packages available for all of these solutions. Options 1 and 2 usually come with the data entry software, whereas options 3-5 are usually driven by some cleaning spec, probably using some language or command driven approach to specify the checks. Fixes can then be made on the real data or by some other interface.

My preference is to leave data in its originally entered state, putting fixes into Excel, so that an easily readable audit control can report what caused the error, who made the error, who fixed the error and whether the fix caused other errors. I prefer this approach because it is easy to undo anything that is wrongly corrected. However, you need software capable of reading Excel spreadsheets and merging/overwriting the data to do this.
We produce our final data in spss and needs to run QA on it. Everytime we produce many tables to check and make sure there are no out-of-range values, all data values are labeled, teh misssing settings are correct for all 500 variables, and all variables themsevles are labeled. It is a very manual process. We know that SPSS Data Preparation Module can automate part of our QA. We wonder wether there are other similar products on the market that might be better and more cost efficient. If you know such a product, I'd appreciate if you can let me know. thanks much.
Reply With Quote
  #4  
Old 06-28-2008, 01:07 AM
phil_hearn's Avatar
phil_hearn phil_hearn is offline
Troubadour
 
Join Date: Sep 2004
Location: Kuala Lumpur (Malaysia), Pattaya (Thailand) and Tonbridge (UK)
Posts: 717
Default

Woanini

It sounds as though you need a flexible and powerful data cleaning tool, so that you know about any data errors and any logical inconsistencies. The product we sell is MRDCL which is a full data editing/cleaning/reporting and tabulation package. There are a small number of competitors in this market, but, to be honest, not many. I am not aware of any powerful packages that just do 'high level' data cleaning only. Although you may feel that my view is biased, as the operator of a successful data processing bureau where data cleaning is a key aspect of our work, there are several tools that are important:

1) Powerful tools to find errors
2) Ability to run edits on batches or individual records
3) Easy ways to correct data (I like my staff to specify changes in Excel using an easy to use template)
4) Ways to have an audit trail of each record, so that you can easily see what has been changed
5) Automated reports on the level of data changes

Please contact me directly if you want more info.
phil.hearn@mrdcsoftware.com
Reply With Quote
  #5  
Old 06-28-2008, 06:28 AM
Statman's Avatar
Statman Statman is offline
Duke
 
Join Date: Sep 2004
Location: Florida, USA
Posts: 1,074
Send a message via Skype™ to Statman
Default

Are you wanting to look outside SPSS for cost reasons or as another application to do validation?

If the former then that might be difficult since your options are pretty much competitively priced, all else being equal.

If the latter then my question is Why? What version of SPSS are you running? Since about V14 or V15 its data module has enhanced greatly with the ability for multiple layers of cleaning option (I am speaking of the Base system, Client system and not the Data Entry Builder).
__________________
WMB
Statistical Services
SPSS Beta Site

mailto:info.statman@earthlink.net
http://home.earthlink.net/~info.statman
=======================================
Reply With Quote
  #6  
Old 01-30-2010, 06:44 AM
bulat bulat is offline
Apprentice
 
Join Date: Jan 2010
Posts: 1
Default Open Source Data Validation Option

Hello,

Try Flat File Checker that has all most used data validation rules:
  • Unique constraint on one or more fields
  • Comparision of values
  • Required fields
  • Relationlal links
  • Back link to the database
  • And more...

The beauty of the Flat File Checker is that you can create data specification through intuitive GUI and then execute it through command line.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Intro to data processing? macviolinist General Research Discussion 1 01-11-2009 11:07 PM
Significant Testing on Rolling Data stallion General Research Discussion 9 06-25-2008 01:13 PM
Industry data needed for India ehblancz General Research Discussion 1 01-30-2007 04:13 AM
Online Reporting Tool G---- General Research Discussion 2 11-16-2006 07:03 AM
Using Marketing Research to aid the validation of Data Mining Models kjc37 General Research Discussion 6 08-18-2004 05:38 AM


All times are GMT -5. The time now is 03:21 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.