![]() |
|
#1
|
|||
|
|||
|
Hi Gurus, I am wondering whether anybody knows what data validation tools are currently available other than the spss data preparation module? We wand to find something that can automate our data QA process, i.e., identify and flag cases that have out-of-range values or unlabeled cases, use the wrong missing data setting. We have a large quantity of variables to be checked on a frequeny basis therefore such a tool would really help. Our data is in spss format but we'd like to explore other products outside of the spss world as well. I'd appreiciate any information you could share. thanks in advance!
|
|
#2
|
||||
|
||||
|
There are several types of data validation. I am not sure whether you have a specific preference or not. I am also not sure about the level of sophistication you want.
1) Double entry - this means that a different person keys the data a second time so that it can be compared, then cleaned 2) Verification - this is where a different person keys the data a second time but can choose whether to override the first person's data or use the original 3) Batch data checking - where the checks are specified in some way then run on a batch of data - the output would be a report that allow the user to correct data 4) Interactive data checking - where the checks are specified in some way so that errors are displayed interactively for correction 5) Checking & fixing - errors are reported, changes are made with some interface and then the data is re-checked, usually automatically 6) Any of the above but with tools to make global fixes on all appropriate records satisfying a particular condition There are different packages available for all of these solutions. Options 1 and 2 usually come with the data entry software, whereas options 3-5 are usually driven by some cleaning spec, probably using some language or command driven approach to specify the checks. Fixes can then be made on the real data or by some other interface. My preference is to leave data in its originally entered state, putting fixes into Excel, so that an easily readable audit control can report what caused the error, who made the error, who fixed the error and whether the fix caused other errors. I prefer this approach because it is easy to undo anything that is wrongly corrected. However, you need software capable of reading Excel spreadsheets and merging/overwriting the data to do this. |
|
#3
|
|||
|
|||
|
Quote:
|
|
#4
|
||||
|
||||
|
Woanini
It sounds as though you need a flexible and powerful data cleaning tool, so that you know about any data errors and any logical inconsistencies. The product we sell is MRDCL which is a full data editing/cleaning/reporting and tabulation package. There are a small number of competitors in this market, but, to be honest, not many. I am not aware of any powerful packages that just do 'high level' data cleaning only. Although you may feel that my view is biased, as the operator of a successful data processing bureau where data cleaning is a key aspect of our work, there are several tools that are important: 1) Powerful tools to find errors 2) Ability to run edits on batches or individual records 3) Easy ways to correct data (I like my staff to specify changes in Excel using an easy to use template) 4) Ways to have an audit trail of each record, so that you can easily see what has been changed 5) Automated reports on the level of data changes Please contact me directly if you want more info. phil.hearn@mrdcsoftware.com |
|
#5
|
||||
|
||||
|
Are you wanting to look outside SPSS for cost reasons or as another application to do validation?
If the former then that might be difficult since your options are pretty much competitively priced, all else being equal. If the latter then my question is Why? What version of SPSS are you running? Since about V14 or V15 its data module has enhanced greatly with the ability for multiple layers of cleaning option (I am speaking of the Base system, Client system and not the Data Entry Builder).
__________________
WMB Statistical Services SPSS Beta Site mailto:info.statman@earthlink.net http://home.earthlink.net/~info.statman ======================================= |
|
#6
|
|||
|
|||
|
Hello,
Try Flat File Checker that has all most used data validation rules:
The beauty of the Flat File Checker is that you can create data specification through intuitive GUI and then execute it through command line. |
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Intro to data processing? | macviolinist | General Research Discussion | 1 | 01-11-2009 11:07 PM |
| Significant Testing on Rolling Data | stallion | General Research Discussion | 9 | 06-25-2008 01:13 PM |
| Industry data needed for India | ehblancz | General Research Discussion | 1 | 01-30-2007 04:13 AM |
| Online Reporting Tool | G---- | General Research Discussion | 2 | 11-16-2006 07:03 AM |
| Using Marketing Research to aid the validation of Data Mining Models | kjc37 | General Research Discussion | 6 | 08-18-2004 05:38 AM |