Marketing Research Roundtable  

Go Back   Marketing Research Roundtable > This Is Research Stuff > General Research Discussion

Reply
 
Thread Tools Display Modes
  #1  
Old 04-24-2006, 07:21 AM
karanth karanth is offline
Apprentice
 
Join Date: Apr 2006
Location: Bangalore, India.
Posts: 12
Default Data considerations in cluster analysis

Hi Everybody,

SPSS gives 3 methods of clustering procedures namely, Two step cluster analysis, K-means cluster analysis and Heirarchial clustering. Now, I want to know when the sample is large, which method would be appropriate when

1. The variables are ordinal (Eg: 5 point likert scale.)
2. The variables are binary(Eg: Yes/No data)
3. The variables are continous.

Also, how to decide on the standardization method when the variables are of mixed type, Eg: A combination of binary and ordinal data.

In other words can somebody comment on the data considerations in using cluster analysis?

Thanks for the help.
Karanth.

Last edited by karanth; 04-24-2006 at 07:24 AM.
Reply With Quote
  #2  
Old 04-24-2006, 08:39 AM
Carlos Michelsen's Avatar
Carlos Michelsen Carlos Michelsen is offline
Troubadour
 
Join Date: Sep 2004
Location: London
Posts: 698
Default

For starters, if you've got a large sample size (1,500 +) then you would probably want to start with K-means, unless you've got quite a lot of processing power at your disposal.

Your other option, depending on the SPSS version you've got, is 2 step Clustering which accepts any variable type and even suggests the best cluster solution. Tried to use it once, but the solution it gave (2 clusters) was not that brilliant.

That leaves you K-means, just be sure you standarize your variables (Z-scores) before including them in the procedure.
Reply With Quote
  #3  
Old 04-24-2006, 01:56 PM
Statman's Avatar
Statman Statman is offline
Duke
 
Join Date: Sep 2004
Location: Florida, USA
Posts: 1,074
Send a message via Skype™ to Statman
Default

Carlos did a good abbreviated description but I strongly suggest you simply go to SPSS' Help>Topics and enter cluster. This should give you all you need and more (I am running V14 but don't think much has changed since, perhaps, V12)

BTY, Carlos, I am not sure that K-means "requires" using the Z-scores, only that the data are interval or ratio?
__________________
WMB
Statistical Services
SPSS Beta Site

mailto:info.statman@earthlink.net
http://home.earthlink.net/~info.statman
=======================================

Last edited by Statman; 04-24-2006 at 01:59 PM.
Reply With Quote
  #4  
Old 04-25-2006, 08:59 AM
karanth karanth is offline
Apprentice
 
Join Date: Apr 2006
Location: Bangalore, India.
Posts: 12
Default

Hello,

Do I need to standardize the variables if all of them are of the same data type? Example: If all of my data is in 5-point likert scale do I have to standardize variables? Is it not like when the data types are mixed only we have to standardize the data?

Also, I see that K-means are applied for interval and ratio scale data. Is it applicable for ordinal data? Or is it applicable for binary data?

Thanks,
Karanth
Reply With Quote
  #5  
Old 04-25-2006, 01:29 PM
Philip Moore's Avatar
Philip Moore Philip Moore is offline
Knight
 
Join Date: Sep 2004
Location: Centreville, VA
Posts: 384
Send a message via Skype™ to Philip Moore
Wink A little trick

One thing that always gets me with cluster analysis using likert scale inputs is those two dang clusters that always seem to form of respondents who tend to use the top end of the scale most of the time and respondents who tend to use the botton end of the scale most of the time. This little trick has been successful for me.

Rather than normalizing on the average score for the survey question, I first like to normalize on the average score for the respondent. This assumes that each individual respondent has their own internal reference point rather than that defined by the scale. Then all I'm clustering on is how the respondent considers each measure relative to their internal reference point.

I know I'm bad, but sometimes you just have to trick the respondents into being useful despite their best efforts to thwart us.
Reply With Quote
  #6  
Old 04-25-2006, 03:42 PM
Statman's Avatar
Statman Statman is offline
Duke
 
Join Date: Sep 2004
Location: Florida, USA
Posts: 1,074
Send a message via Skype™ to Statman
Default

So right Phillip wrt the respondent and an interesting normalization.

BTY Phillip, are the scales now "scale," still ordinal or interval? [Refer back to the thread on measurement scales]

S
__________________
WMB
Statistical Services
SPSS Beta Site

mailto:info.statman@earthlink.net
http://home.earthlink.net/~info.statman
=======================================
Reply With Quote
  #7  
Old 04-26-2006, 02:14 PM
Philip Moore's Avatar
Philip Moore Philip Moore is offline
Knight
 
Join Date: Sep 2004
Location: Centreville, VA
Posts: 384
Send a message via Skype™ to Philip Moore
Lightbulb transformation

Quote:
Originally Posted by Statman
Philip, are the scales now "scale," still ordinal or interval? [Refer back to the thread on measurement scales]

S
Good question,

Before the transformation, the survey measures are ordinal Likert values.

After the transformation, each measure is the # of standard deviations of the measure from the mean across all the Likert ratings, and so is interval data.

I'm thinking at this point some of the readers may be lost without an example so here's what we're talking about.

Lots of surveys have question series like this:

How strongly do you agree or disagree with the following statements, 7=strongly agree, 1=strongly disagree, 4 = neither/neutral

___ This forum is fabulous
___ I want people to know I use this forum
___ Scott Spain is a studd
___ Statman's posts are informative
___ Philip's posts are confusing
___ This forum is better than other forums like it

etc

Often, with this type of series you will get respondent data that looks like:

case 1: 6,7,6,7,7,5
case 2: 7,7,5,5,6,7
case 3: 4,5,4,4,5,5
case 4: 2,3,2,2,2,3

Clearly the variation across the measures is more a function of how the respondents use the scale than a function of actual variation across the measure. When I encounter this pattern in data, I control for the scale-use tendencies of each respondent by calculating the mean and standard deviation for all of the individual respondent's answers on the same question type, then normalize each measure by dividing the difference between the response and the scale-use mean by the scale-use standard deviation. So the questions where the respondent deviates from their scale-use mean the most have the greatest values (positive or negative).

These normalized values usually give me a much more robust and meaningful cluster solution.
Reply With Quote
  #8  
Old 09-28-2006, 04:03 PM
ppal ppal is offline
Apprentice
 
Join Date: Sep 2006
Posts: 3
Default

Phillip,
Could you please clarify/explain a bit more what you mean by "scale use mean", "scale use standard deviation"? Thanks.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Data consideration for Cluster Analysis zabb_4u General Research Discussion 2 05-29-2007 03:23 PM
which data analysis technique i should use jagtaprv General Research Discussion 2 11-01-2006 09:56 AM
Filter question for data analysis Adriane General Research Discussion 1 03-13-2006 03:35 PM
Quick cluster analysis tutoring ehblancz General Research Discussion 5 07-01-2005 11:28 AM
Cluster Analysis c.knigge General Research Discussion 1 11-10-2004 03:10 PM


All times are GMT -5. The time now is 04:16 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.