A User's Guide to Conjoint Analysis

Author: Paul Richard McCullough
Published May 2007 by Marketing Research Magazine

Conjoint Analysis is the most powerful and important family of analytic techniques in all of marketing research. How else can you answer, all at once, questions of this strategic and tactical magnitude:

What price will maximize my profits?
What features should my product have?
How many of these will I sell?
Who will buy them?
Why will they buy them?
What will happen to my sales if my competitor alters his product line?
What market action can I take that will be most devastating to my competitor?

Conjoint Analysis is only the best method, however, if you do it right. Part of the fun of Conjoint Analysis is that it has many moving parts. But the more complex the approach, the more ways to go wrong.

It has been said that the term “Conjoint” was derived from two words: “considered jointly.” I cannot confirm that this is true but it does illustrate the fundamental idea behind Conjoint Analysis. In Conjoint Analysis, products or services are described by sets of attribute values or levels and respondents’ purchase interest is then measured. Thus, a respondent might be shown a red Ford pick-up with a V-8 engine priced at $20,000. He or she must “consider jointly” all of the attributes describing that pick-up when deciding how interested he or she is in purchasing the vehicle.

The primary purpose of Conjoint Analysis is to model human behavior, usually purchase behavior. By measuring purchase interest in a “complete” product or service, Conjoint Analysis captures the essential dilemma of market choice: the “perfect” product is seldom available but lesser alternatives are. By forcing respondents to trade-off competing values and needs, Conjoint Analysis is able to uncover purchase motivations that the respondent may be unwilling to admit to and, sometimes, may even be unaware that he or she has.

Conjoint Analysis addresses big issues with very specific answers. As a result, when Conjoint Analysis fails, it often fails spectacularly. Nonsense conclusions such as “doubling price will double sales” do not sit well with experienced marketers. Study disasters contribute not only to the poor reputation of Conjoint Analysis within some organizations but also to the reputation of the marketing research department in general.

Conjoint failures are generally the result of researchers who do not know how to, or at least fail to, properly design their Conjoint studies and/or correctly interpret the output. Powerful, user-friendly software gives us opportunities to make mistakes we may not even be aware of.

If you want to do Conjoint Analysis right, be aware of the ways you can do it wrong.

Conjoint Techniques

Conjoint Analysis is an ever-growing family of techniques that can be broken into three main branches:

Ratings-based Conjoint
Choice-based Conjoint
Hybrid techniques

For this article, I will not include self-explicated scaling as a stand-alone Conjoint technique since it does not force respondents to make trade-offs. You can refer to Larry Gibson’s article in the Winter 2001 issue of Marketing Research for an enthusiastic discussion of the virtues of self-explicated scaling.

The first step in doing Conjoint right is to pick the most appropriate method for your particular objectives and circumstances. In principle, the right technique will be the one that most closely mimics your marketplace dynamics. In practice, that will most often be Choice-based Conjoint. Choice-based Conjoint offers respondents a series of choice sets, generally two to five alternative products. Respondents can pick any of the available alternatives or even elect not to buy, if none of the alternatives in that choice set are sufficiently attractive. This format closely mimics buying environments in markets with competition.

Ratings-based Conjoint involves monadically rating individual product alternatives or pairwise rating two product alternatives simultaneously. No-buy options are not easily accommodated in Ratings-based Conjoint. Ratings-based Conjoint may be more appropriate for non-competitive markets, such as oligopolies, monopolies or emerging categories.

Hybrid techniques, approaches which combine self-explicated scaling with either Ratings-based Conjoint or Choice-based Conjoint, are generally most appropriate when a large number of attributes must be included. ACA^[1] is the best-known and most widely used example of a hybrid technique.

Both Ratings-based Conjoint and Choice-based Conjoint can be conducted as full-profile or partial-profile studies. Full-profile tasks involve one level from every attribute in the study. If there are six attributes in your full-profile study, then each product alternative will have six attribute levels which define it.

Partial-profile tasks involve a subset of the total set of attributes. If there are six attributes in your partial-profile study, then each product alternative may have two or three attribute levels which define it.

Full-profile studies should ideally contain no more than six attributes. The critical issue is to define products that are simple enough to be understood by respondents. If your attributes are extremely complex and unfamiliar, perhaps six is too many. If your attributes are extremely simple and familiar, perhaps you may be able to include more than six.

Partial-profile designs can include up to 50 or more attributes. Partial-profile designs, a relatively recent development in Conjoint Analysis, typically compete with hybrid designs when a large number of attributes needs to be included.

Full-profile designs are generally preferred over partial profile designs if the number of attributes is sufficiently small because full-profile designs can accommodate interaction terms more easily, require less sample and are more familiar to most market researchers. Full-profile designs are generally preferred over hybrid designs if the number of attributes is sufficiently small because hybrid designs usually cannot accommodate interaction terms and are considered to employ a less natural question format.

A potential concern for any approach that accommodates a large number of attributes is Attribute Additivity (AA). Seldom mentioned in the literature, AA is the phenomenon where a large number of less important attributes may overwhelm one or two extremely important ones, due to sheer numbers. For example, a feature rich product may have more total utility than a low-priced one simply because all of the small utility weights of the various product features, when summed, exceed the utility weight of the price attribute. There is currently no consensual “right” way to address this problem. One possible approach is to, on an individual level, limit the number of attributes included in model simulations to the six most important. This is consistent with the rationale for limiting the number of attributes in a Conjoint task to six.

Attributes and Levels

Once market research objectives are clearly defined, attributes and levels must be specified in such a way that the subsequent analysis can address the objectives. If one objective is to understand the impact of the introduction of a new brand into your category, for example, it is essential that brand be an attribute in your study and the new brand be a level within the brand attribute.

There are two attribute-related issues that you must be aware of which continue to be problematic:

Number of levels effect (NOL)
Attribute range (AR)

NOL is the phenomenon that attribute importance is affected by the number of levels specified in the design. For example, if Price has two levels, $6 and $12, in one study and Price has four levels, $6, $8, $10, and $12, in another study that is exactly the same as the first (except for the Price levels), Price in the second study will be more important than it was in the first. Other than attempting to keep the number of levels of all attributes as close to one another as is practical, there is no known solution to this problem. ACA, however, does suffer substantially less from NOL than other techniques.

Similarly, attribute range also affects attribute importance. If, in the second study above, Price only had two levels, but those levels were $6 and $24, Price would again show more importance in the second study. The best we can do here is to define the minimum range of attribute levels necessary to realistically address the research objectives for each attribute in the study.

Experimental Design, Conjoint Tasks and Sample Size

Conjoint studies, with the notable exception of ACA, require an experimental design to determine the appropriate set of product combinations for testing. Commercial software today offers powerful flexibility in study design and can be surprisingly easy to use. Often, design software provides diagnostic information with which the researcher can evaluate the design. However, to insure your design is viable, designs of any complexity should be tested with synthetic (or other) data prior to field.

One design issue to note involves attribute specification. Numerical attributes, such as price, can be defined as part-worth attributes or vector attributes. If defined as a part-worth attribute, each level within price would receive its own utility weight. If defined as a vector attribute, one utility weight would be calculated for the attribute as a whole and would then be multiplied by each level value to determine the utility weight by level. Part-worth attributes require more information to estimate but vector attributes assume linearity. The best approach is to define all attributes as part-worth attributes so that you are free to model non-linear relationships. Price, for example, is often non-linear.

There are three types of Conjoint questions that should be included in any Conjoint exercise:

Warm-up tasks
Conjoint tasks
Holdout tasks

Studies have shown that respondents take a while to “get it.” Their responses do not stabilize until they’ve done a few tasks. Two to four warm-up tasks should be included at the beginning of the Conjoint exercise, to educate and familiarize the respondent to the exercise at hand.

As an added safeguard, task order should be randomized whenever possible.

Holdout tasks are tasks that will not be included in the utility estimation process. They are “held out” of the analysis and used to validate the model after utility weights have been estimated. Even if your study is a Ratings-based Conjoint study, your holdout tasks should be choice-based to make model validation more meaningful.

As a practical matter, it is often the case that clients have very specific scenarios that they are interested in testing. These scenarios can be specified in the holdout tasks, with no compromise to the study design. The holdout tasks can then serve the dual purposes of validating the model and providing “hard” data that some clients will find more credible than model simulations.

Another practical suggestion is that holdout tasks should be designed so that responses are not flat across alternatives. This will make validating the model easier (see model validation and tuning below).

For Choice-based Conjoint, studies have shown that as many as 20 or more tasks can be given to respondents without degradation of data quality. Of course, that number is largely dependent on the number of attributes displayed, the familiarity of respondents with the category and terms, the level of involvement the respondent has with the category, the length of the questionnaire prior to the Conjoint section and numerous other factors.

If you want a Conjoint study that works, be brief. This is a surprisingly difficult standard to meet. Most choice-based studies that I have designed have worked well with as few as 10 tasks. Add in two warm-up tasks and two holdout tasks and you’re already up to 14, at a minimum.

Sample size is another important question with no clear answer. There is little literature on the impact of sample size on Conjoint model error but current evidence suggests that models can be reliably estimated with samples as low as 75, regardless of type of Conjoint technique employed. However, keep in mind that 75 is the minimum size of any analytic cell you might want to examine. Thus, if you had a market with five regions and you wished to model each region separately, you would need a sample of 375 (5 times 75). If you wanted to model males and females separately within each region, your minimum sample size would be twice that, or 750.

Although numerous technical pitfalls exist, the most common error in commercial Conjoint studies is probably asking respondents questions they are unable to answer accurately. If respondents do not understand terms and concepts, if they are confused by product descriptions that are too complex and lengthy or if they become disinterested or tired due to questionnaire length, your analysis will suffer.

As with all survey questions, it’s critical to ask questions your respondents are capable of answering. To make them capable, be sure that all attributes and levels are clearly defined prior to the Conjoint exercise. Often, a glossary of terms reviewed by the respondent prior to the Conjoint exercise and available as a reference throughout the exercise can be very helpful. Visually organize the Conjoint tasks to assist the respondent in quickly understanding the choices before him or her. Do not include so many attributes in each product alternative that only a chess champion could keep them straight. Always pretest Conjoint studies to confirm that the study you have so carefully designed is implementable. Statistical diagnostics will not tell you if humans can or cannot comprehend the questions you are about to put before them.

There is an essential size problem that all designers of Conjoint studies face. If the model to be estimated is fairly complex, it will require a great deal of information to estimate it, particularly at the disaggregate (individual) level. Experienced researchers know that this information can be extracted in a variety of ways:

Number of Conjoint tasks
Complexity of Conjoint tasks
Sample size
Experimental design
Utility estimation technique

We’ve already discussed the first four bullets but I’d like to briefly revisit the fourth one, experimental design. Commercially available design software is extremely powerful. But to use its power completely, you must also employ either a computer or the Web. Computer assisted interviews and Web-based interviews both allow each respondent to receive a set of Conjoint tasks unique to him or her, a feature generally impractical with paper and pencil studies. This facility greatly enhances the design efficiency of your study. Thus, using individualized interviews may allow you to use fewer tasks, have smaller sample size or perhaps simply complete a difficult and ambitious study successfully.

Utility Estimation and Models

Once data have been collected, the researcher is faced with another set of options and choices. Historically, Ratings-based Conjoint utilities have been estimated using OLS regression at the individual respondent level and Choice-based Conjoint utilities have been estimated using logit regression at the aggregate (total sample) level. Hierarchical Bayes (HB) modeling, introduced by Allenby, Arora and Ginter in 1995, has changed all that.

In general, disaggregate models are preferred over aggregate models. There are several reasons for this but the primary reason is that aggregate models don’t capture heterogeneity. As a simple illustration, consider a sample given choices between Coke and Pepsi. If half the sample loves Coke and hates Pepsi and the other half loves Pepsi and hates Coke, an aggregate model will show the total sample indifferent to brand. The Coke lovers and the Pepsi lovers cancel each other out. In a disaggregate model, brand will appear to be extremely important since all the Coke lovers will exhibit large utilities for Coke and all the Pepsi lovers will exhibit large utilities for Pepsi.

Choice-based Conjoint has historically been preferred over Ratings-based Conjoint because of its more natural question format, its ability to handle interaction terms and its ability to easily model the no-buy option. Its biggest drawback has been its inability to generate disaggregate models. HB allows for individual utilities estimation of Choice-based Conjoint data.

It has also been shown that HB estimates are superior to OLS regression estimates for Ratings-based Conjoint.

The primary drawback to HB estimation is that it is computationally intensive. Computation time can run from 30 minutes to 30 hours, depending on the sample size, the number of parameters being estimated and the power of the computer running the calculations. However, in general, the advantages of HB far outweigh this one disadvantage.

Extremely current research (February 2002 JMR) suggests that finite mixture models can estimate individual level choice utilities as well as HB. However, HB models have proven to be extremely robust and recently introduced user-friendly HB software eliminates any excuse for not using this breakthrough technique.

In some software packages, constraints can be included in the estimation routine which force certain attribute levels to always be the same or higher than other levels. For example, you may feel strongly that consumers truly would prefer to buy your product at a lower price. Therefore, you know a priori that the utility of the lowest price level should be greater than or equal to every higher price level. You can constrain your utility estimates to conform to this relationship. It has been shown that constraints tend to improve holdout prediction accuracy. However, a word of caution about constraints. The goal of most research is to learn how the market works, not to confirm what we already know about how the market works. Sometimes surprises are not bad research, they are insight. I prefer letting the data run free as often as possible. If necessary, the data can always be rerun using constraints.

Once utilities have been estimated, preferably at the individual level using HB, simulations can be run. There are five methods of simulation:

First Choice
Share of Preference
Share of Preference with Correction
Purchase Probability
Randomized First Choice (RFC) ^[2]

First Choice models are only available for disaggregate data and follow the maximum utility rule. That is, if three products are included in a scenario, each individual is assumed to pick the product for which his or her total utility is highest. This approach often suffers from volatility, i.e., minor changes in product configurations can result in unrealistically large shifts in preference shares.

Share of Preference models can be run against either aggregate or disaggregate data. These models distribute preference proportional to each product’s total utility. If, for example, in an aggregate model of two products, product A had total utility of 10 and product B had total utility of 20, product A would have 33% share of preference (10/(10+20)) and product B would have 67% share of preference (20/(10+20)).

Share of Preference models are less volatile than First Choice models but are subject to the IIA bias (Irrelevance of Independent Alternatives), a.k.a., red bus-blue bus problem. If two products are very similar, such as a red bus and a blue bus in a transportation alternatives study, their net share is over-estimated. In effect, there is double counting. Share of Preference models with correction are an attempt to adjust for the IIA bias. First Choice models are not subject to IIA bias.

The best approach is a recently developed technique named Randomized First Choice. Initially conceived by Bryan Orme (1998) and further developed by Orme, Huber and Miller (1999), RFC exhibits much less IIA bias than Share of Preference models and is less volatile that First Choice models. It has the additional advantage of offering several ways to tune the model for increased accuracy.

Regardless of the simulation technique selected, the model should be validated and tuned. Market scenarios should be defined and simulated that replicate the choices available in each holdout task. The model predictions of choices should be compared to the actual choices made by respondents.

For disaggregate models, there are two measures of model accuracy, hit rates and Mean Absolute Error (MAE). For aggregate models, only MAE is appropriate.

Hit rates are calculated by comparing the choice predicted for an individual respondent by the model (using the maximum utility rule) to the actual choice made by the respondent. When the model correctly predicts the respondent’s choice, it is counted as a hit. The total number of hits divided by total sample size equals the hit rate.

MAE is defined to be the sum of the differences between predicted share of preference and actual share of preference for all products in a holdout task divided by the number of products in the holdout task.

Initial hit rates and MAE (prior to model tuning) can be compared to hit rates and MAE from a random model to give the researcher a feel for how successfully the model has been able to capture and model respondent choices.

For example, if there are four choices available in a holdout task, say three products and no-buy, a random model could be expected to have a hit rate of 25% (1/4). If your initial model has a hit rate of 65%, you can feel somewhat assured that your model performs better than random.

Similarly, MAEs for a random model can be calculated by subtracting 25% from the percent of respondents who picked each of the four options, summing the absolute value of the differences and dividing by four. If your random model has an MAE of 12 and your model has an MAE of 4, again you can feel somewhat reassured.

It is for this analysis that you want to construct holdout tasks that are likely to have unequal preference across alternatives. In general, hit rates above 60% and MAEs below 5 points will reflect a reasonably good fitting model.

Once initial hit rate and MAE calculations have been examined, model tuning may be appropriate. Share of Preference and RFC models can be tuned to maximize hit rates and minimize MAE. Tuning the model will increase its accuracy and, therefore, managerial utility.

In some rare and fortuitous instances, actual market data can be used to tune the model, rather than holdout tasks.

Summary

Although there are so many exceptions that the word “right” loses much of its meaning, I would generalize the “right” method of doing Conjoint Analysis as follows:

Choice-based Conjoint
Including warm-up and holdout tasks
Hierarchical Bayes for utility estimation
RFC for market simulations
Tuning the final simulator

In 1990, Batsell and Elmer wrote “The introduction in 1971 by Green and Rao of Conjoint Analysis marked a significant step in the evolution of marketing research from art to science.” I agree. With a heritage in both psychometrics and econometrics, no marketing research technique comes close to offering either the managerial power or the economic efficiency of Conjoint Analysis.

But Conjoint Analysis is an ever increasingly complex family of techniques. Many difficult decisions await the conscientious researcher, often with no clear cut, “right” answer. Conjoint Analysis has pushed marketing research much closer to a science. But it is still an art.

The diligent researcher will be aware of both the possible pitfalls and the available antidotes. The reward far outweighs the effort.

Recommended Readings

Green, Paul E. and V. Srinivasan (1990), “Conjoint Analysis in Marketing, New Developments with Implications for Research and Practice,” Journal of Marketing, 54, (October), 3-19.

Huber, J., and Orme, B. (2000) “Improving the Value of Conjoint Simulations,” Marketing Research, Vol. 12, No.4.

Johnson, Richard M. (2000), “Understanding HB: An Intuitive Approach,” Sawtooth Software, Inc., Sequim, WA.

A User's Guide to Conjoint Analysis

About the Author