Contact Us

Comment on Kilroy and Williams

This paper, to my mind, epitomizes the Sawtooth Software Conference:  it has real-world applicability, it is thorough and rigorous in its analysis and it is presented in such a straightforward and clear manner that it is accessible to almost any reader.  It is a useful and interesting paper that raises many important issues.

The paper discusses two main topics:

  • Sources of the ACA price effect
  • A proposed adjustment to eliminate the ACA price effect

I will make a few comments on both these topics.

Sources of the ACA Price Effect

It would be desirable to minimize, or ideally eliminate, the ACA price effect by removing as much of the source of the effect as possible before making any post hoc adjustments.

One source of the effect identified by Kilroy and Williams is attribute additivity. Due to the large number of attributes that may be included in an ACA study, it is possible for a number of attributes, each with fairly low utility, to, in sum, overwhelm a price attribute that has a fairly large utility. For example, a product with nine attributes each with level utility of .2 (and a price level utility of .2) will have greater total utility than a product with a price utility of 1.5 (and nine attributes with level utilities of 0).

Attribute additivity can be a serious problem that will affect any trade-off method that attempts to accommodate a large number of attributes. One approach to counteract this effect is to limit the number of attributes included in the calculation of total utility (in the model simulation stage) for each individual to that individual’s top six most important attributes. That is, if three products are being modeled simultaneously in a market share model and 10 attributes are included in product specification, for each product and each individual, include only those top six (out of the total 10) attributes in the calculation of total utility for that individual.

The rationale would be similar to that of limiting the number of attributes in a full-profile exercise to six: respondents cannot consider more than six attributes at a time when making a purchase decision. By limiting the number of attributes to six in the simulator, the attribute additivity problem would be diminished and the purchase decision process may be more accurately modeled.

Another source of the ACA price effect identified by Kilroy and Williams is attribute independence. Some attributes may interact with others, violating the main effects assumption of ACA. For example, in the importance ratings section of ACA, a respondent may be asked how important the color red is versus the color blue in the context of new cars. Their true opinions, however, may depend on what type of car the color is applied to. They may prefer red on a high-priced car (such as a sports car) and blue on a lower priced car (such as a family are).

This is an extremely serious problem for all trade-off methodologies that involve some form of direct questioning or self-explicated scaling, not just ACA. The larger question that should be raised is whether self-explicated scaling is appropriate for all types of attributes. Kilroy and Williams have identified a problem with attributes that are dependent on other attributes, i.e., interact with other attributes. But can we determine if there are other types of attributes that are also inappropriate for self-explicated scaling? Are there other, operationally convenient ways to characterize inappropriate attributes? This issue deserves additional attention in the literature and I am very happy that Kilroy and Williams have raised it here.

Kilroy and Williams also cite attribute framing as a potential source of the ACA price effect. Without knowing what attributes are coming next, a respondent might give the strongest importance rating to the one presented first. For example, if the price attribute follows any other attribute, i.e., is not the first attribute to be rated in the importance section, then it may be rated as only equally important to another attribute that the respondent, in reality, does not feel is as important as price.

A simple antidote to this problem would be to review all attributes with the respondent prior to conducting the importance rating exercise. I believe that in most commercial applications that this would be feasible with the possible exception of telephone surveys.

Proposed Adjustment to Eliminate the ACA Price Effect

Given an ACA price effect, the authors have developed a surprisingly straightforward method for recalibrating the price utility to more accurately reflect the magnitude of respondent price sensitivity.

Their approach is to adjust each respondent’s price utility so that predicted choice optimally matches a set of choice-based holdouts. They segment the sample population into three groups: those that need no price adjustment (that is, those whose predicted choices closely match their holdout choices), those that need some adjustment and those that need a lot of adjustment. They chose not to recalibrate price utility at the individual level due to the high incidence of reversals commonly found in conjoint data.

Reversals in conjoint data are commonplace, often involving up to 40% of the total sample. Like attribute independence, this issue is not unique to ACA. If there are reversals in the data set, it appears to me that they can be caused by one of only four factors:

  • The data accurately reflect the respondent’s values (and we simply are unwilling to understand or accept the possibility)
  • Within attribute level variance is large due to the respondent being confused or fatigued when answering the survey, causing unstable level utility estimates
  • Within attribute level variance is large due to limited sample size or experimental design issues, causing unstable level utility estimates
  • There is an anomaly in the utility estimation algorithm

Whatever the cause of the reversals, recalibrating at the segment level does not avoid the problem, it just ignores it. The reversal respondents are included in the three segments and receive the same adjustment as the other respondents. I have no simple solution to this significant problem. I suspect that a good percentage of what we typically call reversals is simply accurate reflections of human behavior. Humans are clearly and frequently irrational in their buying behavior (and in every other aspect of their lives as well). Rather than attempt to force respondents to be rational just so that our models perform better, I suggest we look for ways to better model their sometimes irrational behavior. I also suspect that much of reversal data is due to confused or tired respondents. Making the interview as simple and brief as possible may help minimize reversals. I would like to see more research into the cause of reversals and possible ways to handle reversals in conjoint data sets without constraining respondent answers to conform to our assumptions.

The choice-based holdouts on which the price utility recalibration is based varied five of the 10 attributes and held five constant. The authors correctly point out that the recalibration scalars may be affected by the number of attributes included in the holdouts. For example, if only two attributes are included in the holdouts, the price recalibration scalar will most likely be smaller than if eight attributes are included because the attribute additivity problem will be greater with eight attributes than with two.

This appears to me to be a very serious problem with their proposed approach because the ACA simulator is designed to include all 10 attributes. Thus, one could recalibrate price to optimally predict holdouts and still underestimate price in the simulator. One possible solution would be to include all attributes in the holdout exercise but more often than not there would be too many attributes in the study to make this approach practical.

The suggestion made earlier for addressing the attribute additivity problem appears to me to also be a potential solution to the number of attribute holdouts problem as well. If the number of attributes included in the holdout exercise is six and if the simulator selects the top six attributes per person to calculate total utility, the recalibration scalar will be near the appropriate magnitude as long as the net relative importance of the six attributes in the holdout exercise is relatively similar in magnitude to the net relative importance of the six attributes selected in the simulator.

The authors make a very strong case for selecting price as the most important attribute to recalibrate. I would, in general, strongly agree. However, I suspect that there may be other attributes that would also be excellent candidates for recalibration, depending on the issues at hand. Although brand was not included in the Kilroy and Williams study, it is commonly included in conjoint studies and would be a strong candidate for recalibration because of the obvious lack of attribute independence coupled with its typically high degree of importance. It would be interesting to explore possible ways to simultaneously recalibrate two or more attributes, using the Kilroy and Williams approach.

It would also be interesting if guidelines could be developed to assist the practitioner in determining:

  • The ideal number of holdout tasks needed for recalibration
  • Which product configurations to include in the holdout tasks
  • How many products to include in the holdout tasks


The choice-based holdout tasks have appeal for numerous reasons:

  • Fills client need for realistic alternatives
  • Increases model credibility in client’s eyes
  • Powerful presentation/communications tool to unsophisticated audience
  • Cheaper and simpler than dual conjoint

The Kilroy and Williams method for removing the ACA price effect is:

  • A useful technique
  • A sound approach
  • Easy to understand
  • Relatively easy to apply

Overall, I found this paper very useful and interesting. The authors raise many important issues and provide a practical solution to a significant shortcoming in many ACA models.