Cohort analysis

From Wikipedia the free encyclopedia

Cohort analysis is a kind of behavioral analytics that breaks the data in a data set into related groups before analysis. These groups, or cohorts, usually share common characteristics or experiences within a defined time-span.[1][2] Cohort analysis allows a company to "see patterns clearly across the life-cycle of a customer (or user), rather than slicing across all customers blindly without accounting for the natural cycle that a customer undergoes."[3] By seeing these patterns of time, a company can adapt and tailor its service to those specific cohorts. While cohort analysis is sometimes associated with a cohort study, they are different and should not be viewed as one and the same. Cohort analysis is specifically the analysis of cohorts in regards to big data and business analytics, while in cohort study, data is broken down into similar groups.


The goal of business analytics is to analyze and present actionable information.[4] Large, undifferentiated datasets may include a variety of user types and time periods. Cohort analysis analyzes the users of each cohort separately. In cohort analysis, "each new group [cohort] provides the opportunity to start with a fresh set of users,"[5] allowing the company to look at only the data that is relevant to the current query and act on it.

For example, in eCommerce, customers who signed up in the last two weeks and who made a purchase may constitute a cohort. For software, users who signed up after a certain upgrade, or who use certain features of the platform, may constitute a cohort.

An example of cohort analysis of gamers on a certain platform: Expert gamers, cohort 1, will care more about advanced features and lag time compared to new sign-ups, cohort 2. With these two cohorts determined, and the analysis run, the gaming company would be presented with a visual representation of the data specific to the two cohorts. It could then see that a slight lag in load times has been translating into a significant loss of revenue from advanced gamers, while new sign-ups have not even noticed the lag. Had the company simply looked at its overall revenue reports for all customers, it would not have been able to see the differences between these two cohorts. Cohort analysis allows a company to pick up on patterns and trends and make the changes necessary to keep both advanced and new gamers happy.[citation needed]

Deep actionable cohort analytics[edit]

"An actionable metric is one that ties specific and repeatable actions to observed results [like user registration, or checkout]. The opposite of actionable metrics are vanity metrics (like web hits or number of downloads) which only serve to document the current state of the product but offer no insight into how we got here or what to do next."[6] Without actionable analytics, information may not have any practical application; the information may simply be a non-actionable vanity metric. While it is useful for a company to know how many people are on their site, that metric is useless on its own. For it to be actionable it needs to relate a "repeatable action to [an] observed result".[6]

Performing cohort analysis[edit]

Cohort analysis has four main stages:[7]

  • Determine what question you want to answer. The point of the analysis is to come up with actionable information on which to act in order to improve business, product, user experience, turnover, etc. To ensure that happens, it is important that the right question is asked. In the gaming example above, the company was unsure why they were losing revenue as lag time increased, despite the fact that users were still signing up and playing games.
  • Define the metrics that will be able to help you answer the question. A proper cohort analysis requires the identification of an event, such as a user checking out, and specific properties, like how much the user paid. The gaming example measured a customer's willingness to buy gaming credits based on how much lag time there was on the site.
  • Define the specific cohorts that are relevant. In creating a cohort, one must either analyze all the users and target them or perform attribute contribution in order to find the relevant differences between each of them, ultimately to discover and explain their behavior as a specific cohort. The above example splits users into "basic" and "advanced" users as each group differs in actions, pricing structure sensitivities, and usage levels.
  • Perform the cohort analysis. The analysis above was done using data visualization which allowed the gaming company to realize that their revenues were falling because their higher-paying advanced users were not using the system as the lag time increased. Since the advanced users were such a large portion of the company's revenue, the additional basic user signups were not covering the financial losses from losing the advanced users. In order to fix this, the company improved their lag times and began catering more to their advanced users.
  • Test results. Make sure the results make sense.

See also[edit]


  1. ^ Behrooz Omidvar-Tehrani; Sihem Amer-Yahia; Laks VS Lakshmanan. Cohort representation and exploration. Turin, Italy: IEEE Conference on Data Science and Advanced Analytics (DSAA) 2018.
  2. ^ Dawei Jiang; Qingchao Cai; Gang Chen; H. V. Jagadish; Beng Chin Ooi; Kian-Lee Tan; Anthony K. H. Tung. Cohort Query Processing (PDF). Proceedings of the VLDB Endowment, Volume 10, Number 1, October 2016.
  3. ^ Alistair Croll; Benjamin Yoskovitz (15 April 2013). Lean Analytics: Use Data to Build a Better Startup Faster. Sebastopol, CA: O'Reilly. ISBN 978-1449335670.
  4. ^ Aukeman, Mark. "Cohort Analysis — understanding your customers".
  5. ^ Balogh, Jonathon (24 March 2012). "Introduction to Cohort Analysis for Startups".
  6. ^ a b Maurya, Ash (14 July 2010). "3 Rules to Actionable Metrics in a Lean Startup".
  7. ^ James Torio; Rishabh Dayal (12 February 2013). "Using Cohort Analysis to Optimize Customer Experience". UX Magazine.

Further reading[edit]