top of page

Unsupervised brand categorization

100s –1000s of brands and companies exist belonging to various industries and companies. Today these are manually categorized and grouped under various industry classification schemes like NAICS, SIC, NIC, etc. These schemes are pre-determined and don’t reflect the changing nature of a business or brand.


Can we create a more dynamic categorization basis the journey of a brand/company using a “Code+AI” driven machine?


Here is a demonstration of a ground zero approach towards that. The key is to do it without expecting users providing for keywords or taxonomy in an “unsupervised” machine learning way. We ran the machine on approximately 500 brand names. For each brand, their journey, history, and details are programmatically picked from Wikipedia. Wikipedia is an internet ground truth of facts.


This content is then vectorized. Vectoring means understanding every word in a text document based on the context (surrounding words & sentences) in which they are used. This would result in the journey of a brand being encapsulated into a mathematical vector. It then becomes conducive to grouping. Below is the vector space represented as a network of these brands. (Note: Brand names are not visible here but zoomed out sections are shown later)



Each node is a brand name and the edge represents the degree of intersection between the “journey/ history vectors”. Think of it as what is common in the journeys of two brands. The comparison is done for each of the 500 brands with the other.


Brands with similar history/ content/ journey reside closer. The below figure shows a zoomed out part of the vector space. One can see the machine automatically recognizes automotive brands.



These vectors are then grouped mathematically into numerous categories akin to industry classification. Below figure shows 27 categories


Can we automatically name these categories?


The intersection of journey vector spaces within a category minus the intersection of journey vector spaces across categories will provide unique category level insights. Basically it will tell what factors are common within a cluster but different from others. Below are a few examples generated automatically by the machine.




It categorizes certain brands in line with traditional classification like diaper brands but interestingly provides new insights like Coca-Cola in Category 10 which are drink brands with major global sponsorships. A different way to look at where brands stack up.


The level of categorization can go deeper especially where a bigger mix of brands is found and more insights are needed. Here is an example of the sub-categorization of a specific category that has various kinds of device maker brands.



Apple owns one of the biggest secure mobile consumer networks in the world. With traditional approaches, we would never think about Apple as a brand comparing with Hauwei & Broadcom who play in a similar space.


How is this categorization dynamic?


One can choose different parts of the content to arrive at different categories. For example, only the “History” section of Wikipedia to compare historical journeys versus the “Product” section to categorize based on products. The other way would be to change the content source itself. How are brands categorized from a customer standpoint, for example, would require the content to be customer feedback. Below is a sample mobile phone displayed as a brand network using customer feedback which can be further categorized into groups as illustrated in this article.



The exercise can be repeated frequently to keep the categorizations updated and reflecting any changes in brand perception or company business.


Traditional manual approaches to classification do not offer the dynamism that AI machine created categorization can offer. Are we in for a new era of classification/ categorization in company analysis, earnings analysis, brand mapping, patent analytics, research, etc?


20 views0 comments
bottom of page