An adult in the USA on an average is spending approximately 6 hours of their daily life on the internet (Source: KPCB Internet Trends 2018).
We are constantly looking on the internet. What are we finding?
400+ hours of video are uploaded to Youtube every minute
2 million+ Apps exist on iOS
65 billion WhatsApp messages are exchanged every day by approximately 1 billion daily active users
Every minute on Facebook: 510,000 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded (Source: The Social Skinny)
And you must be thinking this is mass social media. If I am a core sciences researcher for example then this doesn’t affect me? You are not spared either — 1.4 million research papers have been uploaded to arXiv.org and the current run rate of uploads is 12,595 per month
Add to all this the 12 billion sensors out there that are churning data. 47 Zeta Bytes of information is expected to be generated by 2020. Just to make it clear this is 47 Zeta bytes below
47 000 000 000 000 000 000 000 Bytes
Only 16% of this is expected to be structured & searchable. A lot, of course, will be fake information as we struggle with today.
The zeta bytes will hit us and we wouldn’t know how to manage. What will you find?
Let’s take the simplest of things. Deciding to go to a restaurant. Most restaurants these days have 3000+ reviews and 1000s of Instagram images. 48% of Yelp reviews are 5 stars and those can’t be differentiated. 68% of reviews have 4+ stars (Source: https://www.yelp.com/factsheet). And 72% will have a recommended rating. Clutter.
Nothing that helps me decide unless I wade through the comments.
The Internet and data journey that we are traversing can’t be reversed. Our way is only forward which will have zeta bytes of more data getting added annually. We need to de-clutter.
there was a way to summarize every news, every research topic, every restaurant journey (menu, comments), every movie perception, every technology etc
a simpler way to understand the gist and have a starting point
a way to understand the flow of information with the ability to “click” into the details when needed
a way to visually see this without searching 1000s of articles/ papers
a path to de-clutter
This issue has been bugging me and here is my attempt to de-clutter. Below chart summarizes a restaurant’s journey based on 4000+ comments left by people.
Fig 1: Restaurant journey flow
Data holds the truth and a machine can crunch this mountain of data. Teaching it how to do this using AI techniques brings in intelligence to de-clutter.
There are 56 menu items (food) in the restaurant and Figure 1 shows items that have consistently mattered are Clam Chowder, crab cakes, and mixed grill salmon shrimp. The flow is across time and clearly shows that service is great and so is the view (of the harbor). The AI system also finds out that the restaurant is on Pier 39 in San Francisco. It’s touristy as per the machine. Now it’s your decision. And you avoided the 4000+ comments.
One can get details about the service including aspects like the waitresses are very appreciated and that there are two specific waiters called Samir & Tyler who have generated a lot of positivism over time. Wouldn’t you want to know these subtle aspects before going into a restaurant without all the hard work of reading through the reviews?
What you see is called a Sankey chart that mechanical engineers are familiar with. It is used to depict flows and was discovered to depict energy flow in a steam engine. Have used the same to depict information flows in today’s age. The principle of Sankey diagrams is that the width of the arrows is shown proportional to the flow quantity which is maintained here. Visually one can see what information matters and since it is plotted over time one can see what information consistently matters over time. One can also see which topics grow, which topics merge/ split. One can also observe that certain topics didn't exist and start at a point in time
The topics itself are generated using an unsupervised machine learning system. The inputs to such an AI machine is text (like restaurant comments in this case) sorted by time and number of topics to generate (20 topics in Figure 1). Everything else the AI system auto-generates including the Sankey chart. There is a cool Python library to make Sankey diagrams so the whole process of learning and plotting is unsupervised and automatic with no manual intervention..
Can we do the same for movies? Ran it on one of my all-time favorites “The Dark Knight”. 3700 comments from the time of its release were fed to the AI machine. Below (Figure 2) is what the AI machine generates.
Figure 2: The Dark Knight Movie
The machine understands that this is a superhero movie made in Gotham city. The movie has consistently got recommended and has excellent reviews till this week (June 25th, 2018) which itself speaks volumes. A striking insight is that a lot of people have consistently given 10 over 10 rating since last 10 years. People still watch and comment positively and this information is better than any specific star rating number. A must watch. One can see nuances like Christoper Nolan — Hans Zimmer combination coming out which alludes to the soundtrack. This suddenly makes it possible to search based on such aspects. All above data is in a vector space so a query like “show me a movie that has similar director-musician combination soundtrack” is possible. Joker character gets a lot of kudos. The machine also recognizes that there were sequels and one can see from the flow that The Dark Knight trilogy comes up. AI learns the existence of follow-on movies by itself.
Since movie information unlike that of a restaurant is a lot more time-bound especially around the date of release one can see the initial topic flow more closely. Heath Ledger got an Oscar the machine says which he did for the Joker Character identified earlier. Machine recommends seeing in IMAX. It learns that it is a comic book genre crime adaptation and compares with Spiderman, Iron Man and Hulk. AI even recommends Christopher Nolan-Memento to look out for.
The machine summarizes, recommends, pinpoints nuances, and makes the data searchable in completely different ways.
As you would realize the AI machine doesn’t care about the nature of the text. Let’s see the below chart. A big topic of discussion across the world is about migration (humans moving to different countries). Where do you start with such a topic? Why not ask the AI system to plot. Below (Figure 3) depicts what the machine sees happening with respect to migration.
Figure 3: Migration
The machine crunches 3000 newspaper articles over the last 2.5 years and tells us that the following seems to be happening in the world today with reference to human migration: Unaccompanied children issue(UK earlier and now USA), Libya-Italy (the boat crisis), Syria crisis (with Greece, Turkey etc), Syrian refugee resettlement, its impact on Chancellor Angela Merkel in Germany, Calais camp in France, UK Brexit impact on EU citizen migration, Australia detention center in Manus & Naaru and of-course the multiple President Trump bans. Very crisp view of how migration is affecting the world today and top issues to look for, auto-generated. One can delve into details by going backward in the topic flow to find subtopics like the Australia migrants issue where they introduced a citizen test for English. What can also be observed is how issues migrate from one to another. For example, David Cameron brake on EU migrants moves into a broader EU-UK-Brexit issue on how it would impact migrants.
Some more examples. From a multi-country single topic let’s look at a single country (India) technology landscape. The machine generates the below chart (Figure 4) when fed with 6 months of data.
Figure 4: Technology News
Clearly, fundraising has increased. Main topics are around e-commerce, food delivery, cab aggregators like Uber and electric vehicles. One can see Facebook — Cambridge Analytica issue cropping up. In addition, local aspects like Aadhaar and government taxes on angel investor income shows up. One can see the flow of information from the Myntra-Jabong deal (fashion commerce) into a broader eCommerce topic. People familiar with the country landscape can observe nuances like the Snapdeal-Softbank fundraise issue at board level which evolves into a broader fundraise topic. A thing to note is that the time frame of analysis can be changed (daily, monthly, annually) basis the volume of text available. The AI machine has learned what is happening in the technology landscape of India.
One can delve deeper into specific technologies. If one has to start research on a particular topic or invest in a particular technology or a journalist has to write an article on a technology aspect where does one start? Let’s pick the topic of Unmanned Air Vehicles (UAVs) and the AI machine generates the below chart (Figure 5).
Figure 5: UAVs
The machine auto-generates the current state of research in UAV including technical details
“Control algorithms” to maneuver and control a single or fleet of UAVs is key. It is amazing to see details like OLSR routing protocols being pointed to. A quick google search will tell you OLSR stands for Optimized Link State Routing Protocol and is an IP routing protocol optimized for mobile ad-hoc networks including various wireless ad-hoc networks. When multiple UAVs fly they do form an ad-hoc wireless network as their speeds and trajectories vary and will shift connectivity to the ground and other UAV stations. The machine educated me on this without crunching 100s of research papers
“Quadrotor control dynamics” continues to be a key research topic given various form factors of UAVs. Google tells us quadrotor is the set of rotors that you see on a UAV. These multiple rotors each experience different uncertain conditions when going let’s say through a storm and its control is a key area
UAVs fly in the air and it’s connection to the ground and each other is crucial. “UAV Network Network Connectivity” is a key research area especially given the spread of cellular networks. UAVs are considered “aerial” mobile users like we are “land” mobile users. The combination forms a future generation three-dimensional (3D) heterogeneous wireless network with coexisting aerial and ground users. Wow!!! machine pointed to this and a quick glance at the documents behind it let me know what is it all about
Related to this is the topic of “UAV trajectory energy power”. A quick glance of research paper abstracts tells me there is a trade-off between UAV function (ex. video streaming), it’s trajectory and the energy it consumes to be connected wirelessly with other UAVs and ground. The connectivity energy consumption versus trajectory trade-off is a major research topic for optimization
“Coverage users probability” points to research in the area of using multiple UAVs to solve a problem like for example in an emergency response scenario. Think of this as figuring out the minimum number of UAVs needed to guarantee a target coverage probability for a given geographical area. A great indicator of use cases that will come up given that research is a lead indicator of what will get productized
That’s a great summary of UAV cutting-edge topics that can benefit
a journalist wanting to write on the topic
a venture capitalist wanting to invest in a UAV startup
a researcher wanting to identify where to research
a multinational corporation wanting to figure out R&D/ new growth investments
Most importantly these decisions can be taken without human bias as machines give factual output. Since we crunch 1000s of articles/ papers and focus on generating the broader flows the approach is less prone to fake news. A few fake articles thrown in will not alter the trend.
Next in the de-cluttering approach:
Integrating the output to the auto-generation of reports (Auto-generate reports via coding & AI) and technology adoption maps (Auto-generate Technology Adoption Maps) will make this an end to end platform. Each topic flow can be the input to an auto-generated report. Example: UAV control algorithms can be used as an input to auto-generate a report on companies, funding, patents etc in that area. End to end automation using AI. My previous blogs explain auto-generation of reports using code & AI
The UI of Sankey Charts can be simplified and beautified… currently, it is a direct output of the code
By using thresholds the number of flows can be brought down for better readability
Instead of the chart the para I wrote explaining each can be directly provided
Other AI systems that process the same text can be clubbed with this output. For example, embed sentiment in the topic flow
2. Next for the world
Wanted to leave you with what the AI machine sees as a future for the world. Top trends auto-generated by AI & code (Figure 6)
Figure 6: Future Trends
The machine generates the below as future trends. Judge for yourself if AI gets a good sense of what is driving our future and if it can replace white collar jobs:
AI (Artificial Intelligence)
Energy-Wind & Solar power
Climate change & carbon
Cryptocurrency & blockchain
Space — Moon, Mars, NASA, and SpaceX
Life in Universe (Earth, Mars and other planets) including Dark Matter
Electric Vehicles with a highlight on Tesla
brain & patients activity research
Stem cells & cancer (including gene CRISPR)
Internet Net Neutrality
User Data (emphasis on Facebook)
Data holds the truth. Code+AI can unlock it.
In the world of AI… data matters, intelligence matters.
Note: Companies interested in implementing the above IP in their organization/ offerings can contact via Linkedin