Introduction
In my previous post, I explored how we can use NotebookLM to create personal stories. While I covered the topic of the professional story using your resume, you can use the storyteller for any storytelling, like telling a story for a leadership question or modifying your story for a given situation. NotebookLM's capabilities are not limited to personal storytelling. It is a great learning tool for creating summaries and study plans for complex subjects.
For NotebookLM to perform all these tasks seamlessly, it needs a product design that can handle various tasks uniformly. This means the product needs to create a meta-structure for these problems.
Meta Structure to generate stories/summaries
All audio stories that NotebookLM produces have three features:
Summary of the input
Context around the summary of the input
A conversational piece (like a podcast) of the summary in the context
Other features, such as 1) Table of Contents, 2)FAQ, etc., are various summary versions. Since the audio output is the most used feature, we will focus on only that feature in this blog.
Technical Design for Audio Output
Now that we have the meta structure that we need to generate from the input, we can draw the technical design (using the information from the podcast). as follows:
If there is one thing you want to take from this design, it is that at least 3 GenAI models are involved in this product.
Gemini Pro 1.5 - This model takes various inputs and generates the summaries
Content Studio (probably a more advanced version of Gemini Pro): This LLM uses broader knowledge and generates the context for the summary. This is one of the most critical steps in the process since this model needs to create context and the text for audio output and ensure no hallucination. NotebookLM owes most of its success to this studio's work.
Text2Speech (probably a lighter version of Gemini)—Google has quite a few models for this step, and which model is being used to achieve this is anyone’s guess. Since this step is completed in batch mode, this model does not need to be very sophisticated except for the voice quality, which needs to be human-like and natural-sounding.
Observations / Limitations
Single-step uniflow: Since the NotebookLM only generates one summary, and that too in a standard form, it leads me to believe that the whole flow is unidirectional from start to finish. Quality agents might be present inside each step, but it does not go back once the flow moves from one step to the next.
Limited summary lengths: Most of the summaries are between 5-15 minutes, even when you have huge documents, making me believe that there is a limitation on the output window of the Content studio
Limited output types: You only get text outside the audio summary. Even documents that can be enriched through images use simple text. This points to another limitation of the Content Studio.
Lack of balanced view: The professional summary NotebookLM generated for my resume was very positive. This positive summary is excellent for telling stories. Still, various views will be needed for other summary actions (e.g., what the team achieved in a quarter or resume review for the hiring manager).
Only limited voices and formats: The output has only two voices and types; this lack of variety can create monotonicity for the listeners.
Limited inputs: NotebookLM only takes PDFs, text (website, files), and video/audio from YouTube. This means that there is an encoding limitation for Gemini Pro 1.5
Conclusion
NotebookLM in beta form is a significant first step for a product with incredible use cases. The product's meta-structure is defined so that it can be extended for multiple use cases. In the next installment of the discovery of NotebookLM, I will discuss what technical improvements the product should make to make it more flexible, extensible, and usable. Happy experimenting
Thanks for this great post! Really liked how you broke down the technical parts and the areas where it can improve. I’m excited to see how NotebookLM develops, and I can’t wait to read your next article. Keep up the good work!