Friday, March 28, 2025

How Gemini Deep Research Works

Google's Gemini ecosystem has expanded its capabilities with the introduction of Gemini Deep Research, a sophisticated feature designed to revolutionize how users conduct in-depth investigations online. Moving beyond the limitations of traditional search engines, Deep Research acts as a virtual research assistant, autonomously navigating the vast expanse of the internet to synthesize complex information into coherent and insightful reports. This AI-powered tool promises to significantly enhance research efficiency and provide valuable insights across diverse domains for professionals, researchers, and individuals seeking a deeper understanding of complex subjects.

Gemini Deep Research

Unpacking Gemini Deep Research: Your Personal AI Research Partner

Gemini Deep Research is integrated within the Gemini Apps, offering users a specialized feature for comprehensive and real-time research on virtually any topic. It operates as a personal AI research assistant, going beyond basic question-answering to automate web browsing, information analysis, and knowledge synthesis. The core objective is to significantly reduce the time and effort typically associated with in-depth research, empowering users to gain a thorough understanding of complex subjects much faster than with conventional methods.

Unlike traditional search methods that require users to manually navigate numerous tabs and piece together information, Deep Research streamlines this process autonomously. It navigates and analyzes potentially hundreds of websites, thoughtfully processes the gathered information, and generates insightful, multi-page reports. Many reports also offer an Audio Overview feature, enhancing accessibility by allowing users to stay informed while multitasking. This combination of autonomous research and accessible output formats sets Gemini Deep Research apart from standard chatbots.

The Mechanics of Deep Research: From Prompt to Insightful Report

Engaging with Gemini Deep Research is designed to be intuitive, accessible through the Gemini web or mobile app. The process begins with the user entering a clear and straightforward research prompt. The system understands natural language, eliminating the need for specialized prompting techniques.

Upon receiving a prompt, Gemini Deep Research generates a detailed research plan tailored to the specific topic. Importantly, users have the opportunity to review and modify this plan before the research begins, allowing for targeted investigation aligned with their specific objectives. Users can suggest alterations and provide additional instructions using natural language.

Once the plan is finalized, Deep Research autonomously searches and deeply browses the web for relevant and up-to-date information, potentially analyzing hundreds of websites. Transparency is maintained through options like "Sites browsed," which lists the utilized websites, and "Show thinking," which reveals the AI's steps.

A crucial aspect is the AI's ability to engage in iterative reasoning and thoughtful analysis of the gathered information. It continuously evaluates findings, identifies key themes and patterns, and employs multiple passes of self-critique to enhance the clarity, accuracy, and detail of the final report.

The culmination is the generation of comprehensive and customized research reports within minutes, depending on the topic's complexity. These reports often include an Audio Overview and can be easily exported to Google Docs, preserving formatting and citations. Clear citations and direct links to original sources are always included, ensuring transparency and facilitating easy verification.

Under the Hood: Powering Deep Research

Gemini Deep Research harnesses the power of Google's advanced Gemini models. Initially powered by Gemini 1.5 Pro, known for its ability to process large amounts of information, Deep Research was subsequently upgraded to the Gemini 2.0 Flash Thinking Experimental model. This "thinking model" enhances reasoning by breaking down complex problems into smaller steps, leading to more accurate and insightful responses.

At its core, Deep Research operates as an agentic system, autonomously breaking down complex problems into actionable steps based on a detailed, multi-step research plan. This planning is iterative, with the model constantly evaluating gathered information.

Given the long-running nature of research tasks involving numerous model calls, Google has developed a novel asynchronous task manager. This system maintains a shared state, enabling graceful error recovery without restarting the entire process and allowing users to return to results at their convenience.

To manage the extensive information processed during a research session, Deep Research leverages Gemini's large context window (up to 1 million tokens for Gemini Advanced users). This is complemented by Retrieval-Augmented Generation (RAG), allowing the system to effectively "remember" information learned during a session, becoming increasingly context-aware.

The Gemini models are trained on a massive and diverse multimodal and multilingual dataset. This includes web documents, code, images, audio, and video. Instruction tuning and human preference data ensure the models effectively follow complex instructions and align with human expectations for quality. Gemini 1.5 Pro utilizes a sparse Mixture-of-Experts (MoE) architecture for increased efficiency and scalability.

Diverse Applications Across Industries and Research

Gemini Deep Research offers a wide range of applications, demonstrating its versatility.

  • Business Intelligence and Market Analysis: Competitive analysis, due diligence, identifying market trends.
  • Academic and Scientific Research: Literature reviews, summarizing research papers, hypothesis generation.
  • Healthcare and Medical Research: Assisting in radiology reports, summarizing health information, answering clinical questions, analyzing medical images and genomic data.
  • Finance and Investment Analysis: Examining market capitalization, identifying investment opportunities, flagging potential risks, analyzing financial reports.
  • Education: Lesson planning, grant writing, creating assessment materials, supporting student research and understanding.

Real-world examples include planning home renovations, researching vehicles, analyzing business propositions, benchmarking marketing campaigns, analyzing economic downturns, researching product manufacturing, exploring interstellar travel possibilities, researching game trends, assisting in coding, and conducting biographical analysis. Industry-specific uses include accounting associations analyzing tax reforms, professional development identifying skill gaps, regulatory bodies assessing the impact of new regulations, and healthcare streamlining radiology reports and summarizing patient histories.

The utility of Deep Research is further enhanced by its integration with other Google tools like Google Docs and NotebookLM, facilitating editing, collaboration, and in-depth data analysis. The Audio Overview feature provides added accessibility.

Navigating the Competitive Landscape

Comparisons with other AI platforms highlight Gemini Deep Research's unique strengths.

  • Gemini Deep Research vs. ChatGPT: Gemini excels in research-intensive tasks and image analysis, focusing on verifiable facts. ChatGPT is noted for creative writing and contextual explanations. User experience preferences vary.
  • Gemini Deep Research vs. Grok: Grok is designed for real-time data analysis and IT operations, with strong integration with the X platform. Gemini offers broader research applications and handles diverse data types.
  • Gemini Deep Research vs. DeepSeek: DeepSeek is strong in generating structured and technically detailed responses, particularly for programming and technical content. Gemini has shown superior overall versatility and accuracy across a wider range of prompts and offers native multimodal support.

Table 1: Comparison of Gemini Deep Research with Other AI Platforms (a detailed side-by-side comparison across various features.)

Feature

Gemini Deep Research

ChatGPT Deep Research

Grok

DeepSeek

Multimodal Input

Yes (Text, Images, Audio, Video)

Yes (Text, Images, PDFs)

No (Primarily Text)

No (Primarily Text)

Real-time Search

Yes (Uses Google Search)

Yes (Uses Bing)

Yes (Real-time data analysis, integrates with X)

Yes

Citation Support

Yes (Inline and Works Cited)

Yes (Inline and Separate List)

Yes

Yes

Planning

Yes (User-Reviewable Plan)

Yes

No Explicit Planning Mentioned

No Explicit Planning Mentioned

Reasoning

Advanced (Iterative, Self-Critique)

Advanced

Strong (Focus on real-time data)

Strong (Technical Reasoning)

Strengths

Research-heavy tasks, Image Analysis, Google Ecosystem Integration

Creative Writing, Contextual Explanations, Structured Output

Real-time Data Analysis, Social Media Analysis, IT Operations

Structured Technical Responses, Coding, Cost-Effectiveness

Weaknesses

May lack diverse perspectives, Cannot bypass paywalls

Occasional Inaccuracies, Subscription Fee for Full Access

Less Depth in Some Areas, Limited Visuals

Primarily Text-Based, Limited Public Information

Key Use Cases

Business Intelligence, Academic Research, Healthcare, Finance, Education

Content Creation, Brainstorming, Academic Projects, Business Research

Marketing, Financial Planning, Social Media Management, IT Automation

Programming, Math, Scientific Research, Technical Documentation

Pricing (Approx.)

Free (Limited), Paid (with Gemini Advanced)

Paid (with ChatGPT Plus)

Paid (with Grok Premium+)

Free (for some models), Paid (for advanced models)


The Future Trajectory: Impact and Anticipated Enhancements

Gemini Deep Research has the potential to fundamentally transform research across various disciplines by automating information gathering, analysis, and synthesis, leading to significant increases in efficiency and productivity. It represents a step towards a future where AI actively collaborates in the research lifecycle.

Future developments aim to provide users with greater control over the browsing process and expand information sources beyond the open web. Continuous improvements in quality and efficiency are expected with the integration of newer Gemini models. Deeper integration with other Google applications will enable more personalized and context-aware responses. Features like Audio Overview and personalization based on search history indicate a trend towards a more integrated and user-centric research experience.

Democratizing In-Depth Analysis

Gemini Deep Research is a powerful and evolving tool offering a sophisticated approach to information retrieval and analysis. Its core capabilities in autonomous web searching, iterative reasoning, and comprehensive report generation have the potential to significantly enhance research efficiency across numerous industries and academic fields. By providing user control and delivering well-cited, synthesized information, Gemini Deep Research empowers users to gain deeper insights and make more informed decisions. As the technology advances, its role in the future of research and knowledge discovery is poised to become increasingly significant, democratizing access to in-depth analysis and accelerating the pace of innovation.

Wednesday, March 26, 2025

What is Vibe Coding? Exploring AI-Assisted Software Development

A new approach to software development, known as "vibe coding," has started to emerge that promises to make creating software easier. Through interaction with artificial intelligence (AI), people prompt AI systems using natural language to specify their desired code. People can be authors of software without needing a background in programming. While this new technique generates excitement, it also raises many important questions regarding code quality, security, and the future of software engineers. The term, "vibe coding," was coined in early 2025 by Andrej Karpathy, a co-founder of OpenAI. Vibe coding is when users describe the desired functionality of software to large language models (LLMs) trained for coding in natural language. This is fundamentally different from coding in a conventional sense or writing code manually, and it may open software development to a much wider audience as Karpathy jokes: "the hottest new programming language is English." Increasingly, LLMs are capable of understanding basic requests and following them closely in the creation of code. Karpathy admits that when he uses LLMs to code, the process consists of "see[ing] stuff, say[ing] stuff, run[ning] stuff, and copy:paste stuff, and so it mostly works."

vibe coding

Vibe coding conventions typically consist of a repeating cycle between the user and the AI coding assistant. Users will provide instructions or goals in regular language, which form a prompt. For example, a user may ask an AI to "Create a simple web page that will display the current weather of a city entered by the user".  The AI will then work to convert that into code, similar to a sophisticated "autocomplete". After initial code is provided, the user can review it, then provide constructive input to AI, explaining subsequent refines or fixes. In this way, a repetitive user/AI interaction can continue until a user is satisfied. Even for simple tasks like creating a Python function to sort a list of names in alphabetical order, a basic natural language prompt in AI can provide a functioning code, saving the author any manual typing in the process. Proponents of vibe coding cite various possible benefits. Vibe coding technology promotes increased speed and efficiency in software development by automating boilerplate code and repetitive tasks. Vibe coding also improves access to software creation for populations with diminishing knowledge of coding, aiding in access to software development.  Vibe coding can also speed the process of rapid prototyping and experimentation, leading to quicker user feedback loops for iterating and refining ideas. People in non-technical positions may also be enabled to produce prototypes and, in doing so, improve their appreciation of the underlying systems.

However, this emerging trend has limitations and some criticisms. Quality and maintainability of code are common concerns, as AI-generated code may not be as beautiful or efficient as code produced by a human and instead contribute to "spaghetti code." For anyone without a deep understanding of programming principles, including experienced developers, finding and fixing bugs can also be a burden. A significant concern is security vulnerabilities that could easily be contributed by code that has not been rigorously reviewed by an experienced developer. Within the experienced developer community, skepticism exists regarding vibe coding as a potential means of bypassing the fundamental principles of software engineering that are required for writing solid and scalable software. There is also concern that relying too much on AI will deskill new developers and prevent them from developing the problem-solving skills that are essential to growth and confidence. An anecdotal example has even suggested an AI coding assistant to not generate code at all and instead suggest the user develop it themselves, highlighting the limitations of these tools.

Responses from the online programming community range from excitement about more accessibility to strong push back about code quality and security. Expert assessments also reflect the complexity of the topic. Karpathy sees it as intuitive, while Rachel Wolan, CPO at Webflow, calls it fast and flexible, but lacking in custom design, noting that it could be used to augment, rather than replace developers. David Gewirtz of ZDNET views it as a way for developers to increase their productivity, but sees a small opportunity for shortcut coding because main projects will still involve manual, complex code. AI researcher Simon Willison thinks if the AI-generated code gets a full human review and any misunderstandings are corrected, then it’s just using AI as a “typing assistant” and not vibe coding.  There are a few visible products coming to market which are going to (hopefully) make vibe coding simpler, like Cursor, an AI code editor that integrates AI directly within the code editor. Replit, an online coding platform, has integrated AI assistance, and according to the CEO, a significant percentage of Replit users utilize AI features without writing any manual code. GitHub Copilot serves in many ways as an AI pair-programmer completing code and supplying a chat feature for writing code based on natural language requests. Even general LLMs like ChatGPT and Claude can be used for vibe coding by generating code snippets from natural language prompts. Windsurf AI is another AI-driven code editor aiming for a more automated and streamlined experience.

Although it may appear to be solely focused on generating functionally workable code, vibe coding also engages with notions such as aesthetic programming. Aesthetic programming conceives coding as a form of critical and aesthetic inquiry that deepens our understanding of coding as a set of processes that intersect with human meaning-making. Aesthetic programming can also be associated with creative coding, where the resultant creation is primarily expressive, rather than being defined by functional creation, often to create specific "vibes" through visual and interactive modes. The accessibility of vibe coding could indicate an opening up of a lower barrier for artists to begin exploring within code. Furthermore, the AI-assisted nature of vibe coding contributes to changing the emotional space around coding. Vibe coding may reduce the frustration usually associated with learning how to read and use complex (or sometimes) nonsensical syntax, yet new anxieties will potentially emerge in not having a deep understanding or the ability to correctly self-debug. Additionally, pride will shift from writing code, to coding as a process of correctly managing an AI. The literature on coding has raised concerns about cognitive interference and a reduction in everyday coding knowledge if AI spacing is overfostered.

To sum up, vibe coding is a true advancement of the field of software development with exciting promise and challenges. Vibe coding is poised to democratize creation and raise productivity, but there will always be a need for core knowledge of programming fundamentals and for the judgment of experienced developers to create software that is secure, maintainable, and robust. In the long run, vibe coding will take the form of a hybrid; AI tools enhancing human capacity, but the inability for human reasoning and the "vibe" of ease of use must be reconciled with the rigor with which professional software engineering is developed.

Thursday, March 20, 2025

Unleash Creativity with Gemini 2.0 Flash Native Image Generation

The landscape of artificial intelligence continues to evolve at a breathtaking pace, and at the forefront of this innovation is Google's Gemini family of models. Recently, Google has expanded the capabilities of Gemini 2.0 Flash, introducing an exciting experimental feature: native image generation. This development marks a significant step towards more integrated and contextually aware AI applications, directly embedding visual creation within a powerful multimodal model. In this post, we'll delve into the intricacies of this new capability, exploring its potential, technical underpinnings, and the journey ahead.

Introduction to Gemini 2.0 Flash

Gemini 2.0 Flash is a part of Google's cutting-edge Gemini family of large language models, designed for speed and efficiency while retaining robust multimodal understanding. It distinguishes itself by combining multimodal input processing, enhanced reasoning, and natural language understanding. Traditionally, generating images often required separate, specialized models. However, Gemini 2.0 Flash's native image generation signifies a deeper integration, allowing a single model to output both text and images seamlessly. This experimental offering, currently accessible to developers via Google AI Studio and the Gemini API, underscores Google's commitment to pushing the boundaries of AI and soliciting real-world feedback to shape future advancements.

Gemini Flash Image Generation
Screengrab from Google AI Studio

Native Image Generation: Painting Pictures with Language

The core of this exciting update is the experimental native image generation capability. This feature empowers developers to generate images directly from textual descriptions using Gemini 2.0 Flash. Activated through the Gemini API by specifying responseModalities to include "Image" in the generation configuration, this functionality allows users to provide simple or complex text prompts and receive corresponding visual outputs.

Beyond basic text-to-image creation, Gemini 2.0 Flash shines in its ability to perform conversational image editing. This allows for iterative refinement of images through natural language dialogue, where the model maintains context across multiple turns. For instance, a user can upload an image and then ask to change the color of an object, or add new elements, making the editing process more intuitive and accessible.

Another remarkable aspect is the model's capacity for interwoven text and image outputs. This enables the generation of content where text and relevant visuals are seamlessly integrated, such as illustrated recipes or step-by-step guides. Moreover, Gemini 2.0 Flash leverages its world knowledge and enhanced reasoning to create more accurate and realistic imagery, understanding the relationships between different concepts. Finally, internal benchmarks suggest that Gemini 2.0 Flash demonstrates stronger text rendering capabilities compared to other leading models, making it suitable for creating advertisements or social media posts with embedded text.

Technical Insights: Under the Hood

To access these image generation capabilities, developers interact with the Gemini API, specifying the model code gemini-2.0-flash-exp-image-generation or using the alias gemini-2.0-flash-exp. The Gemini API offers SDKs in various programming languages, including Python (using the google-generativeai library) and Node.js (@google-ai/generativelanguage), simplifying the integration process. Direct API calls via RESTful endpoints are also supported. For image editing, the image is typically uploaded as part of the content, often using base64 encoding.

Interestingly, while Gemini 2.0 Flash manages the overall multimodal interaction, the underlying image generation leverages the capabilities of Imagen 3. This allows for some control over the generated images through parameters such as number_of_images (1-4), aspect_ratio (e.g., "1:1", "3:4"), and person_generation (allowing or blocking the generation of images with people). Developers can experiment with this feature in both Google AI Studio and Vertex AI.

To promote transparency and address the issue of content provenance, all images generated by Gemini 2.0 Flash Experimental include a SynthID watermark, an imperceptible digital marker identifying the image as AI-generated. Images created within Google AI Studio also include a visible watermark.

Use Cases and Benefits: Painting a World of Possibilities

The experimental native image generation in Gemini 2.0 Flash unlocks a plethora of exciting use cases across various domains.

  • Creative Industries: Imagine generating consistent illustrations for children's books or creating dynamic visuals that evolve with the narrative in interactive stories. The ability to perform conversational image editing can revolutionize workflows for graphic designers and marketing teams, allowing for rapid iteration and exploration of visual ideas.

  • Marketing and Advertising: Crafting engaging social media posts and advertisements with integrated, well-rendered text becomes significantly easier. Consistent character and setting generation can be invaluable for branding and storytelling across campaigns.
  • Education: Creating illustrated educational materials, such as recipes with accompanying visuals or step-by-step guides, can enhance learning and engagement. The ability to visualize concepts through AI-generated images can be particularly beneficial for complex topics.
  • Accessibility: As demonstrated in the sources, Gemini 2.0 Flash can be used for accessibility design testing, visualizing modifications like wheelchair ramps in existing spaces based on textual descriptions.
  • Prototyping and Visualization: In fields like product design and interior design, the conversational image editing capabilities allow for rapid prototyping of variations and visualization of different concepts through simple natural language commands.

The primary benefit of Gemini 2.0 Flash's native image generation lies in its integrated and intuitive workflow. By combining text and image generation within a single model, it streamlines development and opens doors to more natural and interactive user experiences, potentially reducing the need for multiple specialized tools. The conversational editing feature democratizes image manipulation, making it accessible to users without deep technical expertise.

Challenges and Limitations: Navigating the Experimental Stage

Despite its impressive capabilities, the experimental nature of Gemini 2.0 Flash's image generation comes with certain limitations and challenges.

  • Language Support: The model currently performs optimally with prompts in a limited set of languages, including English, Spanish (Mexico), Japanese, Chinese, and Hindi.
  • Input Modalities: Currently, the image generation functionality does not support audio or video inputs.
  • Generation Uncertainty: The model might occasionally output only text when an image is requested, requiring explicit phrasing in the prompt. Premature halting of the generation process has also been reported.
  • Response Completion Issues: Some users have experienced incomplete responses, requiring multiple attempts.
  • "Content is not permitted" Errors: Frustratingly, users have reported these errors even for seemingly harmless prompts, particularly when editing Japanese anime-style images or family photographs.
  • Inconsistencies in Generated Images: Issues such as disjointed lighting and shadows have been observed, affecting the overall quality.
  • Watermark Removal: Worryingly, there have been reports of users being able to remove the SynthID watermarks within the AI Studio environment, raising ethical and copyright concerns.
  • Bias Concerns: Initial releases of the broader Gemini model family faced criticism regarding biases in image generation, including historically inaccurate depictions and alleged refusals to generate images of certain demographics. While Google has pledged to address these issues, it remains an ongoing challenge.

These limitations highlight that Gemini 2.0 Flash image generation is still in its experimental phase and may not always meet expectations. Developers should be aware of these potential inconsistencies when considering its integration into applications.

Future Prospects

Looking ahead, Google has indicated plans for the broader availability of Gemini 2.0 Flash and its various features. The expectation is that capabilities like native image output will eventually transition from experimental to general availability. Continuous enhancements are expected in areas such as image quality, text rendering accuracy, and the sophistication of conversational editing.

The future may also bring more advanced image manipulation features, including AI-powered retouching and more nuanced scene editing. Furthermore, Google is actively working on integrating the Gemini 2.0 model family into its diverse range of products and platforms, potentially including Search, Android Studio, Chrome DevTools, and Firebase. The development of the Multimodal Live API also holds significant promise for real-time applications that can process and respond to audio and video streams, opening up new interactive experiences.

The evolution of Gemini 2.0 Flash suggests a strategic priority for expanding its capabilities and accessibility within Google's broader AI ecosystem, making advanced AI-driven visual creation more readily available to developers and users alike.

Embrace the Creative Frontier

Gemini 2.0 Flash's experimental native image generation represents a compelling leap forward in AI, offering a unique blend of multimodal understanding and visual creation. Its ability to generate images from text, engage in conversational editing, and seamlessly integrate visuals with textual content opens up a vast landscape of creative and practical applications.

While still in its experimental phase with existing limitations, the potential of this technology is undeniable. As Google continues to refine and expand its capabilities, Gemini 2.0 Flash is poised to become a powerful tool for developers and creators across various industries. We encourage you to explore the experimental features in Google AI Studio and via the Gemini API, contribute your feedback, and be part of shaping the future of AI-driven visual creativity. The journey of bridging the gap between imagination and visual realization has just taken an exciting new turn.

Tuesday, March 11, 2025

How Reinforcement Learning Can Unlock True AI Agents

Artificial intelligence is on the cusp of a significant evolution, moving beyond helpful chatbots and insightful reasoners towards truly autonomous agents capable of tackling complex tasks with minimal human intervention. While current methods rely heavily on meticulously engineered pipelines and prompt tuning, a growing consensus suggests that reinforcement learning (RL) will be the key to unlocking the next level of AI agency. Will Brown, a Machine Learning Researcher at Morgan Stanley, recently shared his perspective on this transformative trend, highlighting the potential of RL to imbue AI systems with the ability to learn and improve through trial and error.

Today's advanced Large Language Models (LLMs) excel as chatbots, engaging in conversational interactions, and as reasoners, adept at question answering and interactive problem-solving. Models like OpenAI's O1, O3, and the recently unveiled Grok-1 and Gemini demonstrate remarkable capabilities in longer-form thinking. However, the journey towards true agents – systems that can independently take actions and manage longer, more intricate tasks – is still in its early stages.

Bridging the Gap: From Pipelines to Autonomous Agents

Currently, achieving agent-like behavior often involves chaining together multiple calls to underlying LLMs, supplemented by techniques like prompt engineering, tool calling, and human oversight. While these "pipelines" or "workflows" have yielded "pretty good" results, they typically possess a low degree of autonomy, demanding substantial engineering effort to define decision trees and refine prompts. Successful applications often feature tight feedback loops with user interaction, enabling relatively quick iterations.

The emergence of more autonomous agents, such as Devon, Operator, and OpenAI's Deep Research, hints at the future. These systems can engage in longer, more sustained tasks, sometimes involving numerous tool calls. The prevailing question is how to foster the development of more such autonomous entities. While awaiting inherently more capable base models is one perspective, Brown emphasizes the significance of the traditional reinforcement learning paradigm.

The Power of Trial and Error: Reinforcement Learning for Agents

At its core, reinforcement learning involves an agent interacting with an environment to achieve a specific goal, learning through repeated interactions and feedback. This contrasts with current practices where desired behaviors are often hardcoded through prompt engineering or learned from static datasets. RL offers a pathway to continuously improve an agent's performance based on numerical reward signals that guide it towards better strategies for problem-solving.

The recent excitement surrounding DeepSeek's release of the r1 model and its accompanying paper underscores the power of RL. This work provided the first detailed explanation of how models like OpenAI's O1 achieve sophisticated reasoning abilities. The key, it turns out, was reinforcement learning: feeding the model questions, evaluating the correctness of its answers, and providing feedback to encourage successful approaches. Notably, the long chains of thought observed in such models emerged as a learned strategy, not through explicit programming. The GRPO algorithm, utilized by DeepSeek, exemplifies this concept: for a given prompt, multiple completions are sampled, scored, and the model is then trained to favor higher-scoring outputs.

Rubric Engineering: Crafting Effective Reward Systems

While the application of RL to single-turn reasoner models has shown promise, the next frontier lies in extending these principles to more complex, multi-step agent systems. OpenAI's Deep Research, powered by end-to-end reinforcement learning involving potentially hundreds of tool calls, demonstrates the potential, albeit with limitations in out-of-distribution tasks.

A critical aspect of implementing RL for agents is the design of effective reward systems and environments. Brown's personal experience experimenting with a small language model and the GRPO algorithm highlighted the potential of "rubric engineering". Similar to prompt engineering, rubric engineering involves creatively designing reward functions that guide the model's learning process. These rubrics can go beyond simple right/wrong evaluations, awarding points for intermediate achievements like adhering to specific formats or demonstrating partial understanding. The simplicity and accessibility of Brown's initial single-file implementation sparked considerable interest, emphasizing the community's eagerness to explore these techniques.

Open Source Innovation and the Future of AI Engineering

Recognizing the need for more robust tools, Brown has been developing an open-source framework for conducting RL within multi-step environments. This framework aims to leverage existing agent frameworks, allowing developers to define interaction protocols and reward structures without needing to delve into the intricacies of model weights or tokenization.

Looking ahead, Brown envisions a future where AI engineering in the RL era will build upon the skills and knowledge gained in recent years. The challenges of constructing effective environments and rubrics are akin to those of building robust evaluation metrics and crafting insightful prompts. The need for good monitoring tools and a thriving ecosystem of supporting platforms and services will remain crucial. While questions remain about the cost, scalability, and generalizability of RL-driven agents, the potential to unlock truly autonomous and innovative AI systems makes further exploration in this domain essential. The journey towards a future powered by intelligent agents learning through trial and error has just begun, promising a new era of possibilities for artificial intelligence.

Monday, March 10, 2025

Supreme Court Approves Electronic Notarization Rules

Manila, Philippines – In a significant stride towards modernizing the country's legal framework, the Supreme Court (SC) has approved the Rules on Electronic Notarization (E-Notarization Rules), a move poised to revolutionize how notarial services are accessed and delivered across the Philippines. This landmark decision, formalized under A.M. No. 24-10-14-SC, aims to enhance efficiency and broaden access to justice by leveraging technology for the notarization of electronic documents. The new rules supplement the existing 2004 Rules on Notarial Practice and represent a key achievement under the Supreme Court's Strategic Plan for Judicial Innovations (SPJI) 2022-2027.

SC E-Notarization

The E-Notarization Rules, approved En Banc on February 4, 2025, pave the way for the commissioning of Electronic Notaries Public (ENPs). Unlike traditional notaries whose authority is limited to specific geographic areas, ENPs will be authorized to perform notarial acts for individuals located anywhere within the Philippines and, in certain circumstances, even abroad. This expanded jurisdiction directly addresses a key limitation of the 2004 Notarial Rules, promising greater convenience and accessibility, particularly for citizens in remote or underserved regions.

Under the updated framework, three distinct forms of electronic notarization will be permissible:

  • In-Person Electronic Notarization (IEN): This method requires both the principals and any witnesses to be physically present with the ENP, utilizing an accredited Electronic Notarization Facility (ENF) within the same location.
  • Remote Electronic Notarization (REN): This allows principals and witnesses to connect with the ENP virtually through secure videoconferencing via an accredited ENF.
  • A hybrid approach combining elements of both IEN and REN.

The E-Notarization Rules place a strong emphasis on security and data privacy. To ensure the integrity of the process, the rules mandate the implementation of Multi-Factor Authentication (MFA), incorporating technologies such as facial recognition, biometrics, and one-time passwords, in line with regulations set by the Bangko Sentral ng Pilipinas. Furthermore, the electronic notarial book, which serves as the chronological record of all electronic notarial acts, will be safeguarded against tampering. All data stored within the Electronic Notarization Facilities (ENFs) will also be protected in accordance with the provisions of the Data Privacy Act.

It is important to note that the E-Notarization Rules will exclusively apply to electronic documents in Portable Document Format (PDF) or Portable Document Format Archival (PDF/A). Traditional paper documents bearing handwritten signatures, notarial wills, and depositions will continue to be governed by the 2004 Notarial Rules.

The E-Notarization Rules are set to take effect 15 days following their publication on March 9, 2025. This effectivity will trigger a transitional period that includes the establishment of the Office of the Electronic Notary Administrator (ENA). The ENA will be the central body responsible for the commissioning and supervision of ENPs, as well as the accreditation of ENFs. Additionally, a nationwide repository for electronically notarized documents, known as the SC Central Notarial Database, will be established during this transitional phase.

The Supreme Court's move towards electronic notarization builds upon its earlier experience with remote notarization during the COVID-19 pandemic. In 2020, the SC introduced the Interim Rules on Remote Notarization of Paper Documents as a temporary measure to address the urgent need for legal services amidst community quarantines. Recognizing the long-term benefits of digital solutions, the SC formed a Technical Working Group (TWG) to develop a permanent framework for electronic notarization. This TWG, chaired by Associate Justice Alfredo Benjamin S. Caguioa with Associate Justice Ramon Paul L. Hernando as Vice-Chairperson, conducted extensive studies, benchmarked best practices from other jurisdictions, and engaged in consultations with various stakeholders, including government agencies and technology experts. Their efforts culminated in the finalization of the E-Notarization Rules, marking a significant milestone in the Philippine judiciary's digital transformation journey.


The full text of the E-Notarization Rules and the Guidelines on the Accreditation of Electronic Notarization Facility Providers can be originally accessed on the Supreme Court's website. This progressive step by the Supreme Court is expected to enhance the efficiency of legal transactions, reduce costs associated with traditional notarization, and ultimately improve access to justice for all Filipinos.

This article is based on information from the Supreme Court of the Philippines, particularly a press release from the Office of Associate Justice Alfredo Benjamin S. Caguioa and the SC Office of the Spokesperson.

Tuesday, March 4, 2025

A Reflective Experience as a Resource Speaker: Inspiring Future Tech Professionals with AI Insights

I recently had the privilege of stepping into the vibrant halls of La Consolacion University Philippines - Main Campus in Malolos, Bulacan, as a resource speaker for Grade 11 and 12 students specializing in Technical-Vocational-Livelihood (TVL) Computer Programming and Computer System Servicing. The event, titled Equipping Future Tech Professionals: A Work Immersion Seminar-Workshop on AI, Cybersecurity, and Robotics, was held at the St. Augustine Building - AVR. While the broader seminar covered various topics, I delivered a focused talk on Artificial Intelligence specifically Generative AI (GenAI), aiming to introduce these young minds to one of the most transformative technologies of our time. Reflecting on the experience, I am filled with gratitude, inspiration, and a renewed sense of purpose.

Jose Nies


Why Generative AI?

Generative AI is not just a buzzword—it’s a paradigm shift. As industries worldwide embrace AI-driven solutions, it’s crucial for the next generation of tech professionals to understand its potential and implications. I chose this topic because it’s timely, relevant, and aligns with the rapid adoption and advancements of AI across industries. My preparation was rooted in my own learning journey, particularly through the Generative AI Learning Path on Google Cloud Skills Boost. This foundational knowledge, combined with my passion for emerging technologies, fueled my desire to share insights with these students.

Key Insights Shared

During my talk, I provided a foundational overview about Generative AI. Here are the key points I delivered:

  1. Market Insights: I shared data on the exponential growth of the Generative AI market, its enterprise adoption, and its industry-specific applications such as in healthcare, finance, marketing, education, etc.
  2. AI Fundamentals: I broke down the concepts of Artificial Intelligence, Machine Learning, Deep Learning, and Generative AI, ensuring students understood the distinctions and connections.
  3. Core Components: We explored the building blocks of Generative AI, including Machine Learning Models, Neural Networks, and the art of Prompt Engineering.
  4. Key Technologies: I introduced them to Large Language Models (LLMs), Transformer Architectures, and Generative Adversarial Networks (GANs).
  5. Ethical Challenges: We discussed the importance of addressing bias, hallucinations, privacy concerns, and copyright issues in AI development and usage.
  6. The Future of AI: I emphasized that while challenges exist, the opportunities for innovation and efficiency far outweigh the risks.
TVL Students

Student Engagement: A Heartening Experience

The students’ enthusiasm was noticeable. Over the course of two hours, they listened attentively, asked thoughtful questions, and engaged deeply with the topic. Their curiosity was evident in the Q&A session, where they posed questions ranging from the technical aspects of AI to its societal impact. One student asked, “How can we ensure AI is used ethically in the future?” Another wondered, “What skills should we focus on to stay relevant in the age of AI?” These questions reflected their genuine interest and desire to understand the technology shaping their future.

Memorable Moments

The Q&A session was undoubtedly the highlight of the seminar. It was heartening to see students actively participating and thinking critically about the implications of AI. I encouraged them not to fear AI but to embrace it as a tool to augment their capabilities. I shared that upskilling is key—learning how to leverage AI effectively will set them apart in the tech landscape.

One moment that stood out was when a student asked, “Will AI take away our jobs?” I responded by emphasizing that while AI will surely automate certain tasks, it will also create new opportunities. The goal is to adapt, learn, and grow alongside the technology.

The Impact I Hope to Leave

My greatest hope is that I sparked curiosity and inspired these students to explore AI further. I encouraged them to take advantage of online learning platforms, experiment with AI tools, and stay informed about emerging trends. AI is reshaping the way we live and work, and understanding it is no longer optional—it’s essential.

Personal Takeaways

This experience was both humbling and motivating. Being invited to share my knowledge with such an eager audience was an honor. It reinforced my belief in the importance of continuous learning and the power of education to transform lives. It also inspired me to deepen my own understanding of AI and cybersecurity so I can contribute even more meaningfully in future speaking engagements.

A Message to Educators, Students, and Future Tech Professionals

To educators: Embrace the responsibility of preparing students for a future dominated by AI. Integrate emerging technologies into your curricula and foster a culture of curiosity and innovation.

To students: The future belongs to those who are willing to learn and adapt. Generative AI is not just a tool—it’s a gateway to endless possibilities. Equip yourselves with the skills and knowledge to harness its potential.

To future cybersecurity professionals: As AI evolves, so do the challenges of securing it. Understanding AI’s intricacies will be critical in safeguarding our digital future.

Generative AI is more than a technological advancement—it’s a catalyst for change and a technology that will reshape the way we live and work. By embracing it, we can shape a future that is innovative, inclusive, and transformative. My experience at La Consolacion University Philippines was a reminder of the power of knowledge-sharing and the boundless potential of the next generation. Together, we can build a future where technology serves humanity in the most meaningful ways.