Software Engineering for AI Chain

Beyond fancy prompts: AI-powered software engineering infrastructure for AI

Software Engineering for AI Chain: Vision and Goals

Software engineering emerged as a discipline in response to the growing complexity of software systems and the need to manage them more effectively. Prior to this, software development was largely an ad-hoc and informal process, often carried out by individual programmers. This is precisely the current state of prompt programming. AI chain engineering (software engineering for AI chain) includes but goes far beyond writing flashy prompts for foundation models, just like software engineering is different from coding.

Vision: Revolutionize Software Landscape through Generative AI

We are currently in the stage of transforming the capabilities of foundation models into AI assistants, which presents two practical opportunities: 1. allowing everyone (not just AI or software engineers) to create personalized intelligent agents, and 2. enabling people to share and hire intelligent agent services. This will allow a broader audience to participate in the AI wave and benefit from it.

To seize these opportunities, we are committed to develop a systematic AI chain methodology which we call promptmanship, corresponding development and deployment environment (AI chain IDE), and AI services in our AI chain marketplace. Our vision "sets us apart from related concepts and techniques", such as ChatGPT, AutoGPT, LangChain, no code AI, and prompt engineering in terms of human-AI collaborative intelligence, low requirements for computing and programming skills, and systematic framework for AI4SE4AI.

Software engineering has developed many effective methodologies and engineering practices and tools for Software 1.0 and Software 2.0, and many successful ideas and experiences are transferable to AI chain engineering. However, due to the emergent capabilities of foundation models and the significant shift it brings to human-AI interaction (see AI 2.0 and Software 3.0 and why is it significant), we need to extend the "old-fashioned" SE methodologies and best practices to adapt to the new foundation-model driven, human-oriented, natural language-based programming paradigm.

On the one hand, large language models encode rich world knowledge and have strong conversational abilities. We can leverage these models to support AI chain engineers in acquiring task-specific knowledge, thinking methods, workflows, and thereby gain more accurate problem understanding and solving capabilities. At the same time, large language models have uncertainty and can suffer from errors and hallucination problems. We need to develop the mechanical sympathy for the models and adopt effective engineering methods and design prompts to mitigate the impact of such problems.

On the other hand, large language models have deeply transformed who can develop AI services and what kind of AI services can be made available. This requires us to shift from the past code-centric development methods and tools to human-oriented ones that allow individuals, including those with limited technical backgrounds, to focus on problem-solving, more effectively articulate and clarify task requirements, and more intuitively design, build, and evaluate AI chains, rather than performing data engineering and writing code.

Goal #1: Promptmanship

Generalize and organize prompt engineering best practices from a software engineering perspective and place prompt engineering in a broader software engineering context, filling in important software engineering methods that have been overlooked, thus proposing a systematic AI chain methodology.

Since the groundbreaking GPT-3 paper, a large amount of work has focused on prompt engineering to improve task performance or accomplish more challenging tasks. Many tasks that performed poorly in the GPT-3 paper can be significantly improved by appropriate prompt formatting. Several important prompt design concepts have been proposed and proven effective, such as Chain of Thought, self-ask, and self-consistency, which are effective prompt constructs for solving recurring prompting problems.

However, as task complexity increases, the impact of optimizing a single prompt decreases. This is because models struggle to handle overly complex, multi-step tasks in a single generative process. People naturally adopt task decomposition, a basic principle of computational thinking, to break down complex tasks into several relatively simple steps that large language models can effectively solve. Many works have employed this principle, but most of them are task-specific decomposition structures, not task-agnostic decomposition strategies. Currently, most research in prompt engineering focuses on performance improvement, but little attention to software engineering perspectives, such as the modularity, composability, debuggability, and reusability of AI functionalities.

Goal #2: AI Chain IDE

Develop an AI chain integrated development environment (IDE) to support the full life cycle of AI chain development. By organically embodying AI chain methodology in this IDE, people can naturally apply AI chain best practices throughout the AI chain development process, enabling those with limited AI and programming skills to develop high-quality foundation model-based AI services.

Chatbots are the simplest and most intuitive tools for interacting with foundation models, such as Stable Diffusion Slack bot, ChatGPT. Through conversations with large models, people can acquire a lot of knowledge (though one should be careful of hallucination problems) and explore the capabilities and effective prompts of the foundation models. Since chatbots themselves cannot be used for task automation, people need to write programs to utilize effective prompts obtained through conversations with the foundtion models and call the foundation model APIs for task automation. Although there are frameworks (LangChain, Dust, Primer) that encapsulate the raw APIs and support AI chain programming, these frameworks still require basic programming language skills, which undoubtedly counteract the democratization effect of foundation models.

Researchers have attempted to lower the barriers to prototyping AI functionality through visual programming such as PromptChainer. Although visual programming can achieve this goal, it does not address the challenges of task decomposition, prompt design, evaluation, and management when creating AI functionality based on foundation models. In response to these challenges, people have developed some scattered tools, such as PromptSource, Promptable, Dyno, GPTTools, Interactive Composition Explorer, Metaprompt, Visual Prompt Builder, AI Test Kitchen, but these tools are independent and do not support a coherent AI chain methodology.

Goal #3: AI services ecosystem

Develop the underlying technology to support the emerging AI services marketplace while researching responsible AI chain engineering techniques to enhance the transparency, accountability, and security of AI services ecosystem and supply chain.

We anticipate that the combination of foundation models and our AI chain IDE will usher in a wave of democratized AI services development, especially in service personalization, vertical markets, and intelligent middlewares. This will lead to the creation of a multitude of diverse and innovative AI services that will impact and transform our learning, working, and living experiences, much like the influence and changes brought about by mobile applications over a decade ago.

AI 2.0, which emerged from the foundation models, empower non-AI or software experts to directly conceive AI service use cases and personalize AI prototypes. Index Ventures believes the adoption of AI will change the software value chain. At the application layer, they believe that over time business models will shift to capture more customer-specific value unlocked by AI. They envision a world where customers are able to pay based on level of customization or personalization within the product. Prompt Sapper offers great customizability. For example, Wen Xiao Jie, Qing Xiao Xie and Chun Xiao Xie are all writing assistants, but they target at different writing scenarios and offer different user experience and services.

In vertical markets, we expect an emergence of domain-specific AI services connecting foundation models with end-users, offering more convenient and affordable service customization. For example, Alivia is a ChatGPT-like product designed specifically for marketing. It handles the entire loop of marketing operations, from content production, management, review, and publishing, to data analysis and operational optimization. In fact, Alivia integrates many AI services similar to our AI chain showcases. OpenAI customer stories report several GPT-based applications across education, finance, gaming, customer management. GPT-3 demo collects 800+ and ever-growing apps and use cases for 210+ categories based on GPT and other generative AI models. However, many of the demos listed stop at the ideation stage. Prompt Sapper can turn many of these great ideas into reality.

As these AI services proliferate, we look forward to a thriving AI services marketplace, similar to today’s app marketplaces, where people can share, assemble, and trade AI services based on foundation models or even hire AI chain development talents. Over time, this marketplace will evolve into an ecosystem of AI services around foundation models, much in the same way that Salesforce was differentiated because of the great product experience around a database. Of course, this will bring new software engineering challenges, especially in responsible AI and AI services supply chain management, which are our active research agenda to expand our promptmanship and sapper IDE to address these challenges.