绘小诗 (PoeticVisons)

Author: Guojie Yin

Put on the glasses powered by AI——A poetic journey across time and space is ready to start!

1 APPLICATION SCENARIO AND END USERS

Let me begin by exploring the usage situations, targeted end users and their requirements for PoeticVisions.Let me begin by exploring the usage situations, targeted end users and their requirements for PoeticVisions.

1.1 Create Cultural and Creative Products
Collaborate with historical landmarks (such as the Red Cliff, Yueyang Tower, the Forbidden City) or the homes of famous figures (such as the Du Fu Thatched Cottage) to launch cultural and creative commemorative products (such as wall decorations, bookmarks, postcards, calendars, etc).

1.2 Produce Educational Products

1.3 END-USERS And Their Requirements

2 EXPLORATIONS WITH LLM

Now, we describe how we learn the task knowledge from the LLM, evaluate the effectiveness of our experiment prompts and summary the task challenges

2.1 Discourse With LLM

By interacting with large models, we have summarized two key points to keep in mind when creating a painting:


2.2 Prompt Experimentation

To improve the quality of artistic interpretation of ancient poetry, direct translation using a large language model may not be sufficient, as it may produce stiff translations and even copy parts the model cannot understand. Instead, interpreting the poetry multiple times using a large language model and summarizing the results can provide a more detailed scene description, surpassing the quality of direct translation.

Once a style (e.g., ink wash painting) is selected for the ancient poetry, the large language model can enhance the expressive power of the painting created by the text-to-image model by providing dimensions such as artistic conception, modeling, line, and color based on the content of the poetry.

This approach results in a more charming and vivid painting for the poetry.

2.3 Challenges

The main challenge lies in the connection between each worker. In the chain design, the feedback from the previous worker needs to be used as the input for the next worker. To ensure that the feedback format matches the prompt of the next worker, we need to add a series of restrictions to the prompt of the previous worker to constrain the format of the feedback. At the same time, we also need to take corresponding measures to minimize the random fluctuations in the answers from the previous worker.

3 AI-CHAIN DESIGN

Figure-1: AI Chain of PoeticVisons

Then, I will introduce my AI chain design and its major features.
The PoeticVisions can be broken down into the following functional modules, Poet Analysis Module and Picture Generation Module.

3.1 Poet Analysis Module

After inputting a poetry, the poet analysis module parses the poetry and provides style and imagery information for generating a painting.

It can be further divided into two sub-modules: Imagery Extraction Module and Style Construction Module.

The poetry is input into the imagery extraction module and the style construction module, which return the imagery and style information, respectively.

3.2 Imagery Extraction Module

The imagery extraction module can be broken down into four Software 3.0 workers: Artistic Conception Generation Worker, Artistic Conception Reconstruction Worker, Artistic Conception Decomposition Worker and Imagery Filtering Worker.

After inputting the poetry, three artistic conception generation workers (i.e., a prompt committee) independently generate three artistic conceptions respectively, and provides them to the artistic conception reconstruction worker.

The artistic conception reconstruction worker (a composer) analyzes and summarizes the received artistic conceptions, generating an improved new artistic conception and providing it to the artistic conception decomposition worker.

The artistic conception decomposition worker (an input rewriter) decomposes the artistic conception into a series of concise phrases and provides them to the imagery filtering worker.

The imagery filtering worker (another input rewriter) screens the suitable phrases for painting and returns them as the imagery information.

3.3 Style Construction Module

The style construction module can be broken down into three Software 3.0 workers: Painting Style Selection Worker, Atmosphere Analysis Worker and Visual Enhancement Worker.

After inputting the poetry, the painting style selection worker (a selection commander) analyzes the poetry content, selects the most suitable painting style, and provides it to the atmosphere analysis worker.

The atmosphere analysis worker (an information inquirer) organizes three adjectives that describe the painting style characteristics and then provides the generated adjectives and the received painting style to the visual enhancement worker.

The visual enhancement worker (a content creator) combines the painting style and the poetry, provides two dimensions that can enhance the visual sense, and provides two adjectives for each dimension. Finally, the visual enhancement worker returns the received painting style characteristics, painting style, and the dimensions as style.

3.4 Painting Generation Module

Generate a painting based on the received image and style.

3.5 Supplementary Content.

There are two major constraints that are necessary in the prompt. I implemented them as output formatter and context control.

4 RAPID PROTOTYPING PROCESS

Finally, let me describe how my AI chain design and implementation has evolved and been enhanced through the rapid prototyping process, and also discuss some potential future enhancements.

4.1 Original Version

These were three issues with the prompt design in earlier versions and their solutions.

4.2 Problem Resolution

In the current version, the program can properly handle poetry that directly express artistic conceptions.

4.3 Future Improvement

Some poetry containing more abstract rhetorical techniques (such as exaggeration, metaphor, etc.) or poetry with hidden imagery, PoeticVisions is unable to accurately find the corresponding imagery or has issues with missing imagery.

In the subsequent versions, we will focus on optimizing these two aspects.

5 APPLICATION USAGE

usage
usage

Choose a poem with artistic conception that you like and PoeticVisons will automatically generate a painting for you based on the content of the poem

6 SAPPER STRUCTURE AND USER EXPERIENCE

6.1 Poet Analysis Module

usage

6.2 Imagery Extraction Module

usage

6.3 Imagery Extraction Module

usage

6.4 Painting Generation Module

usage

6.5 User Experience

Sapper is a revolutionary tool that helps programmers overcome entry barriers and rapidly bring their ideas to life. "PoeticVisons" serves as a perfect example of how Sapper can be used to quickly realize creative ideas. Even though I lacked any formal training in drawing, Sapper's interaction with me enabled it to identify my needs and offer essential insights on program design from a professional artist's perspective. The outcome was impressive, to say the least. Sapper has introduced a new era of minimalist programming, enabling us to code on the go with ease.