How to Send Your Images Through AI

This chatbot is designed to analyze and respond to user-submitted images, specifically focusing on bicycles in this case scenario (though you can decide on what topic to focus). The bot guides users through uploading images and gathers responses based on the image and accompanying text inputs.

The core functionality of the chatbot centers around its ability to analyze images, specifically bicycle images, using OpenAI’s Vision API (GPT-4 with vision) to produce meaningful insights. This step-by-step breakdown shows how the bot processes and responds to images with integrated AI capabilities in Pingstreams.

1. Set Up Basic Intent Structure

Intents or Blocks are the foundational units that define the bot’s responses and actions. For this bot:

There is a welcome intent/block that triggers an initial greeting and a prompt to upload an image with the message “Insert an image, please”
A default fallback intent/block is usually set up to handle unrecognized inputs with prompts like “Can you rephrase your question?” BUT in this case, it is set up to prompt the user to write what they want to know about the image they’ve uploaded

Basic Intent Structure

2. Image Analysis Using Vision API and GPT Integration

Vision API Intent

This is the primary feature where the bot analyzes the uploaded images and responds with insights.

Steps in Image Analysis:

Image Request: Users are instructed to upload an image. The bot pauses with a message like “Analyzing…”.

Web Request Setup: Choose the “Post” option and then insert the API:

https://api.openai.com/v1/chat/completions

Web Request Setup

Prompt Design: Choose the “Body” option where you can insert the JSON-structured prompt:

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "{{last_user_text}}"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "{{last_user_message}}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

Authorization: Insert the attribute in this field after you’ve inserted the OpenAI Key in the Globals section of your Bot (you’ll find this as the third icon on the far right of the Pingstreams Design Studio).

Result Processing: The API’s response is stored in a variable (result), which the bot formats and relays back to the user as a detailed bike description.

Result Processing Configuration

3. Sending the Image to OpenAI’s Vision API

API Call Configuration

The bot constructs an API request to OpenAI’s GPT-4 Vision API. This API call is configured to send both a text prompt and the image URL to the model for processing.

API Request Structure:

The bot first extracts the URL of the uploaded image and then incorporates this URL into the API request
The prompt given to the model is crucial. In this case, it’s set to “Describe this bike in detail, including color, frame style, and any visible features.” A well-designed prompt helps the model return a comprehensive description

4. Handling the Response from the Vision API

Data Extraction: The API’s response includes detailed descriptions of the bicycle, such as frame type, color, and any notable features.

Storing the Result: The bot stores the response in a variable (e.g., result) to ensure the data is readily accessible for further steps.

Error Intent: If an error occurs during API interaction, the bot captures and displays it in the error attribute.

User Feedback: The bot formats the response and sends it back to the user as a detailed description, ensuring the information is clear and user-friendly.

Optional Follow-ups: After displaying the results, the bot asks if the user wants to upload another image.

Response Handling

5. Integration with Pingstreams and OpenAI API

The integration of Pingstreams with OpenAI’s GPT-4 Vision API is essential for powering the image analysis functionality. This section covers the key aspects of securely setting up and utilizing APIs for optimal performance.

A. Obtaining and Securing API Keys

API Key for OpenAI: The bot requires an API key to access OpenAI’s Vision API. This key is issued upon account creation and is securely stored in Pingstreams’ backend or a secure environment variable.

Authorization: When making requests to OpenAI’s API, the bot includes the API key in the request headers to authenticate each call.

B. Configuring the API Requests in Pingstreams

Pingstreams allows for custom API integrations by configuring webhook requests in the bot’s setup. In this case:

Webhook Settings: The bot’s webhook is configured to route each user image upload request to OpenAI’s Vision API endpoint
API Endpoint: The API endpoint for OpenAI’s Vision model is specified in Pingstreams, allowing each request to be seamlessly routed
Testing the API Call: Once configured, initial tests are conducted to ensure the bot successfully sends requests and receives the desired image descriptions in response

C. Handling the API Response

The bot’s code processes the JSON response from the OpenAI API, extracting and formatting the description data to make it user-friendly.

This approach ensures that users receive a clear and relevant answer based on the image they submitted.

D. Structuring the JSON Configuration for API Requests

Each API request is structured in JSON format, specifying the model, prompt, image URL, and optional settings like max_tokens to control the length of the response.

Example JSON for API Request:

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in detail"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "{{image_url}}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

This structure allows the chatbot to send specific prompts and receive detailed responses tailored to the user’s query.

E. Error Handling and Fallbacks

API Error Handling: If the API fails or returns an error, the bot is configured to catch these errors and respond with an informative message to the user.

Fallback Responses: If the response does not meet the expected criteria (e.g., if it’s too vague or irrelevant), the bot can prompt the user to try rephrasing their request or uploading a clearer image.

F. Data Security and Compliance

Storing User Data: All API interactions should comply with data protection standards. User images and messages should be securely stored, anonymized if necessary, and deleted after a specified period to protect user privacy.

Using Environment Variables: API keys and other sensitive data are stored in environment variables or Pingstreams’ secure backend settings to prevent unauthorized access.

Use Cases and Applications

This image analysis chatbot structure can be adapted for various applications:

Product identification and recommendations
Quality control and inspection processes
Medical image preliminary analysis (with appropriate disclaimers)
Real estate property assessment
Vehicle inspection and damage assessment
Art and artwork analysis and description
Fashion and clothing style identification
Food and recipe suggestions based on ingredients

Benefits of Image AI Integration

Implementing image analysis in Pingstreams provides:

Enhanced user experience with visual interactions
Automated processing of visual content
Scalable image analysis without manual intervention
Detailed insights from visual data
Multi-modal support combining text and images
Real-time responses to image queries

This chatbot structure efficiently guides users through image-based inquiries, responds with relevant information using AI, and ensures a smooth user experience through predefined fallback and error responses. This setup can be adapted to various applications requiring image analysis and detailed feedback.

For more advanced image analysis features or custom implementations, contact the Pingstreams support team.