Create Your First Visual Fashion Agent Using AOAI and AI Search - Search Product Catalog Images

mrajguru · ‎Jun 24 2024

Search Product Catalog Images Using Azure Search and OpenAI with Langchain

In the ever-evolving landscape of retail, businesses are continually seeking innovative solutions to streamline their operations and enhance customer experiences. One such breakthrough is the implementation of artificial intelligence (AI) to search product catalog images efficiently. This transformative technology not only simplifies the search process but also empowers businesses to provide personalized and seamless shopping experiences for their customers.

The Need for AI in Product Catalog Image Search: Traditional methods of searching through product catalogs involve manual tagging and categorization, which can be time-consuming and prone to human error. As the volume of products in a catalog grows, managing and searching for specific items becomes a daunting task. AI, particularly computer vision, addresses these challenges by automating the recognition and categorization of products in images.

Key Features of AI-Powered Product Catalog Image Search:

Object Recognition and Tagging: AI algorithms can identify and tag objects within images, providing accurate and consistent categorization of products. This reduces the reliance on manual tagging, ensuring that products are correctly labeled in the catalog.
Visual Similarity Search: AI enables visual similarity search, allowing users to find products based on visual attributes rather than relying solely on text-based queries. This feature is especially valuable for customers who may struggle to describe a product in words but can easily recognize it visually.
Enhanced Product Discovery: By understanding the visual characteristics of products, AI facilitates a more sophisticated recommendation system. Customers can discover related or complementary items, leading to increased cross-selling opportunities and a more engaging shopping experience.
Improved Accuracy and Efficiency: AI-powered image recognition is highly accurate and can process large volumes of images in a fraction of the time it would take a human. This efficiency not only reduces operational costs but also enhances the speed at which customers can find and purchase products.
Integration with E-Commerce Platforms: AI-driven image search can seamlessly integrate with existing e-commerce platforms, making it easy for businesses to adopt this technology without major disruptions. This integration allows for a smoother transition and ensures that the AI-enhanced search becomes an integral part of the overall shopping experience.

Now lets try to implement this with Azure OpenAI.

Firs you need to import some libraries

import azure.cognitiveservices.speech as speechsdk
import datetime
import io
import json
import math
import matplotlib.pyplot as plt
import numpy as np
import openai
import os
import random
import requests
import sys
import time

from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
    SearchIndexerDataContainer,
    SearchIndexerDataSourceConnection,
)
from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions
from azure.cognitiveservices.speech import (
    AudioDataStream,
    SpeechConfig,
    SpeechSynthesizer,
    SpeechSynthesisOutputFormat,
)
from azure.cognitiveservices.speech.audio import AudioOutputConfig
from azure.search.documents.models import VectorizedQuery,VectorizableTextQuery

from dotenv import load_dotenv
from io import BytesIO
from IPython.display import Audio
from PIL import Image
import os
import base64
import re
from datetime import datetime, timedelta

import requests
import os
from tenacity import (
    Retrying,
    retry_if_exception_type,
    wait_random_exponential,
    stop_after_attempt
)
import json
import mimetypes

Initiate some environmental variable for your

Azure OpenAI Endpoint
Azure Cognitive Service End point
Azure Search End point

load_dotenv("azure.env")
# Azure Open AI
openai_api_type = os.getenv("azure")
openai_api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai_api_version = os.getenv("AZURE_API_VERSION")
openai_api_key = os.getenv("AZURE_OPENAI_KEY")

# Azure Cognitive Search
acs_endpoint = os.getenv("ACS_ENDPOINT")
acs_key = os.getenv("ACS_KEY")

# Azure Computer Vision 4
acv_key = os.getenv("ACV_KEY")
acv_endpoint = os.getenv("ACV_ENDPOINT")

blob_connection_string = os.getenv("BLOB_CONNECTION_STRING")
container_name = os.getenv("CONTAINER_NAME")

# Azure Cognitive Search index name to create
index_name = "azure-fashion-demo"

# Azure Cognitive Search api version
api_version = "2023-02-01-preview"

Now lets create a function to create text embedding using vision API

def text_embedding(prompt):
    """
    Text embedding using Azure Computer Vision 4.0
    """
    version = "?api-version=" + api_version + "&modelVersion=latest"
    vec_txt_url = f"{acv_endpoint}/computervision/retrieval:vectorizeText{version}"
    headers = {"Content-type": "application/json", "Ocp-Apim-Subscription-Key": acv_key}
    payload = {"text": prompt}
    response = requests.post(vec_txt_url, json=payload, headers=headers)

    if response.status_code == 200:
        text_emb = response.json().get("vector")
        return text_emb
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None

Lets Now lets create a function to create Image embedding using vision API

def image_embedding(image_path):
    url = f"{acv_endpoint}/computervision/retrieval:vectorizeImage"  
    mime_type, _ = mimetypes.guess_type(image_path)
    headers = {  
        "Content-Type": mime_type,
        "Ocp-Apim-Subscription-Key": acv_key  
    }
    for attempt in Retrying(
        retry=retry_if_exception_type(requests.HTTPError),
        wait=wait_random_exponential(min=15, max=60),
        stop=stop_after_attempt(15)
    ):
        with attempt:
            with open(image_path, 'rb') as image_data:
                response = requests.post(url, params=params, headers=headers, data=image_data)  
                if response.status_code != 200:  
                    response.raise_for_status()
    vector = response.json()["vector"]
    return vector

Next thing we require is to create a function which takes a text prompt as input and search Azure Search for most relevant images. Here Buy Now Link is a dummy link which can be replaced with actual product URL

def prompt_search(prompt, topn=5, disp=False):
    """
    Azure Cognitive visual search using a prompt
    """
    results_list = []
    # Initialize the Azure Cognitive Search client
    search_client = SearchClient(acs_endpoint, index_name, AzureKeyCredential(acs_key))
    blob_service_client = BlobServiceClient.from_connection_string(blob_connection_string)
    container_client = blob_service_client.get_container_client(container_name)
    # Perform vector search
    vector_query = VectorizedQuery(vector=text_embedding(prompt), k_nearest_neighbors=topn, fields="image_vector")
    response = search_client.search(
        search_text=prompt, vector_queries= [vector_query], select=["description"], top = 2
    )    
    for nb, result in enumerate(response, 1):
        blob_name = result["description"] + ".jpg"
        blob_client = container_client.get_blob_client(blob_name)
        image_url = blob_client.url
        sas_token = generate_blob_sas(
                                        blob_service_client.account_name,
                                        container_name,
                                        blob_name,
                                        account_key=blob_client.credential.account_key,
                                        permission=BlobSasPermissions(read=True),
                                        expiry=datetime.utcnow() + timedelta(hours=1)
                                    )
        sas_url = blob_client.url + "?" + sas_token
        results_list.append({"buy_now_link" : sas_url,"price_of_the_product": result["description"], "product_image_url": sas_url})
    return results_list

Lets ingest some Product Images to the Azure Search. Here we are basically the idea is we have folder called images having all the product images stored. We are basically creating a container and uploading all the images from the folder to the specific container.

EMBEDDINGS_DIR = "embeddings"
os.makedirs(EMBEDDINGS_DIR, exist_ok=True)
image_directory = os.path.join('images')
embedding_directory = os.path.join('embeddings')
output_json_file = os.path.join(embedding_directory, 'output.jsonl')

for root, dirs, files in os.walk(image_directory):
    for file in files:
        local_file_path = os.path.join(root, file)
        blob_name = os.path.relpath(local_file_path, image_directory)
        with open(local_file_path, "rb") as data:
            blob_client.upload_blob(data, overwrite=True)

Next we will create the embedding of the product images and store the same locally in the embedding directory. Point to note is that we have used only 2 metadata id and description. You can basically extend to many more metadata like price, buy now link etc.

with open(output_json_file, 'w') as outfile:
    for idx, image_path in enumerate(os.listdir(image_directory)):
        if image_path:
            try:
                vector = image_embedding(os.path.join(image_directory, image_path))
            except Exception as e:
                print(f"Error processing image at index {idx}: {e}")
                vector = None
            
            filename, _ = os.path.splitext(os.path.basename(image_path))
            result = {
                "id": f'{idx}',
                "image_vector": vector,
                "description": filename
            }

            outfile.write(json.dumps(result))
            outfile.write('\n')
            outfile.flush()

print(f"Results are saved to {output_json_file}")

Now since have created the local embedding file , we can upload the same into a Azure Search. Before that lets create an index .

from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SimpleField,
    SearchField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SearchIndex
)
credential = AzureKeyCredential(acs_key)
# Create a search index 
index_client = SearchIndexClient(endpoint=acs_endpoint, credential=credential)  
fields = [  
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),  
    SearchField(name="description", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True),  
    SearchField(
        name="image_vector",  
        hidden=True,
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single), 
        searchable=True,
        vector_search_dimensions=1024,  
        vector_search_profile_name="myHnswProfile"
    ),  
]  
  
# Configure the vector search configuration  
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(  
            name="myHnsw"
        )
    ],  
   profiles=[  
        VectorSearchProfile(  
            name="myHnswProfile",  
            algorithm_configuration_name="myHnsw",
        )
    ],  
)  
  
# Create the search index with the vector search configuration  
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)  
result = index_client.create_or_update_index(index)  
print(f"{result.name} created")

Once you have created the index , you can upload the locally stored index file.

from azure.search.documents import SearchClient
import json

data = []
with open(output_json_file, 'r') as file:
    for line in file:
        # Remove leading/trailing whitespace and parse JSON
        json_data = json.loads(line.strip())
        data.append(json_data)

search_client = SearchClient(endpoint=acs_endpoint, index_name=index_name, credential=credential)
results = search_client.upload_documents(data)
for result in results:
    print(f'Indexed {result.key} with status code {result.status_code}')

Congratulations you have finally ready to implement your Agent using OpenAI

Lets create tool called image search which will be used by the Agent

from typing import Optional
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.tools import BaseTool
from util import prompt_search

class ImageSearchResults(BaseTool):
    """Tool that queries the Fashion Image Search API and gets back json."""

    name: str = "image_search_results_json"
    description: str = (
        "A wrapper around Image Search. "
        "Useful for when you need search fashion images related to cloth , shoe etc"
        "Input should be a search query. Output is a JSON array of the query results"
    )
    num_results: int = 4

    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool."""
        return str(prompt_search(prompt = query, topn=self.num_results))

Here we will be using Langchain to implement our Fashion Agent called Luca

from langchain_core.prompts.chat import (
    BaseMessagePromptTemplate,
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    SystemMessagePromptTemplate,
    PromptTemplate,
)
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.runnables import Runnable, RunnablePassthrough
from langchain_community.tools.convert_to_openai import format_tool_to_openai_function
from langchain_core.utils.function_calling import convert_to_openai_function
from langchain.agents.output_parsers.openai_functions import (
    OpenAIFunctionsAgentOutputParser,
)
from langchain.agents.format_scratchpad.openai_functions import (
    format_to_openai_function_messages,
)
from langchain.agents import AgentExecutor
from langchain_openai import AzureChatOpenAI
from langchain_core.runnables import RunnableConfig

from custom_tool import ImageSearchResults
import openai

Lets initialize our LLM

from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
    api_key=os.environ["AZURE_OPENAI_KEY"],
    api_version="2023-12-01-preview",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    model="gpt-4-turbo",
)
llm(messages=[HumanMessage(content = "Hi")])
prefix="""You are Luca a helpful Fashion Agent who help people navigating and buying products online

Note:

\\ Show Prices always in INR
\\ Always try user to buy from the buy now link provided"""
suffix = ""

Lets attach tool we created, here we are using LCEL to implement out agent

tools = [ImageSearchResults(num_results=5)]
llm_with_tools = llm.bind(
    functions=[convert_to_openai_function(t) for t in tools]
)
messages = [
    SystemMessage(content=prefix),
    HumanMessagePromptTemplate.from_template("{input}"),
    AIMessage(content=suffix),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
]
input_variables = ["input", "agent_scratchpad"]
prompt = ChatPromptTemplate(input_variables=input_variables, messages=messages)
agent = (
    RunnablePassthrough.assign(
        agent_scratchpad=lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        )
    )
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

Congratulation !! You are ready to test your Agent

response = agent_executor.invoke(
    {
        "input": "I am looking for some summer dress as I am travelling to new Delhi",
        "chat_history": [
            HumanMessage(content="hi! my name is bob"),
            AIMessage(content="Hello Bob! How can I assist you today?"),
        ],
    }
)