Skip to main content

AI Models

All AI inference runs on Cloudflare Workers AI via the AI binding. Models are accessed through the @cf/ namespace. The product service routes through the Cloudflare AI Gateway (crow-ai-gateway) for caching and observability.

Model Inventory

ModelUse CaseServiceNotes
@cf/meta/llama-3.1-8b-instructProduct text extractioncore-product-serviceFast structured extraction from crawled HTML
@cf/unum/uform-gen2-qwen-500mProduct image captioningcore-product-serviceLightweight vision-language model
@cf/baai/bge-base-en-v1.5Product embeddings (768 dims)core-product-serviceVectorize index: crow-products
@cf/meta/llama-3.3-70b-instruct-fp8-fastChat multi-agent systemcore-chat-service, bff-chat-serviceTool-calling agentic loop
@cf/google/gemma-3-12b-itCCTV frame analysiscctv-ingest-serviceMultimodal vision analysis
@cf/meta/llama-3.3-70b-instruct-fp8-fastCCTV interaction synthesiscore-interaction-serviceMatches observations to products
@cf/baai/bge-m3QnA embeddings (1024 dims)bff-qna-serviceVectorize index: crow-qna-index
@cf/meta/llama-3.3-70b-instruct-fp8-fastPattern analysiscore-pattern-serviceBehavioral pattern detection

Product Extraction -- @cf/meta/llama-3.1-8b-instruct

Justification: The 8B parameter model provides sufficient reasoning for structured extraction tasks while keeping inference latency low. It runs within the Workers AI free tier for development and is cost-effective at scale. Set via AI_MODEL env var. Routed through AI Gateway ID crow-ai-gateway.

Product Image Vision -- @cf/unum/uform-gen2-qwen-500m

Justification: A lightweight 500M vision-language model optimized for image captioning. Fast inference makes it suitable for batch processing hundreds of product images per crawl job. Generates descriptions of features, colors, materials, and style that augment vector embeddings.

Product Embeddings -- @cf/baai/bge-base-en-v1.5 (768 dimensions)

Justification: BGE-base is a high-quality English embedding model that balances accuracy and speed. The 768-dimension output is well-supported by Cloudflare Vectorize. Embeddings combine product title, description, and AI-generated image descriptions.

Chat MAS -- @cf/meta/llama-3.3-70b-instruct-fp8-fast

Justification: The 70B model (FP8 quantized for speed) provides strong reasoning, tool-calling, and multi-turn conversation capabilities. The -fast variant prioritizes throughput.

Tools: search_products, get_interactions, get_patterns. Agentic loop runs max 5 iterations.

CCTV Vision -- @cf/google/gemma-3-12b-it

Justification: Gemma 3 12B is a multimodal model capable of vision+text reasoning. Its 12B size provides strong scene understanding while remaining fast enough for near-real-time frame analysis. Frames are sent as base64-encoded images.

Analysis focus: customer behavior, people count, product interactions, movement patterns, dwell zones.

CCTV Interaction Synthesis -- @cf/meta/llama-3.3-70b-instruct-fp8-fast

Justification: Complex multi-source synthesis combining vision analysis outputs with product catalog data requires the 70B model's reasoning capabilities.

QnA Embeddings -- @cf/baai/bge-m3 (1024 dimensions)

Justification: BGE-M3 supports multilingual content and produces 1024-dimensional embeddings with strong cross-lingual retrieval performance. Higher dimensionality than product embeddings for better discrimination in knowledge base content. Cron refresh every 12 hours.

Vectorize Indexes

Index NameDimensionsModelService
crow-products / crow-products-dev768bge-base-en-v1.5product
crow-qna-index / crow-qna-index-dev1024bge-m3bff-qna

AI Gateway Configuration

Product service calls route through crow-ai-gateway:

  • Caching -- Reduces redundant calls for identical prompts
  • Analytics -- Tracks usage, latency, and costs per model