AI Models
All AI inference runs on Cloudflare Workers AI via the AI binding. Models are accessed through the @cf/ namespace. The product service routes through the Cloudflare AI Gateway (crow-ai-gateway) for caching and observability.
Model Inventory
| Model | Use Case | Service | Notes |
|---|---|---|---|
@cf/meta/llama-3.1-8b-instruct | Product text extraction | core-product-service | Fast structured extraction from crawled HTML |
@cf/unum/uform-gen2-qwen-500m | Product image captioning | core-product-service | Lightweight vision-language model |
@cf/baai/bge-base-en-v1.5 | Product embeddings (768 dims) | core-product-service | Vectorize index: crow-products |
@cf/meta/llama-3.3-70b-instruct-fp8-fast | Chat multi-agent system | core-chat-service, bff-chat-service | Tool-calling agentic loop |
@cf/google/gemma-3-12b-it | CCTV frame analysis | cctv-ingest-service | Multimodal vision analysis |
@cf/meta/llama-3.3-70b-instruct-fp8-fast | CCTV interaction synthesis | core-interaction-service | Matches observations to products |
@cf/baai/bge-m3 | QnA embeddings (1024 dims) | bff-qna-service | Vectorize index: crow-qna-index |
@cf/meta/llama-3.3-70b-instruct-fp8-fast | Pattern analysis | core-pattern-service | Behavioral pattern detection |
Product Extraction -- @cf/meta/llama-3.1-8b-instruct
Justification: The 8B parameter model provides sufficient reasoning for structured extraction tasks while keeping inference latency low. It runs within the Workers AI free tier for development and is cost-effective at scale. Set via AI_MODEL env var. Routed through AI Gateway ID crow-ai-gateway.
Product Image Vision -- @cf/unum/uform-gen2-qwen-500m
Justification: A lightweight 500M vision-language model optimized for image captioning. Fast inference makes it suitable for batch processing hundreds of product images per crawl job. Generates descriptions of features, colors, materials, and style that augment vector embeddings.
Product Embeddings -- @cf/baai/bge-base-en-v1.5 (768 dimensions)
Justification: BGE-base is a high-quality English embedding model that balances accuracy and speed. The 768-dimension output is well-supported by Cloudflare Vectorize. Embeddings combine product title, description, and AI-generated image descriptions.
Chat MAS -- @cf/meta/llama-3.3-70b-instruct-fp8-fast
Justification: The 70B model (FP8 quantized for speed) provides strong reasoning, tool-calling, and multi-turn conversation capabilities. The -fast variant prioritizes throughput.
Tools: search_products, get_interactions, get_patterns. Agentic loop runs max 5 iterations.
CCTV Vision -- @cf/google/gemma-3-12b-it
Justification: Gemma 3 12B is a multimodal model capable of vision+text reasoning. Its 12B size provides strong scene understanding while remaining fast enough for near-real-time frame analysis. Frames are sent as base64-encoded images.
Analysis focus: customer behavior, people count, product interactions, movement patterns, dwell zones.
CCTV Interaction Synthesis -- @cf/meta/llama-3.3-70b-instruct-fp8-fast
Justification: Complex multi-source synthesis combining vision analysis outputs with product catalog data requires the 70B model's reasoning capabilities.
QnA Embeddings -- @cf/baai/bge-m3 (1024 dimensions)
Justification: BGE-M3 supports multilingual content and produces 1024-dimensional embeddings with strong cross-lingual retrieval performance. Higher dimensionality than product embeddings for better discrimination in knowledge base content. Cron refresh every 12 hours.
Vectorize Indexes
| Index Name | Dimensions | Model | Service |
|---|---|---|---|
crow-products / crow-products-dev | 768 | bge-base-en-v1.5 | product |
crow-qna-index / crow-qna-index-dev | 1024 | bge-m3 | bff-qna |
AI Gateway Configuration
Product service calls route through crow-ai-gateway:
- Caching -- Reduces redundant calls for identical prompts
- Analytics -- Tracks usage, latency, and costs per model