AI Models

All AI inference runs on Cloudflare Workers AI via the AI binding. Models are accessed through the @cf/ namespace. The product service routes through the Cloudflare AI Gateway (crow-ai-gateway) for caching and observability.

Model Inventory

Model	Use Case	Service	Notes
`@cf/meta/llama-3.1-8b-instruct`	Product text extraction	core-product-service	Fast structured extraction from crawled HTML
`@cf/unum/uform-gen2-qwen-500m`	Product image captioning	core-product-service	Lightweight vision-language model
`@cf/baai/bge-base-en-v1.5`	Product embeddings (768 dims)	core-product-service	Vectorize index: `crow-products`
`@cf/meta/llama-3.3-70b-instruct-fp8-fast`	Chat multi-agent system	core-chat-service, bff-chat-service	Tool-calling agentic loop
`@cf/google/gemma-3-12b-it`	CCTV frame analysis	cctv-ingest-service	Multimodal vision analysis
`@cf/meta/llama-3.3-70b-instruct-fp8-fast`	CCTV interaction synthesis	core-interaction-service	Matches observations to products
`@cf/baai/bge-m3`	QnA embeddings (1024 dims)	bff-qna-service	Vectorize index: `crow-qna-index`
`@cf/meta/llama-3.3-70b-instruct-fp8-fast`	Pattern analysis	core-pattern-service	Behavioral pattern detection

Product Extraction -- `@cf/meta/llama-3.1-8b-instruct`

Justification: The 8B parameter model provides sufficient reasoning for structured extraction tasks while keeping inference latency low. It runs within the Workers AI free tier for development and is cost-effective at scale. Set via AI_MODEL env var. Routed through AI Gateway ID crow-ai-gateway.

Product Image Vision -- `@cf/unum/uform-gen2-qwen-500m`

Justification: A lightweight 500M vision-language model optimized for image captioning. Fast inference makes it suitable for batch processing hundreds of product images per crawl job. Generates descriptions of features, colors, materials, and style that augment vector embeddings.

Product Embeddings -- `@cf/baai/bge-base-en-v1.5` (768 dimensions)

Justification: BGE-base is a high-quality English embedding model that balances accuracy and speed. The 768-dimension output is well-supported by Cloudflare Vectorize. Embeddings combine product title, description, and AI-generated image descriptions.

Chat MAS -- `@cf/meta/llama-3.3-70b-instruct-fp8-fast`

Justification: The 70B model (FP8 quantized for speed) provides strong reasoning, tool-calling, and multi-turn conversation capabilities. The -fast variant prioritizes throughput.

Tools: search_products, get_interactions, get_patterns. Agentic loop runs max 5 iterations.

CCTV Vision -- `@cf/google/gemma-3-12b-it`

Justification: Gemma 3 12B is a multimodal model capable of vision+text reasoning. Its 12B size provides strong scene understanding while remaining fast enough for near-real-time frame analysis. Frames are sent as base64-encoded images.

Analysis focus: customer behavior, people count, product interactions, movement patterns, dwell zones.

CCTV Interaction Synthesis -- `@cf/meta/llama-3.3-70b-instruct-fp8-fast`

Justification: Complex multi-source synthesis combining vision analysis outputs with product catalog data requires the 70B model's reasoning capabilities.

QnA Embeddings -- `@cf/baai/bge-m3` (1024 dimensions)

Justification: BGE-M3 supports multilingual content and produces 1024-dimensional embeddings with strong cross-lingual retrieval performance. Higher dimensionality than product embeddings for better discrimination in knowledge base content. Cron refresh every 12 hours.

Vectorize Indexes

Index Name	Dimensions	Model	Service
`crow-products` / `crow-products-dev`	768	`bge-base-en-v1.5`	product
`crow-qna-index` / `crow-qna-index-dev`	1024	`bge-m3`	bff-qna

AI Gateway Configuration

Product service calls route through crow-ai-gateway:

Caching -- Reduces redundant calls for identical prompts
Analytics -- Tracks usage, latency, and costs per model

Model Inventory​

Product Extraction -- @cf/meta/llama-3.1-8b-instruct​

Product Image Vision -- @cf/unum/uform-gen2-qwen-500m​

Product Embeddings -- @cf/baai/bge-base-en-v1.5 (768 dimensions)​

Chat MAS -- @cf/meta/llama-3.3-70b-instruct-fp8-fast​

CCTV Vision -- @cf/google/gemma-3-12b-it​

CCTV Interaction Synthesis -- @cf/meta/llama-3.3-70b-instruct-fp8-fast​

QnA Embeddings -- @cf/baai/bge-m3 (1024 dimensions)​

Vectorize Indexes​

AI Gateway Configuration​