Social Media Component

Overview

The Social Media component is CROW's Phase 2 offering, deployed after the Web component but before CCTV. It performs intelligent web searching and extraction of social media data, analyzing company social media posts, comments, and engagement to track public sentiment and brand perception.

The Social component leverages the core infrastructure built for the Web component:

Uses same Interaction and Pattern services
Leverages established queue architecture
Builds on existing Organization and Product context
Requires less infrastructure than CCTV
Provides valuable insights while CCTV is developed

Architecture

Two-Part Approach

The Social component operates through two distinct methods:

1. AI-Generated Web Search

Process:

Query Generation: Gemini AI generates search queries about the organization
Web Search Execution: Queries executed via search APIs
Link Extraction: Search results parsed for unique URLs
Duplicate Detection: Already-seen links filtered out
Web Extraction: New links scraped via Browser Rendering
Queue Dispatch: Extracted content sent to Interaction Queue

Query Examples:

"[Company Name] customer reviews"
"[Product Name] social media mentions"
"[Brand] latest news"
"[Company] twitter discussions"

Technology:

Gemini AI via Vercel AI SDK for query generation
Tavily for AI-optimized web search and content extraction
Cloudflare Browser Rendering for extraction

Multi-Agent Search System

The AI-generated web search uses a multi-agent orchestration system:

Agent	Role
Search Planner	Generates and refines search queries for brand/products
Search Executor	Runs queries via web search provider, collects candidate links
Content Extractor	Retrieves and parses discovered content
Standardizer	Maps fields into required schema, validates attributes
Deduplication	Removes duplicate content across sources

The Search Planner applies advanced search techniques including Boolean operators, site-specific targeting, and time-based filtering. Using multiple search engines improves coverage and reduces the risk of missing relevant content.

Process:

Configuration: User provides social media account links during setup
Direct Scraping: Cron job triggers scraping of provided accounts
Content Extraction: Posts, comments, engagement data extracted
Platform APIs: Use official APIs where available
Web Extraction: Fallback to Browser Rendering when needed
Queue Dispatch: Content sent to Interaction Queue

Supported Platforms:

Twitter/X
LinkedIn
Facebook
Instagram
Reddit
TikTok
YouTube

Session Management

Session Definition

Social Media Session: One cron job trigger execution = one session.

Each scheduled cron run processes and analyzes content as a complete session, then sends it to the Interaction Service.

Cron Schedule

Configurable scheduling options:

Daily: Once per day (default)
Hourly: Every hour (high-volume brands)
Custom: Specific times or intervals

Session Processing

Cron Triggers: Scheduled execution starts
Both Paths Execute: AI search AND direct scraping run
Content Aggregation: All extracted content combined
Session Creation: Results compiled into session
Queue Dispatch: Complete session sent to Interaction Queue
Interaction Processing: Standard pipeline processes session

Service Details

Service Name: social-worker

Technology: TypeScript on Cloudflare Workers

Deployment: Edge-deployed, cron-triggered

Responsibilities

Cron-based Execution: Runs on schedule
AI Query Generation: Creates search queries using Gemini
Web Search: Executes queries and processes results
Link Management: Tracks seen/unseen links in D1
Direct Scraping: Fetches from provided social media links
Web Extraction: Uses Browser Rendering for content
Duplicate Detection: Prevents reprocessing same content
Queue Dispatch: Sends sessions to processing

Technology Stack

TypeScript on Cloudflare Workers
Gemini AI via Vercel AI SDK
Cloudflare Browser Rendering
Cloudflare AI Gateway for LLM routing
D1 for link tracking
Cron Triggers for scheduling

Configuration Setup

During signup, users configuring the Social component provide:

Users provide URLs to their social media accounts:

Twitter/X: https://twitter.com/company
LinkedIn: https://linkedin.com/company/company-name
Facebook: https://facebook.com/company-page
Instagram: https://instagram.com/company
Reddit: https://reddit.com/r/company

2. API Key Generation

A unique API key is generated specifically for social data:

Scoped to organization
Used for authenticated requests
Managed from dashboard
Revokable at any time

3. Schedule Configuration

Users can configure:

Scraping frequency (daily, hourly, custom)
Specific time windows
Platform priorities
Content type preferences

Data Extraction

Extracted Data Points

For Each Social Post:

Post text/caption
Author information
Post timestamp
Engagement metrics (likes, shares, comments)
Media URLs (images, videos)
Hashtags and mentions
Post URL/ID

For Comments:

Comment text
Author information
Comment timestamp
Engagement metrics
Reply relationships
Sentiment indicators

For Engagement:

Total reach/impressions
Engagement rate
Audience demographics
Trending topics
Sentiment distribution

Content Types

Text posts and updates
Images and videos
Stories and ephemeral content (where accessible)
Comments and replies
Shares and retweets
Reactions and likes
Reviews and ratings

Duplicate Detection

Link Tracking

D1 Database Schema:

CREATE TABLE seen_links (
    url TEXT PRIMARY KEY,
    org_id TEXT NOT NULL,
    first_seen DATETIME DEFAULT CURRENT_TIMESTAMP,
    last_checked DATETIME,
    content_hash TEXT,
    source TEXT
);

Detection Strategy

URL Normalization: Standardize URLs before checking
Hash Comparison: Content hash for duplicate content detection
Timestamp Tracking: When link was first seen
Source Tracking: Origin of the link (search vs direct)
Efficient Queries: Indexed lookups for fast checking

Update Detection

For direct social media scraping:

New posts always processed
Updated posts (edits) re-processed
Deleted posts marked accordingly
Engagement updates tracked

Processing Flow

Complete Flow Diagram

Analytics and Insights

Generated Insights

The Social component generates insights about:

Brand Sentiment:

Overall sentiment trends
Platform-specific sentiment
Sentiment by product/topic
Sentiment change over time

Engagement Patterns:

Best performing content types
Optimal posting times
Audience engagement levels
Viral content identification

Topic Analysis:

Trending topics related to brand
Customer pain points
Feature requests
Competitive mentions

Audience Insights:

Demographics and interests
Geographic distribution
Influence and reach
Community growth

Pattern Recognition

Pattern Service identifies long-term trends:

Seasonal sentiment changes
Campaign effectiveness
Crisis detection and response
Brand perception evolution
Competitive positioning shifts

Integration with Other Components

Social insights combine with Web and CCTV data to provide comprehensive customer understanding:

Web + Social (Current & Planned):

Correlate social mentions with website traffic
Identify social-driven conversions
Track referral effectiveness
Measure social ROI
Pattern Service identifies cross-component patterns automatically

Social + CCTV (Future Capability):

Connect online buzz to in-store traffic
Measure campaign impact on foot traffic
Identify social-to-physical customer journeys
Requires CCTV Phase 3 deployment

Complete View (Vision):

Omnichannel customer understanding across all touchpoints
Full funnel attribution with multi-touch analytics
Cross-channel pattern recognition via Pattern Service
Holistic brand health monitoring with unified insights

The Pattern Service automatically discovers correlations between components as data accumulates. As more components are deployed, cross-component insights become richer and more actionable.

Rate Limiting and Compliance

Responsible Scraping

Robots.txt Compliance: Respects robots.txt directives
Rate Limiting: Configurable delays between requests
Domain Limits: Per-domain request limits
Backoff Strategy: Automatic backoff on errors
User Agent: Proper identification as CROW scraper

API Usage

For platforms with APIs:

Official APIs Preferred: Use APIs when available
Rate Limits: Respect platform rate limits
Token Management: Secure token storage
Quota Monitoring: Track API usage

Terms of Service

Compliance with platform ToS
Ethical scraping practices
Public data only
No personal data collection
Transparent usage

Performance Characteristics

Execution Time

Single Cron Run: 5-30 minutes depending on volume
Query Generation: < 10 seconds
Per Link Scraping: 2-5 seconds
Session Processing: < 10 minutes

Scalability

Links per Session: Up to 1000 links
Platforms: Unlimited platforms
Organizations: Scales with Cloudflare
Concurrent Jobs: Multiple orgs processed in parallel

Reliability

Retry Logic: Automatic retries on failure
Error Handling: Graceful degradation
Partial Results: Partial sessions saved
Monitoring: Execution tracking and alerts

Troubleshooting

Common Issues

No New Content:

Verify social media links are correct
Check if accounts are public
Review cron schedule configuration
Confirm API keys are valid

Duplicate Content:

Check duplicate detection is enabled
Review link normalization logic
Verify content hashing

Rate Limiting:

Adjust scraping frequency
Increase delays between requests
Check platform-specific limits
Consider API upgrades

Extraction Failures:

Verify Browser Rendering quota
Check website structure changes
Review extraction selectors
Test individual URLs

Future Enhancements

Planned Features

Real-time Social Listening: WebSocket-based live monitoring
Advanced Sentiment Analysis: Emotion detection, sarcasm recognition
Influencer Identification: Key influencers and brand advocates
Competitor Analysis: Automated competitive intelligence
Crisis Detection: Early warning system for PR issues
Response Suggestions: AI-generated response recommendations
Automated Reporting: Scheduled social media reports

Under Consideration

Video Content Analysis: Analyze video posts and stories
Image Recognition: Visual brand mentions
Hashtag Campaigns: Campaign tracking and analytics
Social Commerce: Purchase intent and conversion tracking

System Architecture - Social component architecture
User Signup Flow - Social configuration during signup
Dashboard Features - Social settings and management
Website Interaction Tracking - Web component for comparison
CCTV Component - CCTV component for comparison

Overview​

Why Social is Phase 2​

Architecture​

Two-Part Approach​

1. AI-Generated Web Search​

Multi-Agent Search System​

2. Social Media Link Scraping​

Session Management​

Session Definition​

Cron Schedule​

Session Processing​

Social Worker​

Service Details​

Responsibilities​

Technology Stack​

Configuration Setup​

1. Social Media Links​

2. API Key Generation​

3. Schedule Configuration​

Data Extraction​

Extracted Data Points​

Content Types​

Duplicate Detection​

Link Tracking​

Detection Strategy​

Update Detection​

Processing Flow​

Complete Flow Diagram​

Analytics and Insights​

Generated Insights​

Pattern Recognition​

Integration with Other Components​

Rate Limiting and Compliance​

Responsible Scraping​

API Usage​

Terms of Service​

Performance Characteristics​

Execution Time​

Scalability​

Reliability​

Troubleshooting​

Common Issues​

Future Enhancements​

Planned Features​

Under Consideration​

Related Documentation​