Skip to main content

Social Media Component

Overview

The Social Media component is CROW's Phase 2 offering, deployed after the Web component but before CCTV. It performs intelligent web searching and extraction of social media data, analyzing company social media posts, comments, and engagement to track public sentiment and brand perception.

Why Social is Phase 2

The Social component leverages the core infrastructure built for the Web component:

  • Uses same Interaction and Pattern services
  • Leverages established queue architecture
  • Builds on existing Organization and Product context
  • Requires less infrastructure than CCTV
  • Provides valuable insights while CCTV is developed

Architecture

Two-Part Approach

The Social component operates through two distinct methods:

Process:

  1. Query Generation: Gemini AI generates search queries about the organization
  2. Web Search Execution: Queries executed via search APIs
  3. Link Extraction: Search results parsed for unique URLs
  4. Duplicate Detection: Already-seen links filtered out
  5. Web Extraction: New links scraped via Browser Rendering
  6. Queue Dispatch: Extracted content sent to Interaction Queue

Query Examples:

  • "[Company Name] customer reviews"
  • "[Product Name] social media mentions"
  • "[Brand] latest news"
  • "[Company] twitter discussions"

Technology:

  • Gemini AI via Vercel AI SDK for query generation
  • Tavily for AI-optimized web search and content extraction
  • Cloudflare Browser Rendering for extraction

Multi-Agent Search System

The AI-generated web search uses a multi-agent orchestration system:

AgentRole
Search PlannerGenerates and refines search queries for brand/products
Search ExecutorRuns queries via web search provider, collects candidate links
Content ExtractorRetrieves and parses discovered content
StandardizerMaps fields into required schema, validates attributes
DeduplicationRemoves duplicate content across sources

The Search Planner applies advanced search techniques including Boolean operators, site-specific targeting, and time-based filtering. Using multiple search engines improves coverage and reduces the risk of missing relevant content.

Process:

  1. Configuration: User provides social media account links during setup
  2. Direct Scraping: Cron job triggers scraping of provided accounts
  3. Content Extraction: Posts, comments, engagement data extracted
  4. Platform APIs: Use official APIs where available
  5. Web Extraction: Fallback to Browser Rendering when needed
  6. Queue Dispatch: Content sent to Interaction Queue

Supported Platforms:

  • Twitter/X
  • LinkedIn
  • Facebook
  • Instagram
  • Reddit
  • TikTok
  • YouTube

Session Management

Session Definition

Social Media Session: One cron job trigger execution = one session.

Each scheduled cron run processes and analyzes content as a complete session, then sends it to the Interaction Service.

Cron Schedule

Configurable scheduling options:

  • Daily: Once per day (default)
  • Hourly: Every hour (high-volume brands)
  • Custom: Specific times or intervals

Session Processing

  1. Cron Triggers: Scheduled execution starts
  2. Both Paths Execute: AI search AND direct scraping run
  3. Content Aggregation: All extracted content combined
  4. Session Creation: Results compiled into session
  5. Queue Dispatch: Complete session sent to Interaction Queue
  6. Interaction Processing: Standard pipeline processes session

Social Worker

Service Details

Service Name: social-worker

Technology: TypeScript on Cloudflare Workers

Deployment: Edge-deployed, cron-triggered

Responsibilities

  • Cron-based Execution: Runs on schedule
  • AI Query Generation: Creates search queries using Gemini
  • Web Search: Executes queries and processes results
  • Link Management: Tracks seen/unseen links in D1
  • Direct Scraping: Fetches from provided social media links
  • Web Extraction: Uses Browser Rendering for content
  • Duplicate Detection: Prevents reprocessing same content
  • Queue Dispatch: Sends sessions to processing

Technology Stack

  • TypeScript on Cloudflare Workers
  • Gemini AI via Vercel AI SDK
  • Cloudflare Browser Rendering
  • Cloudflare AI Gateway for LLM routing
  • D1 for link tracking
  • Cron Triggers for scheduling

Configuration Setup

During signup, users configuring the Social component provide:

Users provide URLs to their social media accounts:

Twitter/X: https://twitter.com/company
LinkedIn: https://linkedin.com/company/company-name
Facebook: https://facebook.com/company-page
Instagram: https://instagram.com/company
Reddit: https://reddit.com/r/company

2. API Key Generation

A unique API key is generated specifically for social data:

  • Scoped to organization
  • Used for authenticated requests
  • Managed from dashboard
  • Revokable at any time

3. Schedule Configuration

Users can configure:

  • Scraping frequency (daily, hourly, custom)
  • Specific time windows
  • Platform priorities
  • Content type preferences

Data Extraction

Extracted Data Points

For Each Social Post:

  • Post text/caption
  • Author information
  • Post timestamp
  • Engagement metrics (likes, shares, comments)
  • Media URLs (images, videos)
  • Hashtags and mentions
  • Post URL/ID

For Comments:

  • Comment text
  • Author information
  • Comment timestamp
  • Engagement metrics
  • Reply relationships
  • Sentiment indicators

For Engagement:

  • Total reach/impressions
  • Engagement rate
  • Audience demographics
  • Trending topics
  • Sentiment distribution

Content Types

  • Text posts and updates
  • Images and videos
  • Stories and ephemeral content (where accessible)
  • Comments and replies
  • Shares and retweets
  • Reactions and likes
  • Reviews and ratings

Duplicate Detection

D1 Database Schema:

CREATE TABLE seen_links (
url TEXT PRIMARY KEY,
org_id TEXT NOT NULL,
first_seen DATETIME DEFAULT CURRENT_TIMESTAMP,
last_checked DATETIME,
content_hash TEXT,
source TEXT
);

Detection Strategy

  1. URL Normalization: Standardize URLs before checking
  2. Hash Comparison: Content hash for duplicate content detection
  3. Timestamp Tracking: When link was first seen
  4. Source Tracking: Origin of the link (search vs direct)
  5. Efficient Queries: Indexed lookups for fast checking

Update Detection

For direct social media scraping:

  • New posts always processed
  • Updated posts (edits) re-processed
  • Deleted posts marked accordingly
  • Engagement updates tracked

Processing Flow

Complete Flow Diagram

Analytics and Insights

Generated Insights

The Social component generates insights about:

Brand Sentiment:

  • Overall sentiment trends
  • Platform-specific sentiment
  • Sentiment by product/topic
  • Sentiment change over time

Engagement Patterns:

  • Best performing content types
  • Optimal posting times
  • Audience engagement levels
  • Viral content identification

Topic Analysis:

  • Trending topics related to brand
  • Customer pain points
  • Feature requests
  • Competitive mentions

Audience Insights:

  • Demographics and interests
  • Geographic distribution
  • Influence and reach
  • Community growth

Pattern Recognition

Pattern Service identifies long-term trends:

  • Seasonal sentiment changes
  • Campaign effectiveness
  • Crisis detection and response
  • Brand perception evolution
  • Competitive positioning shifts

Integration with Other Components

Social insights combine with Web and CCTV data to provide comprehensive customer understanding:

Web + Social (Current & Planned):

  • Correlate social mentions with website traffic
  • Identify social-driven conversions
  • Track referral effectiveness
  • Measure social ROI
  • Pattern Service identifies cross-component patterns automatically

Social + CCTV (Future Capability):

  • Connect online buzz to in-store traffic
  • Measure campaign impact on foot traffic
  • Identify social-to-physical customer journeys
  • Requires CCTV Phase 3 deployment

Complete View (Vision):

  • Omnichannel customer understanding across all touchpoints
  • Full funnel attribution with multi-touch analytics
  • Cross-channel pattern recognition via Pattern Service
  • Holistic brand health monitoring with unified insights

The Pattern Service automatically discovers correlations between components as data accumulates. As more components are deployed, cross-component insights become richer and more actionable.

Rate Limiting and Compliance

Responsible Scraping

  • Robots.txt Compliance: Respects robots.txt directives
  • Rate Limiting: Configurable delays between requests
  • Domain Limits: Per-domain request limits
  • Backoff Strategy: Automatic backoff on errors
  • User Agent: Proper identification as CROW scraper

API Usage

For platforms with APIs:

  • Official APIs Preferred: Use APIs when available
  • Rate Limits: Respect platform rate limits
  • Token Management: Secure token storage
  • Quota Monitoring: Track API usage

Terms of Service

  • Compliance with platform ToS
  • Ethical scraping practices
  • Public data only
  • No personal data collection
  • Transparent usage

Performance Characteristics

Execution Time

  • Single Cron Run: 5-30 minutes depending on volume
  • Query Generation: < 10 seconds
  • Per Link Scraping: 2-5 seconds
  • Session Processing: < 10 minutes

Scalability

  • Links per Session: Up to 1000 links
  • Platforms: Unlimited platforms
  • Organizations: Scales with Cloudflare
  • Concurrent Jobs: Multiple orgs processed in parallel

Reliability

  • Retry Logic: Automatic retries on failure
  • Error Handling: Graceful degradation
  • Partial Results: Partial sessions saved
  • Monitoring: Execution tracking and alerts

Troubleshooting

Common Issues

No New Content:

  • Verify social media links are correct
  • Check if accounts are public
  • Review cron schedule configuration
  • Confirm API keys are valid

Duplicate Content:

  • Check duplicate detection is enabled
  • Review link normalization logic
  • Verify content hashing

Rate Limiting:

  • Adjust scraping frequency
  • Increase delays between requests
  • Check platform-specific limits
  • Consider API upgrades

Extraction Failures:

  • Verify Browser Rendering quota
  • Check website structure changes
  • Review extraction selectors
  • Test individual URLs

Future Enhancements

Planned Features

  • Real-time Social Listening: WebSocket-based live monitoring
  • Advanced Sentiment Analysis: Emotion detection, sarcasm recognition
  • Influencer Identification: Key influencers and brand advocates
  • Competitor Analysis: Automated competitive intelligence
  • Crisis Detection: Early warning system for PR issues
  • Response Suggestions: AI-generated response recommendations
  • Automated Reporting: Scheduled social media reports

Under Consideration

  • Video Content Analysis: Analyze video posts and stories
  • Image Recognition: Visual brand mentions
  • Hashtag Campaigns: Campaign tracking and analytics
  • Social Commerce: Purchase intent and conversion tracking