Social Media Component
Overview
The Social Media component is CROW's Phase 2 offering, deployed after the Web component but before CCTV. It performs intelligent web searching and extraction of social media data, analyzing company social media posts, comments, and engagement to track public sentiment and brand perception.
Why Social is Phase 2
The Social component leverages the core infrastructure built for the Web component:
- Uses same Interaction and Pattern services
- Leverages established queue architecture
- Builds on existing Organization and Product context
- Requires less infrastructure than CCTV
- Provides valuable insights while CCTV is developed
Architecture
Two-Part Approach
The Social component operates through two distinct methods:
1. AI-Generated Web Search
Process:
- Query Generation: Gemini AI generates search queries about the organization
- Web Search Execution: Queries executed via search APIs
- Link Extraction: Search results parsed for unique URLs
- Duplicate Detection: Already-seen links filtered out
- Web Extraction: New links scraped via Browser Rendering
- Queue Dispatch: Extracted content sent to Interaction Queue
Query Examples:
- "[Company Name] customer reviews"
- "[Product Name] social media mentions"
- "[Brand] latest news"
- "[Company] twitter discussions"
Technology:
- Gemini AI via Vercel AI SDK for query generation
- Tavily for AI-optimized web search and content extraction
- Cloudflare Browser Rendering for extraction
Multi-Agent Search System
The AI-generated web search uses a multi-agent orchestration system:
| Agent | Role |
|---|---|
| Search Planner | Generates and refines search queries for brand/products |
| Search Executor | Runs queries via web search provider, collects candidate links |
| Content Extractor | Retrieves and parses discovered content |
| Standardizer | Maps fields into required schema, validates attributes |
| Deduplication | Removes duplicate content across sources |
The Search Planner applies advanced search techniques including Boolean operators, site-specific targeting, and time-based filtering. Using multiple search engines improves coverage and reduces the risk of missing relevant content.
2. Social Media Link Scraping
Process:
- Configuration: User provides social media account links during setup
- Direct Scraping: Cron job triggers scraping of provided accounts
- Content Extraction: Posts, comments, engagement data extracted
- Platform APIs: Use official APIs where available
- Web Extraction: Fallback to Browser Rendering when needed
- Queue Dispatch: Content sent to Interaction Queue
Supported Platforms:
- Twitter/X
- TikTok
- YouTube
Session Management
Session Definition
Social Media Session: One cron job trigger execution = one session.
Each scheduled cron run processes and analyzes content as a complete session, then sends it to the Interaction Service.
Cron Schedule
Configurable scheduling options:
- Daily: Once per day (default)
- Hourly: Every hour (high-volume brands)
- Custom: Specific times or intervals
Session Processing
- Cron Triggers: Scheduled execution starts
- Both Paths Execute: AI search AND direct scraping run
- Content Aggregation: All extracted content combined
- Session Creation: Results compiled into session
- Queue Dispatch: Complete session sent to Interaction Queue
- Interaction Processing: Standard pipeline processes session
Social Worker
Service Details
Service Name: social-worker
Technology: TypeScript on Cloudflare Workers
Deployment: Edge-deployed, cron-triggered
Responsibilities
- Cron-based Execution: Runs on schedule
- AI Query Generation: Creates search queries using Gemini
- Web Search: Executes queries and processes results
- Link Management: Tracks seen/unseen links in D1
- Direct Scraping: Fetches from provided social media links
- Web Extraction: Uses Browser Rendering for content
- Duplicate Detection: Prevents reprocessing same content
- Queue Dispatch: Sends sessions to processing
Technology Stack
- TypeScript on Cloudflare Workers
- Gemini AI via Vercel AI SDK
- Cloudflare Browser Rendering
- Cloudflare AI Gateway for LLM routing
- D1 for link tracking
- Cron Triggers for scheduling
Configuration Setup
During signup, users configuring the Social component provide:
1. Social Media Links
Users provide URLs to their social media accounts:
Twitter/X: https://twitter.com/company
LinkedIn: https://linkedin.com/company/company-name
Facebook: https://facebook.com/company-page
Instagram: https://instagram.com/company
Reddit: https://reddit.com/r/company
2. API Key Generation
A unique API key is generated specifically for social data:
- Scoped to organization
- Used for authenticated requests
- Managed from dashboard
- Revokable at any time
3. Schedule Configuration
Users can configure:
- Scraping frequency (daily, hourly, custom)
- Specific time windows
- Platform priorities
- Content type preferences
Data Extraction
Extracted Data Points
For Each Social Post:
- Post text/caption
- Author information
- Post timestamp
- Engagement metrics (likes, shares, comments)
- Media URLs (images, videos)
- Hashtags and mentions
- Post URL/ID
For Comments:
- Comment text
- Author information
- Comment timestamp
- Engagement metrics
- Reply relationships
- Sentiment indicators
For Engagement:
- Total reach/impressions
- Engagement rate
- Audience demographics
- Trending topics
- Sentiment distribution
Content Types
- Text posts and updates
- Images and videos
- Stories and ephemeral content (where accessible)
- Comments and replies
- Shares and retweets
- Reactions and likes
- Reviews and ratings
Duplicate Detection
Link Tracking
D1 Database Schema:
CREATE TABLE seen_links (
url TEXT PRIMARY KEY,
org_id TEXT NOT NULL,
first_seen DATETIME DEFAULT CURRENT_TIMESTAMP,
last_checked DATETIME,
content_hash TEXT,
source TEXT
);
Detection Strategy
- URL Normalization: Standardize URLs before checking
- Hash Comparison: Content hash for duplicate content detection
- Timestamp Tracking: When link was first seen
- Source Tracking: Origin of the link (search vs direct)
- Efficient Queries: Indexed lookups for fast checking
Update Detection
For direct social media scraping:
- New posts always processed
- Updated posts (edits) re-processed
- Deleted posts marked accordingly
- Engagement updates tracked
Processing Flow
Complete Flow Diagram
Analytics and Insights
Generated Insights
The Social component generates insights about:
Brand Sentiment:
- Overall sentiment trends
- Platform-specific sentiment
- Sentiment by product/topic
- Sentiment change over time
Engagement Patterns:
- Best performing content types
- Optimal posting times
- Audience engagement levels
- Viral content identification
Topic Analysis:
- Trending topics related to brand
- Customer pain points
- Feature requests
- Competitive mentions
Audience Insights:
- Demographics and interests
- Geographic distribution
- Influence and reach
- Community growth
Pattern Recognition
Pattern Service identifies long-term trends:
- Seasonal sentiment changes
- Campaign effectiveness
- Crisis detection and response
- Brand perception evolution
- Competitive positioning shifts
Integration with Other Components
Social insights combine with Web and CCTV data to provide comprehensive customer understanding:
Web + Social (Current & Planned):
- Correlate social mentions with website traffic
- Identify social-driven conversions
- Track referral effectiveness
- Measure social ROI
- Pattern Service identifies cross-component patterns automatically
Social + CCTV (Future Capability):
- Connect online buzz to in-store traffic
- Measure campaign impact on foot traffic
- Identify social-to-physical customer journeys
- Requires CCTV Phase 3 deployment
Complete View (Vision):
- Omnichannel customer understanding across all touchpoints
- Full funnel attribution with multi-touch analytics
- Cross-channel pattern recognition via Pattern Service
- Holistic brand health monitoring with unified insights
The Pattern Service automatically discovers correlations between components as data accumulates. As more components are deployed, cross-component insights become richer and more actionable.
Rate Limiting and Compliance
Responsible Scraping
- Robots.txt Compliance: Respects robots.txt directives
- Rate Limiting: Configurable delays between requests
- Domain Limits: Per-domain request limits
- Backoff Strategy: Automatic backoff on errors
- User Agent: Proper identification as CROW scraper
API Usage
For platforms with APIs:
- Official APIs Preferred: Use APIs when available
- Rate Limits: Respect platform rate limits
- Token Management: Secure token storage
- Quota Monitoring: Track API usage
Terms of Service
- Compliance with platform ToS
- Ethical scraping practices
- Public data only
- No personal data collection
- Transparent usage
Performance Characteristics
Execution Time
- Single Cron Run: 5-30 minutes depending on volume
- Query Generation: < 10 seconds
- Per Link Scraping: 2-5 seconds
- Session Processing: < 10 minutes
Scalability
- Links per Session: Up to 1000 links
- Platforms: Unlimited platforms
- Organizations: Scales with Cloudflare
- Concurrent Jobs: Multiple orgs processed in parallel
Reliability
- Retry Logic: Automatic retries on failure
- Error Handling: Graceful degradation
- Partial Results: Partial sessions saved
- Monitoring: Execution tracking and alerts
Troubleshooting
Common Issues
No New Content:
- Verify social media links are correct
- Check if accounts are public
- Review cron schedule configuration
- Confirm API keys are valid
Duplicate Content:
- Check duplicate detection is enabled
- Review link normalization logic
- Verify content hashing
Rate Limiting:
- Adjust scraping frequency
- Increase delays between requests
- Check platform-specific limits
- Consider API upgrades
Extraction Failures:
- Verify Browser Rendering quota
- Check website structure changes
- Review extraction selectors
- Test individual URLs
Future Enhancements
Planned Features
- Real-time Social Listening: WebSocket-based live monitoring
- Advanced Sentiment Analysis: Emotion detection, sarcasm recognition
- Influencer Identification: Key influencers and brand advocates
- Competitor Analysis: Automated competitive intelligence
- Crisis Detection: Early warning system for PR issues
- Response Suggestions: AI-generated response recommendations
- Automated Reporting: Scheduled social media reports
Under Consideration
- Video Content Analysis: Analyze video posts and stories
- Image Recognition: Visual brand mentions
- Hashtag Campaigns: Campaign tracking and analytics
- Social Commerce: Purchase intent and conversion tracking
Related Documentation
- System Architecture - Social component architecture
- User Signup Flow - Social configuration during signup
- Dashboard Features - Social settings and management
- Website Interaction Tracking - Web component for comparison
- CCTV Component - CCTV component for comparison