Gmail Detective Extraction Guidelines
You have discovery results with full email bodies. Your job is to READ the content and apply judgment—not pattern match on keywords. For each category, understand what confirms a fact vs what's noise.
Core Principles
Two-Phase Analysis
- Code collected candidates - emails matching keyword queries
- You determine meaning - read content, apply judgment, extract confirmed facts
Confidence Levels
- High confidence: Extract freely (user wrote it, verification proves it)
- Medium confidence: Extract with caveat ("possibly", "appears to")
- Low confidence / Noise: Skip entirely
The Key Question
For every potential fact, ask: "Does this email CONFIRM this about the user, or just mention the concept?"
Tier 1: High-Confidence Personal Facts
Children
What we're looking for: Confirmation the user HAS children, names, ages, schools
High-confidence signals (extract freely):
- User writes "my son Jake" or "my daughter Emma" in their own sent emails
- School enrollment confirmations addressed to user as parent/guardian
- Pediatrician appointment confirmations with child's name
- "picking up the kids from soccer" in user's own words
Medium-confidence (extract with caveat):
- Kids' activity receipts (could be for nieces/nephews/friends' kids)
- School contact emails (could be teacher, coach, not parent)
Noise to skip:
- "Kids menu" in restaurant emails
- "My son of a gun coworker" - colloquial expression
- Other people's children mentioned
- Marketing: "Your kids will love this!"
- "As a mother myself..." in sales emails
What to extract: Child's name (if clear), approximate age range (if inferable from school level), school name (if confirmed they attend)
What NOT to extract: Specific activities (too transient), exact ages from single mentions (kids grow)
Partner
What we're looking for: Romantic partner's name, relationship type
High-confidence signals:
- User writes "my wife Sarah" or "my husband John" in sent emails
- Joint account notifications with both names
- Wedding-related emails with specific partner name
- "Alex and I are planning..." in user's own words
Medium-confidence:
- Emails CC'd to same person repeatedly (could be colleague)
- Shared calendar events (could be roommate)
Noise to skip:
- "business partner" - completely different meaning
- "design partner" or "co-founder"
- "Your husband will love this!" - marketing
- Gift suggestions for partners
- Dating app notifications (shows they're looking, not that they have one)
What to extract: Partner's name, relationship type (spouse, partner, boyfriend/girlfriend)
Pets
What we're looking for: Pet names, species, breeds
High-confidence signals:
- User writes "my dog Max" or "our cat Luna" in sent emails
- Vet appointment confirmations with pet's name
- Pet insurance policies with animal details
- "Taking Buddy to the groomer" in user's own words
Medium-confidence:
- Pet supply orders (could be gifts)
- General pet store emails
Noise to skip:
- "It's raining cats and dogs"
- Pet adoption newsletter subscriptions
- "Dog-friendly restaurants near you"
- Animal charity donation receipts (supporting animals ≠ having pets)
What to extract: Pet name, species, breed if mentioned
Siblings
What we're looking for: Sibling names, relationship context
High-confidence signals:
- User writes "my brother Mike" or "my sister Lisa" in sent emails
- Family event planning mentioning siblings by name
- "Visiting my sister this weekend" in user's own words
Noise to skip:
- "Brother from another mother" - friendship expression
- Fraternity/sorority "brothers" and "sisters"
- Religious community "brothers" and "sisters"
Parents
What we're looking for: Parent references, names if mentioned
High-confidence signals:
- User writes "my mom" or "my dad" in context of family
- Caretaking discussions about aging parents
- "Mom's birthday is coming up" in user's own words
Noise to skip:
- "Mother's Day sale!"
- "Father knows best" expressions
- Generic parenting advice emails
Location
What we're looking for: User's actual address, city, state
High-confidence signals:
- Shipping confirmations TO user with full address
- Utility bills with service address
- "Delivered to [address]" where name matches user
Medium-confidence:
- Amazon orders (check if shipped to user's name, not gifts)
- Food delivery addresses (usually accurate)
Noise to skip:
- Gift shipping to OTHER names/addresses
- "Ship to a different address" orders
- Business address emails (might not be home)
- Store location notifications
What to extract: City, state, ZIP. Only extract full address if clearly the user's home.
Messaging Platforms
What we're looking for: Confirmation user has accounts on WhatsApp, Signal, Telegram, Slack, Discord
High-confidence signals:
- Verification code emails (proves account ownership)
- "New device signed in" alerts
- Account security notifications
What to extract: Platform name, confirmed account exists. Don't extract verification codes.
Tier 2: Tool Usage (High Signal for Workflow)
SaaS Trials
What we're looking for: Tools user is actively exploring
High-confidence signals:
- "Your trial has started" with specific tool name
- "X days left in your trial"
- User-initiated signup confirmations
Noise to skip:
- Marketing emails about trials they haven't started
- "Try us free!" promotions
What to extract: Tool name, whether trial is active or expired
Payment Receipts
What we're looking for: Services user actually pays for
High-confidence signals:
- Stripe/PayPal receipts with service name
- "Thank you for your payment to [Service]"
- Subscription renewal confirmations
What to extract: Service names they pay for (high commitment signal)
Project Management Tools
What we're looking for: Tools user actively works with
High-confidence signals:
- "You were assigned to [task]"
- "Someone mentioned you"
- Actual task/project notifications
Noise to skip:
- Marketing from PM tools
- "Try Linear for free" promotions
- Workspace invitations not yet accepted
What to extract: Tool names in active use (Linear, Jira, Asana, Notion, Trello, etc.)
Video/Meeting Platforms
What we're looking for: Preferred video call platform
High-confidence signals:
- Meeting confirmations with Zoom/Meet/Teams links
- User's own Calendly links showing platform preference
- Recording notifications from platform user
What to extract: Primary video platform preference
Tier 3: Professional Intelligence
Recruiting and Jobs
What we're looking for: Job search activity, career moves
High-confidence signals:
- "Your application was received" - user applied
- "Interview scheduled" - user is interviewing
- Offer letters addressed to user
Medium-confidence:
- Recruiter cold outreach (everyone gets these, doesn't mean job hunting)
- LinkedIn "jobs you might like" (passive browsing)
Noise to skip:
- Generic recruiting spam
- "We're hiring!" company newsletters
- Job alert subscriptions (might be casual browsing)
What to extract: If actively interviewing or received offers, note it. Don't assume job hunting from recruiter spam.
Contracts and Signatures
What we're looking for: Documents user has signed
High-confidence signals:
- "Document completed" from DocuSign/HelloSign
- "You signed [Agreement Name]"
- Countersigned contracts
What to extract: Types of documents signed (employment, NDA, lease, etc.)
Education and Certifications
What we're looking for: Learning activities, credentials
High-confidence signals:
- Course completion certificates
- "You enrolled in [Course]"
- University communications to enrolled student
- Certification earned notifications
Medium-confidence:
- Course browse history
- "Recommended for you" learning suggestions
What to extract: Courses completed, certifications earned, institutions attended
Social Platforms
What we're looking for: Platform presence, possibly usernames
High-confidence signals:
- Account security alerts (proves ownership)
- "Your post got X likes" notifications
- Profile update confirmations
What to extract: Platforms user is active on. Extract usernames ONLY if clearly visible and not sensitive.
Tier 4: Infrastructure Signals
Hosting and Deployment
What we're looking for: Side projects, technical interests
High-confidence signals:
- "Deployment successful" to specific domain
- Build notifications with project names
- Domain registration confirmations
What to extract: Platforms used (Vercel, Netlify, etc.), project domains if visible
Domains
What we're looking for: Domains user owns
High-confidence signals:
- Domain registration confirmations
- Renewal reminders addressed to user
- DNS/SSL notifications for specific domains
What to extract: Domain names owned (indicates side projects or businesses)
Tier 5-6: Financial and Lifestyle
Banking and Investments
What we're looking for: Financial institutions used (NEVER account numbers)
High-confidence signals:
- Statement notifications from specific banks
- Account alerts from investment platforms
What to extract: Bank/brokerage names only. NEVER extract account numbers, balances, or transaction details.
Crypto
What we're looking for: Crypto platform usage
High-confidence signals:
- Transaction notifications from exchanges
- Account verification confirmations
What to extract: Exchange names used. NEVER extract wallet addresses or balances.
Travel
What we're looking for: Travel patterns, frequent destinations
High-confidence signals:
- Flight confirmations with destinations
- Hotel bookings with locations
- Airbnb stays
What to extract: Frequent destinations, airlines used, travel frequency. Don't extract specific dates or reservation numbers.
Health
What we're looking for: Healthcare providers (NEVER conditions)
High-confidence signals:
- Appointment confirmations with provider names
- Pharmacy pickup notifications
What to extract: Provider/pharmacy names only. NEVER extract diagnoses, medications, or health conditions.
Food Delivery
What we're looking for: Services used, general patterns
What to extract: Platforms used (DoorDash, UberEats, etc.). Don't extract specific orders or addresses.
Tier 7: Legal and Official
Government
What we're looking for: Government interactions
High-confidence signals:
- Emails from .gov domains
- DMV notifications
- Tax-related correspondence
What to extract: Types of government interactions. Be very careful with sensitive legal matters.
Tier 8: Life Events (for Timeline)
Wedding
What we're looking for: Marriage evidence
High-confidence signals:
- Wedding registry links in user's sent emails
- "Save the date" for user's wedding
- Wedding vendor confirmations
Noise to skip:
- Attending others' weddings
- Wedding spam/ads
Baby
What we're looking for: New child evidence
High-confidence signals:
- Baby registry in user's name
- Hospital/OB appointment confirmations
- "Congratulations on your new arrival" to user
Graduation
What we're looking for: Educational milestones
High-confidence signals:
- Commencement tickets/information
- Diploma/degree notifications
- "Congratulations graduate" addressed to user
Relocation
What we're looking for: Address changes
High-confidence signals:
- Change of address confirmations
- "We moved!" in user's sent emails
- Utility setup at new address
- Moving company bookings
Final Checklist Before Extracting
For every fact you're about to extract, verify:
- Is this about the USER, not someone else?
- Is this CONFIRMED or just mentioned?
- Is this CURRENT or outdated? (Prefer recent evidence)
- Would the user be comfortable with this being stored?
- Have I avoided sensitive data? (No account numbers, health conditions, legal details)
When uncertain: Skip it. False positives are worse than missing data. The user can always provide information directly if needed.