slice icon Context Slice

Gmail Detective Extraction Guidelines

You have discovery results with full email bodies. Your job is to READ the content and apply judgment—not pattern match on keywords. For each category, understand what confirms a fact vs what's noise.

Core Principles

Two-Phase Analysis

  1. Code collected candidates - emails matching keyword queries
  2. You determine meaning - read content, apply judgment, extract confirmed facts

Confidence Levels

  • High confidence: Extract freely (user wrote it, verification proves it)
  • Medium confidence: Extract with caveat ("possibly", "appears to")
  • Low confidence / Noise: Skip entirely

The Key Question

For every potential fact, ask: "Does this email CONFIRM this about the user, or just mention the concept?"


Tier 1: High-Confidence Personal Facts

Children

What we're looking for: Confirmation the user HAS children, names, ages, schools

High-confidence signals (extract freely):

  • User writes "my son Jake" or "my daughter Emma" in their own sent emails
  • School enrollment confirmations addressed to user as parent/guardian
  • Pediatrician appointment confirmations with child's name
  • "picking up the kids from soccer" in user's own words

Medium-confidence (extract with caveat):

  • Kids' activity receipts (could be for nieces/nephews/friends' kids)
  • School contact emails (could be teacher, coach, not parent)

Noise to skip:

  • "Kids menu" in restaurant emails
  • "My son of a gun coworker" - colloquial expression
  • Other people's children mentioned
  • Marketing: "Your kids will love this!"
  • "As a mother myself..." in sales emails

What to extract: Child's name (if clear), approximate age range (if inferable from school level), school name (if confirmed they attend)

What NOT to extract: Specific activities (too transient), exact ages from single mentions (kids grow)

Partner

What we're looking for: Romantic partner's name, relationship type

High-confidence signals:

  • User writes "my wife Sarah" or "my husband John" in sent emails
  • Joint account notifications with both names
  • Wedding-related emails with specific partner name
  • "Alex and I are planning..." in user's own words

Medium-confidence:

  • Emails CC'd to same person repeatedly (could be colleague)
  • Shared calendar events (could be roommate)

Noise to skip:

  • "business partner" - completely different meaning
  • "design partner" or "co-founder"
  • "Your husband will love this!" - marketing
  • Gift suggestions for partners
  • Dating app notifications (shows they're looking, not that they have one)

What to extract: Partner's name, relationship type (spouse, partner, boyfriend/girlfriend)

Pets

What we're looking for: Pet names, species, breeds

High-confidence signals:

  • User writes "my dog Max" or "our cat Luna" in sent emails
  • Vet appointment confirmations with pet's name
  • Pet insurance policies with animal details
  • "Taking Buddy to the groomer" in user's own words

Medium-confidence:

  • Pet supply orders (could be gifts)
  • General pet store emails

Noise to skip:

  • "It's raining cats and dogs"
  • Pet adoption newsletter subscriptions
  • "Dog-friendly restaurants near you"
  • Animal charity donation receipts (supporting animals ≠ having pets)

What to extract: Pet name, species, breed if mentioned

Siblings

What we're looking for: Sibling names, relationship context

High-confidence signals:

  • User writes "my brother Mike" or "my sister Lisa" in sent emails
  • Family event planning mentioning siblings by name
  • "Visiting my sister this weekend" in user's own words

Noise to skip:

  • "Brother from another mother" - friendship expression
  • Fraternity/sorority "brothers" and "sisters"
  • Religious community "brothers" and "sisters"

Parents

What we're looking for: Parent references, names if mentioned

High-confidence signals:

  • User writes "my mom" or "my dad" in context of family
  • Caretaking discussions about aging parents
  • "Mom's birthday is coming up" in user's own words

Noise to skip:

  • "Mother's Day sale!"
  • "Father knows best" expressions
  • Generic parenting advice emails

Location

What we're looking for: User's actual address, city, state

High-confidence signals:

  • Shipping confirmations TO user with full address
  • Utility bills with service address
  • "Delivered to [address]" where name matches user

Medium-confidence:

  • Amazon orders (check if shipped to user's name, not gifts)
  • Food delivery addresses (usually accurate)

Noise to skip:

  • Gift shipping to OTHER names/addresses
  • "Ship to a different address" orders
  • Business address emails (might not be home)
  • Store location notifications

What to extract: City, state, ZIP. Only extract full address if clearly the user's home.

Messaging Platforms

What we're looking for: Confirmation user has accounts on WhatsApp, Signal, Telegram, Slack, Discord

High-confidence signals:

  • Verification code emails (proves account ownership)
  • "New device signed in" alerts
  • Account security notifications

What to extract: Platform name, confirmed account exists. Don't extract verification codes.


Tier 2: Tool Usage (High Signal for Workflow)

SaaS Trials

What we're looking for: Tools user is actively exploring

High-confidence signals:

  • "Your trial has started" with specific tool name
  • "X days left in your trial"
  • User-initiated signup confirmations

Noise to skip:

  • Marketing emails about trials they haven't started
  • "Try us free!" promotions

What to extract: Tool name, whether trial is active or expired

Payment Receipts

What we're looking for: Services user actually pays for

High-confidence signals:

  • Stripe/PayPal receipts with service name
  • "Thank you for your payment to [Service]"
  • Subscription renewal confirmations

What to extract: Service names they pay for (high commitment signal)

Project Management Tools

What we're looking for: Tools user actively works with

High-confidence signals:

  • "You were assigned to [task]"
  • "Someone mentioned you"
  • Actual task/project notifications

Noise to skip:

  • Marketing from PM tools
  • "Try Linear for free" promotions
  • Workspace invitations not yet accepted

What to extract: Tool names in active use (Linear, Jira, Asana, Notion, Trello, etc.)

Video/Meeting Platforms

What we're looking for: Preferred video call platform

High-confidence signals:

  • Meeting confirmations with Zoom/Meet/Teams links
  • User's own Calendly links showing platform preference
  • Recording notifications from platform user

What to extract: Primary video platform preference


Tier 3: Professional Intelligence

Recruiting and Jobs

What we're looking for: Job search activity, career moves

High-confidence signals:

  • "Your application was received" - user applied
  • "Interview scheduled" - user is interviewing
  • Offer letters addressed to user

Medium-confidence:

  • Recruiter cold outreach (everyone gets these, doesn't mean job hunting)
  • LinkedIn "jobs you might like" (passive browsing)

Noise to skip:

  • Generic recruiting spam
  • "We're hiring!" company newsletters
  • Job alert subscriptions (might be casual browsing)

What to extract: If actively interviewing or received offers, note it. Don't assume job hunting from recruiter spam.

Contracts and Signatures

What we're looking for: Documents user has signed

High-confidence signals:

  • "Document completed" from DocuSign/HelloSign
  • "You signed [Agreement Name]"
  • Countersigned contracts

What to extract: Types of documents signed (employment, NDA, lease, etc.)

Education and Certifications

What we're looking for: Learning activities, credentials

High-confidence signals:

  • Course completion certificates
  • "You enrolled in [Course]"
  • University communications to enrolled student
  • Certification earned notifications

Medium-confidence:

  • Course browse history
  • "Recommended for you" learning suggestions

What to extract: Courses completed, certifications earned, institutions attended

Social Platforms

What we're looking for: Platform presence, possibly usernames

High-confidence signals:

  • Account security alerts (proves ownership)
  • "Your post got X likes" notifications
  • Profile update confirmations

What to extract: Platforms user is active on. Extract usernames ONLY if clearly visible and not sensitive.


Tier 4: Infrastructure Signals

Hosting and Deployment

What we're looking for: Side projects, technical interests

High-confidence signals:

  • "Deployment successful" to specific domain
  • Build notifications with project names
  • Domain registration confirmations

What to extract: Platforms used (Vercel, Netlify, etc.), project domains if visible

Domains

What we're looking for: Domains user owns

High-confidence signals:

  • Domain registration confirmations
  • Renewal reminders addressed to user
  • DNS/SSL notifications for specific domains

What to extract: Domain names owned (indicates side projects or businesses)


Tier 5-6: Financial and Lifestyle

Banking and Investments

What we're looking for: Financial institutions used (NEVER account numbers)

High-confidence signals:

  • Statement notifications from specific banks
  • Account alerts from investment platforms

What to extract: Bank/brokerage names only. NEVER extract account numbers, balances, or transaction details.

Crypto

What we're looking for: Crypto platform usage

High-confidence signals:

  • Transaction notifications from exchanges
  • Account verification confirmations

What to extract: Exchange names used. NEVER extract wallet addresses or balances.

Travel

What we're looking for: Travel patterns, frequent destinations

High-confidence signals:

  • Flight confirmations with destinations
  • Hotel bookings with locations
  • Airbnb stays

What to extract: Frequent destinations, airlines used, travel frequency. Don't extract specific dates or reservation numbers.

Health

What we're looking for: Healthcare providers (NEVER conditions)

High-confidence signals:

  • Appointment confirmations with provider names
  • Pharmacy pickup notifications

What to extract: Provider/pharmacy names only. NEVER extract diagnoses, medications, or health conditions.

Food Delivery

What we're looking for: Services used, general patterns

What to extract: Platforms used (DoorDash, UberEats, etc.). Don't extract specific orders or addresses.


Tier 7: Legal and Official

Government

What we're looking for: Government interactions

High-confidence signals:

  • Emails from .gov domains
  • DMV notifications
  • Tax-related correspondence

What to extract: Types of government interactions. Be very careful with sensitive legal matters.


Tier 8: Life Events (for Timeline)

Wedding

What we're looking for: Marriage evidence

High-confidence signals:

  • Wedding registry links in user's sent emails
  • "Save the date" for user's wedding
  • Wedding vendor confirmations

Noise to skip:

  • Attending others' weddings
  • Wedding spam/ads

Baby

What we're looking for: New child evidence

High-confidence signals:

  • Baby registry in user's name
  • Hospital/OB appointment confirmations
  • "Congratulations on your new arrival" to user

Graduation

What we're looking for: Educational milestones

High-confidence signals:

  • Commencement tickets/information
  • Diploma/degree notifications
  • "Congratulations graduate" addressed to user

Relocation

What we're looking for: Address changes

High-confidence signals:

  • Change of address confirmations
  • "We moved!" in user's sent emails
  • Utility setup at new address
  • Moving company bookings

Final Checklist Before Extracting

For every fact you're about to extract, verify:

  1. Is this about the USER, not someone else?
  2. Is this CONFIRMED or just mentioned?
  3. Is this CURRENT or outdated? (Prefer recent evidence)
  4. Would the user be comfortable with this being stored?
  5. Have I avoided sensitive data? (No account numbers, health conditions, legal details)

When uncertain: Skip it. False positives are worse than missing data. The user can always provide information directly if needed.