How to prevent duplicates during imports (with a simple workflow)
Duplicate records are the silent killer of CRM data quality. They cause reporting errors, waste sales time, and create customer confusion when different reps contact the same person.
Most duplicates are created during imports. You think you're cleanly bringing in historical data, but six months later you're dealing with "Acme Inc," "Acme, Inc.," "ACME," and "Acme Corporation" all as separate company records.
Here's the workflow I use to prevent duplicates during every import project.
Step 1: Understand your CRM's duplicate detection
Before importing anything, know how your CRM identifies duplicates. Different platforms use different logic.
Pipedrive duplicate detection:
- Checks email for contacts
- Checks company name for organizations
- Checks phone number as fallback
- Match must be exact (case-insensitive)
Attio duplicate detection:
- More flexible matching
- Can match on multiple fields
- Uses fuzzy matching for names
- Configurable duplicate rules
Critical insight: Most CRMs are looking for exact matches. "Acme Inc" and "Acme, Inc." are different strings and will create separate records.
Step 2: Export and audit your import data
Get all your data into a spreadsheet before trying to import. This lets you clean it systematically.
Common data sources:
- Spreadsheets from various team members
- Old CRM exports
- Email contact lists
- Event attendee lists
- Form submissions
Combine all sources into a master spreadsheet. Add a "Source" column so you know where each record came from.
Step 3: Deduplicate within your import file
Before even thinking about the CRM, remove duplicates in your spreadsheet.
For contacts/people:
- Sort by email address
- Identify exact duplicate emails
- Merge data from duplicate rows (combine all non-empty fields)
- Delete duplicate rows
For companies:
- Sort by company name
- Standardize naming (remove "Inc," "LLC," etc. variations)
- Look for similar names that should be merged
- Research any unclear cases
Pro tip: Use Excel's or Google Sheets' "Remove Duplicates" feature as a starting point, but manually review the results. Automated deduplication often misses things or incorrectly merges distinct entities.
Step 4: Standardize formatting
Inconsistent formatting creates false duplicates even when the data is the same.
Company names:
- Remove extra spaces
- Standardize "Inc." vs "Incorporated" vs "Inc"
- Decide on comma usage: "Acme, Inc." vs "Acme Inc."
- Consistent capitalization
- Remove parenthetical suffixes unless critical
Example transformations:
- "acme corp. (formerly Beta)" → "Acme Corp"
- "ACME CORPORATION" → "Acme Corporation"
- "Acme Inc." (two spaces) → "Acme Inc."
Email addresses:
- Convert to lowercase
- Trim whitespace
- Remove any with invalid format
Phone numbers:
- Choose one format (e.g., +1-555-123-4567)
- Remove or standardize all numbers consistently
- Include country code if international
Step 5: Check against existing CRM data
Export a list of existing companies and contacts from your CRM. Compare against your import file to identify potential duplicates.
Simple approach:
- Export existing CRM data (companies and contacts)
- Create a column in your import file: "Already in CRM?"
- Use VLOOKUP or similar to check if email/company exists
- Flag matches for review
More thorough approach:
- Use fuzzy matching tools to catch near-duplicates
- Check multiple fields (company name + location, person name + company)
- Review all flagged matches manually
For flagged records, decide:
- Skip import (already have this record)
- Update existing record (import has newer/better data)
- Import as new (false positive, actually distinct)
Step 6: Create a matching strategy
Before importing, document your matching rules. This helps when reviewing post-import.
Example matching strategy:
Contacts:
- Primary match: Email address (exact, case-insensitive)
- Secondary match: Name + Company (both must match)
- Action if match found: Update existing record
Companies:
- Primary match: Company name (standardized format)
- Secondary match: Domain name
- Action if match found: Skip importShare this with your team so everyone understands the logic.
Step 7: Stage your import
Never import everything at once. Use a staged approach to catch issues early.
Import stages:
Stage 1: Test batch (10-20 records)
- Import a small, diverse sample
- Check for duplicates manually
- Verify data maps to correct fields
- Confirm matching rules work as expected
Stage 2: Pilot batch (100-200 records)
- Larger sample across different record types
- Run duplicate detection reports
- Get team feedback on data quality
Stage 3: Full import
- Only proceed if Stages 1-2 had zero or near-zero duplicates
- Import in chunks (500-1000 at a time)
- Run duplicate checks between chunks
Stage 4: Post-import cleanup
- Run full duplicate detection
- Manually review any duplicates created
- Merge or delete as needed
Step 8: Use your CRM's deduplication tools
After import, use built-in duplicate detection to catch anything you missed.
Pipedrive:
- Go to Contacts/Companies
- Use "Duplicates" feature
- Review and merge suggested duplicates
- Can set up rules for automatic merging (use carefully)
Attio:
- Configure duplicate detection rules
- Review suggested merges
- Merge records with merge conflict resolution
- Set up ongoing duplicate prevention rules
Important: Always review suggested merges. Automated merging can incorrectly combine distinct entities, especially with common names.
Step 9: Set up ongoing duplicate prevention
Import isn't the only way duplicates get created. Prevent future duplicates with these rules.
CRM-level rules:
- Block creation of contacts with duplicate emails
- Warn when company name is similar to existing
- Require email validation before saving
- Use enrichment to auto-populate data (reduces manual entry errors)
Process-level rules:
- Require reps to search before creating new records
- Use form integrations that check for existing records
- Regular duplicate audits (monthly or quarterly)
Team training:
- Show reps how to search effectively
- Explain why duplicates matter
- Make it easy to report suspected duplicates
Common import scenarios and solutions
Scenario: Merging from multiple CRMs
You have data in two different systems and need to consolidate.
Solution:
- Export from both systems
- Identify unique identifier (usually email or company domain)
- Merge data in spreadsheet before importing
- Use the CRM with better data as "master"
- Import the merged dataset
Scenario: Customer data + prospect data
You have existing customers in one system and prospects in another.
Solution:
- Identify customers who are also in prospect list
- Decide which data is more current/accurate
- Create "Status" field to mark as customer vs prospect
- Import customers first, then prospects
- Use email matching to prevent customer duplicates
Scenario: International data with name variations
Same companies/people have different naming conventions in different regions.
Solution:
- Create "Legal Name" field for official company name
- Use "Trading As" or "DBA" field for regional variations
- Choose one primary name for CRM
- Document variations in notes field
- Use domain name as tiebreaker for matching
Tools to help with deduplication
While manual review is best, these tools speed up the process:
OpenRefine: Free tool for cleaning messy data, finding duplicates, and standardizing formats.
Google Sheets Remove Duplicates: Built-in feature for basic deduplication.
Excel Power Query: Advanced data transformation and deduplication.
Duplicate detection extensions: Many CRMs have marketplace apps for enhanced duplicate detection.
Enrichment services: Tools like Clearbit or Apollo that auto-populate data reduce manual entry errors.
What to do if you already have duplicates
If you've already imported data and have duplicate problems:
Quick triage:
- Export all records
- Sort by email/company name
- Identify obvious duplicates (exact email matches)
- Merge these first
Systematic cleanup:
- Run CRM duplicate detection report
- Review 10-20 duplicates per day
- Document merge decisions
- Track common patterns to prevent future duplicates
Prevent making it worse:
- Stop all imports until duplicate rules are fixed
- Train team on searching before creating new records
- Set up validation rules to block obvious duplicates
Measuring success
You'll know your import process is working when:
During import:
- Test batches show zero duplicates
- Full import creates minimal new duplicates (<1% of imported records)
- No data is lost or incorrectly merged
After import:
- CRM duplicate reports show minimal matches
- Team can find records easily (not multiple versions)
- Reports are accurate (not inflated by duplicates)
Long-term:
- New duplicate creation rate is low (<5 per month)
- Existing duplicates decrease over time
- Team spends less time on manual deduplication
Final checklist
Before your next import:
- [ ] Understand your CRM's duplicate detection logic
- [ ] Export and combine all data sources
- [ ] Remove duplicates within import file
- [ ] Standardize all formatting
- [ ] Check against existing CRM data
- [ ] Document matching strategy
- [ ] Import test batch and review
- [ ] Set up ongoing duplicate prevention
- [ ] Train team on search-before-create
The extra upfront work prevents months or years of cleanup later. Clean data is the foundation of a useful CRM—start there.
Need help with your CRM?
If you're dealing with messy data, manual processes, or a CRM that doesn't fit your team, let's talk.
Book a call