Define “Dirty” Data
Dirty data is defined as inaccurate, incomplete or erroneous data, especially in a computer system or database.
In Salesforce dirty data can manifest in a number of different ways, including:
- Duplicate records (e.g. two leads with the same information, a contact and a lead with the same information, etc.)
- Incomplete records (e.g. a lead without an email address or phone number)
- Inaccurate records (e.g. an opportunity with an inaccurate close date)
Define “Data Quality”
Data quality refers to the usability and accuracy of data (technical definition here). Â Dirty data is poor quality. Â Completely de-duplicated, properly formatted, populated accurate data is considered clean, and of high quality.
Maintaining Good Data Quality
The standard tools used to create records (import wizards, Data Loader, web-to-lead) are not designed to thoroughly manage data quality:
- Import wizards have limited criteria to match duplicate records (e.g. name, email address) on import.
- Duplicate matching occurs only on one object (e.g. import leads will not match against contact records as well).
- Web-to-lead does not perform any duplicate matching.
Here are a few guidelines that can greatly increase data quality in your org:
- Make sure your import files are clean (removed of duplicates, properly formatted, etc.) prior to importing data. Â Admittedly, this is not always possible or practical.
- Use the leads object to store lower quality or unverified data. Â Only leads that are qualified and of high data quality should be converted to accounts and contacts.
- Train users to search for existing leads/contacts prior to creating a new lead/contact.
- Train users to search for duplicate records prior to working with unverified data (e.g. web-to-lead submissions).
- Use Data.com Duplicate Management (free as of Spring ’15) or third party tools to prevent duplicate records from being created.
- Use required fields (either via field configuration or page layout), validation rules (look at the REGEX function for complex formatting requirements, such as phone numbers), filtered lookups, and other tools and features to ensure data is entered completely and formatted properly.
- Use Data.com Clean (additional license fee as of Spring ’15), another third party tool, or manually cleanse existing dirty data.
Let’s look at a few examples:
Symptom | Potential Cause | Potential Solutions |
---|---|---|
Duplicate lead records. | End user did not search for an existing lead prior to creating a new lead. | 1. Train users to search for duplicate prior to creating new leads. 2. Configure Data.com Duplicate Management Rules. 3. Use a third party solution. |
Web-to-lead created a duplicate lead entry. | 1. Train users to search for duplicate prior to creating new leads. 2. Configure Data.com Duplicate Management Rules. 3. Use a third party solution. |
|
Contact is duplicated as a lead record. | Lead import wizard did not match an existing contact with the same email address. | 1. Train users to search for duplicates on imported leads. 2. Configure Data.com Duplicate Management Rules. 3. Use a third party solution. |
A lead record exists with no contact information. | Nothing prevents a user from creating a lead with no contact information. | 1. Make the email and/or phone field required on lead page layout(s). 2. Use a validation rule to ensure that either phone or email is populated. |
Manually Cleansing Duplicate Records
Salesforce provides several manual tools to merge duplicate records:
Leads
Find Duplicates (Button)
Accounts
Merge Accounts (from the account tab)
Contacts
Merge contacts button (from an account record)
Data.com Duplicate Management
Data.com Duplicate Management identifies duplicate records through the following:
Matching Rules specify which fields are evaluated to determine if a duplicate is detected (e.g. First Name, Last Name, Email).  You can leverage the standard rules for some objects (lead, contact, account), or create your own rules if you want to create your own logic or reference custom objects.  Matching rules can specify fuzzy or exact field matches.
Standard Contact and Lead Matching Rule
[Should / Medium / Salesforce.com]
Understanding Matching Rules
[Should / 5m / Salesforce.com]
Duplicate Rules allow the administrator to specify the matching rule(s) (above) that should be evaluated when a record is created or modified, and what should occur as a result (allow or block the action).
Take Control of Duplicates
[Should / 3m / Salesforce.com]
Once activated, here is what an example duplicate rule looks like from an end user’s perspective. Â In this example, the user is attempting to create a new lead, but Data.com has found 2 existing duplicate leads and 2 existing duplicate contacts and will block the creation of this record (the save button issues this error):
To configure this rule, the administrator would create a new duplicate rule (the numbers below correspond to the screenshot):
- The object determines when the rule will be evaluated. Â In this example, the rule will be evaluated when a lead is created or edited.
- If we also wanted to prevent duplicate contacts from being created or edited, then we would create a second duplicate rule for the contact object.
- Determine what you want to occur when a duplicate is detected upon edit and creation: block or allow.  This example shows that we would block new duplicate leads from being created, but warn users when a duplicate is detected when an existing lead record is modified.
- Under matching rules, the administrator can specify which objects and matching rule(s)Â are used to identify duplicate records. Â In this example, we are using the standard matching rule for leads and contacts.
- The matching rule specifies the individual fields that are compared and methodology for comparison (e.g. exact or fuzzy match).
Data Quality Tools (Third Party)
There are a wide range of products to manage data quality within Salesforce. Â Here are a few of the more popular options:
Name | Vendor | Type | Price | Description |
---|---|---|---|---|
Demand Tools | CRM Fusion | PC Application | Paid | Demand Tools is arguably the industry leading data cleansing tool for Salesforce. |
People Import | CRM Fusion | PC Application | Paid | People Import is designed to import leads/contacts into Salesforce without creating duplicate records (matches against existing leads and contacts). |
Dupe Blocker | CRM Fusion | AppExchange Package | Paid | DupBlocker blocks the creation of duplicate leads/contacts in Salesforce in real-time. |
Various Products | RingLead | AppExchange Package(s) | Free/Paid | Whereas most other data quality software is scenario-driven (meaning that the administrator must define what quantifies a duplicate), RingLead maintains a unique matching algorithm. This makes RingLead potentially a good option for an organization that wants something that "just works". |
DupeCatcher | Symphonic Source | AppExchange Package | Free | DupeCatcher is a great free option for preventing duplicate lead/contacts in Salesforce in real-time. |
Cloudingo | Symphonic Source | AppExchange Package | Paid | Cloudingo is full data quality suite, on-demand. |
this one appears to be a dead link –
Managing Duplicate Records in Salesforce with Duplicate Rules
[Should / 3m / Salesforce.com]
I just experienced the same thing.
Thanks, updated
Hi John,
Quick question , in the above Duplicate Rule Edit, Record level security there are two options
one for sharing rules ,the other for bypass sharing rules
can you please explain what is the difference with example , because what i understand is ultimately duplicate rule would be enforced only on those records which the user have access to.
What is the purpose for two options ?
Hope you reply soon
Thanks in advance!
San.
Ignoring record security would mean that you’d be running dedup rules against ALL records in the system (even the records the user couldn’t view)
Hi john,
If any lead comes from web then how this duplicate Rule will work. will the lead be rejected..?
It looks like yes it is possible: https://success.salesforce.com/answers?id=90630000000wlaSAAQ
Best to add an exception for web to lead entries and then force a dedup by the user after the record is saved
Regarding to this question, how it affect to this? “Web-to-lead does not perform any duplicate matching”
Hi John,
Will the duplicate rule trigger if the user does not have access to the original record? I presume yes?
That’s a really good question and I can’t seem to find an answer in the documentation.
I did test this scenario out and found the following:
-enabled the standard lead matching (matches lead to lead and lead to contact) duplicate check
-created a duplicate lead with an admin account: got an error referencing the duplicate lead
-created a duplicate lead with a user account: was able to create the lead (could not view the duplicate lead from this account)
-went back to the admin account and edited the new duplicate lead. with no field changes, can save the record without triggering the duplicate rule
-make a field change to the duplicate record (as an admin), and it will trigger the duplicate error.
Definitely something to explore further if you’re implementing a data duplication prevention strategy.
John:
I created a duplicate rule for Contacts and another for Account, exactly as shown above.
I can login as myself (Admin) or regular user and still able to create a duplicate new record or change an existing one, form both Contacts and Accounts. What I am missing? Both duplicate rules as activated.
Thanks
Are you populating the entire record? Check the details around the standard matching alg.- you may not be issuing a match.
Oops, I tried to create a new record by typing only 1st and last name.
Now that you mentioned, I just took an existing Contact, tried to clone it and got the warning. Thanks a bunch!
FYI….took the exam this morning. Killed.
Thanks for everything. Get it together on the new developer stuff dude!
Awesome congrats Andrew!
Yeah the Lightning UI and related stuff will definitely make into the site once released – its not actually released yet (Winter 16 around the corner). Glad the site helped!
Is there a lot of multiples choice questions on validation and data quality in exam as m giving my exam on 26th
I passed last June 28th and don’t recall ANY questions like that. Keep in mind that SF has a “bank”of over 1000 questions and you can get a very tough one like I had on May 18th. Good luck and let us know.
The “train users to search for existing records” is really only as good (bad?) as the record level security they have and or / role hierarchy enforcement. Right? If they can’t read or see the record, it won’t turn up in search.
Correct – that definitely needs to be considered
A little edit:
“Only lead that are qualified…”
should read
“Only leads that are qualified…” or “Only lead data that are [or is, depending on your singular/plural preference 🙂 ] qualified…”
Thanks, updated
Needs editing:
Data quality refers the usability and accuracy of data
should read:
Data quality refers to the usability and accuracy of data
Thanks Kevin – appreciate the many finds!
Hi John,
You have done a great job by providing this content, its extremely useful in my ADM201 exam preparation. After reading this material i have gained confidence in my SF knowledge. I have my test on 21st Nov, i look forward to the results .:P
BTW What would you recommend from two options for Data Quality Tool- RIngLead and Demand Tool?
Puja
Cloud Dingo is another option that is growing in popularity from what I’ve seen. I actually haven’t used any of them extensively myself so I’m afraid I can’t provide a recommendation.
Good luck on the exam!
Thank You John!
“DupeCatcher is a great free option for preventing duplicate lead/contacts in Salesforce in real-time”
By my knowledge DupeCatcher is not preventing duplicate leads/contacts in realtime. You need to save the record first before you’ll get a dupe message. I thought that only Ringlead (paid version) offers you while typing (realtime) a dupe message.
Interesting point of clarification – “real time” in this case is intended to mean that it prevents the record from being saved versus rather than allowing a duplicate to be saved and then performing cleanup after the fact. Agreed Ringlead is the solution I’ve seen to include a type-ahead style real time duplicate finder.
minor typo: “Train users to search for duplicate records as prior to working with unverified data (e.g. web-to-lead submissions).”
remove the “as”.
Thx, fixed!
Thanks roger ! very helpful.
“Use a validation rule to ensure that either phone or email is populated.”
—-> an interesting exercise!
to accomplish this, one possible solution is:
1. Go to App Setup > Leads > Validation Rules
2. Click New
3. Rule Name: “Require_either_email_or_phone”
4. Check ‘Active’
5. Enter ‘Error Condition Formula’:
AND(
ISBLANK(Phone),
ISBLANK(Email))
6. Type on ‘Error Message’:
“Either a phone number or an email address is required for every lead!”
7. Click Save
8. Test on any Lead
A minor point, but the formula should use the OR() function, not AND().
If you use OR, then you’ll get an error if EITHER phone OR email is BLANK. The rule we’re trying to build in this example should only error if BOTH phone AND email are BLANK. So, Roger’s formula is correct.
Boolean logic is such fun! 🙂
Roger’s use case specifically says “or” in the requirement. To me, that would be an OR() function in the error condition formula.
Otherwise, requirement, and error message, should both be “Both a phone number and an email is required for every lead!”
Boolean logic is either fun, or not, but it’s never both. 🙂
I think Scott is right on this one – if you used an OR statement, then the validation rule would fire every time EITHER email OR phone was blank. In short it would make both fields required – which you could just as easily accomplish by making them required on the page layout. The AND actually makes it an OR in a practical sense. Counter intuitive eh 🙂
Awesome!!..i just tested in my org..good to know…