Data Hygiene

Calling all data hoarders: 4 reasons why you should purge data

The most common recurring recommendation I make to customers is to purge data from their databases. As the years go by, it’s easy for the data to pile up. Just like we hold on to those jeans in our closet in the hope of losing ten pounds, we often hold on to data sets for “just in case” scenarios that don’t materialize. Although letting go can be tough, there are four key reasons that can make the purge process worth it:

1. Performance

Excessive amounts of data slow down regular update processes, the building of data aggregations, analytics, ad-hoc queries, reporting and dashboard performance, as well as campaign performance. I’ve seen many business intelligence and campaign platforms reach their max performance limits, frustrating technology teams and end users.

2. Cost

Although archive storage for snapshots or old files is generally inexpensive, active block storage hardware can be pretty pricy. In addition to storage costs, many customer data platforms, email platforms, and campaign segmentation tools charge on a per-profile or per-email basis. This means that you are likely being charged for individuals that are unmarketable or have incomplete records. You may also find yourself in need of a new database server to handle the higher data volumes, along with an increase in database software license costs.

3. Clutter

What is that table for again? Poor documentation and lack of cleanup processes can result in copious mystery tables. Single or temporary use tables can add up quickly. Forgetting to delete tables on a regular basis can result in analysis paralysis, where teams are afraid to purge tables ‘just in case’ some person or process might be using the data. Similar tables or views are often created for different audiences.

4. Quality

We find that the oldest data typically carries the highest amount of quality deficits. Legacy systems typically have fewer checkpoints and best practices built in. This means many of these records aren’t marketable or useful in any way. We also find issues where legacy address data has not recently undergone NCOA (National Change of Address) processing, and addresses are so out of date, an updated address cannot be found. Additionally, bad data often creates bad data merge and duplicate data scenarios when it comes to customer records.

Tips for cleaning out old or legacy data

  • Gather all of your database documentation. This includes things like your database schema, ER diagrams, and data dictionary.
  • Identify the data you want to clean. What data is no longer needed or used? What data is outdated or inaccurate? It’s important to have a clear understanding of what data you want to remove before you start the cleaning process. For each database table, ask the following questions:
      • Is this table necessary?
      • Is the data in this table accurate?
      • Is the data in this table useful?
      • Is this table/data redundant?
      • And when in doubt: Does this table/data spark joy? (Thanks Marie Kondo)
  • Develop a plan for cleaning the data. This plan should include how you will identify and remove the old or legacy data, as well as how you will test the data to ensure that it is still accurate and complete after the cleaning process is complete.
  • Consider opportunities to aggregate data. Instead of keeping 25 years of transactions to calculate customer lifetime value, create a process to aggregate this value on a regular basis, and keep your queries running quickly while enabling you to analyze customer trends and behavior effectively.
  • Back up your data before you start cleaning. This will help you to recover your data if something goes wrong during the cleaning process.
  • Clean the data in a safe and controlled environment. This means avoiding making changes to the data directly in production systems. Instead, create a copy of the data to work on in a sandbox environment.
  • Document the changes you make to the data. This will help you to track what data was removed and why, and it will also make it easier to roll back the changes if necessary.

Examples of data that should be purged

  • Customer data: Delete duplicate customer records and customer records for customers who have not been active in a long time.
  • Product data: Delete product and hierarchy data for products that are no longer sold.
  • Website data: Web data has a short shelf life. Delete website activity that is beyond its useful life for retargeting purposes.
  • Email activity data: Email response data has little use beyond 18 months.
  • Financial data: Delete old financial data that is no longer needed for compliance purposes.
  • Operational data: Delete log data and other operational data that is no longer needed.

Here are some additional pointers to guide you through the purge process

  • Use a data quality tool. There are a number of data quality tools available that can help you to identify and remove old or legacy data. These tools can be especially helpful if you have a large dataset to clean.
  • Archive old data instead of deleting it. In some cases, it may be necessary to retain old data for compliance or audit reasons. Instead of deleting the data, you can archive it to a separate storage system. This will make the data less accessible, but it will still be available if needed.
  • Save key reports, counts, or summaries of legacy data if you feel you might need the data in a very unlikely scenario. This takes up significantly less space and it is a much more efficient way to get your hands on the total dollars of footwear sales from back in 2008.
  • Automate the cleaning process. If you need to clean old or legacy data on a regular basis, you can automate the process using a script or job scheduler. This will save you time and effort in the long run.
  • Remove unused user accounts. If you have user accounts that are no longer used, you should remove them. This will help to improve security and reduce the risk of unauthorized access to your system.
  • Delete old files. If you have files that are no longer needed, you should delete them. This will free up disk space and improve the performance of your system.

Cleaning out old or legacy data can be a complex and time-consuming process, but it is important to do so regularly to ensure that your data is accurate, complete, and secure. By following the tips above, you can make the cleaning process more efficient and effective.

Looking for more information about data management and other tips about maintaining a healthy database? Check out our ebook:

New eBook
Navigating the modern data terrain: your expert-guided journey
Christy McGrath
VP Solution Management

Christy brings over 15 years of experience in data analysis, direct marketing, strategy, requirements management, systems implementation, and project management. She specializes in customer and prospect marketing database, campaign, reporting, and email solution implementations. Christy has a proven track record of helping clients implement complex marketing programs, including service reminders, welcomes, retention, upsell, and winback campaigns. She also provides solution architecture support and best practice recommendations to her clients, and leads a team of Solution Managers, Business Systems Analysts, Email Technology Leads, and Email Deliverability Analysts. Her experience spans the automotive, retail, telecommunications, non-profit, healthcare, insurance, and defense industries. She holds a Master of Business Administration and a Master of Finance degree from Boston College.