Data Scrubbing: A Central Tenant of Settlement Administration
Every business runs on data. Our organization is data-driven, and it shows throughout every layer.
Whether it’s sales numbers, inventory, employee information, or financial records, it's central to all aspects of settlement administration.
Your success depends upon the quality of this data, so does ours. This legal segment absolutely depends on the power of data and a sophisticated robust set of procedures that enable large volumes of data to be cleansed, parsed,
semented, catalogued, refined, and reworked in order to function. These processes and workflows are collectively called ‘data scrubbing’.
Within this umbrella term, there are some specific sub-concepts, including de-duplication, consolidation, validation, enriching, formatting, extraction, standardization, normalization, and refactoring.
Kyle Heyne, Database Manager
Each process within a data scrub project is explained below.
Often at the early phase of a project, extraction is the process of extracting specific datasets from corporate databases using SQL or other various database connector technologies. We do this as needed, as some situations call for more consultative approaches to data handling and formatting.
Cleaning data is typically removing erroneous and invalid characters, position errors, or column misalignments. It could extend into replacing values and changing values to match expected scenario data.
Consolidation is taking data from different sources and formats and streamlining it into a master file that is easy to read. This includes formatting data from large and often unstructured sets of information or improperly formatted data, into an organized legible arrangement of columns and rows.
Enrichment is adding data from an external source to enhance the quality of the data. One example of enrichment is to provide the location, contact information, historical records, and previous addresses of a list of class members.
Validation is checking the legitimacy of the data. Often we will run a list of social security numbers that requires formal validation. We can compare those social security numbers with the Social Security Administration database to ensure legitimacy.
Normalizing data typically includes the sense of refactoring and organizing the information into preset and preconfigured buckets or categorical assignments. For example, normalization of vehicle data would force values into forming a series of data points such as Year, Make, Model, and Trim. These 4 data points are considered the fully normalized edition of a vehicle's data.
EXPERTISE, AT SCALE
A Technology Platform For Settlement Administration