Test data generation tools are critical to the success of your software project. They generate sample test data that is utilized in executing test cases. But which is best? There answer depends on the unique needs of your business and DevOps team.
What is test data?
As the name implies, test data is data that DevOps teams use to test their applications in development. Developers need to ensure they are testing their applications under conditions that closely approximate actual production environments.
For example, a DevOps team would want to simulate a high volume of users performing a variety of actions to ensure the application will be able to sustain such traffic. Testing the application with realistic data makes the development process more robust and helps developers catch errors that could come back to haunt them once the app is in production.
In some cases, there may already be real data available for testing. For example, if developers are working on an update to an existing application, there may be a wealth of data they can use to test the features of the new version. But for brand new applications, no such data exists yet.
That’s where test data generator tools come into play—they help developers create sensible data sets that mirror realistic data. DevOps turns to these tools when no existing data is available.
Top test generator tools comparison
Here are our top picks for test generator tools:
Table of Contents
Broadcom Test Data Manager
Broadcom’s Test Data Manager tool provides the capability to quickly locate, secure, design, create, and provision ‘fit for purpose’ test data. This helps optimize test cycles so DevOps teams can deliver applications faster. Test Data Manager can also enhance the quality of production data by filling gaps in test data coverage, thus creating all the data needed to cover continuous testing requirements.
Organizations can use Test Data Manager to find and match data to the specific tests it can run, then provision the data automatically on-demand and in parallel. Some organizations have reported a 90-95 percent reduction in the time taken to provision high-quality test data.
Key differentiators
- Rated as the top champion by the recent Bloor Research report.
- Has the highest practitioner rating in the Gartner Peer Insight for test data masking solutions.
- Can build a data model from heterogeneous data sources and scan for PII.
- Provides secure masked test data to application teams.
- Generates synthetic test data to increase test data coverage.
- Performs PII audits to ensure compliance to industry regulations.
- Centrally stores data as a reusable asset.
- Provides self-service forms to find, view, analyze, and observe test data.
DTM Data Generator
DTM Data Generator produces data for high-quality and realistic test arrays. It automatically creates data values or schema objects such as views, procedures, tables, and triggers. Use cases include test database population, performance analyzing, QA testing, and loading tests fulfillment.
Key differentiators
- Supports IBM DB2, MySQL, Firebird, Oracle and Microsoft SQL server.
- Offers 15 methods to fill in fields with random and repeatable data models.
- Includes 70 built-in functions and an expression processor to define complex test data with dependencies, relationships, and internal structure.
- Desktop formats include SQLite, Microsoft Access, Excel, and DBF.
- Supports unified database interfaces (ODBC, OLE DB) and native Oracle Call Interface.
- Supports a rich set of external data sources such as databases, CSV/text files, Excel spreadsheets, XML documents, JSON files, Access files, web resources, and user-defined scripts.
Generatedata.com
Generatedata.com is an open-source project that can be downloaded from Github. It requires no developer experience to set up and configure, so it’s extremely user-friendly. This web tool is written in PHP, JavaScript, and MySQL. It allows you to quickly generate large volumes of custom data (up to 5,000 records at a time) in a variety of formats for use in testing software and populating databases.
Key differentiators
- Easy-to-use interface.
- Allows you to preview what you’re generating while you’re building it.
- Includes 30+ types of data to generate such as names, emails, countries, and others.
- Offers 10+ generation formats including JSON, CSV, XML, and SQL.
- Interconnects data like related country, region, city, etc.
- Saves data sets with a user account.
- Offers an online demo to provide a sense of what the script does, what features it offers, and how it works.
EMS Data Generator
EMS Data Generator is a tool for simulating a database production environment. It can be used to populate multiple tables with test data concurrently and create custom tables and fields for test data generation. EMS Data Generator also allows you to preview the generated data and edit it in the SQL script without executing queries on the server.
Key differentiators
- Offers discrete editions for SQL Server, Oracle, MySQL, InterBase, DB2, PostgreSQL and others.
- Allows you to generate SQL Server test data using generation templates.
- Provides automatic control over referential integrity for linked tables.
- Offers customization options for the data generation process.
- Different generation types for each field include list and random generation, generation into two and more fields simultaneously.
- Includes a command-line utility to generate data using the template file.
Mockaroo
Mockaroo can design mock APIs and help parallelize UI and API development as part of delivering better applications faster today. It is designed to simplify the development process by allowing non-programmers to quickly and easily download large amounts of randomly generated test data based on specific requirements.
Key differentiators
- Can generate up to 1,000 rows of realistic test data in CSV, JSON, SQL, and Excel formats.
- Available as a docker image to deploy in a private cloud.
- Those with Google accounts can download random data programmatically.
- Does not require programming.
- Supports Base64 image URL type and repeating XML elements.
- Applies formulas to any data type, custom frequencies for lists, restrict locations to specific countries.
Redgate SQL Data Generator
Redgate’s SQL Data Generator creates realistic data rapidly. It provides generators based on table and column names, data types, field length, and other constraints. This test data tool supports inter-column dependency, command-line data generation, and automatic data conversion when your sources have different data types.
Key differentiators
- Manual control for creating foreign key data.
- Seeded random data generation creates the same collection of data every time.
- Imports data from existing sources, and optionally disables triggers and constraints to avoid interfering with database logic.
- Generates test data at row level.
- Offers over 60 built-in generators with sensible configuration options.
- Writes custom generators in Python.
DATPROF Privacy
DATPROF Privacy masks test data and generates synthetic data. This ensures that customer data is protected, but software teams can still use it as representative test data without violating privacy or security rules. Data masking is part of some compliance requirements, so this is an especially good tool for organizations in heavily regulated industries like healthcare and financial services.
Key differentiators
- Preserves data characteristics.
- High performance on large data sets.
- Consistent over multiple applications and databases.
- XML and CSV file support.
- Synthetic data generation.
- Masks privacy-sensitive data and maintains compliance with regulations such as GDPR, PCI, and HIPAA.
DataMasque
Similarly, the DataMasque app helps reduce compliance exposure by masking data before it’s utilized for application development. It takes actual customer or application database information and masks the contextual information effectively. All data transmissions are encrypted between DataMasque and data sources, so your data isn’t at risk when it’s in transit.
Key differentiators
- Available on AWS and Cohesity marketplaces.
- Accelerates application development with high fidelity data without compromising data sovereignty.
- Safeguards sensitive information from malicious actors without rendering data unusable for consumers.
- Avoids the use of simplistic masking techniques like data shuffling that offer inadequate protection.
- Generates realistic, diverse, and functional datasets representative of production data.
How to choose the right test data generator
There is no single test data generator solution that works for any and all data sets and applications. It is a case of matching the test data generator to the workloads being used. Therefore, in the selection process, it is best to leverage several tools and utilize the one that is most applicable.
Some solutions only work with specific databases and platforms. They also generate data sets in several different ways based on how their algorithms are written. Some tools provide worthwhile sample databases for testing—others may not. Finding the right fit often takes some trial and error.
Feature-wise, some platforms are basic and only provide a data set. Others add security, data import and export, reporting, and other functions in addition to basic test data generation capabilities.
In some cases, it may be wise to talk to stakeholders on your team who may be involved in similar development projects. They may be able to offer advice on the tools they find most valuable.
Read next: Google Reveals Security Best Practices Behind DevOps Success