Privacy-safe test data on demand

Generate realistic synthetic datasets from a simple JSON schema

Define column types, ranges, and categories. Get a statistically coherent dataset instantly โ€” no real user data, no compliance risk.

Get started free View API docs

Why Synthetic Dataset Generator?

Real data has compliance risk. Synthetic data has none.

๐Ÿ—‚๏ธ

Schema-driven generation

Describe your dataset as a JSON schema with column names, types, and value ranges. No data science expertise required.

๐Ÿ”ข

All column types supported

Integer, float, boolean, and category columns โ€” each with configurable ranges, distributions, or value lists.

๐Ÿ“Š

Up to 100k rows

Generate small test fixtures or large-scale training datasets. Configurable row count up to 100,000 per request.

๐ŸŒฑ

Reproducible with seeds

Pass a random seed and get the exact same dataset every time โ€” essential for reproducible ML experiments.

๐Ÿ“ฅ

Instant CSV export

Download your dataset as CSV in one click, or pipe the base64-encoded output directly into your ML pipeline.

๐Ÿ›ก๏ธ

GDPR-safe by design

No real personal data ever enters the system. Fully synthetic output means zero compliance exposure.

How it works

Three steps to a clean, shareable dataset

1

Define your schema

Specify column names, types (int, float, bool, category), and value constraints in a simple JSON object.

2

API generates statistically valid rows

Our engine samples values according to your spec โ€” uniform, bounded integers, weighted categories โ€” ensuring statistical coherence across the full dataset.

3

Download CSV or pipe to your test environment

Get the dataset as a downloadable CSV file or consume the base64-encoded API response directly in your CI/CD pipeline.

Try it now

No account needed. Define a schema and generate your first dataset in seconds.

Open the tool โ†’

Simple pricing

Start free. Scale when you need to.

Free
$0/month
  • 20 API calls/day
  • Up to 1,000 rows
  • JSON + CSV output
  • Community support
Get started
Pro
$99/month
  • 10,000 API calls/day
  • Up to 100,000 rows
  • Priority support
  • SLA 99.9%
  • Custom distributions
Contact us

FAQ

Is generated data statistically realistic?

Yes โ€” values are drawn from configurable distributions (uniform for numerics, weighted sampling for categories). The result is coherent across all columns and suitable for model training and integration testing.

What column types are supported?

We support int (with min/max), float (with min/max and decimal precision), bool, and category (with a list of possible string values). More types coming soon.

How many rows can I generate per request?

Free tier: up to 1,000. Starter: up to 50,000. Pro: up to 100,000. For larger datasets, batch multiple requests and concatenate.

Is this GDPR-safe to use for development?

Completely. No real personal data is involved at any stage. The output is fully synthetic and carries no legal or compliance obligations under GDPR, CCPA, or HIPAA.