Privacy-safe test data on demand

Generate realistic synthetic datasets from a simple JSON schema

Define column types, ranges, and categories. Get a statistically coherent dataset instantly — no real user data, no compliance risk.

Get started free View API docs

Why Synthetic Dataset Generator?

Real data has compliance risk. Synthetic data has none.

🗂️

Schema-driven generation

Describe your dataset as a JSON schema with column names, types, and value ranges. No data science expertise required.

🔢

All column types supported

Integer, float, boolean, and category columns — each with configurable ranges, distributions, or value lists.

📊

Up to 100k rows

Generate small test fixtures or large-scale training datasets. Configurable row count up to 100,000 per request.

🌱

Reproducible with seeds

Pass a random seed and get the exact same dataset every time — essential for reproducible ML experiments.

📥

Instant CSV export

Download your dataset as CSV in one click, or pipe the base64-encoded output directly into your ML pipeline.

🛡️

GDPR-safe by design

No real personal data ever enters the system. Fully synthetic output means zero compliance exposure.

How it works

Three steps to a clean, shareable dataset

Define your schema

Specify column names, types (int, float, bool, category), and value constraints in a simple JSON object.

API generates statistically valid rows

Our engine samples values according to your spec — uniform, bounded integers, weighted categories — ensuring statistical coherence across the full dataset.

Download CSV or pipe to your test environment

Get the dataset as a downloadable CSV file or consume the base64-encoded API response directly in your CI/CD pipeline.

Simple pricing

Start free. Scale when you need to.

Free

$0/month

20 API calls/day
Up to 1,000 rows
JSON + CSV output
Community support

Get started

Starter

$29/month

500 API calls/day
Up to 50,000 rows
Seed-based reproducibility
Email support
Usage dashboard

Start 14-day trial

Pro

$99/month

10,000 API calls/day
Up to 100,000 rows
Priority support
SLA 99.9%
Custom distributions

FAQ

Is generated data statistically realistic?

Yes — values are drawn from configurable distributions (uniform for numerics, weighted sampling for categories). The result is coherent across all columns and suitable for model training and integration testing.

What column types are supported?

We support int (with min/max), float (with min/max and decimal precision), bool, and category (with a list of possible string values). More types coming soon.

How many rows can I generate per request?

Free tier: up to 1,000. Starter: up to 50,000. Pro: up to 100,000. For larger datasets, batch multiple requests and concatenate.

Is this GDPR-safe to use for development?

Completely. No real personal data is involved at any stage. The output is fully synthetic and carries no legal or compliance obligations under GDPR, CCPA, or HIPAA.