Define column types, ranges, and categories. Get a statistically coherent dataset instantly โ no real user data, no compliance risk.
Real data has compliance risk. Synthetic data has none.
Describe your dataset as a JSON schema with column names, types, and value ranges. No data science expertise required.
Integer, float, boolean, and category columns โ each with configurable ranges, distributions, or value lists.
Generate small test fixtures or large-scale training datasets. Configurable row count up to 100,000 per request.
Pass a random seed and get the exact same dataset every time โ essential for reproducible ML experiments.
Download your dataset as CSV in one click, or pipe the base64-encoded output directly into your ML pipeline.
No real personal data ever enters the system. Fully synthetic output means zero compliance exposure.
Three steps to a clean, shareable dataset
Specify column names, types (int, float, bool, category), and value constraints in a simple JSON object.
Our engine samples values according to your spec โ uniform, bounded integers, weighted categories โ ensuring statistical coherence across the full dataset.
Get the dataset as a downloadable CSV file or consume the base64-encoded API response directly in your CI/CD pipeline.
No account needed. Define a schema and generate your first dataset in seconds.
Start free. Scale when you need to.
Yes โ values are drawn from configurable distributions (uniform for numerics, weighted sampling for categories). The result is coherent across all columns and suitable for model training and integration testing.
We support int (with min/max), float (with min/max and decimal precision), bool, and category (with a list of possible string values). More types coming soon.
Free tier: up to 1,000. Starter: up to 50,000. Pro: up to 100,000. For larger datasets, batch multiple requests and concatenate.
Completely. No real personal data is involved at any stage. The output is fully synthetic and carries no legal or compliance obligations under GDPR, CCPA, or HIPAA.