What does MinusX do?
A process to model your entire company's data
Why Most 'Talk to Data' Falls Short
Most 'Talk to Data solutions' fails because they lack the right context. Without clear data models, AI systems are forced to guess at ambiguous business terms, leading to inconsistent answers across your organization. Complex queries become error-prone, and tribal knowledge stays siloed within individual teams rather than being codified into a shared understanding.
Unlike coding, where the cost of wrong code can be reviewed and fixed, the users of data are often not proficient in SQL or data modeling. A system that is 90% accurate is not good enough for data as errors compound. This is why most AI solutions fail to deliver on their promises.
To get accurate AI-powered data insights, you need well-modeled data — a clear semantic layer where business entities, metrics, and relationships are unambiguous. This is where dimensional modeling comes in: properly structured facts and dimensions that represent your business logic consistently.
Building this semantic layer manually takes months of data engineering work. MinusX does it in hours, clarifying and codifying business logic from your team's collective knowledge—extracting it from people's heads and transforming it into an automated system that allows everyone in your organization to get a reliable 'Talk to data' experience.
Steps 1 - 2: Input & Clarification
MinusX first connects to your data warehouse and learns your business logic through an interactive process.
Submit Business Questions + SQL
Provide at least 30 business questions with their "Gold SQL" queries. This is the foundation for understanding your business logic.
Business Questions + Gold SQL
Navigate through example business questions and their corresponding "Gold SQL" queries.
What is our revenue by customer cohort with month-over-month retention?
WITH customer_cohorts AS ( SELECT user_id, DATE_TRUNC('month', MIN(created_at)) as cohort_month, MIN(created_at) as first_purchase_date FROM orders WHERE status = 'completed' GROUP BY user_id ), monthly_activity AS ( SELECT o.user_id, DATE_TRUNC('month', o.created_at) as activity_month, SUM(o.amount) as monthly_revenue, COUNT(DISTINCT o.id) as order_count, AVG(o.amount) as avg_order_value FROM orders o WHERE o.status = 'completed' GROUP BY o.user_id, DATE_TRUNC('month', o.created_at) ), cohort_analysis AS ( SELECT cc.cohort_month, ma.activity_month, EXTRACT(MONTH FROM AGE(ma.activity_month, cc.cohort_month)) as months_since_first, COUNT(DISTINCT ma.user_id) as active_customers, SUM(ma.monthly_revenue) as cohort_revenue, AVG(ma.order_count) as avg_orders_per_customer, AVG(ma.avg_order_value) as avg_order_value FROM customer_cohorts cc JOIN monthly_activity ma ON cc.user_id = ma.user_id GROUP BY cc.cohort_month, ma.activity_month ) SELECT cohort_month, activity_month, months_since_first, active_customers, cohort_revenue, avg_orders_per_customer, avg_order_value, cohort_revenue::FLOAT / NULLIF(LAG(cohort_revenue) OVER (PARTITION BY cohort_month ORDER BY activity_month), 0) - 1 as mom_revenue_growth FROM cohort_analysis ORDER BY cohort_month DESC, activity_month DESC;
💡 Why "Gold SQL"? These 100% correct queries act as ground truth. By providing 30+ accurate questions with SQL, you teach MinusX your exact business definitions, ensuring the AI-generated dimensional model captures your logic precisely.
AI Clarification Process
Our model analyzes your SQL to identify ambiguities and engages in an interactive dialogue to clarify business logic.
AI Clarification Process
MinusX asks SQL-focused questions to resolve ambiguities. Interact with the dialog below.
Q1: I notice you calculate 'profit' differently across your queries. To ensure consistency in the dimensional model, please provide the SQL expression for calculating profit:
Or type your custom SQL expression above
🔄 Iterative: This continues for 15-30 questions until MinusX has complete understanding of your business logic, definitions, and SQL patterns.
Steps 3 - 4: ETL Generation & Dimensional Modeling
MinusX then generates a complete ETL pipeline and dimensional model that runs in your warehouse.
Automated ETL Generation
SQL files organized as a dependency tree (like dbt/Quary) that transform your source tables into a dimensional model.
SQL Dependency Tree (DAG)
Click nodes to view SQL transformations. Data flows from source → intermediate → final.
Dimensional Modeling
Properly structured facts and dimensions that represent your queries efficiently — the semantic layer for accurate AI.
Star Schema: Facts & Dimensions
The final dimensional model follows star schema design — a central fact table surrounded by dimension tables. Click any table to view its columns.
⭐ Star Schema Benefits: Optimized for analytical queries, easy to understand, fast aggregations, and minimal joins. Your 30+ business questions now run 10-100x faster using this model.
Before MinusX
WITH customer_orders AS ( SELECT c.customer_id, c.customer_name, c.email, c.tier, o.order_id, o.order_date, o.status, o.subtotal, o.discount_amount, o.tax_amount FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id WHERE c.is_active = true ), order_details AS ( SELECT co.customer_id, co.order_id, oi.product_id, oi.quantity, oi.unit_price, p.category, p.subcategory, (oi.quantity * oi.unit_price) as line_total FROM customer_orders co LEFT JOIN order_items oi ON co.order_id = oi.order_id LEFT JOIN products p ON oi.product_id = p.product_id WHERE co.status = 'completed' ), customer_metrics AS ( SELECT co.customer_id, COUNT(DISTINCT co.order_id) as order_count, SUM(co.subtotal - co.discount_amount + co.tax_amount) as total_revenue, AVG(co.subtotal - co.discount_amount + co.tax_amount) as avg_order_value, MAX(co.order_date) as last_order_date FROM customer_orders co WHERE co.status = 'completed' AND co.order_date >= '2024-01-01' GROUP BY co.customer_id ) SELECT co.customer_name, co.email, co.tier, cm.order_count, cm.total_revenue, cm.avg_order_value, cm.last_order_date, COALESCE(cat_summary.categories, 0) as categories_purchased FROM customer_orders co INNER JOIN customer_metrics cm ON co.customer_id = cm.customer_id LEFT JOIN ( SELECT customer_id, COUNT(DISTINCT category) as categories FROM order_details GROUP BY customer_id ) cat_summary ON co.customer_id = cat_summary.customer_id WHERE co.customer_id IS NOT NULL GROUP BY co.customer_name, co.email, co.tier, cm.order_count, cm.total_revenue, cm.avg_order_value, cm.last_order_date, cat_summary.categories ORDER BY cm.total_revenue DESC
Multiple CTEs, nested subqueries, complex joins - repeated across every similar query
After MinusX
SELECT c.customer_name, c.email, c.tier, f.order_count, f.total_revenue, f.avg_order_value, f.last_order_date, f.categories_purchased FROM fact_orders f JOIN dim_customers c ON f.customer_id = c.customer_id WHERE f.status = 'completed' AND f.order_date >= '2024-01-01'
Simplified joins, pre-computed metrics, consistent logic
The Result: True company-wide data accessibility
With a well-modeled semantic layer in place, along with ground-truth business questions expressed using this layer, MinusX AI agents can now answer any business question accurately — from simple reporting to deep research queries — enabling your entire organization to get reliable insights anytime.
Accurate Answers
AI understands your business context and delivers consistent, reliable insights
Blazing Fast
Optimized dimensional queries return results in seconds, not minutes
Your Infrastructure
All data stays in your warehouse. Full control, zero vendor lock-in
Expandable
Ask new business questions anytime to extend your semantic layer as your needs evolve, or extend the version-controlled SQL files yourself
Backward Compatible
All transformations run in a new schema. Your existing data and queries remain untouched
Legible Queries
Well-modeled data means ALL queries become more readable, maintainable, and easier to debug
See It In Action
Ask any question directly and get instant, accurate answers powered by your dimensional model
Ready to Build Your Semantic Layer?
Let us show you how MinusX can transform your data warehouse into a true talk-to-data experience in hours, not months.