Great Expectations is the leading open source library for fighting technical debt in data pipelines, through data validation, profiling, and documentation.
We are super active in the Great Expectations public Slack channel. Please join us there with questions, feedback, or just to talk shop about data.
A little history: Great Expectations started as a side project in 2017. We launched publicly at Strata in March 2018. Since then, the project has steadily gained popularity, but it's never been a full-time gig.
We’ve gone heads-down to transition Great Expectations from a side project into a truly scalable open source community. We'll be unveiling and sharing more later this fall. In the meantime, if you'd like a preview of the next generation of tools for beating pipeline debt, please reach out on Slack.
The Great Expectations community has spoken and we are taking action.
We've spoken with dozens of data teams to learn what they want from their data validation framework. Although infrastructure choices lead to thousands of different data pipeline configurations, almost all deployment patterns for data testing fall into a few specific categories. Within these categories, many data teams have been building in-house versions of the same components and business logic.
Version 0.8.0 of Great Expectations will support production deployment out of the box. Instead of building these components for yourself over weeks or months, you will be able to add production-ready validation to your pipeline in a day. This “Expectations on rails” framework plays nice with other data engineering tools, respects your existing name spaces, and is designed for extensibility.
Maintaining data documentation is time consuming, thankless, and crucial. As a result, many data systems suffer from “documentation rot,” where data documentation is chronically outdated, incomplete, and therefore only semi-trusted.
Great Expectations’ compile-to-docs feature flips the normal workflow by allowing teams to compile their Expectation test suites into clean, human-readable data documentation. Since documentation is compiled from tests and tests are run regularly, your documentation is guaranteed to never go stale.
Data exploration workflows have always been hard to scale. Data analysts spend hours staring and poking at data. They learn a lot, but those learnings are hard to capture, share, and replicate.
Great Expectations allow teams to persist learnings from data exploration as tests and documentation. We then turbocharge the process by providing data profiling algorithms, checklists, and UI widgets for the capture-share-replicate dev loop.
Starting in v0.8.0, plugins are a first-class citizen in Great Expectations. Every component of the framework is designed to be extensible: Expectations, storage, profilers, renderers for documentation, actions taken after validation, etc.
We’ll be releasing “official” plugin libraries, and—more importantly—publishing guides on how to build and share your own. Many alpha users of the new framework are excited to share plugins. Each one takes a clever hack for thinking or communicating about data, and captures it in a tidy, shareable package.
We're very excited to see what the data community comes up with once we make this capability widely available.
We dont' like to brag, but we don't mind letting our customers do it for us. Here are a few nice things folks have said about our themes over the years.
Beau Cronin, Partner at The Data Guild
“Speaking as a long-time data person, this is one of the most exciting products I’ve ever seen. Something I’ve wanted for years without knowing I wanted it. Great Expectations is going to dramatically reshape the way data teams work together.”
Matt Gee, CEO at BrightHive
“BrightHive combines and aggregates data across many state and local government entities. The data is so messy and inconsistent that we couldn’t do what we need to do without a tool like Great Expectations. This product literally makes the rest of my business possible.”
Kamla Kasichainula, Data Engineer at Calm
"Now that we're using Great Expectations we get notified ahead of time when data does not look right, giving us time to investigate and alert data users before they find out themselves"
Elizabeth Sander, Sr. Data Scientist, Civis Analytics
"I wanted to set up my project in a way that someone else would be able to understand what I expected the data to be. GE was really useful for documenting my expectations about the data in a structured way"
Gerry Manoim, Engineer @ Quantopian
“Great Expectations got everybody on the same page on what data should look like. There is a lot less mental overhead of every team member tracking that.”
It is super easy...
$ pip install great_expectations
$ git clone https://github.com/great-expectations/great_expectations.git
$ pip install great_expectations/