The Rise of Text-to-SQL in GenAI
Generative AI is often celebrated for its ability to create content, but within engineering teams at enterprises and software companies, a different use case is stealing the spotlight: Text-to-SQL (see work at Pinterest). Simply put, it’s the power to translate natural language requests into SQL queries.
But why is this such a big deal? And is it just a passing trend, or is it here to stay?
While I’m not about to dive into a history lesson, let’s take a moment to explore how we got there and why it matters.
SQL: The Language of Data
In the early days of the digital world, the need to store information—data—quickly became apparent. In 1970, Edgar F. Codd, a computer scientist at IBM, published a paper that laid the groundwork for the introduction of relational databases, revolutionizing how we store data:
Database > Data
But how would we access this data? Enter SQL. In the 1970s, Donald D. Chamberlin and Raymond F. Boyce, also from IBM, developed the early versions of SQL to interact with these databases.
With this, we had a clear path to the data:
SQL > Database (Relational) > Data
🔑 50 years later, SQL remains the language of data!
Everybody Wants to Benefit from Data
Back when knowledge was stored on physical media, anyone who understood the language of that medium could theoretically access the information. In today’s world, it is as simple as speaking English to read an English book.
The same principle applies to data: if you "speak" SQL, you can read the data:
User > SQL > Database > Data
But wait—what if not everyone knows SQL?
While direct SQL queries are often the go-to method for technical users, business applications have been developed to make data accessible to a broader audience. Take Martech applications for marketers, for example.
These applications offer drag-and-drop interfaces that translate user requests into SQL queries:
User > Drag-and-drop > SQL > Database > Data
Text-to-SQL: The Evidence
As of 2024, the typical pattern still holds:
User > Drag-and-drop [optional] > SQL > Database > Data
But with the rise of generative AI, a new question emerged: could we simplify this interface to democratize data access even further?
User > Natural Language Request > SQL > Database > Data
This shift from Natural language request > SQL is what we call the Text-to-SQL process.
Text-to-SQL: The Evidence, But for How Long?
There’s a common assumption we all share: that SQL is, and always will be, the language of data.
But here’s the thing—data has evolved dramatically. While we grew up with SQL as the go-to for data, the landscape has changed. Traditional databases and data warehouses have been joined by data lakes, which store semi-structured and unstructured data. Now, organizations are beginning to deploy vector databases to support their generative AI use cases.
💡 So, what if SQL, originally designed for relational data models, isn’t the language of data in the future?
User > Natural Language Request > ? > Database > Data
SQL’s Staying Power: A Question of Time and Innovation?
Here we are in 2024, and believe it or not, there are still jobs for Fortran developers—a language created by IBM (once again) in the 1950s...
History suggests that SQL has plenty of life left; it could be decades before enterprises fully transition away from legacy systems and relational databases.
🔑 But here’s the catch: while engineers are busy integrating generative AI into existing workflows, I can’t help but wonder if the future holds disruptive innovations that challenge the idea of SQL remaining the language of data!