You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I watched the good video by Llamaindex about Advanced RAG and Text-to-SQL (or Pandas). But you know, I realized quite a lot of enterprise data is now designed to be acquired by calling not a database directly but instead to remote APIs, RESTful APIs, microservices, Lambdas, SOAP web services, and others. They perform queries for data, often at the business level of abstraction. They hide the database architecture details, SQL, columnar DB, vector DB, graph DB, the type of DB is or CSV or parquet file underneath is unimportant -- we really just want to query our best Financial or our Customer data, however that's accessed should be immaterial.
What does it take to further generalize the Text-to-SQL(or to-pandas) into a Text-to-Data which can also interrogate API endpoints and API catalogs, to also get at the important enterprise data that they "contain"?
"Bypass" the remote APIs is one possible answer; go straight to their data storage. But this is not a great idea because there is typically added value in the higher semantic level which they are represented at to the applications that use them. They also have role-based security baked in here, correctly preventing uncontrolled unauthorized access to entire databases.
By now there are quite large collections of RESTful services and web services at large enterprises like financial institutions. Imagine using AI, in a way not unlike text-to-SQL, to find the best API to call for a specific data query. This feature could be awesomely helpful, especially when your company has 100s or thousands of these APIs already.
One possible way to think of getting the data from an API is to add an abstraction layer so that API query requirements are expressed in SQL even though it's really calling a RESTful API for Customers, could simply be expressed like this:
SELECT FName, LName FROM Customers WHERE customer-id==42
which, because it's only a facade (public interface) translates merely into an actual http GET to a RESTful URL in the "driver layer", not really a SQL statement run directly on an actual row oriented (tabular) database necessarily because it could be any kind of storage not just SQL that's behind the Customers RESTful URL endpoint.
Notice also it's any of a variety of storage systems being accessed here by the REST service, typically any of S3, BigTable, RDBMS, vectorDB, parquet files, CSV, etc. The abstraction helps simplify what the agent (human or computer) needs to do to get the data; the AI agent as well as the human software engineer, will just need to call the correct REST APIs, and importantly this means that they do NOT need to understand how to directly manipulate all these different storage formats.
And they do not need to understand how to directly manipulate the role-based access control / security system and act in compliance. The architectural value of the use of API instead of database direct access, is that the API that the enterprise has created for us, has been carefully designed and develop such that it can be assumed to already encapsulates business semantics, as well as the infrastructure like telemetry, analytics, authorization, and compliance behaviors that the enterprise deems important.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I watched the good video by Llamaindex about Advanced RAG and Text-to-SQL (or Pandas). But you know, I realized quite a lot of enterprise data is now designed to be acquired by calling not a database directly but instead to remote APIs, RESTful APIs, microservices, Lambdas, SOAP web services, and others. They perform queries for data, often at the business level of abstraction. They hide the database architecture details, SQL, columnar DB, vector DB, graph DB, the type of DB is or CSV or parquet file underneath is unimportant -- we really just want to query our best Financial or our Customer data, however that's accessed should be immaterial.
What does it take to further generalize the Text-to-SQL(or to-pandas) into a Text-to-Data which can also interrogate API endpoints and API catalogs, to also get at the important enterprise data that they "contain"?
"Bypass" the remote APIs is one possible answer; go straight to their data storage. But this is not a great idea because there is typically added value in the higher semantic level which they are represented at to the applications that use them. They also have role-based security baked in here, correctly preventing uncontrolled unauthorized access to entire databases.
By now there are quite large collections of RESTful services and web services at large enterprises like financial institutions. Imagine using AI, in a way not unlike text-to-SQL, to find the best API to call for a specific data query. This feature could be awesomely helpful, especially when your company has 100s or thousands of these APIs already.
One possible way to think of getting the data from an API is to add an abstraction layer so that API query requirements are expressed in SQL even though it's really calling a RESTful API for Customers, could simply be expressed like this:
SELECT FName, LName FROM Customers WHERE customer-id==42
which, because it's only a facade (public interface) translates merely into an actual http GET to a RESTful URL in the "driver layer", not really a SQL statement run directly on an actual row oriented (tabular) database necessarily because it could be any kind of storage not just SQL that's behind the Customers RESTful URL endpoint.
Notice also it's any of a variety of storage systems being accessed here by the REST service, typically any of S3, BigTable, RDBMS, vectorDB, parquet files, CSV, etc. The abstraction helps simplify what the agent (human or computer) needs to do to get the data; the AI agent as well as the human software engineer, will just need to call the correct REST APIs, and importantly this means that they do NOT need to understand how to directly manipulate all these different storage formats.
And they do not need to understand how to directly manipulate the role-based access control / security system and act in compliance. The architectural value of the use of API instead of database direct access, is that the API that the enterprise has created for us, has been carefully designed and develop such that it can be assumed to already encapsulates business semantics, as well as the infrastructure like telemetry, analytics, authorization, and compliance behaviors that the enterprise deems important.
Beta Was this translation helpful? Give feedback.
All reactions