Saturday, January 6, 2024
HomeBig DataOperating SQL on Nested JSON

Operating SQL on Nested JSON


Once we surveyed the market, we noticed the necessity for an answer that might carry out quick SQL queries on fluid JSON information, together with arrays and nested objects:

The Problem of SQL on JSON

Some type of ETL to rework JSON to tables in SQL databases could also be workable for primary JSON information with fastened fields which might be recognized up entrance. Nevertheless, JSON with nested objects or new fields that “can spring up each 2-4 weeks,” as the unique Stack Overflow poster put it, is unimaginable to deal with in such a inflexible method.

Relational databases provide different approaches to accommodate extra advanced JSON information. SQL Server shops JSON in varchar columns, whereas Postgres and MySQL have JSON information varieties. In these eventualities, customers can ingest JSON information with out conversion to SQL fields, however take a efficiency hit when querying the information as a result of these columns assist minimal indexing at greatest.

SQL on Nested JSON Utilizing Rockset

With a lot of fields that change, get added/eliminated, and so on, it may be quite cumbersome to take care of ETL pipelines. Rockset was designed to assist with this drawback—by indexing all fields in JSON paperwork, together with all sort info, and exposing a SQL API on prime of it.

For instance, with a Rockset assortment named new_collection, I can begin by including a single doc to an empty assortment that appears like:

{
    "my-field": "doc1",
    "my-other-field": "some textual content"
}

… after which question it.

rockset> choose "my-field", "my-other-field" 
         from new_collection;

+------------+------------------+
| my-field   | my-other-field   |
|------------+------------------|
| doc1       | some textual content        |
+------------+------------------+

Now, if a brand new JSON doc is available in with some new fields – possibly with some arrays, nested JSON objects, and so on, I can nonetheless question it with SQL.

{
    "my-field": "doc2",
    "my-other-field":[
        {
            "c1": "this",
            "c2": "field",
            "c3": "has",
            "c4": "changed"
        }
    ]
}

I add that to the identical assortment and might question it simply as earlier than.

rockset> choose "my-field", "my-other-field" 
         from new_collection;

+------------+---------------------------------------------------------------+
| my-field   | my-other-field                                                |
|------------+---------------------------------------------------------------|
| doc1       | some textual content                                                     |
| doc2       | [{'c1': 'this', 'c2': 'field', 'c3': 'has', 'c4': 'changed'}] |
+------------+---------------------------------------------------------------+

I can additional flatten nested JSON objects and array fields at question time and assemble the desk I wish to get to – with out having to do any transformations beforehand.

rockset> choose mof.* 
         from new_collection, unnest(new_collection."my-other-field") as mof;

+------+-------+------+---------+
| c1   | c2    | c3   | c4      |
|------+-------+------+---------|
| this | area | has  | modified |
+------+-------+------+---------+

Along with this, there’s sturdy sort info saved, which implies I will not get tripped up by having blended varieties, and so on. Including a 3rd doc:

{
    "my-field": "doc3",
    "my-other-field":[
        {
            "c1": "unexpected",
            "c2": 99,
            "c3": 100,
            "c4": 101
        }
    ]
}

It nonetheless provides my doc as anticipated.

rockset> choose mof.* 
         from new_collection, unnest(new_collection."my-other-field") as mof;

+------------+-------+------+---------+
| c1         | c2    | c3   | c4      |
|------------+-------+------+---------|
| sudden | 99    | 100  | 101     |
| this       | area | has  | modified |
+------------+-------+------+---------+

… and the fields are strongly typed.

rockset> choose typeof(mof.c2) 
         from new_collection, unnest(new_collection."my-other-field") as mof;

+-----------+
| ?typeof   |
|-----------|
| int       |
| string    |
+-----------+

If having the ability to run SQL on advanced JSON, with none ETL, information pipelines, or fastened schema, sounds fascinating to you, you must give Rockset a attempt.





Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments