I am trying to work with JSON files validation using Apache Drill. I am working with both linear and nested JSON data. I have stored the JSON file in Drill tmp storage dfs/tmp/employee_src_reg_json.json. The JSON data inside the file looks like this:
{
"employee_reg": [
{
"EmployeeID": 1,
"FirstName": "John",
"LastName": "Doe",
"Age": 35,
"Salary": 60000,
"Department": "Engineering",
"Experience": 8
},
{
"EmployeeID": 2,
"FirstName": "Jane",
"LastName": "",
"Age": 28,
"Salary": 50000,
"Department": "Marketing",
"Experience": 5
},
{
"EmployeeID": 3,
"FirstName": "Michael",
"LastName": "Johnson",
"Age": 40,
"Salary": 70000,
"Department": "Finance",
"Experience": 12
},
{
"EmployeeID": 4,
"FirstName": "Emily",
"LastName": "Williams",
"Age": 32,
"Salary": 65000,
"Department": "Human Resources",
"Experience": 9
}
]
}
I need a query to get the column names in Apache Drill. I tried the following queries but did not get the column names:
1)DESCRIBE dfs/tmp/employee_src_reg_json.json – this gives no data available.
2)DESCRIBE (SELECT * FROM (SELECT FLATTEN(t.employee_reg) AS emp FROM dfs.tmp.employee_src_reg_json.json t) LIMIT 1) – this gives the first JSON array values but not the column names.
3)(SELECT FLATTEN(t.employee_reg) AS flatdata FROM dfs.tmp.employee_src_reg_json.json t) – this gives something like:
00-00 Screen
00-01 Project(flatdata=[$0])
00-02 Flatten(flattenField=[$0])
00-03 Project(flatdata=[$0])
00-04 Scan(table=[[dfs, tmp, employee_src_reg_json.json]], groupscan=[EasyGroupScan [selectionRoot=file:/tmp/employee_src_reg_json.json, numFiles=1, columns=[employee_reg], files=[file:/tmp/employee_src_reg_json.json], usedMetastore=false, limit=-1, formatConfig=JSONFormatConfig [extensions=[json]]]])
{
"head" : {
"version" : 1,
"generator" : {
"type" : "ExplainHandler",
"info" : ""
}
}
4)SHOW COLUMNS IN (SELECT FLATTEN(t.employee_reg) FROM dfs.tmp.employee_src_reg_json.json t);
How can I get the column names from the JSON file?
Sarmila Mohanraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.