As per (poor) google’s official QUERY documentation here we can filter our tables by referencing individual columns, for example like:
=QUERY(A:I; "select A,B where D > 9000"; 1)
this may be sufficient for small-scale projects where we either use simple or more advanced queries to extract data we need, however, in case we would have a massive lookup table (like 50+ columns) and we would need to filter-down multiple smaller tables in rapid succession (with no intention to store them) is there some way how we could query more easily? for example, directly by column headers / labels (avoiding remembering column numbers/positions and triple checking if 47th column is really the right one)
quick search reveals there was already an attempt to achieve this with BETTERQUERY, but I would be interested in a simpler alternative (for less tech-savvy populus)
is there some alternative way to query, maybe with some universal formula that could translate various instances and variations, like the example below, with ease?
=QUERY(A:I; "select <name>,<email> where <sales> > 9000"; 1)
officially, there are two sources of documentation:
-
mentioned QUERY function/formula
-
and QUERY language reference
unfortunately, both of them are poor, and QUERY formula is a function probably with the most undocumented features (bugs) out there. there are only two variants of syntax:
-
if range is an active reference, the syntax of 2nd QUERY parameter accepts column letters as input:
=QUERY(A:C; "select B,A"; )
-
if range is an construct (of arrays or another formula), the syntax of 2nd QUERY parameter accepts only
Col
references as input:=QUERY({A:C}, "select Col2,Col1", )
=QUERY({A:A, B:C}, "select Col2,Col1", )
=QUERY({IMPORTRANGE("id", "Sheet1!A:C")}, "select Col2,Col1", )
=QUERY(INDIRECT("Sheet1!A:C"), "select Col2,Col1", )
for locale differences (when to use instead of
,
) and how to convert any formula syntax to non-english semicolon ;
version, refer to this answer: /a/73767720/5632629
fun fact: since 2024 we are allowed (this was previously not possible) to use Col
references without wrapping simple range reference into array brackets { }
.
side notes for QUERY rookies:
-
the 2nd QUERY parameter has its own ecosystem. in there, do not use semicolons
;
nor backslashesas column/item separators – only commas
,
=QUERY(A:C; "select B,C"; 0)
-
it’s a good practice to not forget the 3rd QUERY parameter, even if you don’t need it. you can use 0 or leave it blank like:
=QUERY(A:C; "select B,C"; )
-
and if your dataset is mixed (eg. column contains numeric values formatted as numbers and text strings formatted as plain text) QUERY contains a bug (not a feature) manifesting in a particular way – QUERY will take over and make self-assumption, that you don’t need the minority of that column’s data, and it will show you only the majority of dataset. in 99% of cases, this is not a desired outcome that the user wishes, so in order to not “loose” the data, this needs to be combated either via internal formatting of the particular column or more drastically by forceful conversion of whole or partial range into plain text (in various ways) based on needed scenario. for example:
=ARRAYFORMULA(QUERY({A:A, TO_TEXT(B:B), C:C}), "select Col2,Col3", 1))
now to answer the question – yes, there is an alternative how to select columns based on the table headers (of course dynamically, not by hard-coding it into formula). this setup will require some helper fields for sake of convenience and speed.
this is an example of TURBOQUERY concept, version 1.0 (feel free to edit it in ways to fit your needs. as of now, no plans in motion to release v2.0 so if you significantly improve it you are welcome to share it)
Sheet1:
Sheet2:
the only formula from B12:
=ARRAYFORMULA(IFERROR(LET(xx, Sheet1!A:Z,
select, B3:B9, where, C3:C9, logic, D3:D9, sortby, E3:E9, order, F3:F9,
r, REGEXREPLACE(where, "(?i)(?:^(.?not )|(?:^( )|(?:^not )|(<|>|=|!|is (?:not )?null|matches|contains|starts|ends).*", ), QUERY({xx},
"select "&IF(SUM(LEN(select))=0, "*", TEXTJOIN(",", 1, IFNA("Col"&MATCH(select, INDEX(xx, 1), ))))&
" where 1=1 "&IF(SUM(LEN(where))=0,,TRIM(TEXTJOIN(, 1, IF((logic="")*(where<>""),
" AND ", " "&logic&" ")&IFNA(REGEXEXTRACT(where, "(not|( not|not|("))&IFNA(" Col"&MATCH(TRIM(r), INDEX(xx, 1), ))&" "®EXEXTRACT(where, r&"(.*)"))))&IF(SUM(LEN(sortby))=0,,
" order by "&TEXTJOIN(",", 1, IFNA("Col"&MATCH(sortby, INDEX(xx, 1), ))&IF((sortby<>"")*(order<>""), " "&order&" ", ) )), 1)), "syntax error"))
data validation:
- ranges B3:B9 and E3:E9 are dropdowns fed from Sheet1!1:1 row (headers)
- range D4:D9 is a dropdown list with
OR
item only - range F3:F9 is a dropdown list with
DESC
item only
spreadsheet demo copy
what is supported (also included in notes):
-
supports SELECT parameter
-
supports WHERE parameter
-
supports OR/AND logic gates
-
supports SORT BY parameter
-
supports ORDER BY parameter
-
SELECT fields can be left empty to get the whole table
-
SELECT fields can accept only valid header names
-
header names can be case-insensitive
-
WHERE fields can be left empty if no conditions are needed
-
WHERE fields can accept only valid header names followed (or preceded) by valid syntax conditions
-
WHERE fields support these conditions:
>
<
>=
<=
!=
<>
=
not
(preceded)
is null
is not null
matches
(supports regex)
contains
starts with
ends with
-
conditions can be case-insensitive
-
parentheses are supported too for OR hierarchy, but opening
(
need to be followed by space -
numbers formatted as plain text and text needs to be single-quoted as by standard
-
supports regex commands after
matches
condition -
within LOGIC fields, AND logic is set by default
-
within LOGIC fields, override with OR logic is possible if needed
-
LOGIC fields are bound with WHERE fields and offset by 1 row down
-
if WHERE fields contain only one entry OR logic should not be used
-
SORT BY fields can be left empty if no sorting is needed
-
SORT BY fields can accept only valid header names
-
header names can be case-insensitive
-
within ORDER fields, ASC-ending order is set by default
-
within ORDER fields, override with DESC-ending order is possible if needed
-
ORDER fields are bound with SORT BY fields
suggested improvements for v2.0:
- better regex patterns within REGEXREPLACE part of the formula
- extended support for more conditions (like
like
,date
…) - extended support for more clauses (like
limit
,offset
,format
,pivot
,label
,group by
) - adding support for arithmetic (
+
,-
,×
,÷
) capabilities - adding support for aggregating capabilities (
avg
,max
,min
,count
,sum
) - adding support for scalar functions (like
year
,month
,day
,upper
,lower
…) - adding more fail-safes for exotic header names and much more
- everything is possible!
few words about formula:
not really much to say. it is a simple, lightweight formula that takes input fields and auto-assembles the query string dynamically based on a pre-defined ruleset and a few fail safes. nothing special. the main idea is to build a generator. row with headers is transformed into dropdown menu, and then simple MATCH fx looks for the correct column of invoked selection, and transforms it into Col
number reference. this way the end-user is able to generate various tables/reports without constantly fiddling with QUERY formula itself, and due to its “friendliness” even user with no formulae skill will be happy and able to interact with a dataset (win-win with grandma-proof stamp). few other generator examples:
- QUERY formula generation
- array of IMPORTRANGE generation
- dynamic sheet name generation
- alternative sheet name generation
- simple range generation
- simple string generation