I intend to setup a query on BigQuery that when it runs, it will run a machine learning on the elements that has not run it on previously.
For example, if I use object_tables to track files on cloud storage, how do I make sure that each time I run ML.PROCESS_DOCUMENT I’m not running it against the same documents I have run it previously?
Gemini suggested something as the following, but I’m not sure that ML.PROCESS_DOCUMENT function can be used that way:
SELECT
document_id,
ML.PROCESS_DOCUMENT(
MODEL `path_to_model_name`,
(
SELECT *
FROM `path_to_storage`
WHERE document_id = t.document_id
)
) AS processed_document
FROM (
SELECT document_id
FROM `path_to_storage`
WHERE
NOT EXISTS (
SELECT 1
FROM `parsed_table`
WHERE document_id = t.document_id
)
AND NOT EXISTS (
SELECT 1
FROM `parsed_table$__STREAMING_BUFFER`
WHERE document_id = t.document_id
)
) AS t;
I do get the following error:
Syntax error: Expected “)” but got identifier
path_to_model_name
at [4:15]