I have a dataset and a table in google big query (BQ). For the dataset, I can add description, and for the table I can add description and column policy tags to control column level access (I am ignoring the “Labels” and “Tags” that one can attach to any BQ resource).
Next, in Dataplex, I created a lake and a zone, and then attached the previous BQ dataset to the zone.
Then I searched for the BQ table in the “Search” page under the “Discover” page in Dataplex. 2 results come up, one with “System” as “BIGQUERY” and one with “System” as “DATAPLEX”. When I select the 2 results, I find the following points:
- The one with System as BIGQUERY refers to the BQ table whereas the one with Dataplex refers to the entity created for the table in the Dataplex zone.
- For the one with System as BIGQUERY, I can add an Overview, Steward and attach Tags using Tag templates. Additionally, for the columns, I can attach tags and add business terms. For the one with System as Dataplex, I cannot add an Overview or Steward, but can attach Tags using Tag templates and Attributes. Additionally, for the columns, I can only add Column Attributes.
What I understand is that the entry with System as BIGQUERY is Data Catalog metadata (the url contains the string …entryGroups/@bigquery/entries/…) whereas the one with System as Dataplex is a Dataplex entry. Also, for the same table, I was able to add different metadata using Data Catalog and the Dataplex entry. The system is perfectly fine with it, the metadata from Data Catalog does not surface into the Dataplex entry and vice versa, and metadata from both does not surface in BQ UI.
Is the above behavior expected? Seems that there are 3 sources of metadata for the same table, one in BQ, one in Data Catalog, and one in the Dataplex entry, all independent of each other (albeit the Data Catalog and Dataplex metadata is a superset of the BQ metadata).