Use XLA automatic grouping feature in a specific layer of a model
I have a tf.keras model with multiple layers. In one of the layers, there are a lot of small linear algebra operations which are causing sparse GPU utilization (I observe this in the nsight systems profiling report). I want to use XLA in order to fuse these linear algebra operations and obtain a speedup.