this question came up while translating a pretty long stream from SPSS modeler to R, but the general question is also outside of this context relevant.
Are there best practices or tips on how to create long and complex sql queries in R?
Since SPSS modeler streams consist of many operations chained after each other my first go to were either chaining common table expressions (CTE) in SQL to mimic what SPSS modeler does or using dbplyr to write piped dplyr code. Both options are possible, but not very convenient / nice to work with.
Writing long and chained CTEs in R is cumbersome and annoying since the query has to go into a string which disables syntax highlithing or one hast to copy and paste the query multiple times. Additionally, with very long CTE chains they can become hard to debug and alter.
Using dbplyr doesn’t seem like the best solution either since every table has to explicitely be defined in the beginning with code like tablename_r = tbl(connection, I("schema.table"))
instead of just using it within the query as one would when writing plain SQL. Additionally, one of the benefits of using piped dplyr code is to easily debug and change individual steps. This doesn’t properly work here since one wants to perform as much computation on the database as possible for time and performance reasons.
Do you have a third option you could recommend for this type of scenario? Am I missing a good way to properly create long SQL queries with multiple source tables in R?
Thanks!