In R, is it possible to write a function such that SQL queries can be correctly formatted?
Is it possible to write a function which takes unformatted SQL code and using a series of looped ifelse statements (i.e. after seeing a special keyword, e.g. a new CTE, subquery, GROUP, CASE …. puts the next lines of SQL code on the next line with the correct indents)?
After much work, I got a small proof of concept to somewhat work:
<code>format_sql <- function(sql) {
keywords <- c("SELECT", "FROM", "WHERE", "GROUP BY", "ORDER BY")
indent <- " " # Define the indentation (four spaces in this case)
for (i in seq_along(keywords)) {
sql <- gsub(paste0("\b", keywords[i], "\b"), paste0("n", strrep(indent, i), keywords[i]), sql, ignore.case = TRUE)
}
cat(sql)
}
sql_query <- "SELECT column1, column2 FROM table WHERE column1 > 10 GROUP BY column2 ORDER BY column1 DESC"
</code>
<code>format_sql <- function(sql) {
keywords <- c("SELECT", "FROM", "WHERE", "GROUP BY", "ORDER BY")
indent <- " " # Define the indentation (four spaces in this case)
for (i in seq_along(keywords)) {
sql <- gsub(paste0("\b", keywords[i], "\b"), paste0("n", strrep(indent, i), keywords[i]), sql, ignore.case = TRUE)
}
cat(sql)
}
sql_query <- "SELECT column1, column2 FROM table WHERE column1 > 10 GROUP BY column2 ORDER BY column1 DESC"
</code>
format_sql <- function(sql) {
keywords <- c("SELECT", "FROM", "WHERE", "GROUP BY", "ORDER BY")
indent <- " " # Define the indentation (four spaces in this case)
for (i in seq_along(keywords)) {
sql <- gsub(paste0("\b", keywords[i], "\b"), paste0("n", strrep(indent, i), keywords[i]), sql, ignore.case = TRUE)
}
cat(sql)
}
sql_query <- "SELECT column1, column2 FROM table WHERE column1 > 10 GROUP BY column2 ORDER BY column1 DESC"
The output looks like this:
<code> format_sql(sql_query)
> format_sql(sql_query)
SELECT column1, column2
FROM table
WHERE column1 > 10
GROUP BY column2
ORDER BY column1 DESC
</code>
<code> format_sql(sql_query)
> format_sql(sql_query)
SELECT column1, column2
FROM table
WHERE column1 > 10
GROUP BY column2
ORDER BY column1 DESC
</code>
format_sql(sql_query)
> format_sql(sql_query)
SELECT column1, column2
FROM table
WHERE column1 > 10
GROUP BY column2
ORDER BY column1 DESC
Can someone please show me how to do this?
Thanks!
References:
- Is there a way to auto format SQL in R?
- Can I gracefully include formatted SQL strings in an R script?