Let’s say I have a (large) list of (large) dicts, for instance:
my_list = [
{
"foo": "bar",
"foobar": "barfoo",
"something": 1,
"useless_bool": True
},
{
"foo": "rab",
"foobar": "oofrab",
"something": 1,
"useless_str": "different_value"
},
...
]
Where this list contains thousands of dicts, and where each dicts can have hundreds of keys. Also I have a Jinja template, for instance:
jinja_template = jinja2.Environment().from_string(
"{{ foo }} - {{ foobar }} - {{ something }}"
)
For each dict in my list, I want add a new key containing the rendered Jinja template (using other existing keys), and remove all other keys. To end with something like:
[
{
"new_key": "bar - barfoo - 1"
},
{
"new_key": "rab - oofrab - different_value"
},
...
]
So far I made a (dumb) loop to do so:
import jinja2
new_message_template = jinja2.Environment().from_string("{{ foo }} - {{ foobar }} - {{ something }}")
for item in my_list:
# Render the new value
rendered_value = new_message_template.render(**item)
# Remove all existing keys
item.clear()
# Set the new key/value
item["new_key"] = rendered_value
This works.
But considering this list contains thousands of dicts, with each having hundreds of keys, is there a more optimized way to perform this operation, in term of performances and execution time ?
1
Couple of things:
-
jinja2.Environment().from_string("{{ foo }} - {{ foobar }} - {{ something }}")
this expression is currently getting ignored. I’m assuming this is supposed to be assigned tonew_message_template
-
You are doing in place mutations on the dictionary. Why? Just create a new list of transformed dicts.
-
Jinja 2 template rendering is I/O bound making it a good candidate for thread based concurrency. Threading is good for I/O bound tasks because python threads run concurrently while waiting for I/O operations to complete. You can use
ThreadPoolExecutor
import jinja2
from concurrent.futures import ThreadPoolExecutor
new_message_template = jinja2.Environment().from_string("{{ foo }} - {{ foobar }} - {{ something }}")
def process_item(item):
rendered_value = new_message_template.render(**item)
return {"new_key": rendered_value}
def process_data(data):
with ThreadPoolExecutor() as executor:
result = list(executor.map(process_item, data))
return result
# Example data
my_list = [
{"foo": "value1", "foobar": "value2", "something": "value3"},
{"foo": "hello", "foobar": "world", "something": "!"}
]
new_list = process_data(my_list)
print(new_list)
1