I had an array of 30,011 strings of no more than 20 characters. The script that generated the strings also created a 30,011 x 30,011 ND array Dist
of zeroes and ran code for 1.5 hours to populate the lower triangular half with text distances between each pair of strings (450+ million distances).
When I tried to view the array in the Variable Explorer, I was told “The variable is too big to be retrieved”.
Thereafter, the console didn’t present any results in response to commands, though I was always presented with a prompt for the next command. The top 5 of the following lines are the last update messages from calculating the distances, followed by comands:
Distance 449250000 / 450315055 ( 99.76348669932877 %) 5689 seconds.
Distance 449500000 / 450315055 ( 99.81900338641798 %) 5692 seconds.
Distance 449750000 / 450315055 ( 99.8745200735072 %) 5695 seconds.
Distance 450000000 / 450315055 ( 99.93003676059642 %) 5698 seconds.
Distance 450250000 / 450315055 ( 99.98555344768565 %) 5701 seconds.
In [2]: os.get_cwd()
In [3]: help(os.get_cwd)
In [4]: os.getcwd()
In [5]: Dist[2,2]
In [6]: np.savez_compressed('C:/cygwin64/home/User.Name/prj/ProjectName/WorkingFolder/Dist',Dist=Dist)
The last command was an attempt to save the array that took so long to calculate. Signs that it worked included a long delay during which the Task Manager showed high CPU usage by Python, as well as the subsequent existence of a 317MB file Dist.npz
.
When I presented the prompt, I confirmed that no output was still being presented in response to further commands.
In [7]: 'Dog'
In [8]: Dist
In [9]: dfShipName # A dataframe
I also tried to save the variables in the Variable Explorer using the Save icon, but was told “Data is too big to be saved”.
I next clicked on the trash can icon to remove the variables, but they remained showing, even though the console displayed the message “Removing all variables”.
Nothing happened when I double clicked any dataframe or scalar variable, but after much delay, I was presented with a series of popup messages saying that the variable is too big to be retrieved. This remained so even after entering del Dist
at the console (Dist
is by far the biggest variable, eclipsing all others).
In summary, the Console stopped presenting output and the Variable Explorer seems unresponsive.
Is there any way to avoid putting Spyder into this state when I’ve created a large array? Is there nothing that can be done to restore proper functionality and state other than restarting Spyder and stopping execution prior to generation of the big array? I am using Spyder 5.4.3 as installed by Anaconda on Windows 10.