I think it’s very simple but I can’t figure it out.
In a python program to plot science meteo data I need to exclude some file of a list of text file.
I have hundred of file downloaded from a datalogger on the field. Sometimes the LTE link is degraded and instead of receiving a normal text file with normal header, I’ve something looklike HMTL (because I’m downling file with an URL API, an unreadable file could result in a sort of 404 error page downloaded, I guess).
Here a sample of a normal file with it’s header:
"TOA5","2693","CR300","2693","CR300.Std.11.00","CPU:20240411_modem_steynard.CR300","4956","meteo"
"TIMESTAMP","RECORD","T107_C_010_Avg","T107_C_150_Avg","T107_C_300_Avg","AirTC_Avg","RH","WS_ms_S_WVT","WindDir_D1_WVT","WindDir_SD1_WVT","SlrW_Avg","SlrkJ_Tot","Rain_mm_Tot"
"TS","RN","Deg C","Deg C","Deg C","Deg C","%","meters/second","Deg","Deg","W/m^2","kJ/m^2","mm"
"","","Avg","Avg","Avg","Avg","Smp","WVc","WVc","WVc","Avg","Tot","Tot"
"2024-04-18 00:00:00",788,6.492,"NAN",9.66,-1.336,86.2,0,0,0,0,0,0
"2024-04-18 00:10:00",789,6.446,"NAN",9.6,-1.314,88.1,0,0,0,0,0,0
"2024-04-18 00:20:00",790,6.411,"NAN",9.54,-1.379,91.1,0,0,0,0,0,0
"2024-04-18 00:30:00",791,6.373,"NAN",9.48,-1.433,90.1,0,0,0,0,0,0
"2024-04-18 00:40:00",792,6.343,"NAN",9.42,-1.553,90.2,0,0,0,0,0,0
And a ‘false’ file sample :
<html>
<head>
<script language='JavaScript'>var BrowserDetect = {
init: function () {
this.browser = this.searchString(this.dataBrowser) || "An unknown browser";
this.version = this.searchVersion(navigator.userAgent)
|| this.searchVersion(navigator.appVersion)
|| "an unknown version";
this.OS = this.searchString(this.dataOS) || "an unknown OS";
It’s easy to see in a ‘du -sh’ bash command for example because false file are 4 kb and a normal file is approx 12 kb.
Here is the line I’m using in python to retreive all ile in a list
path = '/TEMPORARY/STMTO/'
all_files = glob.glob(os.path.join(path , "*.dat"))
How can I exclude file less than 5 kb of all_files ?