I am trying to extract most of the information found on a government website (CFIA-CFIT Part I and Part II) and create a table in excel. This table is to have three columns; ID, Name, and Detail. The ID and Name are bolded while the details are unbolded and begin with “is the…” as seen in the image below
Any guidance on this would be greatly appreciated as manually inputting this data would be extremely inefficient. Not to mention this information is likely to be updated so I would need to be able to have recreate/update the table periodically.
I have tried to create a table by extracting the data from a pdf of this whole page into excel, by getting data from web link, and I tried to remove all tabs between sentences so that the text is one large string. I can get the data into excel but have been unsuccessful in creating a table which follows the format seen below.
ID | Name | Details |
---|---|---|
1-100-001 | Alfalfa-grass hay sun-cured ground (or alfalfa-grass meal) | is the product that consists….. |
1-100-002 | Alfalfa hay sun-cured ground (or sun-cured alfalfa meal) | is the product that consists…. |
1-100…. | Alfalfa leaves…. | is the product…. |
The farthest I could get was using the data which had all tabs removed creating one large string. I tied to import this block of text into excel however, it lost the bolding on the text which I was going to use as part of the separation process.
This too was ultimately unsuccessful and I am at an impasse.