I have a long file containing thousands of lines and a couple of samples are shown below:
begin{align*}
H_0 amp : mu_1 = mu_2 = mu_3 = mu_4 = mu_5 \
H_1 amp : text{Some text} \
H_2 amp : text{More text...} \
end{align*}
begin{table}[htb]
centering
begin{tabular}{cc}
Mean amp = amp the mean value $mu$ \
Median amp = amp the median value $median$ \
Mode amp = amp the mode value $mode$ \
end{tabular}
end{table}
The objective is to turn begin{align*}...end{align*}
into
<md>
<mrow>H_0 amp : mu_1 = mu_2 = mu_3 = mu_4 = mu_5</mrow>
<mrow>H_1 amp : text{Some text}</mrow>
<mrow>H_2 amp : text{More text...}</mrow>
</md>
and begin{table}[htb]...end{table}
to
<table>
<tabular halign="center">
<row header="yes" bottom="minor" >
<cell>Mean</cell>
<cell>=</cell>
<cell>the mean value $mu$</cell>
</row>
<row>
<cell>Median</cell>
<cell>=</cell>
<cell>the mode value $mode$</cell>
</row>
<row>
<cell>Mode</cell>
<cell>=</cell>
<cell>the mode value $mode$</cell>
</row>
</tabular>
</table>
I am trying to get begin{align*}
working and haven’t started on begin{table}
yet. I have made a script for it but doesn’t work as expected. I believe it is because I am using re.escape(...)
. There are too many unnecessary characters generated. I want to eliminate the extra
‘s and also remove
begin{align*}
along with end{align*}
during the process. Any assistance is appreciated!
<md><mrow>begin{align*}
H_0 amp : mu_1 = mu_2 = mu_3 = mu_4 = mu_5 </mrow><mrow>
H_1 amp : text{Some text} </mrow><mrow>
H_2 amp : text{More text...} </mrow>
end{align*}</md>
begin{table}[htb]
centering
begin{tabular}{cc}
Mean amp = amp the mean value $mu$ \
Median amp = amp the median value $median$ \
Mode amp = amp the mode value $mode$ \
end{tabular}
end{table}
import re
my_file = open("sample.txt", "r")
data: str = my_file.read()
result: str = data
original = re.findall(r'\begin{align*}[sS]*\end{align*}', data,)
modified = re.findall(r'\begin{align*}[sS]*\end{align*}', data,)
for i in range(len(modified)):
# append the first mrow of the <md> tag
modified[i] = r'<mrow>' + modified[i]
# replace \ with a closing and opening of </mrow> and <mrow>.
modified[i] = str(modified[i]).replace(r'\', r'</mrow><mrow>')
#wrap everything with the math display environment
modified[i] = '<md>' + modified[i]+r'</md>'
# Remove the last <mrow> as it is an extra
modified[i] =(modified[i][::-1].replace(r'<mrow>'[::-1], ''[::-1], 1))[::-1]
result = re.sub(re.escape(original[i]), re.escape(modified[i]), result)
# print(modified[i])
# print(original[i])
print(result)