Here, I scraped the English Premier League table from 2021/22 to 2024 season using Selenium and stored it in table_data
which looks like this:
[['1',
'Arsenal',
'33 23 5 5 77 26 51 74',
'D',
'W',
'W',
'L',
'W',
'2',
'Liverpool',
'33 22 8 3 75 32 43 74',
'W',
'W',
'D',
'L',
'W',
'3',
'Manchester City',
'32 22 7 3 76 32 44 73',
'D',
'D',
'W',
'W',
'W',
'4',
'Aston Villa',
'34 20 6 8 71 50 21 66',
'W',
'L',
'D',
'W',
'W',
'5',
'Tottenham Hotspur',
'32 18 6 8 65 49 16 60',
'L',
'W',
'D',
'W',
'L',
'6',
'Newcastle United',
'32 15 5 12 69 52 17 50',
'L',
'W',
'D',
'W',
'W',
'7',
'Manchester United',
'32 15 5 12 47 48 -1 50',
'W',
'D',
'L',
'D',
'D',
'8',
'West Ham United',
'34 13 9 12 54 63 -9 48',
'L',
'D',
'W',
'L',
'L',
'9',
'Chelsea',
'31 13 8 10 61 52 9 47',
'W',
'D',
'W',
'D',
'W',
'10',
'Brighton And Hove Albion',
'32 11 11 10 52 50 2 44',
'W',
'L',
'D',
'L',
'D',
'11',
'Wolverhampton Wanderers',
'33 12 7 14 46 53 -7 43',
'L',
'D',
'L',
'D',
'L',
'12',
'Fulham',
'34 12 6 16 50 54 -4 42',
'D',
'L',
'L',
'W',
'L',
'13',
'Bournemouth',
'33 11 9 13 48 60 -12 42',
'W',
'W',
'L',
'D',
'L',
'14',
'Crystal Palace',
'33 9 9 15 42 56 -14 36',
'D',
'L',
'L',
'W',
'W',
'15',
'Brentford',
'34 9 8 17 52 59 -7 35',
'D',
'D',
'D',
'W',
'W',
'16',
'Everton',
'33 10 8 15 34 48 -14 30',
'L',
'D',
'W',
'L',
'W',
'17',
'Nottingham Forest',
'34 7 9 18 42 60 -18 26',
'D',
'W',
'L',
'D',
'L',
'18',
'Luton Town',
'34 6 7 21 47 75 -28 25',
'L',
'L',
'W',
'L',
'L',
'19',
'Burnley',
'34 5 8 21 37 69 -32 23',
'D',
'D',
'L',
'D',
'W',
'20',
'Sheffield United',
'33 3 7 23 31 88 -57 16',
'D',
'L',
'D',
'L',
'L'],
['1',
'Manchester City',
'38 28 5 5 94 33 61 89',
'W',
'W',
'W',
'D',
'L',
'2',
'Arsenal',
'38 26 6 6 88 43 45 84',
'W',
'W',
'L',
'L',
'W',
'3',
'Manchester United',
'38 23 6 9 58 43 15 75',
'L',
'W',
'W',
'W',
'W',
'4',
'Newcastle United',
'38 19 14 5 68 33 35 71',
'L',
'D',
'W',
'D',
'D',
'5',
'Liverpool',
'38 19 10 9 75 47 28 67',
'W',
'W',
'W',
'D',
'D',
'6',
'Brighton And Hove Albion',
'38 18 8 12 72 53 19 62',
'W',
'L',
'W',
'D',
'L',
'7',
'Aston Villa',
'38 18 7 13 51 46 5 61',
'L',
'L',
'W',
'D',
'W',
'8',
'Tottenham Hotspur',
'38 18 6 14 70 63 7 60',
'L',
'W',
'L',
'L',
'W',
'9',
'Brentford',
'38 15 14 9 58 46 12 59',
'W',
'L',
'W',
'W',
'W',
'10',
'Fulham',
'38 15 7 16 55 53 2 52',
'L',
'W',
'W',
'D',
'L',
'11',
'Crystal Palace',
'38 11 12 15 40 49 -9 45',
'W',
'L',
'W',
'D',
'D',
'12',
'Chelsea',
'38 11 11 16 38 47 -9 44',
'W',
'D',
'L',
'L',
'D',
'13',
'Wolverhampton Wanderers',
'38 11 8 19 31 58 -27 41',
'L',
'W',
'L',
'D',
'L',
'14',
'West Ham United',
'38 11 7 20 42 55 -13 40',
'L',
'W',
'L',
'W',
'L',
'15',
'Bournemouth',
'38 11 6 21 37 71 -34 39',
'W',
'L',
'L',
'L',
'L',
'16',
'Nottingham Forest',
'38 9 11 18 38 68 -30 38',
'L',
'W',
'D',
'W',
'D',
'17',
'Everton',
'38 8 12 18 34 57 -23 36',
'D',
'W',
'L',
'D',
'W',
'18',
'Leicester City',
'38 9 7 22 51 68 -17 34',
'D',
'L',
'L',
'D',
'W',
'19',
'Leeds United',
'38 7 10 21 48 78 -30 31',
'L',
'L',
'D',
'L',
'L',
'20',
'Southampton',
'38 6 7 25 36 73 -37 25',
'L',
'L',
'L',
'L',
'D'],
['1',
'Manchester City',
'38 29 6 3 99 26 73 93',
'W',
'W',
'W',
'D',
'W',
'2',
'Liverpool',
'38 28 8 2 94 26 68 92',
'W',
'D',
'W',
'W',
'W',
'3',
'Chelsea',
'38 21 11 6 76 33 43 74',
'L',
'D',
'W',
'D',
'W',
'4',
'Tottenham Hotspur',
'38 22 5 11 69 40 29 71',
'W',
'D',
'W',
'W',
'W',
'5',
'Arsenal',
'38 22 3 13 61 48 13 69',
'W',
'W',
'L',
'L',
'W',
'6',
'Manchester United',
'38 16 10 12 57 57 0 58',
'L',
'D',
'W',
'L',
'L',
'7',
'West Ham United',
'38 16 8 14 60 51 9 56',
'L',
'L',
'W',
'D',
'L',
'8',
'Leicester City',
'38 14 10 14 62 59 3 52',
'L',
'W',
'W',
'D',
'W',
'9',
'Brighton And Hove Albion',
'38 12 15 11 42 44 -2 51',
'D',
'W',
'W',
'D',
'W',
'10',
'Wolverhampton Wanderers',
'38 15 6 17 38 43 -5 51',
'L',
'D',
'L',
'D',
'L',
'11',
'Newcastle United',
'38 13 10 15 44 62 -18 49',
'W',
'L',
'L',
'W',
'W',
'12',
'Crystal Palace',
'38 11 15 12 50 46 4 48',
'W',
'W',
'D',
'L',
'W',
'13',
'Brentford',
'38 13 7 18 48 56 -8 46',
'D',
'L',
'W',
'W',
'L',
'14',
'Aston Villa',
'38 13 6 19 52 54 -2 45',
'W',
'L',
'D',
'D',
'L',
'15',
'Southampton',
'38 9 13 16 43 67 -24 40',
'D',
'L',
'L',
'L',
'L',
'16',
'Everton',
'38 11 6 21 43 66 -23 39',
'W',
'D',
'L',
'W',
'L',
'17',
'Leeds United',
'38 9 11 18 42 79 -37 38',
'L',
'L',
'L',
'D',
'W',
'18',
'Burnley',
'38 7 14 17 34 53 -19 35',
'W',
'L',
'L',
'D',
'L',
'19',
'Watford',
'38 6 5 27 34 77 -43 23',
'L',
'L',
'D',
'L',
'L',
'20',
'Norwich City',
'38 5 7 26 23 84 -61 22',
'L',
'L',
'L',
'D',
'L']]
I stored this in a Dataframe, however the output of that DataFrame looks likes this here. It is storing each season as 1 row instead of going down vertically, thats why it has 3 rows, and 160 columns. Ideally I want it to look just like the EPL Table like this ideal table.
I have tried transposing but it does not work, I’m guessing that I need to map the rows and columns myself, but I am quite new to pandas and hope someone can help.
4
If you can’t shape it upfront (when scraping), here is one way of some post-processing :
from itertools import chain, batched # >=3.12
df = (
pd.DataFrame(batched(chain.from_iterable(table_data), 8))
# @ouroboros1's variant
# pd.DataFrame(np.array(table_data).reshape(-1, int(20*8/20)))
.pipe(
lambda raw: pd.concat(
[
raw.iloc[:, :2], # club's pos
raw[2].str.split(expand=True).astype(int), # game details
raw.iloc[:, 3:].agg(" ".join, axis=1), # form
],
axis=1,
)
)
.set_axis(
[
"Position", "Club", "Played",
"Won", "Draw", "Lost", "GF", "GA", "GD",
"Points", "Form",
],
axis=1,
)
)
NB: You may want to add an identifier (here: 1
, 2
and 3
) to distinguish ranking times.
Output :
Position Club Played Won Draw Lost GF GA GD Points Form
0 1 Arsenal 33 23 5 5 77 26 51 74 D W W L W
1 2 Liverpool 33 22 8 3 75 32 43 74 W W D L W
2 3 Manchester City 32 22 7 3 76 32 44 73 D D W W W
3 4 Aston Villa 34 20 6 8 71 50 21 66 W L D W W
.. ... ... ... ... ... ... .. .. .. ... ...
56 17 Leeds United 38 9 11 18 42 79 -37 38 L L L D W
57 18 Burnley 38 7 14 17 34 53 -19 35 W L L D L
58 19 Watford 38 6 5 27 34 77 -43 23 L L D L L
59 20 Norwich City 38 5 7 26 23 84 -61 22 L L L D L
[60 rows x 11 columns]
3
You need to fix your input first. Notice that the stats from Played all the way to points are all in the same string. This isn’t the prettiest code but it gets the job done
import pandas as pd
def fix_season(list_output):
stats_counter = 0
current_team = []
fixed_output = []
for idx in range(len(list_output)):
element = list_output[idx]
# fix formatting on stats
if stats_counter==2:
current_team.extend(element.split(" "))
stats_counter += 1
# Appends team to the final season table
elif stats_counter == 8:
fixed_output.append(current_team)
stats_counter = 1
current_team = [element]
# Add Stat to current team
else:
current_team.append(element)
stats_counter += 1
return fixed_output
season1 = fix_season(output[0])
result = pd.DataFrame(season1)
print(result)
Frankly, I think you should change code which you used for scraping and format it during scraping.
Maybe you should use more complex rules to scrape table by table and row by row instead of scraping all at once.
Now you have two different problems.
- many rows in one row
- values as single string in single row
and you should resolve every problem separatelly using different code. You can’t do this in one command.
First doesn’t need pandas.DataFrame
but you can split every row to many rows using for
-loop with slice row[i:i+8]
data = [[ ... your data ... ]]
new_data = []
for row in data:
for i in range(0, len(row), 8):
part = row[i:i+8]
new_data.apped(part)
print(part)
Result:
['1', 'Arsenal', '33 23 5 5 77 26 51 74', 'D', 'W', 'W', 'L', 'W']
['2', 'Liverpool', '33 22 8 3 75 32 43 74', 'W', 'W', 'D', 'L', 'W']
['3', 'Manchester City', '32 22 7 3 76 32 44 73', 'D', 'D', 'W', 'W', 'W']
['4', 'Aston Villa', '34 20 6 8 71 50 21 66', 'W', 'L', 'D', 'W', 'W']
['5', 'Tottenham Hotspur', '32 18 6 8 65 49 16 60', 'L', 'W', 'D', 'W', 'L']
['6', 'Newcastle United', '32 15 5 12 69 52 17 50', 'L', 'W', 'D', 'W', 'W']
['7', 'Manchester United', '32 15 5 12 47 48 -1 50', 'W', 'D', 'L', 'D', 'D']
['8', 'West Ham United', '34 13 9 12 54 63 -9 48', 'L', 'D', 'W', 'L', 'L']
['9', 'Chelsea', '31 13 8 10 61 52 9 47', 'W', 'D', 'W', 'D', 'W']
['10', 'Brighton And Hove Albion', '32 11 11 10 52 50 2 44', 'W', 'L', 'D', 'L', 'D']
['11', 'Wolverhampton Wanderers', '33 12 7 14 46 53 -7 43', 'L', 'D', 'L', 'D', 'L']
['12', 'Fulham', '34 12 6 16 50 54 -4 42', 'D', 'L', 'L', 'W', 'L']
['13', 'Bournemouth', '33 11 9 13 48 60 -12 42', 'W', 'W', 'L', 'D', 'L']
['14', 'Crystal Palace', '33 9 9 15 42 56 -14 36', 'D', 'L', 'L', 'W', 'W']
['15', 'Brentford', '34 9 8 17 52 59 -7 35', 'D', 'D', 'D', 'W', 'W']
['16', 'Everton', '33 10 8 15 34 48 -14 30', 'L', 'D', 'W', 'L', 'W']
['17', 'Nottingham Forest', '34 7 9 18 42 60 -18 26', 'D', 'W', 'L', 'D', 'L']
['18', 'Luton Town', '34 6 7 21 47 75 -28 25', 'L', 'L', 'W', 'L', 'L']
['19', 'Burnley', '34 5 8 21 37 69 -32 23', 'D', 'D', 'L', 'D', 'W']
['20', 'Sheffield United', '33 3 7 23 31 88 -57 16', 'D', 'L', 'D', 'L', 'L']
['1', 'Manchester City', '38 28 5 5 94 33 61 89', 'W', 'W', 'W', 'D', 'L']
['2', 'Arsenal', '38 26 6 6 88 43 45 84', 'W', 'W', 'L', 'L', 'W']
['3', 'Manchester United', '38 23 6 9 58 43 15 75', 'L', 'W', 'W', 'W', 'W']
['4', 'Newcastle United', '38 19 14 5 68 33 35 71', 'L', 'D', 'W', 'D', 'D']
['5', 'Liverpool', '38 19 10 9 75 47 28 67', 'W', 'W', 'W', 'D', 'D']
['6', 'Brighton And Hove Albion', '38 18 8 12 72 53 19 62', 'W', 'L', 'W', 'D', 'L']
['7', 'Aston Villa', '38 18 7 13 51 46 5 61', 'L', 'L', 'W', 'D', 'W']
['8', 'Tottenham Hotspur', '38 18 6 14 70 63 7 60', 'L', 'W', 'L', 'L', 'W']
['9', 'Brentford', '38 15 14 9 58 46 12 59', 'W', 'L', 'W', 'W', 'W']
['10', 'Fulham', '38 15 7 16 55 53 2 52', 'L', 'W', 'W', 'D', 'L']
['11', 'Crystal Palace', '38 11 12 15 40 49 -9 45', 'W', 'L', 'W', 'D', 'D']
['12', 'Chelsea', '38 11 11 16 38 47 -9 44', 'W', 'D', 'L', 'L', 'D']
['13', 'Wolverhampton Wanderers', '38 11 8 19 31 58 -27 41', 'L', 'W', 'L', 'D', 'L']
['14', 'West Ham United', '38 11 7 20 42 55 -13 40', 'L', 'W', 'L', 'W', 'L']
['15', 'Bournemouth', '38 11 6 21 37 71 -34 39', 'W', 'L', 'L', 'L', 'L']
['16', 'Nottingham Forest', '38 9 11 18 38 68 -30 38', 'L', 'W', 'D', 'W', 'D']
['17', 'Everton', '38 8 12 18 34 57 -23 36', 'D', 'W', 'L', 'D', 'W']
['18', 'Leicester City', '38 9 7 22 51 68 -17 34', 'D', 'L', 'L', 'D', 'W']
['19', 'Leeds United', '38 7 10 21 48 78 -30 31', 'L', 'L', 'D', 'L', 'L']
['20', 'Southampton', '38 6 7 25 36 73 -37 25', 'L', 'L', 'L', 'L', 'D']
['1', 'Manchester City', '38 29 6 3 99 26 73 93', 'W', 'W', 'W', 'D', 'W']
['2', 'Liverpool', '38 28 8 2 94 26 68 92', 'W', 'D', 'W', 'W', 'W']
['3', 'Chelsea', '38 21 11 6 76 33 43 74', 'L', 'D', 'W', 'D', 'W']
['4', 'Tottenham Hotspur', '38 22 5 11 69 40 29 71', 'W', 'D', 'W', 'W', 'W']
['5', 'Arsenal', '38 22 3 13 61 48 13 69', 'W', 'W', 'L', 'L', 'W']
['6', 'Manchester United', '38 16 10 12 57 57 0 58', 'L', 'D', 'W', 'L', 'L']
['7', 'West Ham United', '38 16 8 14 60 51 9 56', 'L', 'L', 'W', 'D', 'L']
['8', 'Leicester City', '38 14 10 14 62 59 3 52', 'L', 'W', 'W', 'D', 'W']
['9', 'Brighton And Hove Albion', '38 12 15 11 42 44 -2 51', 'D', 'W', 'W', 'D', 'W']
['10', 'Wolverhampton Wanderers', '38 15 6 17 38 43 -5 51', 'L', 'D', 'L', 'D', 'L']
['11', 'Newcastle United', '38 13 10 15 44 62 -18 49', 'W', 'L', 'L', 'W', 'W']
['12', 'Crystal Palace', '38 11 15 12 50 46 4 48', 'W', 'W', 'D', 'L', 'W']
['13', 'Brentford', '38 13 7 18 48 56 -8 46', 'D', 'L', 'W', 'W', 'L']
['14', 'Aston Villa', '38 13 6 19 52 54 -2 45', 'W', 'L', 'D', 'D', 'L']
['15', 'Southampton', '38 9 13 16 43 67 -24 40', 'D', 'L', 'L', 'L', 'L']
['16', 'Everton', '38 11 6 21 43 66 -23 39', 'W', 'D', 'L', 'W', 'L']
['17', 'Leeds United', '38 9 11 18 42 79 -37 38', 'L', 'L', 'L', 'D', 'W']
['18', 'Burnley', '38 7 14 17 34 53 -19 35', 'W', 'L', 'L', 'D', 'L']
['19', 'Watford', '38 6 5 27 34 77 -43 23', 'L', 'L', 'D', 'L', 'L']
['20', 'Norwich City', '38 5 7 26 23 84 -61 22', 'L', 'L', 'L', 'D', 'L']
Second problem may need to use pandas.DataFrame
It can use .str.split(' ')
to convert string to list.
Later it can convert it to new DataFrame
with values in separated columns and join it to original DataFrame
import pandas as pd
df1 = pd.DataFrame(new_data)
#print(df1)
# convert string to list
df1.iloc[:,2] = df1.iloc[:,2].str.split(' ')
#print(df1)
# create new dataframe with values in separated columns
df2 = pd.DataFrame(df1.iloc[:,2].to_list(), columns=['Played', 'Won', 'Drawn', 'Lost', 'GF', 'GA', 'GD', 'Points'])
#print(df2)
# add new columns to old dataframe
df = df1.join(df2)
# remove old column with list
df.drop(columns=[2], inplace=True)
print(df)
Result:
0 1 3 4 5 6 7 Played Won Drawn Lost GF GA GD Points
0 1 Arsenal D W W L W 33 23 5 5 77 26 51 74
1 2 Liverpool W W D L W 33 22 8 3 75 32 43 74
2 3 Manchester City D D W W W 32 22 7 3 76 32 44 73
3 4 Aston Villa W L D W W 34 20 6 8 71 50 21 66
4 5 Tottenham Hotspur L W D W L 32 18 6 8 65 49 16 60
5 6 Newcastle United L W D W W 32 15 5 12 69 52 17 50
6 7 Manchester United W D L D D 32 15 5 12 47 48 -1 50
7 8 West Ham United L D W L L 34 13 9 12 54 63 -9 48
8 9 Chelsea W D W D W 31 13 8 10 61 52 9 47
9 10 Brighton And Hove Albion W L D L D 32 11 11 10 52 50 2 44
10 11 Wolverhampton Wanderers L D L D L 33 12 7 14 46 53 -7 43
11 12 Fulham D L L W L 34 12 6 16 50 54 -4 42
12 13 Bournemouth W W L D L 33 11 9 13 48 60 -12 42
13 14 Crystal Palace D L L W W 33 9 9 15 42 56 -14 36
14 15 Brentford D D D W W 34 9 8 17 52 59 -7 35
15 16 Everton L D W L W 33 10 8 15 34 48 -14 30
16 17 Nottingham Forest D W L D L 34 7 9 18 42 60 -18 26
17 18 Luton Town L L W L L 34 6 7 21 47 75 -28 25
18 19 Burnley D D L D W 34 5 8 21 37 69 -32 23
19 20 Sheffield United D L D L L 33 3 7 23 31 88 -57 16
20 1 Manchester City W W W D L 38 28 5 5 94 33 61 89
21 2 Arsenal W W L L W 38 26 6 6 88 43 45 84
22 3 Manchester United L W W W W 38 23 6 9 58 43 15 75
23 4 Newcastle United L D W D D 38 19 14 5 68 33 35 71
24 5 Liverpool W W W D D 38 19 10 9 75 47 28 67
25 6 Brighton And Hove Albion W L W D L 38 18 8 12 72 53 19 62
26 7 Aston Villa L L W D W 38 18 7 13 51 46 5 61
27 8 Tottenham Hotspur L W L L W 38 18 6 14 70 63 7 60
28 9 Brentford W L W W W 38 15 14 9 58 46 12 59
29 10 Fulham L W W D L 38 15 7 16 55 53 2 52
30 11 Crystal Palace W L W D D 38 11 12 15 40 49 -9 45
31 12 Chelsea W D L L D 38 11 11 16 38 47 -9 44
32 13 Wolverhampton Wanderers L W L D L 38 11 8 19 31 58 -27 41
33 14 West Ham United L W L W L 38 11 7 20 42 55 -13 40
34 15 Bournemouth W L L L L 38 11 6 21 37 71 -34 39
35 16 Nottingham Forest L W D W D 38 9 11 18 38 68 -30 38
36 17 Everton D W L D W 38 8 12 18 34 57 -23 36
37 18 Leicester City D L L D W 38 9 7 22 51 68 -17 34
38 19 Leeds United L L D L L 38 7 10 21 48 78 -30 31
39 20 Southampton L L L L D 38 6 7 25 36 73 -37 25
40 1 Manchester City W W W D W 38 29 6 3 99 26 73 93
41 2 Liverpool W D W W W 38 28 8 2 94 26 68 92
42 3 Chelsea L D W D W 38 21 11 6 76 33 43 74
43 4 Tottenham Hotspur W D W W W 38 22 5 11 69 40 29 71
44 5 Arsenal W W L L W 38 22 3 13 61 48 13 69
45 6 Manchester United L D W L L 38 16 10 12 57 57 0 58
46 7 West Ham United L L W D L 38 16 8 14 60 51 9 56
47 8 Leicester City L W W D W 38 14 10 14 62 59 3 52
48 9 Brighton And Hove Albion D W W D W 38 12 15 11 42 44 -2 51
49 10 Wolverhampton Wanderers L D L D L 38 15 6 17 38 43 -5 51
50 11 Newcastle United W L L W W 38 13 10 15 44 62 -18 49
51 12 Crystal Palace W W D L W 38 11 15 12 50 46 4 48
52 13 Brentford D L W W L 38 13 7 18 48 56 -8 46
53 14 Aston Villa W L D D L 38 13 6 19 52 54 -2 45
54 15 Southampton D L L L L 38 9 13 16 43 67 -24 40
55 16 Everton W D L W L 38 11 6 21 43 66 -23 39
56 17 Leeds United L L L D W 38 9 11 18 42 79 -37 38
57 18 Burnley W L L D L 38 7 14 17 34 53 -19 35
58 19 Watford L L D L L 38 6 5 27 34 77 -43 23
59 20 Norwich City L L L D L 38 5 7 26 23 84 -61 22
Now you can only rename columns and put them in different order.