I am new to scRNA-seq analysis and cell type deconvolution. I am attempting to create a ‘pseudobulk’ matrix (where rows = genes, columns = cell types, and cell values = TPM) from my scRNA Seurat object using the AggregateExpression() function. Here is the code that I used to run this (Seurat object = ‘seurat_mes’):
# subset the Seurat object to include only the desired cell types
cell_types_mes <- levels(Idents(seurat_mes)) # specify the cell types of interest
subset_seurat_mes <- subset(seurat_mes, idents = cell_types_mes)
# compute aggregated gene expression
agg_expr_mes <- AggregateExpression(object = subset_seurat_mes, by = "ident")
And this is an example of the output:
> head(agg_expr_mes)
$RNA
21831 x 7 sparse Matrix of class "dgCMatrix"
OBP OB2 LMP MALP OB1 EMP Ocy
Xkr4 . 4 . 8 1 2 .
Mrpl15 21436 11822 45940 16528 6254 1034 4892
Lypla1 10559 5201 14328 7747 3847 434 2951
Gm37988 33 16 66 29 16 . 12
Tcea1 24986 19689 45397 19175 9076 1304 7329
Rgs20 41 19 315 292 7 9 3
Atp6v1h 11372 6902 18714 10471 3986 708 3586
Rb1cc1 25209 27960 34488 18034 14421 1293 12483
4732440D04Rik 853 832 1637 1328 627 68 512
Alkal1 131 1115 1167 28 58 19 202
St18 6 5 18 6 5 1 3
Pcmtd1 13911 14229 25702 21717 6554 790 6656
Gm26901 500 370 272 57 202 33 182
Sntg1 59 21 157 8 14 2 38
Rrs1 5952 2186 15834 6587 1822 409 1322
Adhfe1 36 68 302 1762 21 13 16
Mybl1 208 308 1987 372 74 54 97
Vcpip1 8145 5788 13180 8216 3080 440 2576
1700034P13Rik 51 50 77 37 18 3 18
Sgk3 5478 2059 2704 1620 2083 147 3977
Mcmdc2 415 143 295 149 139 6 190
Snhg6 23559 25226 46590 12932 8266 1356 4859
Tcf24 18 17 63 32 10 1 2
Ppp1r42 3 5 16 8 1 3 5
Cops5 27548 15058 49163 23342 9126 1150 7969
Cspp1 5577 4739 10492 6614 2222 362 2130
Arfgef1 14343 9306 18801 9697 5389 684 3766
Cpa6 1528 1064 2236 473 1089 64 707
Prex2 32 32 133 100 11 6 14
A830018L16Rik 53 107 21 12 30 . 9
Sulf1 1138 1041 5277 4538 444 106 757
Slco5a1 130 28 189 33 63 2 16
Prdm14 10 3 21 20 1 . 1
Ncoa2 3812 2279 7627 6328 1710 286 1492
Tram1 92978 48607 61504 18886 27390 2164 19399
Lactb2 4798 1927 9215 4806 1273 249 1254
Eya1 2297 2835 6760 5282 1345 248 777
Msc 5 1 38 139 5 3 1
Trpa1 50 30 41 7 14 3 2
Terf1 2247 2067 7547 2457 1082 162 963
Sbspon 21 32 228 47 12 6 14
4930444P10Rik 30 8 35 6 7 3 8
Rpl7 841034 776479 1318311 361088 314515 30655 175153
Rdh10 3017 2629 5017 3076 2075 177 1581
Stau2 5094 4813 8546 3754 2356 223 2428
Ube2w 15812 11442 27872 14854 5174 743 4810
Eloc 60814 35468 96947 37518 21481 2491 17303
D030040B21Rik 935 541 834 276 85 13 182
Tmem70 12286 7651 24846 9891 3593 573 3322
Ly96 3488 1810 7902 4565 1294 181 1131
Jph1 3947 3287 3128 579 2187 147 1071
Gdap1 29 12 160 91 5 2 11
Pi15 340 74 1899 4471 471 53 53
Gm28154 . . 4 24 1 . .
Crispld1 230 61 119 40 234 6 383
Pkhd1 3 1 6 10 . 3 .
Il17f 8 5 30 3 2 2 1
Mcm3 1877 765 13336 1870 645 300 210
Paqr8 276 158 1118 704 74 29 50
Efhc1 238 182 583 300 57 7 87
Tram2 8588 8719 9503 3670 3356 344 2773
Tmem14a 2340 1335 4879 3126 817 83 719
Gsta3 33 32 69 162 9 2 5
Kcnq5 18 100 86 8 6 2 21
Rims1 20 10 8 6 . . 1
Gm29107 124 125 261 222 72 7 65
Ogfrl1 2476 1408 7567 2966 942 216 852
B3gat2 326 212 327 256 206 40 128
Smap1 52503 30265 52346 29865 21670 1388 16426
Sdhaf4 27177 13247 28494 15850 9559 768 8623
Fam135a 6524 5099 8518 3553 2619 318 1700
..............................
........suppressing 21689 rows in show(); maybe adjust options(max.print=, width=)
..............................
OBP OB2 LMP MALP OB1 EMP Ocy
Gm30489 . . 1 . . . .
1700061E18Rik . . 1 1 . . .
Gm15813 . 1 . . 2 . .
Gm34557 1 1 . . . . .
Gm46416 . 3 1 . . . .
4930455J16Rik . . 2 . . . .
Cts8 1 . 1 . 1 . .
Gm49345 . . 1 2 . . .
Gm8016 . . 1 2 1 . .
Irx4 1 1 1 . 1 . .
CT009718.2 2 . . . . . 1
1700119I11Rik 1 . . . 1 . 1
Gm31452 . . 1 1 . . 1
Gm15324 1 . 2 . 1 . .
Gm48512 1 . . . 2 . .
Gm48311 1 . 1 . . . .
Gm47705 . . . . . . .
Gm48613 . . 1 . . . .
Gm34611 . . 2 . . . .
Gm26709 1 . 1 . . . .
Vsx2 . 1 1 . . . .
Rtl1 . . . . 2 . .
Gm40576 . . . 1 . . .
Gm15996 . . . 2 . . .
9230109A22Rik . 1 1 . . . .
4930592A05Rik 1 1 . . . . .
4930413F20Rik 1 . . 1 1 . .
Trhr 2 . 1 . . . .
Gm30159 . . 2 . . . .
Ly6h . . . 2 . . .
Gm20420 . . . 1 2 . .
Dnajb7 1 . . 1 . . .
Gm29331 . . 1 1 . . .
Gm10337 . . 1 . . . 1
Sec14l5 . 1 . . . . 1
Gsc2 2 . . . . . 1
Gm21987 . . 2 1 . . .
Slc9c1 . 1 1 . . . .
D16Ertd519e . 3 . . . . .
AC117775.1 2 1 . . . . .
AC122413.2 2 . . . . . .
Vmn2r99 . 1 1 . . . .
Gm15947 2 . . . 1 . .
Gm26693 1 . 1 . . . .
Olfr118 1 1 1 . . . 2
Olfr127 . . 1 1 . . 1
Vmn2r118 . . 1 . . 1 .
Gm10190 . 2 . . . . .
Tmem247 . 2 1 . . . .
Gm10549 1 . . . 1 . .
Pcdha3 1 1 . . . . .
1700017D01Rik . . 1 . . . .
Foxb2 . . . . 2 . 2
Wnt8b 1 . 1 . . . .
Olfr351 2 . 1 . . . .
Sertm1 28 3 21 . 2 2 3
Foxe3 4 . . . . . .
Nkx3-2 . . 2 . . . .
Gm17112 2 . . . . . .
Rprl1 5 . 3 . 1 . 1
Tex36 . . 3 . . . .
Gm47343 1 . 3 . . . .
Olfr728 2 . 1 . . . .
Trav14d-3-dv8 2 . . . 3 . .
Gm39325 1 . 1 . . . 1
Gm11479 1 . . 2 . . .
Olfr466 1 . 1 1 . . .
Gm21761 2 . 2 . . . 2
Gm20767 4 . 3 . . . .
AC122375.2 . . 3 1 . . .
Pxt1 1 . . 2 . . .
I am struggling to interpret this output, specifically (A) are the cell values expressed in counts or a normalized value, like TPM, and (B) what is the difference between the “upper matrix” (with larger numbers) and the “lower matrix” (with smaller numbers and dots). Ultimately, I need the cell values expressed in TPM, so any suggestions would be much appreciated.
Thank you SO much from an aspiring scRNA-seq data analyst!
1