Is batching an option when analysing sequences with TraMineR?

I’m a communication scientist and a total newbie with TraMineR and sequenze analyis. I have a (relatively large) dataset that includes the app usage of study participants. My aim is to identify sequences of app-categories used in succession.

The original dataset looks like this:

Participant ID	Session ID	Category of used Apps	Start Time (in unix-time)	End Time (in unix-time)
0001	0001_1	Communication	1614868224	1614868236
0001	0001_1	Social Media	1614868236	1614868265
0002	0002_1	Games	1614868265	1614868320
…	…	…	…	…

Accordingly, I have two levels of analysis: (1) On the one hand the participants and on the other hand (2) the sessions.

In the first step my aim is to identify sequences of app-cateogries used in succession. A session is a coherent usage sequence between switching the smartphone screen on and off. The data set comprises just under 400 participants, with each participant having around 2000-5000 sessions (~ 1,4 mio sessions for the whole dataset).

labels = seqstatl(sample$app_category)
states = 1:length(labels)

session_seq = seqdef(data = sample, 
                     var = c("session", "begin", "end", "app_category"), 
                     informat = "SPELL",
                     states = states,
                     labels = labels,
                     process = FALSE)

print(session_seq[1:15, ], format = "SPS")


# Using the transition rates between states observed in the sequence data
cost = seqsubm(session_seq, method = "TRATE", with.missing = TRUE)

# compute the distances using the matrix and the default indel cost of 1
session_seq_OM = seqdist(session_seq, method = "OM", sm = cost, with.missing = TRUE)
# --> Function crashed due to lack of RAM

I have already made my first attempts with subsamples and have come across the question:

The question relates to the computing resources required. I need a relatively large amount of computing power even for a sub-sample. Is it possible to make the calculation more resource-efficient? Is it an option to split the data set and later merge the sequence distance (batching) or will this distort my results?

I have already created the sequence object in STS format (530 objects and 1222844 variables) for a subset of the data set (the mobile sessions of a participant, n = ~ 4000; the structure of the data set looks as described above) and then wanted to calculate the sequence distance (“OM”). However, I was unable to calculate the sequence distance due to the high computing resources required. The calculation was cancelled due to a lack of RAM even on a 1TB RAM machine.

I am also happy to receive further tips for reading. The TraMineR User Guide has already helped me a lot.

New contributor

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 14:13

Thẻ: rtraminersequence-analysis

Thiết kế website giá rẻ

Danh mục

Is batching an option when analysing sequences with TraMineR?