Parse large xml file with XMLPullParser or Sax-Parser in Android causes lags

I’m having following problem: In my android tv app I can add (and later update) epg-sources, as .xml, .gz or .xz file (.gz and .xz are decompressed to .xml). So the user adds an url for a file, it gets downloaded and then parsed and saved to the objectbox-database.
I tried the XmlPullParser and Sax-Parser and everything was working fine, for a xml with about 50mb and 700.000 lines (350 channels and about 80.000 programs) it took:

XmlPullParser -> 50 seconds on Emulator, 1min 30sec directly on my TV
Sax-Parser -> 55 seconds on Emulator, 1min 50sec directy on my TV

I prefered that it would be a bit faster, but it was ok. Then I first realized that if I update the epg-source (download the xml again, parse it, and add the new epgdata to the ob-db) and navigate in my app in the meantime,

  1. it last much longer (some minutes for both, XmlPullParser and Sax-Parser

  2. the app began to lag while using it and on my TV it crashed also after some time – probably for memory reasons. If I updated the epg-source without doing anything other in my app, that didn’t happen.

I noticed two things when “investigating” the Profiler.

  1. While parsing (especially the programs), the garbage collector is called very oftern, between 20-40 times in 5 seconds .
  2. When the process is finished the java part in the memory profiler jumps up to 200mb and needs some time before it gets gc.

I am not sure, but I read that the constantly calling of the garbage collector could cause the lags in my app. So I tried to minimize the object creations, but somehow it didn’t change anything (or maybe I didn’t it correct). I tested the process also without creating the database Object for the EpgDataOB and therefore also no EpgData was added to the database. But I could see still the many garbage collector call in the Profiler, so my parsing code should be the problem.

The only thing that helped me, was adding a delay of 100ms after each parsed program (logically that’s no possible solution as it increases the process time for hours), or reducing the batch size (what also increases the process time, for example: using a batch-size of 500 = processtime on emulator: 2min 10sec and the garbage collector is called about 6-10 times in 5 seconds, reducing the batch to 100 -> emulator = nearly 3min, gc called 4-5 times in 5 seconds).

I’ll post both my versions.

XmlPullParser

Repository code:

 var currentChannel: Channel? = null
    var epgDataBatch = mutableListOf<EpgDataOB>()
    val batchSize = 10000

    suspend fun parseXmlStream(
        inputStream: InputStream,
        epgSourceId: Long,
        maxDays: Int,
        minDays: Int,
        sourceUrl: String
    ): Resource<String> = withContext(Dispatchers.Default) {
        try {
            val thisEpgSource = epgSourceBox.get(epgSourceId)
            val factory = XmlPullParserFactory.newInstance()
            val parser = factory.newPullParser()
            parser.setInput(inputStream, null)
            var eventType = parser.eventType
          
            while (eventType != XmlPullParser.END_DOCUMENT) {
                when (eventType) {
                    XmlPullParser.START_TAG -> {
                        when (parser.name) {
                            "channel" -> {
                                parseChannel(parser, thisEpgSource)
                            }
                            "programme" -> {
                                parseProgram(parser, thisEpgSource)
                            }
                        }
                    }
                }
                eventType = parser.next()
            }
            if (epgDataBatch.isNotEmpty()) {
                epgDataBox.put(epgDataBatch)
            }

            assignEpgDataToChannels(thisEpgSource)

            _epgProcessState.value = ExternEpgProcessState.Success
            Resource.Success("OK")
        } catch (e: Exception) {
            Log.d("ERROR PARSING", "Error parsing XML: ${e.message}")
            _epgProcessState.value = ExternEpgProcessState.Error("Error parsing XML: ${e.message}")
            Resource.Error("Error parsing XML: ${e.message}")
        } finally {
            withContext(Dispatchers.IO) {
                inputStream.close()
            }
        }
    }

    private fun resetChannel() {
        currentChannel = Channel("", mutableListOf(), mutableListOf(), "")
    }

    private fun parseChannel(parser: XmlPullParser, thisEpgSource: EpgSource) {
        resetChannel()
        currentChannel?.id = parser.getAttributeValue(null, "id")

        while (parser.next() != XmlPullParser.END_TAG) {
            if (parser.eventType == XmlPullParser.START_TAG) {
                when (parser.name) {
                    "display-name" -> currentChannel?.displayName = mutableListOf(parser.nextText())
                    "icon" -> currentChannel?.icon = mutableListOf(parser.getAttributeValue(null, "src"))
                    "url" -> currentChannel?.url = parser.nextText()
                }
            }
        }

        val channelInDB = epgChannelBox.query(EpgSourceChannel_.chEpgId.equal("${thisEpgSource.id}_${currentChannel?.id}")).build().findUnique()
        if (channelInDB == null) {
            val epgChannelToAdd = EpgSourceChannel(
                0,
                "${thisEpgSource.id}_${currentChannel?.id}",
                currentChannel?.id ?: "",
                currentChannel?.icon,
                currentChannel?.displayName?.firstOrNull() ?: "",
                thisEpgSource.id,
                currentChannel?.displayName ?: mutableListOf(),
                true
            )
            epgChannelBox.put(epgChannelToAdd)
        } else {
            channelInDB.display_name = currentChannel?.displayName ?: channelInDB.display_name
            channelInDB.icon = currentChannel?.icon
            channelInDB.name = currentChannel?.displayName?.firstOrNull() ?: channelInDB.name
            epgChannelBox.put(channelInDB)
        }
    }

    private fun parseProgram(parser: XmlPullParser, thisEpgSource: EpgSource) {

        val start = SimpleDateFormat("yyyyMMddHHmmss Z", Locale.getDefault())
            .parse(parser.getAttributeValue(null, "start"))?.time ?: -1

        val stop = SimpleDateFormat("yyyyMMddHHmmss Z", Locale.getDefault())
            .parse(parser.getAttributeValue(null, "stop"))?.time ?: -1

        val channel = parser.getAttributeValue(null, "channel")

        val isAnUpdate = if (isUpdating) {
            epgDataBox.query(EpgDataOB_.idByAccountData.equal("${channel}_${start}_${thisEpgSource.id}")).build().findUnique() != null
        } else {
            false
        }

        if (!isAnUpdate) {
            val newEpgData = EpgDataOB(
                id = 0, 
                idByAccountData = "${channel}_${start}_${thisEpgSource.id}",
                epgId = channel ?: "",
                chId = channel ?: "",
                datum = SimpleDateFormat("yyyy-MM-dd", Locale.getDefault()).format(start),
                name = "",
                sub_title = "",
                descr = "",
                category = null,
                director = null,
                actor = null,
                date = "",
                country = null,
                showIcon = "",
                episode_num = "",
                rating = "",
                startTimestamp = start,
                stopTimestamp = stop,
                mark_archive = null,
                accountData = thisEpgSource.url,
                epgSourceId = thisEpgSource.id.toInt(),
                epChId = "${thisEpgSource.id}_${channel}"
            )
     
            while (parser.next() != XmlPullParser.END_TAG) {
                if (parser.eventType == XmlPullParser.START_TAG) {
                    when (parser.name) {
                        "title" -> newEpgData.name = parser.nextText()
                        "sub-title" -> newEpgData.sub_title = parser.nextText()
                        "desc" -> newEpgData.descr = parser.nextText()
                        "director" -> newEpgData.director?.add(parser.nextText())
                        "actor" -> newEpgData.actor?.add(parser.nextText())
                        "date" -> newEpgData.date = parser.nextText()
                        "category" -> newEpgData.category?.add(parser.nextText())
                        "country" -> newEpgData.country?.add(parser.nextText())
                        "episode-num" -> newEpgData.episode_num = parser.nextText()
                        "value" -> newEpgData.rating = parser.nextText()
                        "icon" -> newEpgData.showIcon = parser.getAttributeValue(null, "src") ?: ""
                    }
                }
            }

            epgDataBatch.add(newEpgData)
            if (epgDataBatch.size >= batchSize) {
                epgDataBox.put(epgDataBatch)
                epgDataBatch.clear()
            }
        }
    }

    private fun assignEpgDataToChannels(thisEpgSource: EpgSource) {
        epgChannelBox.query(EpgSourceChannel_.epgSourceId.equal(thisEpgSource.id)).build().find().forEach { epgChannel ->
            epgChannel.epgSource.target = thisEpgSource
            epgChannel.epgDataList.addAll(epgDataBox.query(EpgDataOB_.epChId.equal(epgChannel.chEpgId)).build().find())
            epgChannelBox.put(epgChannel)
        }
        epgDataBatch.clear()
    }

Sax Parser

Repository code:

suspend fun parseXmlStream(
        inputStream: InputStream,
        epgSourceId: Long,
        maxDays: Int,
        minDays: Int,
        sourceUrl: String
    ): Resource<String> = withContext(Dispatchers.Default) {
        try {
            val thisEpgSource = epgSourceBox.get(epgSourceId)
            inputStream.use { input ->
                val saxParserFactory = SAXParserFactory.newInstance()
                val saxParser = saxParserFactory.newSAXParser()
                val handler = EpgSaxHandler(thisEpgSource.id, maxDays, minDays, thisEpgSource.url, isUpdating)
                saxParser.parse(input, handler)
                if (handler.epgDataBatch.isNotEmpty()) {
                    epgDataBox.put(handler.epgDataBatch)
                    handler.epgDataBatch.clear()
                }
                _epgProcessState.value = ExternEpgProcessState.Success
                return@withContext Resource.Success("OK")
            }
        } catch (e: Exception) {
            Log.e("ERROR PARSING", "${e.message}")
            _epgProcessState.value = ExternEpgProcessState.Error("Error parsing XML: ${e.message}")
            return@withContext Resource.Error("Error parsing XML: ${e.message}")
        }
    }

Handler:

class EpgSaxHandler(
    private val epgSourceId: Long,
    private val maxDays: Int,
    private val minDays: Int,
    private val sourceUrl: String,
    private val isUpdating: Boolean
) : DefaultHandler() {

    private val epgSourceBox: Box<EpgSource>
    private val epgChannelBox: Box<EpgSourceChannel>
    private val epgDataBox: Box<EpgDataOB>


    init {
        val store = ObjectBox.store
        epgSourceBox = store.boxFor(EpgSource::class.java)
        epgChannelBox = store.boxFor(EpgSourceChannel::class.java)
        epgDataBox = store.boxFor(EpgDataOB::class.java)
    }

    var epgDataBatch = mutableListOf<EpgDataOB>()
    private val batchSize = 10000
    private var currentElement = ""
    private var currentChannel: Channel? = null
    private var currentProgram: EpgDataOB? = null
    private var stringBuilder = StringBuilder()


    override fun startElement(uri: String?, localName: String?, qName: String?, attributes: Attributes?) {
        currentElement = qName ?: ""
        when (qName) {
            "channel" -> {
                val id = attributes?.getValue("id") ?: ""
                currentChannel = Channel(id, mutableListOf(), mutableListOf(), "")
            }
            "programme" -> {

                val start = SimpleDateFormat("yyyyMMddHHmmss Z", Locale.getDefault())
                    .parse(attributes?.getValue("start") ?: "")?.time ?: -1

                val stop = SimpleDateFormat("yyyyMMddHHmmss Z", Locale.getDefault())
                    .parse(attributes?.getValue("stop") ?: "")?.time ?: -1

                val channel = attributes?.getValue("channel") ?: ""

                if (isUpdating) {
                    val existingProgram = epgDataBox.query(EpgDataOB_.idByAccountData.equal("${channel}_${start}_${epgSourceId}",)).build().findUnique()
                    if (existingProgram != null) {
                        currentProgram = null
                        return
                    }
                }
                currentProgram = EpgDataOB(
                    id = 0,
                    idByAccountData = "${channel}_${start}_${epgSourceId}",
                    epgId = channel,
                    chId = channel,
                    datum = SimpleDateFormat("yyyy-MM-dd", Locale.getDefault()).format(start),
                    name = "",
                    sub_title = "",
                    descr = "",
                    category = mutableListOf(),
                    director = mutableListOf(),
                    actor = mutableListOf(),
                    date = "",
                    country = mutableListOf(),
                    showIcon = "",
                    episode_num = "",
                    rating = "",
                    startTimestamp = start,
                    stopTimestamp = stop,
                    mark_archive = null,
                    accountData = sourceUrl,
                    epgSourceId = epgSourceId.toInt(),
                    epChId = "${epgSourceId}_$channel"
                )
            }
            "icon" -> {
                val src = attributes?.getValue("src") ?: ""
                currentChannel?.icon?.add(src)
                currentProgram?.showIcon = src
            }
            "desc", "title", "sub-title", "episode-num", "rating", "country", "director", "actor", "date", "display-name" -> {
                stringBuilder = StringBuilder()
            }
        }
    }

    override fun characters(ch: CharArray?, start: Int, length: Int) {
        ch?.let {
            stringBuilder.append(it, start, length)
        }
    }

    override fun endElement(uri: String?, localName: String?, qName: String?) {
        when (qName) {
            "channel" -> {
                currentChannel?.let { channel ->
                    val channelInDB = epgChannelBox.query(EpgSourceChannel_.chEpgId.equal("${epgSourceId}_${channel.id}")).build().findUnique()
                    if (channelInDB == null) {
                        val newChannel = EpgSourceChannel(
                            id = 0,
                            chEpgId = "${epgSourceId}_${channel.id}",
                            chId = channel.id,
                            icon = channel.icon,
                            display_name = channel.displayName,
                            name = channel.displayName.firstOrNull() ?: "",
                            epgSourceId = epgSourceId,
                            isExternalEpg = true
                        )
                        epgChannelBox.put(newChannel)
                    } else {
                        channelInDB.display_name = channel.displayName
                        channelInDB.icon = channel.icon
                        channelInDB.name = channel.displayName.firstOrNull() ?: channelInDB.name
                        epgChannelBox.put(channelInDB)
                    }
                }
                currentChannel = null
            }
            "programme" -> {
                currentProgram?.let { program ->
                    addEpgDataToBatch(program)
                }
                currentProgram = null
            }
            "desc" -> {
                currentProgram?.descr = stringBuilder.toString()
            }
            "title" -> {
                currentProgram?.name = stringBuilder.toString()
            }
            "sub-title" -> {
                currentProgram?.sub_title = stringBuilder.toString()
            }
            "episode-num" -> {
                currentProgram?.episode_num = stringBuilder.toString()
            }
            "rating" -> {
                currentProgram?.rating = stringBuilder.toString()
            }
            "country" -> {
                currentProgram?.country?.add(stringBuilder.toString())
            }
            "director" -> {
                currentProgram?.director?.add(stringBuilder.toString())
            }
            "actor" -> {
                currentProgram?.actor?.add(stringBuilder.toString())
            }
            "date" -> {
                currentProgram?.date = stringBuilder.toString()
            }
            "display-name" -> {
                currentChannel?.displayName?.add(stringBuilder.toString())
            }
        }
        currentElement = ""
    }



    private fun addEpgDataToBatch(epgData: EpgDataOB) {
        epgDataBatch.add(epgData)
        if (epgDataBatch.size >= batchSize) {
            processEpgDataBatch()
        }
    }

    private fun processEpgDataBatch() {
        if (epgDataBatch.isNotEmpty()) {
            epgDataBox.put(epgDataBatch)
            epgDataBatch.clear()
        }
    }
}

So I am searching for a fast way to parse the xml-data and insert it to the database, without having lags or crashes in my app 🙂 🙂
Is there something wrong in my code that causes the lags? Or isn’t it simple possible without slow down the parsing and database inserting process?

If any other code is needed, I can post it.
Here what the Memory-Profiler looks like while parsing the Programs with XmlPullParser:

UPDATE:

Memory usage & gc -> only parsing, no database usage
I used data classes Channel & Programme to parse the data somewhere, and reused always the same channel/programme:

Memory usage & gc -> parsing and creating EpgDataOB Objects (no db inserting)

Memory usage & gc -> parsing and add data to the database (db = last 10 seconds)

Memory usage & gc -> parsing, adding data to db & manage relation epg-channel with list of EpgData with:

 private fun addEpgDataToDatabase() {
        GlobalScope.launch {
            withContext(Dispatchers.IO) {
                epgDataBatch.chunked(15000).forEach { batch ->
                    epgDataBox.put(batch)
                    epgChannelBatch.forEach { epgChannel ->
                        epgChannel.epgDataList.addAll(batch.filter { it.epChId == epgChannel.chEpgId })
                    }
                    Log.d("EPGPARSING ADD TO DB", "OK")
                    delay(500)
                }
                epgDataBatch.clear()
            }
        }
    }

New code for putting the parsed data into the data (tested also 3 times on TV, it’s running much better then with the code of my question). Adding the whole epgDataBatch (= mutableListof) with one put into the database is even a little faster.

 private fun addEpgDataToDatabase() {
        epgDataBatch.chunked(30000).forEach { batch ->
            epgDataBox.store.runInTx {
                epgDataBox.put(batch)
                epgDataBox.closeThreadResources()
            }
        }
        addEpgDataToChannel()
    }

    private fun addEpgDataToChannel() {
        epgChannelBox.store.runInTx {
            for (epgCh in epgChannelBatch) {
                epgCh.epgDataList.addAll(epgDataBatch.filter { it.epChId == epgCh.chEpgId })
            }
            epgChannelBox.put(epgChannelBatch)
            epgChannelBox.closeThreadResources()
        }
        epgChannelBatch.clear()
        epgDataBatch.clear()
    }

8

Database inserts can be costly if you are doing a lot of them when inserting your parsed xml data after data object. From ObjectBox docs.

This is because it uses blocking I/O and file locks to write the database to disk as each put is in an implicit transaction.

Thus you can speed up parsing by speeding up the database inserts.

You can batch up the data in to an array and put (insert) them all in one go and thus are in only one transaction, this will cost more memory but be faster.

Or ObjectBox does have BoxStore.runInTx() that takes a Runnable to do multiple puts in a single transaction.

ObjectBox seems want you to avoid just beginning a transaction at the start of the xml parsing and ending it when you have finished xml parsing. It does have an Internal low level method to do this.

Note this also applies to other file based databases like sqlite.

4

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật