I’m new to zig and learning it atm. When I use std.mem.splitAny
or splitSequence
or tokenize
to split a console input string with a space as delimiter, it returns garbage. So I tried to write my own unicode split function. Here it is:
fn split_utf8(allocator: std.mem.Allocator, in: []const u8, delimiter: []const u8) !std.ArrayList([]const u8) {
var list = std.ArrayList([]const u8).init(allocator);
try list.append("");
var utf8 = (try std.unicode.Utf8View.init(in)).iterator();
while (utf8.nextCodepointSlice()) |codepoint| {
if (std.mem.eql(u8, codepoint, delimiter)) {
try list.append("");
}
const last: []const u8 = list.pop();
defer allocator.free(last);
const new: []const u8 = std.mem.concat(allocator, u8, &[_][]const u8{last, codepoint}) catch "Something went really wrong";
try list.append(new);
}
return list;
}
test "Split test" {
var expected = std.ArrayList([]const u8).init(std.testing.allocator);
defer expected.deinit();
try expected.append("F");
try expected.append("1");
try expected.append("100");
try expected.append("????click");
const actual = try split_utf8(std.testing.allocator, "F 1 100 ????click", " ");
defer actual.deinit();
try testing.expectEqual(expected.items.len, actual.items.len);
// further testing
}
but I always get the error that the address produced by std.mem.concat is leaked.
I’ve tried the following things:
- defer free new (crashes)
- in the test, after the assert, loop over actual and free each element (crashes)
I’m pretty lost. I’d like to know
- Is there a better way to split unicode strings?
- If there were no better way, how can I fix the split_utf8 method? Building a list of growing things might be something that might come up in the future.
Thanks!