First question: is there a document that describes the rules how MariaDB builds the fulltext index words (for InnoDB)?
I set innodb_ft_min_token_size=1
to include also one letter words in the index. But this does not seem to work for words like “D’Artagnan”: the “d” letter is not found in information_schema.INNODB_FT_INDEX_TABLE
. It works for “D Artagnan”.
So I assume that single letters that are followed by a quote are ignored when building the index?
You could also post the link to the source code file, but I don’t know whether my C knowledge is sufficient 😉
Second question: the doc (https://mariadb.com/kb/en/information-schema-innodb_ft_index_table-table/) states that information_schema.INNODB_FT_INDEX_TABLE.doc_id
matches to the id column of the original table: “DOC_ID: Document ID of the newly added row, either an appropriate ID column or an internal InnoDB value.” But this did not happen for me – the doc_id values do not match my ID column. I tried the datatypes “int” and “bigint”.
Here is a script that demonstrates my question:
create table FulltextTest
(
textid int not null,
text varchar(254) not null
) ENGINE=INNODB CHARACTER SET utf8;
alter table FulltextTest add primary key FulltextTest_PK (textid);
insert into FulltextTest (textid, text) values (100, 'D''Artagnan, Lucas');
insert into FulltextTest (textid, text) values (200, 'De''Artagnan, Lucas');
insert into FulltextTest (textid, text) values (201, 'D Artagnan, Lucas');
insert into FulltextTest (textid, text) values (202, 'L''Artagnan, Lucas');
insert into FulltextTest (textid, text) values (203, 'M''Artagnan, Lucas');
alter table FulltextTest add FULLTEXT index FulltextTest_IXFULL1(text);
SET GLOBAL innodb_ft_aux_table='<insert_database_name>/fulltexttest';
select * from information_schema.INNODB_FT_INDEX_TABLE order by doc_id, position;
SET GLOBAL innodb_ft_aux_table=null;
drop table FulltextTest;
The select on information_schema.INNODB_FT_INDEX_TABLE
returns this:
WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
---|---|---|---|---|---|
artagnan | 2 | 6 | 5 | 2 | 2 |
lucas | 2 | 6 | 5 | 2 | 12 |
de | 3 | 3 | 1 | 3 | 0 |
artagnan | 2 | 6 | 5 | 3 | 3 |
lucas | 2 | 6 | 5 | 3 | 13 |
d | 4 | 4 | 1 | 4 | 0 |
artagnan | 2 | 6 | 5 | 4 | 2 |
lucas | 2 | 6 | 5 | 4 | 12 |
artagnan | 2 | 6 | 5 | 5 | 2 |
lucas | 2 | 6 | 5 | 5 | 12 |
artagnan | 2 | 6 | 5 | 6 | 2 |
lucas | 2 | 6 | 5 | 6 | 12 |
The first two lines correspond to the first row of my sample data and so on.
0
To answer part one of my question myself: this was changed by https://jira.mariadb.org/browse/MDEV-20797
So, I have to eliminate single letters before an apostrophe while splitting the search string into word fragments.
Before, my search was like this:
select * From FulltextTest where
MATCH(Text) against ('+d*, +artagnan*' IN BOOLEAN MODE) > 0;
It should be:
select * From FulltextTest where
MATCH(Text) against ('+artagnan*' IN BOOLEAN MODE) > 0;
To find the “D'” part, I have to combine it with a “like” query (as suggested here: MariaDB fulltext search with special chars and “word starts with”):
select * From FulltextTest where
MATCH(Text) against ('+artagnan*' IN BOOLEAN MODE) > 0 and
Text like '%D''Artagnan%';