2025-09-04 17:31:31 +02:00
2025-08-28 15:57:36 +02:00
2025-09-04 17:09:05 +02:00
2025-08-24 22:00:02 +02:00
2025-08-24 22:01:44 +02:00
2025-08-19 17:23:02 +02:00
2025-09-04 17:31:31 +02:00
2025-08-19 17:35:32 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00
2025-09-04 17:09:05 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00
2025-08-19 17:23:02 +02:00

How to use

Run python3 phrase_index.py with any of the parameters listed below:

  -h, --help            show this help message and exit
  --db DB               Database file name
  --dataset DATASET     ir_datasets name (e.g., cranfield, msmarco-passage)
  --stopwords STOPWORDS Stopwords to use (english, none)
  --mode MODE           Indexing mode (duckdb, phrases)
  --min-freq MIN_FREQ   Minimum frequency for phrases (only for mode "phrases")
  --min-pmi MIN_PMI     Minimum PMI for phrases (only for mode "phrases")

Helper scripts

  • ./auto_phrase.sh and ./auto_zoekeend.sh can be used to automatically index, search and evaluate the results and store it in a results directory. auto_phrase uses phrase_index.py, while auto_zoekeend uses ze_index.py.

  • ./batch_phrase.sh can be used to create the results using multiple different variables in one go.

  • And display_results.sh can be used to display the evaluation metrics of all previous results. (So MAP, CiP, dictionary size, terms size, number of phrases, AVGDL and SUMDF)

Description
No description provided
Readme 337 KiB
Languages
Python 93.8%
Shell 6.2%