mirror of
https://github.com/ArthurIdema/Zoekeend-Phrase-Indexing.git
synced 2026-02-09 01:22:23 +00:00
Added descriptions to files
This commit is contained in:
parent
872a13a394
commit
d1d7eb517b
7
.gitignore
vendored
7
.gitignore
vendored
@ -11,7 +11,12 @@ plot*
|
||||
*lock*
|
||||
*.db
|
||||
*.ciff
|
||||
*.csv
|
||||
combined*.csv
|
||||
comparison*.csv
|
||||
no*.csv
|
||||
output*.csv
|
||||
p*.csv
|
||||
results*.csv
|
||||
*.sync*
|
||||
*.log
|
||||
/trec_eval/
|
||||
|
||||
@ -15,4 +15,9 @@ Run `python3 phrase_index.py` with any of the parameters listed below:
|
||||
|
||||
- `./batch_phrase.sh` can be used to create the results using multiple different variables in one go.
|
||||
|
||||
- And display_results.sh can be used to display the evaluation metrics of all previous results. (So MAP, CiP, dictionary size, terms size, number of phrases, AVGDL and SUMDF)
|
||||
- And `display_results.sh` can be used to display the evaluation metrics of all previous results. (So MAP, CiP, dictionary size, terms size, number of phrases, AVGDL and SUMDF)
|
||||
|
||||
### Statistical Analysis and Comparison
|
||||
- **[compare_phrases_vs_duckdb.py](compare_phrases_vs_duckdb.py)** - Performs two-tailed pairwise sign test comparing MAP (Mean Average Precision) results between phrase-based and baseline approaches. Uses min_pmi=24 as baseline. Requires scipy for statistical testing.
|
||||
|
||||
- **[compare_postings_cost_vs_duckdb.py](compare_postings_cost_vs_duckdb.py)** - Similar to above but compares Cost in Postings (CiP) metric instead of MAP. Evaluates computational efficiency of different indexing approaches.
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/bin/bash
|
||||
# This script can be used to run search and evaluation over existing databases in a results directory
|
||||
# Usage: ./batch_search_eval.sh <results_dir> <queries_dir> <qrels_file>
|
||||
|
||||
if [ "$#" -ne 3 ]; then
|
||||
|
||||
@ -1,5 +1,6 @@
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
# This script is a two tailed pairwise sign test comparing MAP results against a baseline with min_pmi=24
|
||||
|
||||
try:
|
||||
from scipy.stats import binomtest
|
||||
|
||||
@ -1,5 +1,6 @@
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
# This script is a two tailed pairwise sign test comparing Cost in Postings against a baseline with min_pmi=24
|
||||
|
||||
try:
|
||||
from scipy.stats import binomtest
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
# This script can be used to run an automated CIFF indexing, searching and evaluation process (with bigrams)
|
||||
# Settings
|
||||
DB="database.db"
|
||||
OUT="results.txt"
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/bin/bash
|
||||
# This script can be used to run an automated zoekeend indexing, searching and evaluation process (no bigrams)
|
||||
set -e
|
||||
|
||||
# Settings
|
||||
|
||||
@ -1,4 +1,6 @@
|
||||
#!/bin/bash
|
||||
# This script can be used to run a batch of phrase indexing experiments with varying parameters
|
||||
# Like the minimum frequency and minimum PMI thresholds, to use stopwords or not etc.
|
||||
set -e
|
||||
|
||||
DB_BASE="database"
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
#!/bin/bash
|
||||
|
||||
# This script can be used to display results (CiP, MAP, etc.) from a batch of experiments, given a folder
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
Loading…
Reference in New Issue
Block a user