mirror of
https://github.com/ArthurIdema/Zoekeend-Phrase-Indexing.git
synced 2026-02-09 01:22:23 +00:00
Added descriptions to files
This commit is contained in:
parent
872a13a394
commit
d1d7eb517b
7
.gitignore
vendored
7
.gitignore
vendored
@ -11,7 +11,12 @@ plot*
|
|||||||
*lock*
|
*lock*
|
||||||
*.db
|
*.db
|
||||||
*.ciff
|
*.ciff
|
||||||
*.csv
|
combined*.csv
|
||||||
|
comparison*.csv
|
||||||
|
no*.csv
|
||||||
|
output*.csv
|
||||||
|
p*.csv
|
||||||
|
results*.csv
|
||||||
*.sync*
|
*.sync*
|
||||||
*.log
|
*.log
|
||||||
/trec_eval/
|
/trec_eval/
|
||||||
|
|||||||
@ -15,4 +15,9 @@ Run `python3 phrase_index.py` with any of the parameters listed below:
|
|||||||
|
|
||||||
- `./batch_phrase.sh` can be used to create the results using multiple different variables in one go.
|
- `./batch_phrase.sh` can be used to create the results using multiple different variables in one go.
|
||||||
|
|
||||||
- And display_results.sh can be used to display the evaluation metrics of all previous results. (So MAP, CiP, dictionary size, terms size, number of phrases, AVGDL and SUMDF)
|
- And `display_results.sh` can be used to display the evaluation metrics of all previous results. (So MAP, CiP, dictionary size, terms size, number of phrases, AVGDL and SUMDF)
|
||||||
|
|
||||||
|
### Statistical Analysis and Comparison
|
||||||
|
- **[compare_phrases_vs_duckdb.py](compare_phrases_vs_duckdb.py)** - Performs two-tailed pairwise sign test comparing MAP (Mean Average Precision) results between phrase-based and baseline approaches. Uses min_pmi=24 as baseline. Requires scipy for statistical testing.
|
||||||
|
|
||||||
|
- **[compare_postings_cost_vs_duckdb.py](compare_postings_cost_vs_duckdb.py)** - Similar to above but compares Cost in Postings (CiP) metric instead of MAP. Evaluates computational efficiency of different indexing approaches.
|
||||||
|
|||||||
@ -1,4 +1,5 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
# This script can be used to run search and evaluation over existing databases in a results directory
|
||||||
# Usage: ./batch_search_eval.sh <results_dir> <queries_dir> <qrels_file>
|
# Usage: ./batch_search_eval.sh <results_dir> <queries_dir> <qrels_file>
|
||||||
|
|
||||||
if [ "$#" -ne 3 ]; then
|
if [ "$#" -ne 3 ]; then
|
||||||
|
|||||||
@ -1,5 +1,6 @@
|
|||||||
import pandas as pd
|
import pandas as pd
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
# This script is a two tailed pairwise sign test comparing MAP results against a baseline with min_pmi=24
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from scipy.stats import binomtest
|
from scipy.stats import binomtest
|
||||||
|
|||||||
@ -1,5 +1,6 @@
|
|||||||
import pandas as pd
|
import pandas as pd
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
# This script is a two tailed pairwise sign test comparing Cost in Postings against a baseline with min_pmi=24
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from scipy.stats import binomtest
|
from scipy.stats import binomtest
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
set -e
|
set -e
|
||||||
|
# This script can be used to run an automated CIFF indexing, searching and evaluation process (with bigrams)
|
||||||
# Settings
|
# Settings
|
||||||
DB="database.db"
|
DB="database.db"
|
||||||
OUT="results.txt"
|
OUT="results.txt"
|
||||||
|
|||||||
@ -1,4 +1,5 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
# This script can be used to run an automated zoekeend indexing, searching and evaluation process (no bigrams)
|
||||||
set -e
|
set -e
|
||||||
|
|
||||||
# Settings
|
# Settings
|
||||||
|
|||||||
@ -1,4 +1,6 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
# This script can be used to run a batch of phrase indexing experiments with varying parameters
|
||||||
|
# Like the minimum frequency and minimum PMI thresholds, to use stopwords or not etc.
|
||||||
set -e
|
set -e
|
||||||
|
|
||||||
DB_BASE="database"
|
DB_BASE="database"
|
||||||
|
|||||||
@ -1,5 +1,5 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
# This script can be used to display results (CiP, MAP, etc.) from a batch of experiments, given a folder
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user