GitHub - DylanJoo/relevation_fintext: Information Retrieval Relevance Judging System

1. Rebuild the schema and database

rm db/relevation.db
python manage.py migrate --run-syncdb

2. Prepare collections (for retrieved documents)

For efficiency, this system would read the file of document once a time. Thus, each file has only one line (a paragraph) in a file; the filename refers to document identifier (docID).

The collection (references) we use is from 10K corpus (companies in SP 500 and available in 2018-2023.)
Please download from here or cfda server:/home/ythsiao/output. Then, you can use split_corpus.py to pool the corpus, to meet the reference/document format.
The documents will output in the documents folder.

mkdir -p documents/
for file in raw_data/*/10-K/*jsonl;do
    python3 preprocess/split_corpus.py \
        --input_jsonl $file \   # the jsonl with entire document.
        --output_dir documents/ # all documents would output here.
done

3. Prepare queries (for empirical evaluation)

The evaluation queries have already been stored in raw_query: the 10K of 5 companies in 2020 and 2022.

AAPL/10-K/20201030_10-K_320193.jsonl
AAPL/10-K/20221028_10-K_320193.jsonl
HLT/10-K/20200211_10-K_1585689.jsonl
HLT/10-K/20220216_10-K_1585689.jsonl
....

All the queries are from MD&A paragraphs, the other MD&A can be downloaded from here or cfda server: /home/ythsiao/mda.

4. Start server

python manage.py runserver

5. Upload queries (for task1) and results (for task2)

See the example query and qresults in this repo: query.jsonl and qresult.txt.

Annotation results

You are good to go now. Once the judgements are done, you can find the download button in the first annotation page (Queries @ the navigation bar). They include the judged queries and judged reference-to-target relevance.

Example query:

{
    "id": "20200220_10-K_1090727_part2_item7_para1",
    "text": "Management's discussion and analysis of financial condition and results of operations Overview Highlights of our annual results follow: Yea....",
    "highlight": "",
    "category": {"0": 0, "1": 0, "2": 1, "3": 0, "4": 0},
    "topic": {"1": 1, "2": 0, "3": 0, "4": 1, "5": 0, "6": 0}
}

Example reference judgements:

20181105_10-K_320193_part2_item7_para1 Q0 20181105_10-K_320193_part1_item2_para1 1
20181105_10-K_320193_part2_item7_para1 Q0 20181105_10-K_320193_part1_item2_para2 0
20181105_10-K_320193_part2_item7_para1 Q0 20181105_10-K_320193_part1_item2_para3 2

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
bootstrap_toolkit		bootstrap_toolkit
db		db
example		example
judgementapp		judgementapp
preprocess		preprocess
raw_query		raw_query
relevation		relevation
.DS_Store		.DS_Store
.gitignore		.gitignore
License.txt		License.txt
README.md		README.md
README_orig.md		README_orig.md
export_judgements.sh		export_judgements.sh
export_query.sh		export_query.sh
manage.py		manage.py
requirements.txt		requirements.txt
split_corpus.sh		split_corpus.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Rebuild the schema and database

2. Prepare collections (for retrieved documents)

3. Prepare queries (for empirical evaluation)

4. Start server

5. Upload queries (for task1) and results (for task2)

Annotation results

About

Releases

Packages

Languages

License

DylanJoo/relevation_fintext

Folders and files

Latest commit

History

Repository files navigation

1. Rebuild the schema and database

2. Prepare collections (for retrieved documents)

3. Prepare queries (for empirical evaluation)

4. Start server

5. Upload queries (for task1) and results (for task2)

Annotation results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages