8000 GitHub - zengjichuan/Topic_Disc: Source code and dataset of "Joint Modeling of Topics and Discourse in Microblog Conversations"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Source code and dataset of "Joint Modeling of Topics and Discourse in Microblog Conversations"

License

Notifications You must be signed in to change notification settings

zengjichuan/Topic_Disc

Repository files navigation

TopicDics

TopicDisc is an unsupervised framework for jointly modeling topic content and discourse behavior in microblog conversations.

More details can be referred to:

Jichuan Zeng, Jing Li, Yulan He, Cuiyun Gao, Michael Lyu, Irwin King. What You Say and How You Say it: Joint Modeling of Topics andDiscourse in Microblog Conversations. TACL 2019.

Input Data

TREC-11 and TWT-16 Dataset are in data/twitter-conv/. We remove the text of each tweet according to Twitter's policy. One needs to download the tweets with the given tweet ID by Twitter API. All the Tweets text should be put in text_lst field of twitter_m.json accordingly.

For example:

"meta_lst": [
        {
            "id": "816147858076241920"
        },
        {
            "id": "816175837737123841"
        },
        {
            "id": "816190727323459584"
        },
        {
            "id": "816192396945924096"
        }
            ],
"text_lst": [
        "<hash> When you realize TransCult thinks Radical Feminism is more \"degenerate\" than the KKK &amp; NeoNazis.\u2026 <url>", 
        "<men> Kind of interesting trans cult is using KKK/Nazi rhetoric now -- worried that their bullying is only getting them so far? \ud83d\ude02", 
        "<men> <men> LOL. That's so funny.  This must really tap into some primitive fear. Witches zooming off to cavort with satan\ud83d\ude02", 
        "<men> <men> &lt;cackles&gt; \ud83d\ude02"
]

Usage

After download all the tweets, save the data file as 'twitter.json'.

You can run the main code as:

$ python topic_disc.py

You can visualize the topic and discourse distribution among words by enabling --output_vis True.

$ python topic_disc.py --output_vis True
$ cd vis/
$ python vis_atten.py   # you should put the gen_sample.txt to this folder.

The visualization looks like this:

About

Source code and dataset of "Joint Modeling of Topics and Discourse in Microblog Conversations"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0