Releases · jboynyc/textnets

The Corpus class now handles missing data (#13).
Support for more corpus languages. If no statistical language model is available, Corpus tries to use a basic ("blank") model.
Improved documentation around dependencies and language support.
Added tests.

Textnet.plot and ProjectedTextnet.plot now accept arguments to selectively suppress node or edge labels. node_label_filter and edge_label_filter optionally take a function that is mapped to the iterator of nodes and edges. Only nodes or edges for which the function returns True are displayed in the plot. For example, node_label_filter=lambda n: n.degree() > 2 ensures that only nodes with a degree greater than 2 are displayed.
Noun phrases created by Corpus.noun_phrases() can now be normalized (lemmatized) by passing normalize=True (defaults to False).
Corpus now has a useful string representation.
Documentation updates, particularly to show the label filter functionality.

Python 3.7 compatibility is here, closing issue #8.
New circular_layout option for Textnet.plot. This is based "Tidier Drawings" and looks very nice for some bipartite graphs.
String representation of Textnet instances now gives helpful information.
Updated documentation to note changed Python version requirement.

ProjectedTextnet.plot now takes an argument, alpha, that allows for pruning the graph in order to visualize the "backbone." This is useful when working with hairball graphs, which is common when creating textnets. Right now, it uses Serrano et al.'s disparity filter. That means that edges with an alpha value greater than the one specified are discarded, so lower values mean more extreme pruning.
Language models can now be specified using a short ISO language code.
Bipartite networks can now be plotted using a layered layout (Sugiyama). Simply pass sugiyama_layout=True to Textnet.plot.
Incremental improvements to documentation.

Documented TextnetBase methods to output lists of nodes ranked by various centrality measures: top_betweenness and several more.
Added top_cluster_nodes to output list of top nodes per cluster found via community detection. This is useful when trying to interpret such clusters as themes/topics (in the projected word-to-word graph) or as groupings (in the document-to-document graph).
Small additions to documentation.

Lots of changes, some of them breaking, but overall just providing nicer abstractions over the underlying pandas and igraph stuff.

Introduced TextnetBase and ProjectedTextnet classes, and made Textnet a descendant of the former.
Improved code modularity to make it easier to add features.
Corpus is now based on a Series rather than a DataFrame.
Added methods for creating an instance of Corpus: from_df, from_csv, from_sql.
Expanded and improved documentation.
Added bibliography to documentation using a Sphinx bibtex plugin.
A first contributor!

See what textnets can now do in this demo notebook.

The package also has more documentation, more unit tests, and should be easier to install.

Provide feedback