[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Microsoft Azure Cognitive Services

The Microsoft Azure Cognitive Services API uses machine learning to find insights and relationships in text. The procedures in this chapter act as a wrapper around calls to this API to extract entities and key phrases and provide sentiment analysis from text stored as node properties.

Each procedure has two modes:

  • Stream - returns a map constructed from the JSON returned from the API

Procedure Overview

The procedures are described below:

Qualified Name Type Release

apoc.nlp.azure.entities.graph

Procedure

APOC Core

apoc.nlp.azure.entities.stream

Procedure

APOC Core

apoc.nlp.azure.keyPhrases.graph

Procedure

APOC Core

apoc.nlp.azure.keyPhrases.stream

Procedure

APOC Core

apoc.nlp.azure.sentiment.graph

Procedure

APOC Core

apoc.nlp.azure.sentiment.stream

Procedure

APOC Core

At the moment, Microsoft Azure Cognitive Services API supports text input in more than 10 languages. For better results, make sure that your text is one of the supported languages by Cognitive Services.

Entity Extraction

The entity extraction procedures (apoc.nlp.azure.entities.*) are wrappers around the Entities end point of the Azure Text Analytics API. This API method returns a list of known entities and general named entities ("Person", "Location", "Organization" etc) in a given document.

The procedures are described below:

signature

apoc.nlp.azure.entities.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

apoc.nlp.azure.entities.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)

The procedure supports the following config parameters:

Table 1. Config parameters
name type default description

key

String

null

Microsoft.CognitiveServicesTextAnalytics API Key

url

String

null

Microsoft.CognitiveServicesTextAnalytics Endpoint

nodeProperty

String

text

The property on the provided node that contains the unstructured text to be analyzed

In addition, apoc.nlp.azure.entities.graph supports the following config parameters:

Table 2. Config parameters
name type default description

scoreCutoff

Double

0.0

Lower limit for the score of an entity to be present in the graph. Value must be between 0 and 1.

Score is an indicator of the level of confidence that Amazon Comprehend has in the accuracy of the detection.

write

Boolean

false

persist the graph of entities

writeRelationshipType

String

ENTITY

relationship type for relationships from source node to entity nodes

writeRelationshipProperty

String

score

relationship property for relationships from source node to entity nodes

Streaming mode
CALL apoc.nlp.azure.entities.stream(source:Node or List<Node>, {
  key: String,
  url: String,
  nodeProperty: String
})
YIELD value
Graph mode
CALL apoc.nlp.azure.entities.graph(source:Node or List<Node>, {
  key: String,
  url: String,
  nodeProperty: String,
  scoreCutoff: Double,
  writeRelationshipType: String,
  writeRelationshipProperty: String,
  write: Boolean
})
YIELD graph

Key Phrases

The key phrase procedures (apoc.nlp.azure.keyPhrases.*) are wrappers around the Key Phrases end point of the Azure Text Analytics API. A key phrase is a key talking point in the input text.

The procedure is described below:

signature

apoc.nlp.azure.keyPhrases.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

apoc.nlp.azure.keyPhrases.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)

The procedure support the following config parameters:

Table 3. Config parameters
name type default description

key

String

null

Microsoft.CognitiveServicesTextAnalytics API Key

url

String

null

Microsoft.CognitiveServicesTextAnalytics Endpoint

nodeProperty

String

text

The property on the provided node that contains the unstructured text to be analyzed

In addition, apoc.nlp.azure.keyPhrases.graph supports the following config parameters:

Table 4. Config parameters
name type default description

write

Boolean

false

persist the graph of key phrases

writeRelationshipType

String

KEY_PHRASE

relationship type for relationships from source node to key phrase nodes

Streaming mode
CALL apoc.nlp.azure.keyPhrases.stream(source:Node or List<Node>, {
  key: String,
  url: String,
  nodeProperty: String
})
YIELD value
Graph mode
CALL apoc.nlp.azure.keyPhrases.graph(source:Node or List<Node>, {
  key: String,
  url: String,
  nodeProperty: String,
  writeRelationshipType: String,
  write: Boolean
})
YIELD graph

Sentiment

The sentiment procedures (apoc.nlp.azure.sentiment.*) are wrappers around the Sentiment end point of the Azure Text Analytics API. The API returns a numeric score between 0 and 1. Scores close to 1 indicate positive sentiment, while scores close to 0 indicate negative sentiment. A score of 0.5 indicates the lack of sentiment (e.g. a factoid statement).

The procedures are described below:

signature

apoc.nlp.azure.sentiment.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

apoc.nlp.azure.sentiment.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)

The procedures support the following config parameters:

Table 5. Config parameters
name type default description

key

String

null

Microsoft.CognitiveServicesTextAnalytics API Key

url

String

null

Microsoft.CognitiveServicesTextAnalytics Endpoint

nodeProperty

String

text

The property on the provided node that contains the unstructured text to be analyzed

In addition, apoc.nlp.azure.sentiment.graph supports the following config parameters:

Table 6. Config parameters
name type default description

write

Boolean

false

persist the graph of sentiment

Streaming mode
CALL apoc.nlp.azure.sentiment.stream(source:Node or List<Node>, {
  key: String,
  url: String,
  nodeProperty: String
})
YIELD value
Graph mode
CALL apoc.nlp.azure.sentiment.graph(source:Node or List<Node>, {
  key: String,
  url: String,
  nodeProperty: String,
  write: Boolean
})
YIELD graph

Install Dependencies

The NLP procedures have dependencies on Kotlin and client libraries that are not included in the APOC Library.

These dependencies are included in apoc-nlp-dependencies-4.3.0.12.jar, which can be downloaded from the releases page. Once that file is downloaded, it should be placed in the plugins directory and the Neo4j Server restarted.

Setting up API Key and URL

We can generate an API key and URL by following the instructions in the Quickstart: Use the Text Analytics client library article. Once we’ve done that, we should be able to see a page listing our credentials, similar to the screenshot below:

azure text analytics keys
Figure 1. Azure Text Analytics credentials

In this case our API URL is https://neo4j-nlp-text-analytics.cognitiveservices.azure.com/, and we can use either of the hidden keys.

Let’s populate and execute the following commands to create parameters that contains these details.

The following define the apiKey and apiSecret parameters
:param apiKey => ("<api-key-here>");
:param apiUrl => ("<api-url-here>");

Alternatively we can add these credentials to apoc.conf and retrieve them using the static value storage functions. See Static Value Storage

apoc.conf
apoc.static.azure.apiKey=<api-key-here>
apoc.static.azure.apiUrl=<api-url-here>
The following retrieves AWS credentials from apoc.conf
RETURN apoc.static.getAll("azure") AS azure;
Table 7. Results
azure

{apiKey: "<api-key-here>", apiUrl: "<api-url-here>"}

Examples

The examples in this section are based on the following sample graph:

CREATE (:Article {
  uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/",
  body: "These days I’m rarely more than a few feet away from my Nintendo Switch and I play board games, card games and role playing games with friends at least once or twice a week. I’ve even organised lunch-time Mario Kart 8 tournaments between the Neo4j European offices!"
});

CREATE (:Article {
  uri: "https://en.wikipedia.org/wiki/Nintendo_Switch",
  body: "The Nintendo Switch is a video game console developed by Nintendo, released worldwide in most regions on March 3, 2017. It is a hybrid console that can be used as a home console and portable device. The Nintendo Switch was unveiled on October 20, 2016. Nintendo offers a Joy-Con Wheel, a small steering wheel-like unit that a Joy-Con can slot into, allowing it to be used for racing games such as Mario Kart 8."
});

Entity Extraction

Let’s start by extracting the entities from one of the Article nodes. The text that we want to analyze is stored in the body property of the node, so we’ll need to specify that via the nodeProperty configuration parameter.

The following streams the entities for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.entities.stream(a, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;
Table 8. Results
entity

{name: "Nintendo Switch", wikipediaId: "Nintendo Switch", type: "Other", matches: [{length: 15, text: "Nintendo Switch", wikipediaScore: 0.8339868065025469, offset: 56}], bingId: "b3d617ef-81fc-4188-9a2b-a5cf1f8534b5", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Nintendo_Switch"}

{name: "Nintendo Switch", type: "Organization", matches: [{length: 15, entityTypeScore: 0.94, text: "Nintendo Switch", offset: 56}]}

{name: "Oberon Media", wikipediaId: "Oberon Media", type: "Organization", matches: [{length: 6, text: "I play", wikipediaScore: 0.032446316016667254, offset: 76}], bingId: "166f6e0f-33b7-8707-bb8b-5a932c498333", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Oberon_Media"}

{name: "a week", subType: "Duration", type: "DateTime", matches: [{length: 6, entityTypeScore: 0.8, text: "a week", offset: 166}]}

{name: "Mario Kart 8", wikipediaId: "Mario Kart 8", type: "Other", matches: [{length: 12, text: "Mario Kart 8", wikipediaScore: 0.7802000593632747, offset: 205}], bingId: "ce6f55ec-d3d7-032a-0bf8-15ad3e8df3f4", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Mario_Kart_8"}

{name: "Mario Kart", type: "Organization", matches: [{length: 10, entityTypeScore: 0.72, text: "Mario Kart", offset: 205}]}

{name: "8", subType: "Number", type: "Quantity", matches: [{length: 1, entityTypeScore: 0.8, text: "8", offset: 216}]}

{name: "Neo4j", wikipediaId: "Neo4j", type: "Other", matches: [{length: 5, text: "Neo4j", wikipediaScore: 0.8150388253887939, offset: 242}], bingId: "bc2f436b-8edd-6ba6-b2d3-69901348d653", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Neo4j"}

{name: "Europe", wikipediaId: "Europe", type: "Location", matches: [{length: 8, text: "European", wikipediaScore: 0.00591759926701263, offset: 248}], bingId: "501457aa-5b70-cfba-cfd8-be882b4bac1e", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Europe"}

We get back 9 different entities, although we can see that some of them are referring to the same things, albeit with different type values. We could then apply a Cypher statement that creates one node per entity and an ENTITY relationship from each of those nodes back to the Article node.

The following streams the entities for the Pokemon article and then creates nodes for each entity
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.entities.stream(a, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
WITH a, entity.name AS entity, collect(entity.type) AS types
MERGE (e:Entity {name: entity})
SET e.type = types
MERGE (a)-[:ENTITY]->(e);

Alternatively we can use the graph mode to automatically create the entity graph. As well as having the Entity label, each entity node will have another label based on the value of the type property. By default, a virtual graph is returned.

The following returns a virtual graph of entities for the Pokemon and Nintendo Switch articles
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body",
  writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g

We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph.

apoc.nlp.azure.entities multiple.graph
Figure 2. Pokemon and Nintendo Switch entities graph

On this visualization we can also see the score for each entity node. This score represents the level of confidence that the API has in its detection of the entity. We can specify a minimum cut off value for the score using the scoreCutoff property.

The following returns a virtual graph of entities with a score >= 0.7 for the Pokemon and Nintendo Switch articles
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body",
  scoreCutoff: 0.7,
  writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g

We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph with confidence >= 0.7.

apoc.nlp.azure.entities multiple.graph cutoff
Figure 3. Pokemon and Nintendo Switch entities graph with confidence >= 0.7

If we’re happy with this graph and would like to persist it in Neo4j, we can do this by specifying the write: true configuration.

The following creates a HAS_ENTITY relationship from the article to each entity
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body",
  scoreCutoff: 0.7,
  writeRelationshipType: "HAS_ENTITY",
  writeRelationshipProperty: "azureEntityScore",
  write: true
})
YIELD graph AS g
RETURN g;

We can then write a query to return the entities that have been created.

The following returns articles and their entities
MATCH (article:Article)
RETURN article.uri AS article,
       [(article)-[r:HAS_ENTITY]->(e:Entity) | {text: e.text, score: r.azureEntityScore}] AS entities;
Table 9. Results
article entities

"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"

[{score: 0.72, text: "Mario Kart"}, {score: 0.7802000593632747, text: "Mario Kart 8"}, {score: 0.8, text: "8"}, {score: 0.8, text: "a week"}, {score: 0.94, text: "Nintendo Switch"}, {score: 0.8150388253887939, text: "Neo4j"}]

"https://en.wikipedia.org/wiki/Nintendo_Switch"

[{score: 0.9023679924293266, text: "Joy-Con"}, {score: 0.98, text: "Nintendo"}, {score: 0.8, text: "March 3, 2017"}, {score: 0.9355623498560008, text: "Nintendo Switch"}, {score: 0.92, text: "Mario Kart"}, {score: 0.8, text: "8"}, {score: 0.8863202650046607, text: "Mario Kart 8"}, {score: 0.8, text: "October 20, 2016"}]

Key Phrases

Let’s now extract the key phrases from the Article node. The text that we want to analyze is stored in the body property of the node, so we’ll need to specify that via the nodeProperty configuration parameter.

The following streams the key phrases for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.keyPhrases.stream(a, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body"
})
YIELD value
UNWIND value.keyPhrases AS keyPhrase
RETURN keyPhrase;
Table 10. Results
keyPhrase

"board games"

"card games"

"tournaments"

"role"

"organised lunch-time Mario Kart"

"Neo4j European offices"

"Nintendo Switch"

"friends"

"feet"

"days"

Alternatively we can use the graph mode to automatically create a key phrase graph. One node with the KeyPhrase label will be created for each key phrase extracted.

By default, a virtual graph is returned, but the graph can be persisted by specifying the write: true configuration.

The following returns a graph of key phrases for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.keyPhrases.graph(a, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body",
  writeRelationshipType: "KEY_PHRASE",
  write: true
})
YIELD graph AS g
RETURN g;

We can see a Neo4j Browser visualization of the virtual graph in Pokemon key phrases graph.

apoc.nlp.azure.keyPhrases.graph
Figure 4. Pokemon key phrases graph

We can then write a query to return the key phrases that have been created.

The following returns articles and their entities
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
RETURN a.uri AS article,
       [(a)-[:KEY_PHRASE]->(k:KeyPhrase) | k.text] AS keyPhrases;
Table 11. Results
article keyPhrases

"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"

["card games", "board games", "friends", "feet", "Nintendo Switch", "days", "organised lunch-time Mario Kart", "tournaments", "Neo4j European offices", "role"]

Sentiment

Let’s now extract the sentiment for the Article node. The text that we want to analyze is stored in the body property of the node, so we’ll need to specify that via the nodeProperty configuration parameter.

The following streams the key phrases for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.sentiment.stream(a, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body"
})
YIELD value
RETURN value;
Table 12. Results
value

{score: 0.5, id: "0"}

Alternatively we can use the graph mode to automatically store the sentiment and its score.

By default, a virtual graph is returned, but the graph can be persisted by specifying the write: true configuration. The sentiment score is stored in the sentimentScore property.

The following returns a graph with the sentiment for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.sentiment.graph(a, {
  key: $apiKey,
  url: $apiUrl,
  nodeProperty: "body",
  write: true
})
YIELD graph AS g
UNWIND g.nodes AS node
RETURN node {.uri, .sentimentScore} AS node;
Table 13. Results
node

{uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/", sentimentScore: 0.5}