[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Google Cloud Platform (GCP)

Google Cloud Platform’s Natural Language API lets users derive insights from unstructured text using Google machine learning. The procedures in this chapter act as a wrapper around calls to this API to extract entities, categories, or sentiment from text stored as node properties.

Each procedure has two modes:

  • Stream - returns a map constructed from the JSON returned from the API

  • Graph - creates a graph or virtual graph based on the values returned by the API

The procedures described in this chapter make API calls and subsequent updates to the database on the calling thread. If we want to make parallel requests to the API and avoid out of memory errors from keeping too much transaction state in memory while running procedures that write to the database, see Batching Requests.

At the moment, GCP Natural Language API supports text input in more than 10 languages. For better results, make sure that your text is one of the supported languages by Natural Language API. If we input a text of an unsupported language, you might get a "HTTP response code: 400" error.

Procedure Overview

The procedures are described below:

Qualified Name Type Release

apoc.nlp.gcp.entities.graph

Creates a (virtual) entity graph for provided text

Procedure

Apoc Extended

apoc.nlp.gcp.entities.stream

Returns a stream of entities for provided text

Procedure

Apoc Extended

apoc.nlp.gcp.classify.graph

Classifies a document into categories.

Procedure

Apoc Extended

apoc.nlp.gcp.classify.stream

Classifies a document into categories.

Procedure

Apoc Extended

Entity Extraction

The entity extraction procedures (apoc.nlp.gcp.entities.*) are wrappers around the documents.analyzeEntities method of the Google Natural Language API. This API method finds named entities (currently proper names and common nouns) in the text along with entity types, salience, mentions for each entity, and other properties.

The procedures are described below:

signature

apoc.nlp.gcp.entities.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)

apoc.nlp.gcp.entities.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

The procedures support the following config parameters:

Table 1. Config parameters
name type default description

key

String

null

API Key for Google Natural Language API

nodeProperty

String

text

The property on the provided node that contains the unstructured text to be analyzed

In addition, apoc.nlp.gcp.entities.graph supports the following config parameters:

Table 2. Config parameters
name type default description

scoreCutoff

Double

0.0

Lower limit for the salience score of an entity to be present in the graph. Value must be between 0 and 1.

Salience is an indicator of the importance or centrality of that entity to the entire document text. Scores closer to 0 are less salient, while scores closer to 1.0 are highly salient.

write

Boolean

false

persist the graph of entities

writeRelationshipType

String

ENTITY

relationship type for relationships from source node to entity nodes

writeRelationshipProperty

String

score

relationship property for relationships from source node to entity nodes

Streaming mode
CALL apoc.nlp.gcp.entities.stream(source:Node or List<Node>, {
  key: String,
  nodeProperty: String
})
YIELD value
Graph mode
CALL apoc.nlp.gcp.entities.graph(source:Node or List<Node>, {
  key: String,
  nodeProperty: String,
  scoreCutoff: Double,
  writeRelationshipType: String,
  writeRelationshipProperty: String,
  write: Boolean
})
YIELD graph

Classification

The entity extraction procedures (apoc.nlp.gcp.classify.*) are wrappers around the documents.classifyText method of the Google Natural Language API. This API method classifies a document into categories.

The procedures are described below:

signature

apoc.nlp.gcp.classify.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)

apoc.nlp.gcp.classify.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

The procedures support the following config parameters:

Table 3. Config parameters
name type default description

key

String

null

API Key for Google Natural Language API

nodeProperty

String

text

The property on the provided node that contains the unstructured text to be analyzed

In addition, apoc.nlp.gcp.classify.graph supports the following config parameters:

Table 4. Config parameters
name type default description

scoreCutoff

Double

0.0

Lower limit for the confidence score of a category to be present in the graph. Value must be between 0 and 1.

Confidence is a number representing how certain the classifier is that this category represents the given text.

write

Boolean

false

persist the graph of entities

writeRelationshipType

String

CATEGORY

relationship type for relationships from source node to category nodes

writeRelationshipProperty

String

score

relationship property for relationships from source node to category nodes

Streaming mode
CALL apoc.nlp.gcp.classify.stream(source:Node or List<Node>, {
  key: String,
  nodeProperty: String
})
YIELD value
Graph mode
CALL apoc.nlp.gcp.classify.graph(source:Node or List<Node>, {
  key: String,
  nodeProperty: String,
  scoreCutoff: Double,
  writeRelationshipType: String,
  writeRelationshipProperty: String,
  write: Boolean
})
YIELD graph

Install Dependencies

The NLP procedures have dependencies on Kotlin and client libraries that are not included in the APOC Extended library.

These dependencies are included in apoc-nlp-dependencies-5.21.0-all.jar, which can be downloaded from the releases page. Once that file is downloaded, it should be placed in the plugins directory and the Neo4j Server restarted.

Setting up API Key

We can generate an API Key that has access to the Cloud Natural Language API by going to console.cloud.google.com/apis/credentials. Once we’ve created a key, we can populate and execute the following command to create a parameter that contains these details.

The following defines the apiKey parameter
:param apiKey => ("<api-key-here>")

Alternatively we can add these credentials to apoc.conf and load them using the static value storage functions.

apoc.conf
apoc.static.gcp.apiKey=<api-key-here>
The following retrieves GCP credentials from apoc.conf
RETURN apoc.static.getAll("gcp") AS gcp;
Table 5. Results
gcp

{apiKey: "<api-key-here>"}

Batching Requests

Batching requests to the GCP API and the processing of results can be done using Periodic Iterate. This approach is useful if we want to make parallel requests to the GCP API and reduce the amount of transaction state kept in memory while running procedures that write to the database.

The following creates an entity graph in batches of 25 nodes
CALL apoc.periodic.iterate("
  MATCH (n)
  WITH collect(n) as total
  CALL apoc.coll.partition(total, 25)
  YIELD value as nodes
  RETURN nodes", "
  CALL apoc.nlp.gcp.entities.graph(nodes, {
    key: $apiKey,
    nodeProperty: 'body',
    writeRelationshipType: 'GCP_ENTITY',
    write:true
  })
  YIELD graph
  RETURN distinct 'done'", {
    batchSize: 1,
    params: { apiKey: $apiKey }
  }
);

Examples

The examples in this section are based on the following sample graph:

CREATE (:Article {
  uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/",
  body: "These days I’m rarely more than a few feet away from my Nintendo Switch and I play board games, card games and role playing games with friends at least once or twice a week. I’ve even organised lunch-time Mario Kart 8 tournaments between the Neo4j European offices!"
});

CREATE (:Article {
  uri: "https://en.wikipedia.org/wiki/Nintendo_Switch",
  body: "The Nintendo Switch is a video game console developed by Nintendo, released worldwide in most regions on March 3, 2017. It is a hybrid console that can be used as a home console and portable device. The Nintendo Switch was unveiled on October 20, 2016. Nintendo offers a Joy-Con Wheel, a small steering wheel-like unit that a Joy-Con can slot into, allowing it to be used for racing games such as Mario Kart 8."
});

Entity Extraction

Let’s start by extracting the entities from the Article node. The text that we want to analyze is stored in the body property of the node, so we’ll need to specify that via the nodeProperty configuration parameter.

The following streams the entities for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.entities.stream(a, {
  key: $apiKey,
  nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;
Table 6. Results
entity

{name: "card games", salience: 0.17967656, metadata: {}, type: "CONSUMER_GOOD", mentions: [{type: "COMMON", text: {content: "card games", beginOffset: -1}}]}

{name: "role playing games", salience: 0.16441391, metadata: {}, type: "OTHER", mentions: [{type: "COMMON", text: {content: "role playing games", beginOffset: -1}}]}

{name: "Switch", salience: 0.143287, metadata: {}, type: "OTHER", mentions: [{type: "COMMON", text: {content: "Switch", beginOffset: -1}}]}

{name: "friends", salience: 0.13336793, metadata: {}, type: "PERSON", mentions: [{type: "COMMON", text: {content: "friends", beginOffset: -1}}]}

{name: "Nintendo", salience: 0.12601112, metadata: {mid: "/g/1ymzszlpz"}, type: "ORGANIZATION", mentions: [{type: "PROPER", text: {content: "Nintendo", beginOffset: -1}}]}

{name: "board games", salience: 0.08861496, metadata: {}, type: "CONSUMER_GOOD", mentions: [{type: "COMMON", text: {content: "board games", beginOffset: -1}}]}

{name: "tournaments", salience: 0.0603245, metadata: {}, type: "EVENT", mentions: [{type: "COMMON", text: {content: "tournaments", beginOffset: -1}}]}

{name: "offices", salience: 0.034420907, metadata: {}, type: "LOCATION", mentions: [{type: "COMMON", text: {content: "offices", beginOffset: -1}}]}

{name: "Mario Kart 8", salience: 0.029095741, metadata: {wikipedia_url: "https://en.wikipedia.org/wiki/Mario_Kart_8", mid: "/m/0119mf7q"}, type: "PERSON", mentions: [{type: "PROPER", text: {content: "Mario Kart 8", beginOffset: -1}}]}

{name: "European", salience: 0.020393685, metadata: {mid: "/m/02j9z", wikipedia_url: "https://en.wikipedia.org/wiki/Europe"}, type: "LOCATION", mentions: [{type: "PROPER", text: {content: "European", beginOffset: -1}}]}

{name: "Neo4j", salience: 0.020393685, metadata: {mid: "/m/0b76t3s", wikipedia_url: "https://en.wikipedia.org/wiki/Neo4j"}, type: "ORGANIZATION", mentions: [{type: "PROPER", text: {content: "Neo4j", beginOffset: -1}}]}

{name: "8", salience: 0, metadata: {value: "8"}, type: "NUMBER", mentions: [{type: "TYPE_UNKNOWN", text: {content: "8", beginOffset: -1}}]}

We get back 12 different entities. We could then apply a Cypher statement that creates one node per entity and an ENTITY relationship from each of those nodes back to the Article node.

The following streams the entities for the Pokemon article and then creates nodes for each entity
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.entities.stream(a, {
  key: $apiKey,
  nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
MERGE (e:Entity {name: entity.name})
SET e.type = entity.type
MERGE (a)-[:ENTITY]->(e)

Alternatively we can use the graph mode to automatically create the entity graph. As well as having the Entity label, each entity node will have another label based on the value of the type property. By default a virtual graph is returned.

The following returns a virtual graph of entities for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.entities.graph(a, {
  key: $apiKey,
  nodeProperty: "body",
  writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g;

We can see a Neo4j Browser visualization of the virtual graph in Pokemon entities graph.

apoc.nlp.gcp.entities.graph
Figure 1. Pokemon entities graph

We can compute the entities for multiple nodes by passing a list of nodes to the procedure.

The following returns a virtual graph of entities for the Pokemon and Nintendo Switch articles
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.gcp.entities.graph(articles, {
  key: $apiKey,
  nodeProperty: "body",
  writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g;

We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph.

apoc.nlp.gcp.entities multiple.graph
Figure 2. Pokemon and Nintendo Switch entities graph

On this visualization we can also see the score for each entity node. This score represents importance of that entity in the entire document. We can specify a minimum cut off value for the score using the scoreCutoff property.

The following returns a virtual graph of entities for the Pokemon and Nintendo Switch articles
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.gcp.entities.graph(articles, {
  key: $apiKey,
  nodeProperty: "body",
  writeRelationshipType: "ENTITY",
  scoreCutoff: 0.01
})
YIELD graph AS g
RETURN g;

We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph with importance >= 0.01.

apoc.nlp.gcp.entities multiple.graph cutoff
Figure 3. Pokemon and Nintendo Switch entities graph with importance >= 0.01

If we’re happy with this graph and would like to persist it in Neo4j, we can do this by specifying the write: true configuration.

The following creates a HAS_ENTITY relationship from the article to each entity
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.gcp.entities.graph(articles, {
  key: $apiKey,
  nodeProperty: "body",
  scoreCutoff: 0.01,
  writeRelationshipType: "HAS_ENTITY",
  writeRelationshipProperty: "gcpEntityScore",
  write: true
})
YIELD graph AS g
RETURN g;

We can then write a query to return the entities that have been created.

The following returns articles and their entities
MATCH (article:Article)
RETURN article.uri AS article,
       [(article)-[r:HAS_ENTITY]->(e) | {entity: e.text, score: r.gcpEntityScore}] AS entities;
Table 7. Results
article entities

"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"

[{score: 0.020393685, entity: "Neo4j"}, {score: 0.034420907, entity: "offices"}, {score: 0.0603245, entity: "tournaments"}, {score: 0.020393685, entity: "European"}, {score: 0.029095741, entity: "Mario Kart 8"}, {score: 0.12601112, entity: "Nintendo"}, {score: 0.13336793, entity: "friends"}, {score: 0.08861496, entity: "board games"}, {score: 0.143287, entity: "Switch"}, {score: 0.16441391, entity: "role playing games"}, {score: 0.17967656, entity: "card games"}]

"https://en.wikipedia.org/wiki/Nintendo_Switch"

[{score: 0.76108575, entity: "Nintendo Switch"}, {score: 0.07424594, entity: "Nintendo"}, {score: 0.015900765, entity: "home console"}, {score: 0.012772448, entity: "device"}, {score: 0.038113687, entity: "regions"}, {score: 0.07299799, entity: "Joy-Con Wheel"}]

Classification

Now let’s extract categories from the Article node. The text that we want to analyze is stored in the body property of the node, so we’ll need to specify that via the nodeProperty configuration parameter.

The following streams the categories for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.classify.stream(a, {
  key: $apiKey,
  nodeProperty: "body"
})
YIELD value
UNWIND value.categories AS category
RETURN category;
Table 8. Results
category

{name: "/Games", confidence: 0.91}

We get back only one category We could then apply a Cypher statement that creates one node per category and a CATEGORY relationship from each of those nodes back to the Article node.

The following streams the categories for the Pokemon article and then creates nodes for each category
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.classify.stream(a, {
  key: $apiKey,
  nodeProperty: "body"
})
YIELD value
UNWIND value.categories AS category
MERGE (c:Category {name: category.name})
MERGE (a)-[:CATEGORY]->(c)

Alternatively we can use the graph mode to automatically create the category graph. As well as having the Category label, each category node will have another label based on the value of the type property. By default, a virtual graph is returned.

The following returns a virtual graph of categories for the Pokemon article
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.classify.graph(a, {
  key: $apiKey,
  nodeProperty: "body",
  writeRelationshipType: "CATEGORY"
})
YIELD graph AS g
RETURN g;

We can see a Neo4j Browser visualization of the virtual graph in Pokemon categories graph.

apoc.nlp.gcp.classify.graph
Figure 4. Pokemon categories graph
The following creates a HAS_CATEGORY relationship from the article to each entity
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.gcp.classify.graph(articles, {
  key: $apiKey,
  nodeProperty: "body",
  writeRelationshipType: "HAS_CATEGORY",
  writeRelationshipProperty: "gcpCategoryScore",
  write: true
})
YIELD graph AS g
RETURN g;

We can then write a query to return the entities that have been created.

The following returns articles and their entities
MATCH (article:Article)
RETURN article.uri AS article,
       [(article)-[r:HAS_CATEGORY]->(c) | {category: c.text, score: r.gcpCategoryScore}] AS categories;
Table 9. Results
article categories

"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"

[{category: "/Games", score: 0.91}]

"https://en.wikipedia.org/wiki/Nintendo_Switch"

[{category: "/Computers & Electronics/Consumer Electronics/Game Systems & Consoles", score: 0.99}, {category: "/Games/Computer & Video Games", score: 0.99}]