[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Load CSV

Many existing applications and data integrations use CSV as the minimal denominator format. CSV files contain text with delimiters (most often comma, but also tab (TSV) and colon (DSV)) separating columns and newlines for rows. Fields are possibly quoted to handle stray quotes, newlines, and the use of the delimiter within a field.

In Cypher it is supported by LOAD CSV and with the neo4j-import (neo4j-admin import) tool for bulk imports. The existing LOAD CSV works ok for most uses, but has a few features missing, that apoc.load.csv and apoc.load.xls add.

  • provide a line number

  • provide both a map and a list representation of each line

  • automatic data conversion (including split into arrays)

  • option to keep the original string formatted values

  • ignoring fields (makes it easier to assign a full line as properties)

  • headerless files

  • replacing certain values with null

The APOC procedures also support reading compressed files.

The data conversion is useful for setting properties directly, but for computation within Cypher it’s problematic as Cypher doesn’t know the type of map values so they default to Any.

To use them correctly, you’ll have to indicate their type to Cypher by using the built-in (e.g. toInteger) conversion functions on the value.

For reading from files you’ll have to enable the config option:

apoc.import.file.enabled=true

By default file paths are global, for paths relative to the import directory set:

apoc.import.file.use_neo4j_config=true

Examples for apoc.load.csv

test.csv
name,age,beverage
Selma,9,Soda
Rana,12,Tea;Milk
Selina,19,Cola
CALL apoc.load.csv('test.csv')
YIELD lineNo, map, list
RETURN *;
Table 1. Results
lineNo list map

0

["Selma", "9", "Soda"]

{name: "Selma", age: "9", beverage: "Soda"}

1

["Rana", "12", "Tea;Milk"]

{name: "Rana", age: "12", beverage: "Tea;Milk"}

2

["Selina", "19", "Cola"]

{name: "Selina", age: "19", beverage: "Cola"}

Configuration Options

Besides the file you can pass in a config map:

name default description

skip

none

skip result rows

limit

none

limit result rows

header

true

indicates if file has a header

sep

','

separator string or 'TAB'

quoteChar

'"'

the char to use for quoted elements

arraySep

';'

array separator

ignore

[]

which columns to ignore

nullValues

[]

which values to treat as null, e.g. ['na',false]

mapping

{}

per field mapping, entry key is field name, .e.g {years:{…​.} see below

failOnError

boolean

true

Table 2. mapping config for each field in the mapping entry
name default description

type

none

'int', 'string' etc.

array

false

indicates if field is an array

arraySep

';'

separator for array

name

none

rename field

ignore

false

ignore/remove this field

nullValues

[]

which values to treat as null, e.g. ['na',false]

CALL apoc.load.csv('test.csv', {skip:1, limit:1, header:true, ignore:['name'],
   mapping:{
     age: {type:'int'},
     beverage: {array:true, arraySep:';', name:'drinks'}
   }
})
YIELD lineNo, map, list
RETURN *;
Table 3. Results
lineNo list map

1

[12,["Tea","Milk"]]

{"age":12,"drinks":["Tea","Milk"]}

Given the following csv:

testMultiCharSep.csv
nameSEPARage
SelmaSEPAR8
RanaSEPAR11
SelinaSEPAR18

we can execute:

CALL apoc.load.csv('testMultiCharSep.csv', {sep:'SEPAR'})
YIELD lineNo, map, list
RETURN *;
Table 4. Results
lineNo list map

0

["Selma", "9"]

{name: "Selma", age: "9"}

1

["Rana", "12"]

{name: "Rana", age: "12"}

2

["Selina", "19"]

{name: "Selina", age: "19"}

Transaction Batching

To handle large files, CALL …​ IN TRANSACTIONS can be used together with LOAD CSV, you’ll have to watch out though for Eager operations which might break that behavior.

In apoc you can combine any data source with apoc.periodic.iterate to achieve the same.

CALL apoc.periodic.iterate('
CALL apoc.load.csv({url}) yield map as row return row
','
CREATE (p:Person) SET p = row
', {batchSize:10000, iterateList:true, parallel:true});
Please note that the parallel operation only works well for non-conflicting updates otherwise you might run into deadlocks.

To make these data structures available to Cypher, you can use apoc.load.xml. It takes a file or http URL and parses the XML into a map data structure.

See the following usage-examples for the procedures.

Error handling

You can use failOnError configuration to handle the result in case of incorrect url or csv. For example, with the help of the apoc.when procedure, you can return nothingToList and nothingToMap as list and map result, with incorrect url:

CALL apoc.load.csv("MY_CSV_URL", {failOnError:false})
YIELD list, map
WITH list, map
call apoc.do.when(list = [], "return 'nothingToList' as list, 'nothingToMap' as map", "return list, map", {list: list, map: map})
YIELD value
RETURN value["list"], value["map"]