A lightweight dataframe handling package for php.
The php-dataframe package is not a data science package, it just tries copy the main features of pandas' DataFrame.
PHP version 7.2 or newer to develop using php-dataframe.
Use composer to install PhpSpreadsheet into your project:
composer require nagyatka/php-dataframe
>>> $df = new DataFrame([[0,1],[2,3]]);
>>> print($df);
|0 |1 |
============================
0 |0 |1 |
1 |2 |3 |
Shape: 2x2
>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"]);
>>> print($df);
|a |b |
============================
0 |0 |1 |
1 |2 |3 |
Shape: 2x2
>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]);
>>> print($df);
|a |b |
============================
e |0 |1 |
f |2 |3 |
Shape: 2x2
A DataFrame stores a 2D php array and the associated column names and indices.
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print_r($df->values);
Array
(
[0] => Array
(
[a] => 0
[b] => 1
[c] => 2
)
[1] => Array
(
[a] => 3
[b] => 4
[c] => 5
)
)
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print("Number of rows: " . $df->shape[0]);
Number of rows: 2
>>> print("Number of columns: " . $df->shape[1]);
Number of columns: 3
>>> print_r($df->getColumnNames());
Array
(
[0] => a
[1] => b
[2] => c
)
>>> $df->setColumnNames(["x", "y", "z"]);
>>> print($df);
|x |y |z |
=======================================
e |0 |1 |2 |
f |3 |4 |5 |
Shape: 2x3
>>> print_r($df->getIndices());
Array
(
[0] => e
[1] => f
)
>>> $df->setIndices(["p", "q"]);
>>> print($df);
|x |y |z |
=======================================
p |0 |1 |2 |
q |3 |4 |5 |
Shape: 2x3
Selecting column "a"
, the DataFrame object returns with a Series object.
>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]);
>>> print($df["a"]);
Series(Name=a, length=2){[
e: 0,
f: 2,
]}
Selecting a column by column's index also works
>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]);
>>> print($df[0]);
Series(Name=a, length=2){[
e: 0,
f: 2,
]}
Because php does not support array object as key, the cols
helper function needs to be used to select multi columns.
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print($df[\PHPDataFrame\cols(["b", "c"])]);
|b |c |
============================
e |1 |2 |
f |4 |5 |
Shape: 2x2
You can also use column indices to get a sub DataFrame:
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print($df[\PHPDataFrame\cols([0,1])]);
|a |b |
============================
e |0 |1 |
f |3 |4 |
Shape: 2x2
You can select one row of a DataFrame using iloc
operator.
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print($df->iloc["e"]);
Series(Index=e, Length=3){[
a: 0,
b: 1,
c: 2,
]}
Of course, numeric indices can be used as well.
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print($df->iloc[0]);
Series(Index=0, Length=3){[
a: 0,
b: 1,
c: 2,
]}
Because php does not support array object as key, the inds
helper function needs to be used to select multi columns.
>>> $df = new DataFrame([[0,1,2], [3,4,5], [6,7,8]], ["a", "b", "c"], ["e", "f", "g"]);
>>> print($df->iloc[\PHPDataFrame\inds(["g","f"])]);
|a |b |c |
=======================================
g |6 |7 |8 |
f |3 |4 |5 |
Shape: 2x3
>>> print($df->iloc[\PHPDataFrame\inds([0,1])]);
|a |b |c |
=======================================
e |0 |1 |2 |
f |3 |4 |5 |
Shape: 2x3
Since the index values don't have to be unique, an iloc
operation with a label index can results in multiple rows.
>>> $df = new DataFrame([[0,1,2], [3,4,5], [6,7,8]], ["a", "b", "c"], ["e", "f", "e"]);
>>> print($df->iloc["e"]);
|a |b |c |
=======================================
e |0 |1 |2 |
e |6 |7 |8 |
Shape: 2x3