cudf.crosstab#

cudf.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=None, normalize=False)#

Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.

Parameters

indexarray-like, Series, or list of arrays/Series: Values to group by in the rows.
columnsarray-like, Series, or list of arrays/Series: Values to group by in the columns.
valuesarray-like, optional: Array of values to aggregate according to the factors. Requires aggfunc be specified.
rownameslist of str, default None: If passed, must match number of row arrays passed.
colnameslist of str, default None: If passed, must match number of column arrays passed.
aggfuncfunction, optional: If specified, requires values be specified as well.
marginsNot supported
margins_nameNot supported
dropnaNot supported
normalizeNot supported

Returns

DataFrame: Cross tabulation of the data.

Examples

>>> a = cudf.Series(["foo", "foo", "foo", "foo", "bar", "bar",
...               "bar", "bar", "foo", "foo", "foo"], dtype=object)
>>> b = cudf.Series(["one", "one", "one", "two", "one", "one",
...               "one", "two", "two", "two", "one"], dtype=object)
>>> c = cudf.Series(["dull", "dull", "shiny", "dull", "dull", "shiny",
...               "shiny", "dull", "shiny", "shiny", "shiny"],
...              dtype=object)
>>> cudf.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
b   one        two
c   dull shiny dull shiny
a
bar    1     2    1     0
foo    2     2    1     2