6 SparseLengthsSumFused4BitRowwiseFakeFP16NNPI,
13 .ValueKeyLengthInputFillers(
18Performs the same operation as SparseLengthsSum, but operating on
194-bit rowwise quantized matrices with fused storage (where each row
20stores quantized values, and then 2-byte scale and 2-byte bias).
25 "uint8 tensor obtained with "
26 "operator FloatToFused4BitRowwiseQuantized")
30 "Integer vector containing indices of the first "
31 "dimension of DATA for the slices that are being aggregated")
35 "Vector with the same sum of elements as the first dimension of DATA")
36 .Output(0,
"output",
"output")
38NO_GRADIENT(SparseLengthsSumFused4BitRowwiseFakeFP16NNPI);
41 SparseLengthsSumFused4BitRowwiseFakeFP16EmbeddingOnly,
49 .ValueKeyLengthInputFillers(
56Performs the same operation as SparseLengthsSum, but operating on
574-bit rowwise quantized matrices with fused storage (where each row
58stores quantized values, and then 2-byte scale and 2-byte bias).
59Convert only embedding entries using fake fp16.
64 "uint8 tensor obtained with "
65 "operator FloatToFused4BitRowwiseQuantized")
69 "Integer vector containing indices of the first "
70 "dimension of DATA for the slices that are being aggregated")
74 "Vector with the same sum of elements as the first dimension of DATA")
75 .Output(0,
"output",
"output")
77NO_GRADIENT(SparseLengthsSumFused4BitRowwiseFakeFP16EmbeddingOnly);
80 SparseLengthsWeightedSumFused4BitRowwiseFakeFP16NNPI,
87 .WeightedValueKeyLengthInputFillers(
93Performs the same operation as SparseLengthsWeightedSum,
94but operating on 4-bit rowwise quantized matrices with fused storage
95(where each row stores quantized values, and then 2-byte scale and 2-byte bias).
100 "uint8 tensor obtained with "
101 "operator FloatToFused4BitRowwiseQuantized")
105 "Integer vector containing indices of the first "
106 "dimension of DATA for the slices that are being aggregated")
110 "Vector with the same sum of elements as the first dimension of DATA")
114 "Vector of weights to scale rows of DATA with before reduction")
115 .Output(0,
"output",
"output");
117NO_GRADIENT(SparseLengthsWeightedSumFused4BitRowwiseFakeFP16NNPI);
120 SparseLengthsWeightedSumFused4BitRowwiseFakeFP16EmbeddingOnly,
125OPERATOR_SCHEMA(SparseLengthsWeightedSumFused4BitRowwiseFakeFP16EmbeddingOnly)
128 .WeightedValueKeyLengthInputFillers(
137Performs the same operation as SparseLengthsWeightedSum,
138but operating on 4-bit rowwise quantized matrices with fused storage
139(where each row stores quantized values, and then 2-byte scale and 2-byte bias).
140Convert only embedding entries using fake fp16.
145 "uint8 tensor obtained with "
146 "operator FloatToFused4BitRowwiseQuantized")
150 "Integer vector containing indices of the first "
151 "dimension of DATA for the slices that are being aggregated")
155 "Vector with the same sum of elements as the first dimension of DATA")
159 "Vector of weights to scale rows of DATA with before reduction")
160 .Output(0,
"output",
"output");
162NO_GRADIENT(SparseLengthsWeightedSumFused4BitRowwiseFakeFP16EmbeddingOnly);
The CPU Context, representing the bare minimum of what a Context class in Caffe2 should implement.
Copyright (c) 2016-present, Facebook, Inc.
REGISTER_CPU_OPERATOR(ATen, ATenOp< CPUContext >)
NO_GRADIENT(SparseLengthsSumFused4BitRowwiseFakeFP16NNPI)
true SparseLengthsFused8BitRowwiseOp< CPUContext, true >::WEIGHTS uint8 tensor obtained with INDICES
false SparseLengthsFused4BitRowwiseFakeFP16Op< CPUContext, false >::LENGTHS uint8 tensor obtained with LENGTHS