Signature | Description | Parameters |
---|---|---|
template<typename F, typename ... Ts> DataFrame bucketize (F &&func, const IndexType &bucket_interval) const; |
It bucketizes the data and index into bucket_interval's, based on index values and calls the
functor for each bucket. The result of each bucket will be stored in a new DataFrame with
same shape and returned. Every data bucket is guaranteed to be as wide as bucket_interval.
This mean some data items at the end may not be included in the new bucketized DataFrame.
The index of each bucket will be the last index in the original DataFrame that is less than
bucket_interval away from the previous bucket NOTE: The DataFrame must already be sorted by index. |
F: type functor to be applied to columns to bucketize Ts: The list of types for all columns. A type should be specified only once. bucket_interval: Bucket interval is in the index's single value unit. For example, if index is in minutes, bucket_interval will be in the unit of minutes and so on. already_sorted: If the DataFrame is already sorted by index, this will save the expensive sort operation |
template<typename F, typename ... Ts> std::future<DataFrame> bucketize_async (F &&func, const IndexType &bucket_interval) const; |
Same as bucketize() above, but executed asynchronously | |
template<typename F, typename ... Ts> void self_bucketize(F &&func, const IndexType &bucket_interval); |
This is exactly the same as bucketize() above. The only difference is it stores the result in itself and returns void. So, after the return the original data is lost and replaced with bucketized data |
std::vector<unsigned long> ulgvec2 = { 123450, 123451, 123452, 123450, 123455, 123450, 123449, 123448, 123451, 123452, 123452, 123450, 123455, 123450, 123454, 123453, 123456, 123457, 123458, 123459, 123460, 123441, 123442, 123432, 123433, 123434, 123435, 123436 }; std::vector<unsigned long> xulgvec2 = ulgvec2; std::vector<int> intvec2 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 22, 23, 24, 25, 30, 33, 34, 35, 36, 40, 45, 46 }; std::vector<double> xdblvec2 = { 1.2345, 2.2345, 3.2345, 4.2345, 5.2345, 3.0, 0.9999, 10.0, 4.25, 0.009, 1.111, 8.0, 2.2222, 3.3333, 11.0, 5.25, 1.009, 2.111, 9.0, 3.2222, 4.3333, 12.0, 6.25, 2.009, 3.111, 10.0, 4.2222, 5.3333 }; std::vector<double> dblvec22 = { 0.998, 0.3456, 0.056, 0.15678, 0.00345, 0.923, 0.06743, 0.1, 0.0056, 0.07865, -0.9999, 0.0111, 0.1002, -0.8888, 0.14, 0.0456, 0.078654, -0.8999, 0.01119, 0.8002, -0.9888, 0.2, 0.1056, 0.87865, -0.6999, 0.4111, 0.1902, -0.4888 }; std::vector<std::string> strvec2 = { "4% of something", "Description 4/5", "This is bad", "3.4% of GDP", "Market drops", "Market pulls back", "$15 increase", "Running fast", "C++14 development", "Some explanation", "More strings", "Bonds vs. Equities", "Almost done", "Here comes the sun", "XXXX1", "XXXX04", "XXXX2", "XXXX3", "XXXX4", "XXXX4", "XXXX5", "XXXX6", "XXXX7", "XXXX10", "XXXX11", "XXXX01", "XXXX02", "XXXX03" }; MyDataFrame dfx; dfx.load_data(std::move(ulgvec2), std::make_pair("xint_col", intvec2), std::make_pair("dbl_col", xdblvec2), std::make_pair("dbl_col_2", dblvec22), std::make_pair("str_col", strvec2), std::make_pair("ul_col", xulgvec2)); dfx.write<std::ostream, int, unsigned long, double, std::string>(std::cout); const MyDataFrame dfxx = dfx.groupby<GroupbySum, unsigned long, int, unsigned long, std::string, double>(GroupbySum()); dfxx.write<std::ostream, int, unsigned long, double, std::string>(std::cout); const MyDataFrame dfxx2 = dfx.groupby<GroupbySum, std::string, int, unsigned long, std::string, double>(GroupbySum(), "str_col"); dfxx2.write<std::ostream, int, unsigned long, double, std::string>(std::cout); std::future<MyDataFrame> gb_fut = dfx.groupby_async<GroupbySum, double, int, unsigned long, std::string, double>(GroupbySum(), "dbl_col_2"); const MyDataFrame dfxx3 = gb_fut.get(); dfxx3.write<std::ostream, int, unsigned long, double, std::string>(std::cout); std::cout << "\nTesting Bucketize() ..." << std::endl; const MyDataFrame::IndexType interval = 4; std::future<MyDataFrame> b_fut = dfx.bucketize_async<GroupbySum, int, unsigned long, std::string, double>(GroupbySum(), interval); const MyDataFrame buck_df = b_fut.get(); buck_df.write<std::ostream, int, unsigned long, double, std::string>(std::cout, true);