Module bytehub.cloud
Classes
class CloudFeatureStore (connection_string='https://api.bytehub.ai', enable_transforms=True)
-
Cloud Feature Store
Connects to a hosted feature store via REST API.
When using specifying features for
create_feature
,update_feature
, etc., use either:namespace
andname
as arguments; or- specify
name
in the format"my-namespace/my-feature"
.
Args
connection_string
:str
- URL of ByteHub Cloud, e.g. https://api.bytehub.ai.
enable_transforms
:bool
, optional- whether to allow execution of pickled functions stored in the feature store. Required for feature transforms, but should only be enabled if you trust the feature store and the transforms that have been saved to it.
Ancestors
- bytehub._base.BaseFeatureStore
- abc.ABC
Methods
def clean_namespace(self, name)
-
Removes any data that is not associated with features in the namespace. Run this to free up disk space after deleting features
Args
name
:str
- namespace to clean
def clone_feature(self, name, namespace=None, **kwargs)
-
Create a new feature by cloning an existing one.
Args
name
:str
- name of the feature.
namespace
:str
, optional- namespace which should hold this feature.
from_name
:str
- the name of the existing feature to copy from.
from_namespace
:str
, optional- namespace of the existing feature.
def create_feature(self, name, namespace=None, **kwargs)
-
Create a new feature in the feature store.
Args
name
:str
- name of the feature
namespace
:str
, optional- namespace which should hold this feature.
description
:str
, optional- description for this namespace.
partition
:str
, optional- partitioning of stored timeseries (default:
"date"
). serialized
:bool
, optional- if
True
, converts values to JSON strings before saving, which can help in situations where the format/schema of the data changes over time. transform
:str
, optional- pickled function code for feature transforms.
meta
:dict
, optional- key/value pairs of metadata.
def create_namespace(self, name, **kwargs)
-
Create a new namespace in the feature store.
Args
name
:str
- name of the namespace.
description
:str
, optional- description for this namespace.
url
:str
- url of data store.
storage_options
:dict
, optional- storage options, e.g. access credentials.
backend
:str
, optional- storage backend, see bytehub._storage.available_backends, defaults to
"pandas"
. meta
:dict
, optional- key/value pairs of metadata.
def delete_feature(self, name, namespace=None, delete_data=False)
-
Delete a feature from the feature store.
Args
name
:str
- name of feature to delete.
namespace
:str
, optional- namespace, if not included in feature name.
delete_data
:bool
, optional- if set to
True
will delete underlying stored data for this feature, otherwise default behaviour is to delete the feature store metadata but leave the stored timeseries values intact.
def delete_namespace(self, name)
-
Delete a namespace from the feature store.
Args
name
- namespace to be deleted.
def last(self, features)
-
Fetch the last value of one or more features.
Args
features
:Union[str, list, pd.DataFrame]
- feature or features to fetch.
Returns
dict
- dictionary of name, last value pairs.
def list_features(self, **kwargs)
-
List features in the feature store.
Search by namespace, name and/or regex query
Args
name
:str
, optional- name of feature to filter by.
namespace
:str
, optional- namespace to filter by.
regex
:str
, optional- regex filter on name.
friendly
:bool
, optional- simplify output for user.
Returns
pd.DataFrame
- DataFrame of features and metadata.
def list_namespaces(self, **kwargs)
-
List namespaces in the feature store.
Search by name or regex query.
Args
name
:str
, optional- name of namespace to filter by.
namespace
:str
, optional- same as name.
regex
:str
, optional- regex filter on name.
Returns
pd.DataFrame
- DataFrame of namespaces and metadata.
def load_dataframe(self, features, from_date=None, to_date=None, freq=None, time_travel=None)
-
Load a DataFrame of feature values from the feature store.
Args
features
:Union[str, list, pd.DataFrame]
- name of feature to load, or list/DataFrame of feature namespaces/name.
from_date
:datetime
, optional- start date to load timeseries from, defaults to everything.
to_date
:datetime
, optional- end date to load timeseries to, defaults to everything.
freq
:str
, optional- frequency interval at which feature values should be sampled.
time_travel
:str
, optional- timedelta string, indicating that time-travel should be applied to the returned timeseries values, useful in forecasting applications.
Returns
Union[pd.DataFrame, dask.DataFrame]
- depending on which backend was specified in the feature store.
def save_dataframe(self, df, name=None, namespace=None)
-
Save a DataFrame of feature values to the feature store.
Args
df
:pd.DataFrame
- DataFrame of feature values.
Must have a
time
column or DateTimeIndex of time values. Optionally include acreated_time
column (defaults toutcnow()
if omitted). For a single feature: avalue
column, or column header of featurenamespace/name
. For multiple features name the columns usingnamespace/name
. name
:str
, optional- name of feature, if not included in DataFrame column name.
namespace
:str
, optional- namespace, if not included in DataFrame column name.
def transform(self, name, namespace=None, from_features=[])
-
Decorator for creating/updating virtual (transformed) features. Use this on a function that accepts a dataframe input and returns an output dataframe of tranformed values.
Args
name
:str
- feature to update.
namespace
:str
, optional- namespace, if not included in feature name.
from_features
:list
- list of features which should be transformed by this one
def update_feature(self, name, namespace=None, **kwargs)
-
Update a feature in the feature store.
Args
name
:str
- feature to update.
namespace
:str
, optional- namespace, if not included in feature name.
description
:str
, optional- updated description.
transform
:str
, optional- pickled function code for feature transforms.
meta
:dict
, optional- updated key/value pairs of metadata.
To remove metadata, update using
{"key_to_remove": None}
.
def update_namespace(self, name, **kwargs)
-
Update a namespace in the feature store.
Args
name
:str
- namespace to update.
description
:str
, optional- updated description.
storage_options
:dict
, optional- updated storage options.
meta
:dict
, optional- updated key/value pairs of metadata.
To remove metadata, update using
{"key_to_remove": None}
.