Research in machine learning leads to the creation of new algorithms and techniques. Development teams must constantly learn and pump up their skills, generating new approaches in machine learning. Even a method like feature engineering, which has been around for decades, is constantly being updated.
The feature engineering method is as old as data science. But for some reason, it is becoming increasingly neglected. It involves transforming domain-specific data into model-understandable vectors.
To effectively solve a problem with feature engineering, one must be an expert in a particular domain and understand what affects a specific target variable. That is why many developers call feature engineering art, which requires years of experience in ML development.
Today, many IT companies are participating in developing new tools that make it easier for users to create and select indicators for feature selection.
Generalized data is also becoming a commodity, and cloud-based machine learning services (MLaaS) such as Amazon ML and Google AutoML now allow even less experienced team members to run data models and get their predictions within minutes. But as a result, companies developing organizational competence in collecting or building their data created with feature engineering are gaining momentum. Simply collecting data and building models are no longer enough.
Use cases of feature engineering can be classified into two categories:
- Modeling for prediction accuracy is the default if the goal is to have a productive system;
- When the model should be easy to interpret, you can acquire a better knowledge of the problem.
Now, with all this in mind, let’s see how some major companies use feature engineering.
Amazon SageMaker Feature Store is a fully managed, purpose-built repository for storing, sharing, and managing machine learning (ML) model features. Features are the input to the ML models used during training and output. It is beneficial in applications that recommend music – features might include song ratings, listening time, and listener demographics.
For developers, this allows them to store features used repeatedly by multiple teams, and the quality of the features is critical to ensure high model accuracy. The SageMaker feature repository provides a secure and unified repository for feature use throughout the ML lifecycle.
The Robusta, a feature automation platform, focuses on associative and commutative aggregation functions. These functions are widely used in ML applications, sometimes accounting for more than 80% of the signals in the model. The associative and commutative properties are very convenient for distributed systems. Many companies in the industry have similar systems, such as Airbnb Zipline and Uber Palette.
Google is probably the most prominent IT company in the world. Unsurprisingly they use feature engineering. Google Translate team, for example, has more training data than they can use, and instead of tweaking their model, the team achieved big wins by using the best features in their data.
The Google Brain project on diabetic retinopathy also used a neural network architecture known as Inception to detect disease by classifying images. The team did not tweak the models. Instead, they created a dataset of 120,000 examples labeled by ophthalmologists.
Turning data with the assistance of feature engineering into proprietary features is the way to the creation of meaningful models faster and with less cost. This can be precious support and provide a competitive advantage to an organization. Check other trendy AI methods in this review of top ML trends for 2024.