Symbolic Regression (evopt.derive)
The derive module provides symbolic regression capabilities for equation discovery from the results data. Built on top of the Miles Cranmer’s PySR engine for symbolic regression.
Classes
- class evopt.derive.Derive(evolve_dir_path: str, target_variable: str, parameters: list[str], save_dir: str | None = None, binary_operators: str | None = None, unary_operators: str | None = None, n_iterations: int = 100, population_size: int = 32, max_size: int = 20)[source]
Bases:
objectSymbolic regression model for equation discovery from data.
This class provides methods to discover mathematical equations that describe the relationship between input parameters and a target variable using symbolic regression with the PySR library.
- evolve_dir_path
Path to the directory containing results data.
- Type:
str
- target_variable
The variable to be predicted.
- Type:
str
- parameters
Input parameters to use for prediction.
- Type:
list[str]
- save_dir
Directory to save equations and output.
- Type:
str
- binary_operators
Binary operators for symbolic regression.
- Type:
list
- unary_operators
Unary operators for symbolic regression.
- Type:
list
- n_iterations
Number of iterations for regression.
- Type:
int
- population_size
Population size for genetic algorithm.
- Type:
int
- max_size
Maximum size of generated equations.
- Type:
int
- results_csv_path
Path to the results CSV file.
- Type:
str
- y_pred
Predicted values from the model.
- Type:
DataFrame
- best_equation
Best equation found by symbolic regression.
- Type:
sympy.Expr
- sympymappings
Custom operator mappings for SymPy.
- Type:
dict
Model Configuration
- __init__(evolve_dir_path: str, target_variable: str, parameters: list[str], save_dir: str | None = None, binary_operators: str | None = None, unary_operators: str | None = None, n_iterations: int = 100, population_size: int = 32, max_size: int = 20)[source]
Initialize the Derive class for symbolic regression.
- Parameters:
evolve_dir_path (str) – Path to directory containing the results.csv file.
target_variable (str) – Target variable to predict.
parameters (list[str]) – List of parameter names to use as predictors.
save_dir (str, optional) – Directory to save equations. If None, uses ‘equations’ subdirectory in evolve_dir_path. Defaults to None.
binary_operators (list, optional) – Binary operators for symbolic regression. Defaults to [“+”, “-”, “*”, “/”, “^”].
unary_operators (list, optional) – Unary operators for symbolic regression. Can include custom operators in format “inv(x)=1/x”. Defaults to [“sin”, “exp”, “log”].
n_iterations (int, optional) – Number of iterations. Defaults to 100.
population_size (int, optional) – Population size. Defaults to 32.
max_size (int, optional) – Maximum size of equations. Defaults to 20.
- Raises:
FileNotFoundError – If results.csv file doesn’t exist in evolve_dir_path.
Example
>>> derive_model = Derive( ... evolve_dir_path="path/to/data", ... target_variable="density", ... parameters=["temperature", "pressure"] ... )
Regression & Prediction
- fit()[source]
Fit symbolic regression model to discover equations.
This method configures and runs the PySR symbolic regression algorithm to discover mathematical relationships between parameters and target variable. It sets constraints on operations and stores the best equation found.
- Returns:
Updates self.model and self.best_equation attributes.
- Return type:
None
Example
>>> model = Derive(evolve_dir_path="data", target_variable="y", parameters=["x1", "x2"]) >>> model.fit() >>> print(model.best_equation) x1 + 2.5*x2
- predict(x=None, index: int | None = None)[source]
Generate predictions using the discovered equation.
- Parameters:
x (DataFrame, optional) – Input data to use for prediction. If None, uses the training data. Defaults to None.
index (int, optional) – Index of the equation to use for prediction. If None, uses the best equation. Defaults to None.
- Returns:
Predicted values from the model.
- Return type:
DataFrame
Example
>>> model.fit() >>> model.predict() >>> print(model.y_pred.head())
Visualization
- parity_plot(index: int | None = None, point_colour: str = 'black', alpha: float = 0.5, title: str | None = None, save_figures: bool = True, show: bool = True, save_ext: str = '.png', save_dir: str | None = None)[source]
Plot the parity plot of the target variable and the predicted variable.
This function creates a parity plot comparing actual values with predictions from the symbolic regression model, showing how well the discovered equation fits the data.
- Parameters:
index (int, optional) – Index of the equation to use for prediction. If None, uses the best equation. Defaults to None.
point_colour (str, optional) – Color of scatter points. Defaults to “black”.
alpha (float, optional) – Transparency of points. Defaults to 0.5.
title (str, optional) – Plot title. If None, uses default title. Defaults to None.
save_figures (bool, optional) – Whether to save figure to disk. Defaults to True.
show (bool, optional) – Whether to display the figure. Defaults to True.
save_ext (str, optional) – File extension for saved figure. Defaults to “.png”.
save_dir (str, optional) – Directory to save figures. If None, uses ‘figures’ subdirectory in evolve_dir_path. Defaults to None.
- Returns:
The axis object containing the plot.
- Return type:
matplotlib.axes.Axes
- Raises:
ValueError – If save_ext is not one of ‘png’, ‘jpg’, ‘jpeg’, ‘pdf’, or ‘svg’.
Example
>>> model.fit() >>> model.predict() >>> model.parity_plot( ... point_colour="blue", ... alpha=0.7, ... title="Model Performance", ... save_figures=True ... )