Symbolic Regression (evopt.derive)

The derive module provides symbolic regression capabilities for equation discovery from the results data. Built on top of the Miles Cranmer’s PySR engine for symbolic regression.

Classes

class evopt.derive.Derive(evolve_dir_path: str, target_variable: str, parameters: list[str], save_dir: str | None = None, binary_operators: str | None = None, unary_operators: str | None = None, n_iterations: int = 100, population_size: int = 32, max_size: int = 20)[source]

Bases: object

Symbolic regression model for equation discovery from data.

This class provides methods to discover mathematical equations that describe the relationship between input parameters and a target variable using symbolic regression with the PySR library.

evolve_dir_path

Path to the directory containing results data.

Type:

str

target_variable

The variable to be predicted.

Type:

str

parameters

Input parameters to use for prediction.

Type:

list[str]

save_dir

Directory to save equations and output.

Type:

str

binary_operators

Binary operators for symbolic regression.

Type:

list

unary_operators

Unary operators for symbolic regression.

Type:

list

n_iterations

Number of iterations for regression.

Type:

int

population_size

Population size for genetic algorithm.

Type:

int

max_size

Maximum size of generated equations.

Type:

int

results_csv_path

Path to the results CSV file.

Type:

str

y_pred

Predicted values from the model.

Type:

DataFrame

best_equation

Best equation found by symbolic regression.

Type:

sympy.Expr

sympymappings

Custom operator mappings for SymPy.

Type:

dict

Model Configuration

__init__(evolve_dir_path: str, target_variable: str, parameters: list[str], save_dir: str | None = None, binary_operators: str | None = None, unary_operators: str | None = None, n_iterations: int = 100, population_size: int = 32, max_size: int = 20)[source]

Initialize the Derive class for symbolic regression.

Parameters:
  • evolve_dir_path (str) – Path to directory containing the results.csv file.

  • target_variable (str) – Target variable to predict.

  • parameters (list[str]) – List of parameter names to use as predictors.

  • save_dir (str, optional) – Directory to save equations. If None, uses ‘equations’ subdirectory in evolve_dir_path. Defaults to None.

  • binary_operators (list, optional) – Binary operators for symbolic regression. Defaults to [“+”, “-”, “*”, “/”, “^”].

  • unary_operators (list, optional) – Unary operators for symbolic regression. Can include custom operators in format “inv(x)=1/x”. Defaults to [“sin”, “exp”, “log”].

  • n_iterations (int, optional) – Number of iterations. Defaults to 100.

  • population_size (int, optional) – Population size. Defaults to 32.

  • max_size (int, optional) – Maximum size of equations. Defaults to 20.

Raises:

FileNotFoundError – If results.csv file doesn’t exist in evolve_dir_path.

Example

>>> derive_model = Derive(
...     evolve_dir_path="path/to/data",
...     target_variable="density",
...     parameters=["temperature", "pressure"]
... )

Regression & Prediction

fit()[source]

Fit symbolic regression model to discover equations.

This method configures and runs the PySR symbolic regression algorithm to discover mathematical relationships between parameters and target variable. It sets constraints on operations and stores the best equation found.

Returns:

Updates self.model and self.best_equation attributes.

Return type:

None

Example

>>> model = Derive(evolve_dir_path="data", target_variable="y", parameters=["x1", "x2"])
>>> model.fit()
>>> print(model.best_equation)
x1 + 2.5*x2
predict(x=None, index: int | None = None)[source]

Generate predictions using the discovered equation.

Parameters:
  • x (DataFrame, optional) – Input data to use for prediction. If None, uses the training data. Defaults to None.

  • index (int, optional) – Index of the equation to use for prediction. If None, uses the best equation. Defaults to None.

Returns:

Predicted values from the model.

Return type:

DataFrame

Example

>>> model.fit()
>>> model.predict()
>>> print(model.y_pred.head())

Visualization

parity_plot(index: int | None = None, point_colour: str = 'black', alpha: float = 0.5, title: str | None = None, save_figures: bool = True, show: bool = True, save_ext: str = '.png', save_dir: str | None = None)[source]

Plot the parity plot of the target variable and the predicted variable.

This function creates a parity plot comparing actual values with predictions from the symbolic regression model, showing how well the discovered equation fits the data.

Parameters:
  • index (int, optional) – Index of the equation to use for prediction. If None, uses the best equation. Defaults to None.

  • point_colour (str, optional) – Color of scatter points. Defaults to “black”.

  • alpha (float, optional) – Transparency of points. Defaults to 0.5.

  • title (str, optional) – Plot title. If None, uses default title. Defaults to None.

  • save_figures (bool, optional) – Whether to save figure to disk. Defaults to True.

  • show (bool, optional) – Whether to display the figure. Defaults to True.

  • save_ext (str, optional) – File extension for saved figure. Defaults to “.png”.

  • save_dir (str, optional) – Directory to save figures. If None, uses ‘figures’ subdirectory in evolve_dir_path. Defaults to None.

Returns:

The axis object containing the plot.

Return type:

matplotlib.axes.Axes

Raises:

ValueError – If save_ext is not one of ‘png’, ‘jpg’, ‘jpeg’, ‘pdf’, or ‘svg’.

Example

>>> model.fit()
>>> model.predict()
>>> model.parity_plot(
...     point_colour="blue",
...     alpha=0.7,
...     title="Model Performance",
...     save_figures=True
... )