Is Drupal State Really Environment Specific?

December 6, 2023

Profile picture of Kyle EineckerBy: Kyle Einecker

I deployed to Stage and Prod broke

Picture this you just wrapped up a sprint for your website, all the testing went smoothly, your configuration is in order, the devs are happy, the project manager is happy, and all lights are green to push your changes up to stage on time and start UAT and yet you are nervous. You are nervous because every time you've pushed to stage your prod site has broken and you don't know why but work has to go out so a stage deployment has to be done. So on the scheduled Wednesday afternoon you push to stage and sure enough several hours later you get reports that the DAM integration on prod is no longer working.

This is a situation I found myself in recently. Deployments to stage broke prod. It's confusing right? Surely this can't be the case and it's just a coincidence. I was skeptical as well but after ruling out all the other likely suspects, admins making changes on prod, a misbehaving cron, scouring the error logs, and finding nothing I had to accept that my stage deployments were indeed causing the DAM integration on prod to break. The kicker? The integration continued working fine on Stage after the deployment.

Digging deeper

My deployment process is the bog standard deployment process for modern Drupal sites. I start by building a code artifact in my GitHub action, sync the prod database and files down to stage, check the artifact out on the code stage environment, run drush updb,drush cr,drush cim, and there we go a fully functioning stage environment ready for testing the latest and greatest. I've been using this general approach for years and it's always worked great so why is it failing me now?

My first hint that something in the deployment process was the cause of the prod stability issues was when I did what I call a "code only" stage deployment. Knowing that stage deployments broke the prod DAM integration I was expecting to hear shortly from someone that this was the case and that the DAM needs to be authenticated. But the email never came. Curious. Perhaps just a fluke? I tried another code only deploy and nothing broke again. Then I did a full stage deployment and sure enough prod broke. The difference between the two types of deployments is that a code only deploy only updates the code while leaving the current stage database and files intact. While a full deploy refreshes the stage environment with the latest database and files from prod, then updates the code.

With that bit of troubleshooting insight, it became clear the root of the issue was something with the database sync and the DAM module so I started diving deeper into the DAM module code.

What is State

Before we get to the root cause and ultimate solution let's talk about Drupal's State API. If you go look it up on Drupal.org you will find this description along with some helpful tips on how to use and interact with state.

State information is stored in the database and has the following characteristics:

  • It is not meant to be exported.
  • It is specific to an individual environment.
  • It is not to be deployed to another environment.
  • All state information is lost when you reset the database.

Examples of state information are CSRF tokens and tracking the time something occurred, such as the last time cron was run ('system.cron_last'). All these are information that is specific to an environment and has no use in deployment.

In short, state is a database key value store where you can put things that shouldn't be tracked in Drupal's configuration, AND importantly items in state should not be assumed to exist. State values are meant to be temporary, if the value gets reset then the code that uses a state value should be able to recreate it. The description on Drupal.org is a bit confusing in my opinion. Specifically the following three lines

  • It is specific to an individual environment.
  • It is not to be deployed to another environment.
  • All state information is lost when you reset the database.

Reading these three lines you may think that these are warnings about how state behaves or how Drupal manages state. These aren't warnings though, these are guidance on how you as a module developer/site maintainer should manage state. When you dive into Drupal state you'll find that there are no controls or systems that enforce state being specific to an environment.

Off to the races

Back to my broken deployments. At this point, I knew deployments were breaking due to a database sync and that it was limited to the DAM module on the site. So I went and looked at the code in the DAM module, starting with event subscribers and the .module as these areas are the most likely to contain integration type code. Sure enough right there in .module, there was dam_module_cron() calling a service to manage the DAM API tokens, and in that service I found code like this

1 $access_token = $this->state->get('custom_access_token');
2 $refresh_token = $this->state->get('custom_refresh_token');

Knowing that state is stored in the database, my deployments were breaking during deployment, and now with code showing me auth tokens are stored in state I had found the root cause of the instability. Every deployment created a race condition because both stage and prod were connecting to the same DAM environment. Whichever environment used the refresh token first stayed connected to the DAM and the other one broke.

In systems where you can connect each web environment to a corresponding service environment, this wouldn't be an issue. Unfortunately, most modern Drupal development approaches call for at least 3 environments a dev, stage, and prod, and not all services give you a prod and two lower environments or a way to create multiple credentials for the same environment. In those situations where the number of web environments doesn't match your services, it's very important to understand how connecting the service to multiple environments could impact the integration.

An easy fix

With the issue identified the fix was incredibly straightforward. All that needed to be done was to clear the state key that stores the refresh token during deployments to lower environments. This guarantees that prod continues to function and if there are DAM issues they are limited to lower environments. But because this is state, the DAM module in the lower environments should be able to handle fetching a new refresh token

1 # stage-deploy.sh
2
3 drush sset custom_refresh_token null
4
5 # host specific sync database command
6
7 drush updb
8 drush cim

Taking the fix a step further

For services that can be authenticated automatically the easy fix is the best fix. I've worked with several services in the past where authenticating means logging into Drupal and manually clicking a button to be redirected to a login page and entering the integration user credentials. In this case, the easy fix is going to be really annoying because every deployment someone has to log in and reauthenticate the integration. For these situations, I've developed an approach using a custom drush command to write the necessary state values to the file system before a database sync and then restore it afterward.

1 /**
2 * Preserve and restore state values.
3 *
4 * @param array $options
5 * An associative array of options whose values come from cli, aliases, config, etc.
6 * @option restore
7 * If set then state values will be restored to the saved values.
8 * @usage preserve_state
9 * Preserve state by saving it to a private file.
10 * @usage preserve_state --restore
11 * Restore state by retrieving it from the private file.
12 *
13 * @command preserve_state
14 */
15 public function state($options = ['restore' => FALSE]) {
16 $fileSystem = \Drupal::service('file_system');
17 $privateDirectory = \Drupal::service('stream_wrapper_manager')->getViaUri('private://')->getUri();
18 $filePath = $privateDirectory . '/' . 'preserved-state.txt';
19 $state = \Drupal::state();
20
21 $trackedState = [
22 'custom_access_token',
23 'custom_refresh_token',
24 ];
25
26 if (!$options['restore']) {
27 $stateValues = [];
28 foreach($trackedState as $name) {
29 $stateValues[$name] = $state->get($name);
30 }
31
32 $fileSystem->saveData(serialize($stateValues), $filePath, FileSystemInterface::EXISTS_REPLACE);
33 $this->logger()->success(dt('Successfully saved state to ' . $filePath));
34 }
35 else {
36 if (file_exists($filePath)) {
37 $savedValues = file_get_contents($filePath);
38 $savedValues = unserialize($savedValues);
39
40 foreach ($savedValues as $name => $value) {
41 $state->set($name, $value);
42 }
43 $this->logger()->success(dt('Successfully restored state from ' . $filePath));
44 }
45 else {
46 $this->logger()->warning(dt('Tried to restore state from ' . $filePath . ' but the file doesn't exist.'));
47 }
48 }

This is a simple drush command that stores a set list of serialized state values in a preserved-state.txt file that is saved to the private files directory. It also allows restoring the values in preserved-state.txt to the environments state variables. This command needs to be the first thing run as part of a deployment process because we need to preserve the working credentials before refreshing the database. Then after the database is refreshed the restore command should be the first thing run so that update hooks dependent on the integration use the right credentials and environment.

1 # stage-deploy.sh
2
3 drush preserve_state
4
5 # host specific sync database command
6
7 drush preserve_state --restore
8
9 drush updb
10drush cim

Wrapping it all up

The takeaway from this story is not to overlook how services are integrated into your site. Whether a custom implementation or a popular contrib module it's important that as the owner of a site you know which environments are connected to each other.

There are also serious security considerations. Knowing where sensitive credentials are stored is important to controlling who has access to them. If you are syncing your production database down to lower environments you could be inadvertently exposing production credentials through unknown state variables.

Conquer Your Deployment Challenges

Struggling with Drupal deployments? If you're seeking personalized solutions for your deployment issues, True Summit is here to guide you.

Comments