Fixing common issues when reindexing data on Solr

I have recently set up a legacy project which is running on Rails 4 and Solr 5. It uses sunspot_rails gem to interact with Solr server. Because the project is pretty old with deprecated libraries, I decided to use Docker to run it on my local. Everything seems fine until I started reindexing the data. I searched on the internet for solutions but it seems everyone having different issues and no-one got the same issues like me. It took me several hours to sort it out.

Here are a few issues I met and resolved. I hope it could help someone save their day.

1. Running reindex gives exception but not specific error message is provided, no logging shown on solr

This is mostly because you are not configuring the correct endpoint of your Solr server. Here is the content of the original sunspot.yml on my project

development:
  solr:
    solr_home: solr
    hostname: localhost
    port: 8982
    path: /solr/development

This config does not work because:

+ I'm using docker, so the hostname should be the name of the service I'm configuring in docker-compose.yml. In my case, it should be solr

+ My Solr is not running on port 8982. Check your port, make sure your Solr is running on port 8983 and make sure it's the same everywhere.

Here is my final content of sunspot.yml which works.

development:
  solr:
    solr_home: solr
    hostname: solr
    port: 8983
    path: /solr/development

2. Illegal character ... in Solr server logging

This is because of the protocol sunspot using when sending requests to Solr. When you see the exception, pay attention to the scheme of the URI, is it http or https? In my case, it was producing illegal character... error because it was trying to connect using https. Obviously that's not working.

To fix the issue, make sure you do not specify scheme: https in your sunspot.yml. You might also want to check if there is any ENV variable like SOLR_URL or WEBSOLR_URL configured which is in different scheme

3. Undefined field "type" (or class_name)

This is because the default schema created by Solr is not correct. In my case, I had to look for the default content of Solr 5 here: https://github.com/sunspot/sunspot/blob/99f2a0d0945e4e6e2f81352c2af0effa6b71d121/sunspot_solr/solr/solr/configsets/sunspot/conf/schema.xml, and then I copied its content, paste it into the managed-schema file inside my core. If your core is named development, it will be at <your_solr_data_dir>/development/conf/managed-schema

After replacing the file content, you should restart your solr container and it would work fine.

Tips

You might want to index a specific model in the rails console to test. Find which model is configured to be searchable on Solr and then run:

Model.index

For example, I have a model called Store, I can simply run

Store.index

It often helps me to debug the issue easier following this approach than running the rake for reindexing