An analysis of a vast database of compounds has revealed a curious repeating pattern in the way matter composes itself.

Of more than 80,000 electronic structures of experimental and predicted materials studied, a whopping 60 percent have a basic structural unit based on a multiple of four.

What's so strange about this is that the research team that discovered this pattern couldn't figure out why it happens. All we know at the moment is that it's real and observable. It just evades explanation.

"Through an extensive investigation, in this work we highlight and analyze the anomalous abundance of inorganic compounds whose primitive unit cell contains a number of atoms that is a multiple of four, a property that we name rule of four," write a team led by physicist Elena Gazzarrini, formerly of the Swiss Federal Institute of Technology Lausanne in Switzerland, now at CERN.

"The study provides a starting point for future investigations on the rule's emergence, given that a fully satisfactory explanation of such anomalous distribution is as yet lacking."

One of the most basic questions about the Universe around us is why some properties are more abundant than others. Why is there more matter than antimatter? Why are the building blocks of life left-handed? And why do materials behave the way they do?

This last one is of keen interest in materials science, which seeks to understand the properties and behavior of different combinations of atoms to help develop and refine them. But it's a question that's difficult to approach, since there's just such a vast variety in the way particles can come together.

This is why Gazzarrini and her colleagues became intrigued when they noticed a pattern seeming to emerge in two databases of materials, the Materials Project (MP) database and the Materials Cloud 3-dimensional crystal structures 'source' database (MC3Dsource). The majority of inorganic compounds in both databases have unit cells – that is, the smallest possible unit that repeats within a crystal structure – based on multiples of four.

This was concerning, because theoretically all structure types should be equally represented in these databases. Having such a dominant pattern emerge could mean that there is a flaw in the data somewhere, an error that hasn't been noticed.

"A first intuitive reason could come from the fact that when a conventional unit cell (a larger cell than the primitive one, representing the full symmetry of the crystal) is transformed into a primitive cell, the number of atoms is typically reduced by four times," Gazzarini explains. "The first question we asked was whether the software used to 'primitivize' the unit cell had done it correctly, and the answer was yes."

Once they ruled out obvious errors, they had to dig deeper, and look for other patterns that might explain the rule of four.

One possible explanation was silicon, which can bind four other atoms to its central atom. If all the rule of four compounds contained silicon, that would solve the mystery… but not all the materials had silicon in them. Similarly, there was no rhyme or reason to the formation energies of the rule of four compounds.

So, the next step was to build a more powerful algorithm. This was accomplished with the aid of engineer Rose Cernosky of the University of Wisconsin. The algorithm grouped the compounds together according to similarities in their atomic properties. Once again, there was no discernible pattern.

No matter what the team tried, nothing stuck. The pattern is real, and doesn't appear to be an error, but there was no other property that could accurately predict whether or not a compound will follow the rule of four – at least when a person was the one doing the predicting.

When the team ran the data through a machine learning algorithm that can achieve high-accuracy predictions from datasets, the output results predicted whether or not a compound would obey the rule of four with up to 87 percent accuracy. This suggests that there may be something that we're missing about the rule of four compounds that could help explain what produces the pattern.

It may be currently difficult to study patterns in materials; but the findings suggest that, with ever more powerful computational techniques, we might be able to start making some fascinating headway.

The team's research has been published in npj Computational Materials.