Materials Platform
for Data Science
based on the PAULING FILE experimental inorganic database

Dear physicists, chemists, and engineers! We are happy to welcome you at the MPDS. The MPDS provides materials data, manually extracted by our experts from the scientific publications. Even nowadays this task cannot be automated, so we process scientific articles in physics, chemistry, materials science etc. literally by hands since 1993. And all these data are now available online at the MPDS!

§1. Introductory examples

Rich MPDS web-interface: plotting the results of the search as a matrix
Fig. 1. MPDS platform: plotting the results of the search as a matrix.


Rich MPDS web-interface: using the graph to refine the search
Fig. 2. MPDS platform: using the graph to refine the search.


Rich MPDS web-interface: complex search and redirection to the source article
Fig. 3. MPDS platform: complex search and redirection to the source article.


The PAULING FILE is the relational database for materials scientists, grouping crystallographic data, phase diagrams, and physical properties of inorganic crystalline substances under the same frame. Its focus is put on the experimental observations, and the data are processed from the original publications, covering world scientific literature from 1891 to the present date. Each individual crystal structure, phase diagram, or physical property database entry originates from a particular publication.

Now the PAULING FILE project is relatively well-known. There are already an order of thousand of publications referring it. Its foundations, database design, and some data-centric observations are published e.g. in the works Villars 2004, Villars 2008, Xu 2011, Kong 2012, and Villars 2013.

The MPDS is not the only product based on the PAULING FILE. There are others, such as SpringerMaterials, NIMS AtomWork, MedeA etc. More information can be found at the PAULING FILE website.

§3. MPDS platform

§3.1. Overview

The MPDS platform is an online edition of the PAULING FILE materials database. All the data are presented in two ways (online interfaces): browser-based graphical user interface (GUI) and application programming interface (API). Here the browser-based user interface (GUI) is described, whereas the programmatic usage is covered in the API section.

Full access to all the data in all the supported formats (CIF, PDF, PNG, BIBTEX etc.) is provided by the subscription. Free access is also possible although limited. In addition, some parts of the data are open-access. In particular, these are: (a) cell parameters - temperature diagrams and cell parameters - pressure diagrams, (b) all data for compounds containing both Ag and K, (c) all data for binary compounds of oxygen.

§3.2. Search criteria and modes

Search of data at the MPDS platform is possible according to 12 criteria: 8 in physics or chemistry (materials classes, physical properties, chemical elements, chemical formulae, space groups, crystal systems, prototypes, and atomic environments) and 4 in bibliography (publication author, years, journal, and DOI). There are two search modes: simple and advanced.

In the simple mode different search terms can be typed all in a single input field (see Fig. 4). Here the most frequently used 5 criteria are supported: materials classes, physical properties, chemical elements, chemical formulae, and crystal systems. All they will be correctly recognized and attributed to your search keywords.

Simple materials search over the MPDS database
Fig. 4. Simple (one input field) mode of search.

In the advanced mode each of the 12 search criteria has its own input field. To use it, either click the middle search menu button (), or click the criteria boxes shown at the right of the results pages. Let us get acquainted with the meaning and proper usage of each criterion of search.

§3.3. Materials classes

In this category various materials classes are collected, ranging from technical terms to physical categories, chemical names, element counts, periodic table groups, some isotope names etc. There are lots of auxiliary terms, only applicable to the specific domains, e.g. cell-only, disordered, and non-disordered are valid for the crystalline structures (S-entries). Another example: the term ab initio refers to data taken from the theoretical first-principles modeling papers. Moreover, the majority of the known mineral names are supported, e.g. perovskite, baddeleyite, stishovite, yeelimite etc. Five special (arity) classes unary, binary, ternary, quaternary, and quinary restrict the distinct element count of the results. The most frequently occurring terms are collected below alphabetically.

Listing 1. The most frequently occurring materials classes.

§3.4. Physical properties

All the supported physical properties are given by the MPDS hierarchy. A search for a high-order property assumes all the subordinate properties included in the results. In addition, even more general terms like permittivity or pressure are supported. The physical properties containing these terms in the name will be found.

A part of the physical properties in the hierarchy supports numerical searches. For that an exact name of the property should be used together with the less or more sign and the numerical value of interest (in SI units). Example: isothermal bulk modulus > 300 (assuming GPa).

§3.5. Chemical elements

Chemical elements can be typed as names or symbols (e.g. copper or Cu). Obviously, chemical elements can be combined arbitrarily in searches, using spaces, commas, or dashes as the separators. By default, equal or greater count of elements is implied, e.g. the results for Cd-O-S may contain not only Cd, O, and S, but also Tl, H, N, K etc. To restrict the elements count, the arity materials classes unary, binary, ternary, quaternary, or quinary should be added, e.g. Cd-O-S ternary.

§3.6. Chemical formulae

In the chemical formulae order of elements does not matter. However the results will contain the chemical formulae with the standard order of elements (according to their electronegativity). For instance, the 1000 most frequently occurring chemical formulae are listed below alphabetically.

Listing 2. 1000 most frequently occurring chemical formulae.

§3.7. Crystal systems and space groups

Seven crystal systems and 230 space groups are fully supported. The space groups can be specified as the number or international short symbol. Full list of crystal systems and space groups can be found e.g. in Wikipedia. Note, that crystal systems, space groups, and prototype systems (see below) are mutually exclusive, i.e. not possible to combine in a search query.

§3.8. Prototype systems

Prototype systems are supported in two notations: Strukturbericht and PAULING FILE. The first notation is an old crystallographic classification system still sometimes used in the scientific literature (see the listing below). The second notation is given by a combination of the chemical formula, Pearson symbol, and space group number. For instance, the most common prototype in the world literature is NaCl cF8 225, counting about 40 000 hits. Other important structural prototypes are e.g. cubic perovskite CaTiO3 cP5 221, zincblende ZnS cF8 216, superconducting cuprate Ba2Cu3YO6.3 tP14 123 etc..

Listing 3. All used Strukturbericht symbols.

§3.9. Atomic environments

The atomic environments in the crystalline structures are arranged in the polyhedra (e.g. TiO6 or HgX12). It is possible to search throughout the entire MPDS data by the type and the atomic composition of these polyhedra. The most frequently occurring polyhedral types are shown below, sorted by the number of vertices (i.e. the central atom coordination number):

Listing 4. MPDS data polyhedral types.

The input of polyhedron atoms implies the first typed chemical symbol is the center of the polyhedron. It makes no sense to specify any numerical index near it. The next typed chemical symbols are treated as ligands. Here numerical indices are properly supported. The center and ligands atoms may be subdivided by the space or minus sign. The X symbol stands for any chemical element.

§3.10. Bibliography

Since all the MPDS data were manually excerpted from the peer-reviewed articles, they are searchable by their corresponding author names, publication years, journal issues, pages, and DOIs. This information can be also used for citing. Generally citing of the MPDS is desirable but not obligatory although, as all the data have their own publishers' DOIs or at least journal issues with pages.

This tutorial is work in progress. We thank the reader for the time and interest! Any questions or feedback is very welcomed and greatly appreciated.

Twitter @mpdsio
See on GitHub