Big Data for Nano-electronics

Author: Dr. Stephen Church

Close up of electronical component

This research project studies new material structures that are built on the sub-micronscale: at their largest, they are a factor of ten smaller than the width of a human hair. The structures function as devices that either detect or emit light and are likely to form the basis of the next-generation of nanotechnology, acting as a platform to manipulate light for applications such as communications and photonic-computing. 

The dimensions of these devices are critical to their performance, however, there remain issues with device yield and size variation that must be addressed before they can reach their industrial potential. This project has accelerated this process by developing a high-speed high-throughput approach to characterising hundreds of thousands of individual nanostructures – applying multiple experimental techniques to each device. For each sample, this approach produces a vast, multidimensional dataset that can be investigated to provide previously inaccessible insights into the device performance and establish a route to their optimisation.

Applying open research practices

The principles of open-data/code are fundamental to this research, this includes the following considerations: 

  • The full, multidimensional datasets that lead to publications have been published online on figshare. This includes both raw experimental data, and the extracted physical parameters of each structure. This approach is applied to allow other researchers to verify the results in the publications, as well as to use the data to perform their own investigations. 

  • As it can be complicated to get to grips with such a large dataset, we also provided analysis code on figshare and github, in the form of a Jupyter Notebook. This code provides users with an example of how to extract the dataset, plot the data and study correlations between different parameters. We also share our in-house developed hardware drivers 

  • To facilitate sharing of our research, the pre-prints of the papers associated with the project are all made freely available on the arXiv. 

Overcoming challenges

It can be challenging to get to grips with our large datasets, this is doubly so for external researchers who were not involved in collecting or structuring the data themselves. Therefore, the major challenge was to format of the datasets to make them as compressed, accessible and understandable as possibleThis was addressed by formatting the datasets as the widely used .h5 filetype, providing the relevant metadata to understand the content and, most importantly, providing example code which explains the structure of the dataset. 

Benefits of using these open research practices

We have found that having openly accessible datasets directly increases engagement in our work when in discussion with colleagues and when presenting at conferences – particularly because we can provide direct links to the data that we draw our conclusions from. This approach is accepted as best-practice in the field of advanced materials, which allows the most efficient use of experimental data and facilitates transparency in research. 

A further advantage is how the open datasets synergise with our publications: traffic to one of these drives interest in the other, which establishes our research portfolio.  

Top tip

In open-data experimental projects it is crucial to plan out structure of the datasets before measurements are performed. It can be a massive time-sink to modify the structure after-the-fact.