Full-chain modeling and performance analysis of integral imaging three-dimensional display system

The full-chain system performance characterization is very important for the optimization design of an integral imaging three-dimensional (3D) display system. In this paper, the acquisition and display processes of 3D scene will be treated as a complete light field information transmission process. The full-chain performance characterization model of an integral imaging 3D display system is established, which uses the 3D voxel, the image depth, and the field of view of the reconstructed images as the 3D display quality evaluation indicators. Unlike most of the previous research results using the ideal integral imaging model, the proposed full-chain performance characterization model considering the diffraction effect and optical aberration of the microlens array, the sampling effect of the detector, 3D image data scaling, and the human visual system, can accurately describe the actual 3D light field transmission and convergence characteristics. The relationships between key parameters of an integral imaging 3D display system and the 3D display quality evaluation indicators are analyzed and discussed by the simulation experiment. The results will be helpful for the optimization design of a high-quality integral imaging 3D display system.


Introduction
Integral imaging, proposed by Lippman in 1908, is a promising 3D display technique for its full-color parallax, continuous-viewing 3D images and without any special glasses. Integral imaging uses a microlens array (MLA) to capture the light field from the 3D scene and another MLA to reconstruct the 3D light field for the observers [1][2][3][4]. In recent years, with the development of the precision manufacturing technology on microscale/nanoscale, the 3D display quality of integral imaging system has been greatly improved, which shows the dawn of the industrialization of integral imaging technology. The 3D display quality and the trade-offs between the image resolution [5], field of view [6], and image depth [7] are determined by the structural characteristics of an integral imaging 3D display system, including the capture/display MLA, the detector, the display device, as well as the observers. When facing the problem of designing a high-quality integral imaging 3D display system for the specific application requirements of the commercial market, the system performance needs to be accurately predicted to make appropriate engineering design choices.
There have been many kinds of literature on the performance characterization of an integral imaging 3D display system, mostly concentrating on either the capture or the display process, which is merely a unilateral performance indicator and cannot characterize the system performance comprehensively. Kawakita et al. [8] analyzed the relationships between the reconstructed 3D image quality and the geometric distortion in the elemental image array (EIA) caused by the projected lens. Kavehvash et al. [9] presented a new 3D resolvability concept of lateral and axial resolvability by the sampling properties of light rays based on the particular image plane. Cho et al. [10] defined the volume of the reconstructed voxel by the product of depth resolution and lateral resolution squared as the performance metrics and optimized an integral imaging system using the brute-force searching algorithm under fixed resource constraints. Wu et al. [11] reported the 3D spatialresolution research based on reconstructed 3D space and analyzed the effects of microlens parameter accuracy on the reconstructed position error. Zhou et al. [12] proposed an approximate voxel model for integral imaging, but they only considered an ideal capture process and did not consider the diffraction effect of the MLA, the sampling effect of detector, and discretization of the display pixel. Young-Min Kim et al. [13] proposed a practical method to analyze the expressible depth range of an integral imaging system based on image blur at defocused depths caused by overlaps among voxels in both the real and focused mode. Nevertheless, the above work mainly discusses the influence of system parameters on a certain performance evaluation indicator (the imaging resolution, depth range or field of view) through theoretical modeling, which does not research on the global parameters optimization of an integral imaging 3D display system from the perspective of the full-chain light field transmission. Hong Hua et al. [14,15] describe a generalized framework to model the image formation process of the existing light field display methods and present a systematic method to simulate the retinal image and the accommodation response rendered by a light field display, which provides the inspirations for the full-chain performance characterization of integral imaging 3D display system.
To achieve a comprehensive performance characterization of an integral imaging 3D display system, we deal with the capture process and the display process as a whole and establish the full-chain performance characterization model that considers the main 3D display quality degradation factors, including the diffraction effect and optical aberration of MLA, sampling effect of the detector, pixel discretization of the display device, 3D image data scaling, and the human visual system. The full-chain performance characterization model is established based on ray-tracing and sampling theory and calculated in MATLAB simulation environment. Research results will provide theoretical supports for the optimized design of a high-quality integral imaging 3D display system. The rest of the paper is organized as follows. Methods section introduces the degradation mechanism of 3D display quality and the principle of full-chain performance characterization model considering the most 3D display quality degradation factors of an integral imaging 3D display system. Numerical simulation and results analysis are given in the Results and Discussion section. The last section concludes this paper.

Methods
Quality degradation mechanism of an integral imaging 3D display system A typical integral imaging system consists of two parts: the capture process and the display process. In the capture process, the EIA recording 3D object from different perspectives are captured by a camera with MLA. In the display process, the captured EIA are back-projected to reconstruct the 3D images either by the optical or computational method. The parameters of the capture process and the display process are not completely symmetrical in most cases so that the light field conversion process will be needed. The actual imaging process of an integral imaging 3D display system that determines the 3D display quality is complicated. All the steps including the capture process, the light field conversion, and the light field display process will lead to the degradation of 3D display quality. Figure 1 shows the full-chain light field transmission process of an integral imaging 3D display system. The degradation factors in the capture process include the diffraction effect, the optical aberration and distortion of MLA, and the sampling effect of the recording device. The light field conversion process includes the pseudoscopic effect, 3D image data scaling, and system parameters mismatch between the capture and display devices. Integral imaging systems suffer from the pseudoscopic problem, in which the reconstructed 3D images are depth reversed. Meanwhile, the parameters of the 3D display system usually do not match those of the capture system. How to create EIA with appropriate display parameters from the original captured EIA is also an important issue. The 3D image data scaling that the display parameters are enlarged by the same magnification factor of the capture parameters will be discussed in this paper, which is a special case of parameters mismatch. In the display process, the pixel discretization of the display device such as LCD screen, the diffraction effect, the optical aberration and distortion of MLA need to be considered. Last but not least, the reconstructed 3D object will be received by the human visual system. Thus, the quality of retinal image received by the human eyes should be used to guide the optimized design of an integral imaging 3D display system.

Full-chain modeling of the integral imaging 3D display system
Based on the idea of full-chain optical information transmission process in the integral imaging 3D display system, the degradation factors of 3D images such as the detector sampling, diffraction effect, geometric distortion, and pixel physical size scaling will be quantitatively described. The full-chain performance characterization model using 3D voxel, field of view, and image depth as the 3D display quality indicators will be established in this section.

A. the capture process
In the theoretical modeling, a 3D point source passes through the MLA and is recorded by the image detector to obtain the EIA. The corresponding pixels on each elemental image contain the 3D point information of different directions and lead to a spatial energy distribution of the light field with 3D point spread information. Figure 2(a) illustrates the capture process of an integral imaging 3D display system. The model of a onedimensional MLA is assumed for the sake of simplicity because it can be expanded to a two-dimensional MLA easily. We assume that the 3D scene is a point source and the recorded detector is arranged in a rectangular grid. The focal length of each microlens is f, and g is the distance between the MLA and the EIA. The object distance from the 3D scene to MLA is l. The premise of this manuscript is to record an object point in the optimal distance, which is according to the Gauss lens law 1/g + 1/l = 1/f. If an object point does not have the optimal distance, the collected elemental images array will have a serious blurring effect, and eventually result in a very poor 3D display. The parameter p and c represent the pitch of the MLA and the pixel size of the image detector, respectively. To describe the light propagation in the image space, we set up a coordinate system x-y-z in which the z-axis coincides with the optical axis of the central microlens, and the origin is located at the center of EIA.
Assume that the MLA size is the same as the detector array size and the microlens number of MLA is sufficient, the range of microlens by which the 3D scene can be imaged and recorded by the detector is [−w, w]. When the light ray emitted by the 3D object point passing through the center of the microlens is just recorded by the edge pixel of the corresponding elemental image, the maximum number of microlens that can participate in 3D imaging satisfies the similar triangle theorem wp/ l = p/(2 g). Thus, to keep the light rays from leaving the area behind the corresponding microlenses, w can be calculated by the following equation: where the symbol ⌊x⌋ denotes the greatest integer less than or equal to the parameter x. Suppose MLA contains N × M microlenses and the parameters (m, n), which satisfy (m, n) [−w, w], 2n + 1 ≤ N and 2 m + 1 ≤ M, are the indexes of the microlenses and elemental images on the x-direction and y-direction, respectively. The (m, n) th elemental image is the critical one that contains the corresponding pixels for the 3D scene. If the MLA has no diffraction and distortion, the principal light rays emitted from a point source A(x, y, l) will pass through the (m, n) th microlens and converge on the corresponding point ðx m A ; y n A Þ of the EIA. The corresponding point recorded by the detector can be calculated by the following equation: Considering the diffraction effect of MLA, each corresponding point diffuses the size of the airy spot radius calculated by R = 1.22λg/D, where the parameter D denotes the diameter of a microlens. The airy spot radius is the half width of the central lobe of the point spread function (PSF). Besides, we discuss some typical geometric distortions such as barrel and pincushion distortions of the MLA showing in Fig. 2(b), which are proportional to the second power of the image height. According to the Seidel aberrations theory, the distortion expressions δ x ðx m A ; y n A Þ and δ y ðx m A ; y n A Þ are related to the offset of each microlens center and can be written as the following formulas: where c d is the Seidel distortion coefficient. When c d is greater than zero, the system produces a positive distorted EIA that is pincushion distortion, while c d less than zero produces a negative distorted EIA that is barrel distortion. In the actual imaging process, the diffraction effect and distortion effect of MLA will coexist. Thus, the light ray passing through the (m, n) th microlens can be expressed by: However, in general, the recorded detector has a limited discrete number of pixels. To avoid crosstalk, the size of each elemental image must be the same as that of each microlens in the capture process. Thus, the pixel number of each elemental image K satisfies the equation that Kc = p. The pixelation of the actual capture position range for each microlens is implemented as depicted in Fig. 2(c) and can be accomplished as follows: where the symbol ⌈x⌉ denotes the smallest integer more than or equal to the parameter x. The symbol min() is the minimum operator and the symbol max() is the maximum operator. The corresponding pixels of (m, n) th elemental images are recorded by the recorded detector from the pixel number k ðm;nÞ min x to k ðm;nÞ max x in the xdirection and k ðm;nÞ min y to k ðm;nÞ max y in the y-direction. If a 3D point source is captured, the elemental image will be a point spread spot array.

B. the display process
Essentially, the optical display process is the inverse of the capture process. The display device placed at the focal plane of the MLA displays the EIA, and the MLA gathers the light field information to obtain a reconstructed 3D scene in the image space. When an 3D object is captured by the MLA and displayed by the same MLA in an ideal integral imaging 3D display system, the reconstructed image is located at the exact position where the original object is. However, the parameters of the display process usually do not match the parameters of the capture process. Owing to the limitation of display manufacturing technology, the pixel size of the display device is always much larger than the recorded detector. When we use the other devices of a different specification, such as MLA, the reconstructed image will locate at a position that is different from the original object.
Moreover, there are some scaling changes along with the lateral and longitudinal directions. For deriving the viewing parameters of the system performance such as the 3D voxel, field of view, and image depth range, geometric optics theory can be used as an approximation for simplicity, because the pixel size of display devices in common use so far is large enough to neglect the diffraction in the display process.
In the display process, as shown in Fig. 3, the captured EIA is displayed on the display device and reconstructed through the MLA. The x-y-z space is formed by the x-y plane on the display device plane with its origin at the center of the EIA array. Assume that the pixel size of the display screen is c r and M represents the magnification between c r and c, that is M = c r /c. The distance between the display device and the display MLA is g r , and the distance between the display microlens and the reconstructed image plane is l r . The relationship between g r and l r satisfies the Gauss lens law and can be given as 1/g r + 1/l r = 1/f r , where f r is the focal length of each display microlens.
The 3D image data scaling that the display parameters are rescaled in the lateral and longitudinal directions by the factor of M will be discussed in this paper. The relationship between the display parameters and the captured parameters is as follows: The corresponding pixels of the original 3D point overlap in the image space and lead to a spatial energy distribution of the light field. This overlapping region cannot be resolved by the viewer unless the signal-tonoise ratio (SNR) is high enough. We define the energyoverlap region with the highest SNR as the full-chain 3D voxel, which is approximated as a cuboid for convenience and simplification. The coordinate of 3D reconstructed point Z r , the lateral size of the 3D voxel H r , and the longitudinal size of the 3D voxel D r can be defined by the following formulas: where the longitudinal size D r is determined by the corresponding points of the (j, h) th elemental images passing through the (j, h) th microlens. In general, the index (j, h) is the maximum number of the elemental images which can contribute to the full-chain 3D voxel. That is to say, the longitudinal size of the voxel is decided by the projection of the corresponding pixels in the marginal elemental images. Moreover, the lateral size is decided by the projection of the corresponding pixels of the center elemental image on the reconstructed image plane. The lateral resolution and the longitudinal resolution can be defined as the reciprocal of the lateral size H r and the longitudinal size D r , respectively. Therefore, the lateral resolution and the longitudinal resolution are mainly affected by the number of the microlens in the MLA, the microlens aperture, and the resolution of the display device. The methods of increasing the number of microlenses by space-time multiplexing technology or using the computational integral imaging reconstruction technology to overcome the limitations of display device resolution can improve the 3D display resolution. The ultimate goal of the 3D display is to present the high-quality 3D content for the human visual system. According to the theory of human visual perception, if the angle between the lateral size of the voxel and the pupil of the human eye is less than 1′, it can be considered that the voxel satisfies the resolution limit of the human eyes and the system is the perfect 3D display. We choose the median value of the optimal viewing distance range [16,17] as the optimal viewing distance. The optimal viewing distance d OVD can be expressed as: where α e = 2.9 × 10 −4 rad. This paper uses the full-chain 3D voxel, the observable image depth range, and the field of view as the 3D display quality evaluation indicators. According to the paraxial optics theorem, the field of view can be expressed as θ = 2 arctan p r /2g r . All the depths which have the fullchain voxel resolvability can be regarded as the tolerable defocused depths, and the whole observable image depth range will be a set of those depths. The equations of the observable image depth range are given by:

Results and discussion
To discuss the trade-off relationships between the key system parameters, simulation experiments will be carried out with the typical system design parameters based on the full-chain performance characterization model. The point array EIAs and 3D display quality evaluation indicators including the 3D voxel, image depth range, and field of view will be calculated to analyze the effects of different system design parameters on 3D display quality.

A. Generation of EIA
The parameters of the capture process are set as follows.
The MLA consists of closely arranged small squares, each microlens unit size equals 1 mm × 1 mm. The 3D object point is located on its central optical axis, the detection pixel size is 0.005 mm and the average wavelength is 5.5 × 10 − 4 mm. The pixel intensity of the elemental image acquired in the paper is normalized to 1, regardless of the different gray levels of the pixel. It is assumed that the MLA is composed of N microlenses in the one-dimensional direction, the central microlens is identified as the 0 th microlens, and the maximum critical microlens is w (N ≥ 2w + 1) representing the number of the farthest microlenses that can participate in 3D imaging. Figure 4 shows the variation of the number of maximum critical microlens w at different focal lengths and object distances. Figure 5 is the point array EIAs under different system parameters. The focal length of the first row to the fourth row are f = 3.0 mm, f = 3.1 mm, f = 3.2 mm, f = 3.3 mm, respectively. The object distance of the first column to fourth column are taken as g = 3.4 mm, g = 3.6 mm, g = 3.8 mm, g = 4.0 mm, respectively. It can be obtained that the size of the reconstructed 3D voxel will change as the number of microlenses increases while remaining unchanged after the number of microlenses increases to the maximum critical value. Due to the number of microlenses participating in 3D imaging is fixed under certain system parameters, the light field information imaged by the microlens greater than w will not be recorded by the detector. Therefore, it is only necessary to consider the contribution of the effective microlenses within the maximum critical value w when studying the light field convergence characteristics of EIA.

B. Analysis of 3D display quality
Effects of the pixel size on the 3D display quality The detector pixel size c affects the captured EIA, which will indirectly act on the final 3D display quality of an integral imaging system. The display pixel size c r will directly affect the 3D display characteristics of an integral imaging system. In this section, the effects of detector and display screen on 3D display quality under two different conditions will be discussed respectively. The first is that the detector pixel c changes and the display pixel size c r remains constant, and the second is that the pixel size of the detector and the display device change simultaneously. The ideal voxel calculation model proposed in reference [12] is selected as the comparison method. The lateral size and longitudinal size of the voxel calculated by the contrast method are indicated by "2014 lat_vol" and "2014 dep_vol", respectively. The lateral size and longitudinal size of the full-chain 3D voxel calculated by the full-chain performance characterization model proposed in this paper are represented by "lat_vol" and "dep_vol", respectively.
When c varies from 0.002 mm to 0.05 mm and c r remains 0.05 mm unchanged, the system field of view remains unchanged at 15.39°. Figure 6 is the schematic diagram showing the lateral and longitudinal size of the voxel change with the detector size. Due to an ideal capture process is adopted in the comparison method in which the diffraction effect of MLA, geometric distortion and discrete sampling effect are ignored, the lateral size and longitudinal size of the 3D voxel are smaller than the method proposed in this paper. It can be seen that the reduction of the detector pixel size cannot decrease the 3D voxel or improve the 3D display resolution significantly when keeping the other system parameters unchanged. The variation of the lateral size, the longitudinal size, and the image depth when the display pixel size and the detector pixel size are equal and increase simultaneously in the range of 0.002 mm to 0.05 mm are shown in Fig. 7. DOF is used to represent the image depth in the figure. It can be seen that the lateral size, the longitudinal size, and the image depth will become larger as the size of the detector pixel and the display pixel increase  simultaneously. The degree of change in the longitudinal size of the voxel is greater than the degree of change in the lateral size. Assuming that an integral imaging 3D display system requires the lateral size less than 0.5 mm, then the detector pixel size and display screen needs to be less than 0.03 mm, corresponding to the region B in Fig. 7. If the longitudinal size of the voxel requires less than 1.5 mm, then the pixel size of the detector and display screen cannot be greater than 0.027 mm, corresponding to the region A in Fig. 7. If the image depth requires greater than 10 mm, then the pixel size of the detector and display screen needs to be greater than 0.019 mm, corresponding to the region C in Fig. 7. The system parameters range satisfying all the performance requirements is the overlapping region D of the regions A, B, and C. That is to say, when the pixel size of the detector and the display screen is in the region D ranging from 0.019 mm to 0.027 mm, an integral imaging 3D display system satisfies the lateral size less than 0.5 mm, the longitudinal size less than 1.5 mm, and the depth of field greater than 10 mm. According to Formula (10), the optimal viewing distance for the human visual system will become larger as the lateral size of voxel becomes larger. For the mobile phone display, the optimal viewing distance is generally controlled to 150 mm to 400 mm, so the display pixel size should be controlled within 0.006 mm, otherwise the picture will be distorted or grainy. For the long-distance 3D display devices such as 3D film screen, LCD TVs or billboards, the requirement of display pixel size can be relaxed, and the display pixel within 0.05 mm can ensure that the human visual system sees continuous 3D images at 3000 mm. In this case, the parameter optimization design of an integral imaging 3D display system should be performed from the aspects of image depth and field of view.
Effects of the number and aperture size of MLA on 3D display quality Assuming that the variable N indicates the total number of MLA, then n = (N-1)/2 is the number of microlenses from the center to the edge microlens. Figure 8 is the schematic diagram showing the variation of the lateral and longitudinal size of the voxel with n at different focal lengths f. It can be seen that the lateral and longitudinal size of the voxel will decrease as the number of microlenses increases. The rate of change will gradually become smaller and will remain constant after the number of microlenses reaches a certain critical value. It is mainly because that when g is kept constant, the number of microlenses and elemental images that can participate in 3D imaging decreases as f decreases. More microlenses can reduce the voxel size and increase the display resolution, while the number of microlenses that can participate in the 3D display has an upper limit under the fixed system parameters. This means that those microlenses cannot contribute to the light field energy distribution of the image space and affect the lateral or longitudinal size of the reconstructed 3D voxel. Figure 9 is the schematic diagram showing the lateral size, the image depth, and field of view when the microlens aperture size p is varied from 0.5 mm to 3 mm. FOV is indicated to the field of view in Fig. 9. As the microlens aperture size p increases from 0.5 mm to 3 mm, the field of view will gradually increase from about 8°to 44°. Assuming the aperture size is equal to the spacing of adjacent microlenses, then the field of view is proportional to the microlens aperture size. The image depth will decrease from 50 mm to 8 mm and the lateral size will decrease from 0.8 mm to 0.55 mm as the microlens aperture size p increases. The trade-off between the different performance evaluation indicators should be considered. The reduction in the lateral size means an increase in lateral resolution and an improvement in 3D display performance. Therefore, the system parameters range can be determined first according to the requirement for image depth when designing an integral imaging 3D display system, and then the microlens aperture size and the number of microlenses participating in 3D imaging can be increased as much as possible within the fixed system parameters.
Effects of object distance and focal length on 3D display quality Figure 10(a) shows the effects of the object distance and focal length on the lateral size of the voxel. When the focal length f is constant, the lateral size of the voxel decreases as g increases until it stabilizes. The lateral size increases nonlinearly with the increase of focal length f when the object distance g is constant, while the differences between the lateral size of voxels corresponding to different focal length f decrease as g increases. Figure 10(b) shows the effects of the object distance and focal length on the longitudinal size of the voxel. The effects on the longitudinal size of the voxel have some similarities with the lateral size. In addition, the longitudinal size of the voxel will fluctuate with the increase of g and do not tend to increase or decrease when the focal length f is reduced to a critical value. Figure 10(c) shows the effects of the object distance and focal length on the depth of field. The effect on the image depth is similar to the lateral size. The field of view decreases from 16.2°to 13.6°as g increases.  It can be seen from Fig. 10 (a), (b), and (c) that the effects of object distance and focal length on the different evaluation indicators of an integral imaging 3D display system are similar. As the value | g-f | increases, the rate of change of 3D display quality evaluation indicators will decrease. That is, the greater the difference between the object distance and the focal length, the smaller the effects of the object distance and the focal length on 3D display quality. When the value | g-f | is increased to a critical value, the performance metrics of an integral imaging system will no longer change with the parameters change.
When optimizing an integral imaging 3D display system, the object distance can be set according to different application environments first. The voxel size and the image depth are nonlinearly related to the object distance, and the field of view varies linearly with the object distance. Different object distances will determine the system optimization parameters that the 3D display system can achieve. Next, under the premise of meeting the requirements for the image depth and field of view, increasing the aperture size and the number of the microlenses while reducing the pixel size and focal length can improve 3D display quality.

Conclusions
The trade-off relationships among the viewing parameters of an integral imaging system force us to characterize 3D display performance comprehensively and optimize the processes of the 3D image acquisition and display. In this paper, the full-chain performance characterization model of an integral imaging 3D display system is established, which uses the 3D voxel, the image depth range and the field of view as the 3D display quality evaluation indicators. The full-chain performance characterization model considering the diffraction effect and optical aberration of MLA, the sampling effect of the detector, system parameters mismatch, 3D image data scaling and the human visual system, can accurately characterize the actual 3D light field transmission and convergence characteristics. The relationships between system key parameters and the 3D display quality evaluation indicators are analyzed and discussed by the simulation experiment. Research results can provide theoretical guidance for the optimized design of integral imaging 3D display systems.

Availability of data and materials
Detail about data has been provided in the manuscript.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.