### Quality degradation mechanism of an integral imaging 3D display system

A typical integral imaging system consists of two parts: the capture process and the display process. In the capture process, the EIA recording 3D object from different perspectives are captured by a camera with MLA. In the display process, the captured EIA are back-projected to reconstruct the 3D images either by the optical or computational method. The parameters of the capture process and the display process are not completely symmetrical in most cases so that the light field conversion process will be needed. The actual imaging process of an integral imaging 3D display system that determines the 3D display quality is complicated. All the steps including the capture process, the light field conversion, and the light field display process will lead to the degradation of 3D display quality.

Figure 1 shows the full-chain light field transmission process of an integral imaging 3D display system. The degradation factors in the capture process include the diffraction effect, the optical aberration and distortion of MLA, and the sampling effect of the recording device. The light field conversion process includes the pseudoscopic effect, 3D image data scaling, and system parameters mismatch between the capture and display devices. Integral imaging systems suffer from the pseudoscopic problem, in which the reconstructed 3D images are depth reversed. Meanwhile, the parameters of the 3D display system usually do not match those of the capture system. How to create EIA with appropriate display parameters from the original captured EIA is also an important issue. The 3D image data scaling that the display parameters are enlarged by the same magnification factor of the capture parameters will be discussed in this paper, which is a special case of parameters mismatch. In the display process, the pixel discretization of the display device such as LCD screen, the diffraction effect, the optical aberration and distortion of MLA need to be considered. Last but not least, the reconstructed 3D object will be received by the human visual system. Thus, the quality of retinal image received by the human eyes should be used to guide the optimized design of an integral imaging 3D display system.

### Full-chain modeling of the integral imaging 3D display system

Based on the idea of full-chain optical information transmission process in the integral imaging 3D display system, the degradation factors of 3D images such as the detector sampling, diffraction effect, geometric distortion, and pixel physical size scaling will be quantitatively described. The full-chain performance characterization model using 3D voxel, field of view, and image depth as the 3D display quality indicators will be established in this section.

### A. the capture process

In the theoretical modeling, a 3D point source passes through the MLA and is recorded by the image detector to obtain the EIA. The corresponding pixels on each elemental image contain the 3D point information of different directions and lead to a spatial energy distribution of the light field with 3D point spread information. Figure 2(a) illustrates the capture process of an integral imaging 3D display system. The model of a one-dimensional MLA is assumed for the sake of simplicity because it can be expanded to a two-dimensional MLA easily. We assume that the 3D scene is a point source and the recorded detector is arranged in a rectangular grid. The focal length of each microlens is *f*, and *g* is the distance between the MLA and the EIA. The object distance from the 3D scene to MLA is *l*. The premise of this manuscript is to record an object point in the optimal distance, which is according to the Gauss lens law 1/*g* + 1/*l* = 1/*f*. If an object point does not have the optimal distance, the collected elemental images array will have a serious blurring effect, and eventually result in a very poor 3D display. The parameter *p* and *c* represent the pitch of the MLA and the pixel size of the image detector, respectively. To describe the light propagation in the image space, we set up a coordinate system *x*-*y*-*z* in which the *z*-axis coincides with the optical axis of the central microlens, and the origin is located at the center of EIA.

Assume that the MLA size is the same as the detector array size and the microlens number of MLA is sufficient, the range of microlens by which the 3D scene can be imaged and recorded by the detector is [−*w*, *w*]. When the light ray emitted by the 3D object point passing through the center of the microlens is just recorded by the edge pixel of the corresponding elemental image, the maximum number of microlens that can participate in 3D imaging satisfies the similar triangle theorem *wp*/*l* = *p*/(2 *g*). Thus, to keep the light rays from leaving the area behind the corresponding microlenses, *w* can be calculated by the following equation:

$$ w=\left\lfloor \frac{l}{2g}\right\rfloor =\left\lfloor \frac{f}{2\left(g-f\right)}\right\rfloor $$

(1)

where the symbol ⌊*x*⌋ denotes the greatest integer less than or equal to the parameter *x*.

Suppose MLA contains *N* × *M* microlenses and the parameters (*m*, *n*), which satisfy (*m*, *n*)∊[−*w*, *w*], 2*n* + 1 ≤ *N* and 2 *m* + 1 ≤ *M*, are the indexes of the microlenses and elemental images on the *x*-direction and *y*-direction, respectively. The (*m*, *n*) th elemental image is the critical one that contains the corresponding pixels for the 3D scene. If the MLA has no diffraction and distortion, the principal light rays emitted from a point source A(*x*, *y*, *l*) will pass through the (*m*, *n*) th microlens and converge on the corresponding point \( \left({x}_A^m,{y}_A^n\right) \) of the EIA. The corresponding point recorded by the detector can be calculated by the following equation:

$$ \left\{\begin{array}{l}{x}_A^m= mp\left(1+\frac{g}{l}\right)-\frac{g}{l}x\\ {}{y}_A^n= np\left(1+\frac{g}{l}\right)-\frac{g}{l}y\end{array}\right. $$

(2)

Considering the diffraction effect of MLA, each corresponding point diffuses the size of the airy spot radius calculated by *R* = 1.22*λg*/*D*, where the parameter *D* denotes the diameter of a microlens. The airy spot radius is the half width of the central lobe of the point spread function (PSF). Besides, we discuss some typical geometric distortions such as barrel and pincushion distortions of the MLA showing in Fig. 2(b), which are proportional to the second power of the image height. According to the Seidel aberrations theory, the distortion expressions \( {\delta}_x\left({x}_A^m,{y}_A^n\right) \) and \( {\delta}_y\left({x}_A^m,{y}_A^n\right) \) are related to the offset of each microlens center and can be written as the following formulas:

$$ \left\{\begin{array}{l}{\delta}_x\left({x}_A^m,{y}_A^n\right)={c}_d\cdot \left({x}_A^m- mp\right)\cdot \left({\left({x}_A^m- mp\right)}^2+{\left({y}_A^n- np\right)}^2\right)\\ {}{\delta}_y\left({x}_A^m,{y}_A^n\right)={c}_d\cdot \left({y}_A^n- np\right)\cdot \left({\left({x}_A^m- mp\right)}^2+{\left({y}_A^n- np\right)}^2\right)\end{array}\right. $$

(3)

where *c*_{d} is the Seidel distortion coefficient. When *c*_{d} is greater than zero, the system produces a positive distorted EIA that is pincushion distortion, while *c*_{d} less than zero produces a negative distorted EIA that is barrel distortion. In the actual imaging process, the diffraction effect and distortion effect of MLA will coexist. Thus, the light ray passing through the (*m*, *n*) th microlens can be expressed by:

$$ \left\{\begin{array}{l}{x}_A^{m,n}= mp\left(1+\frac{g}{l}\right)-\frac{g}{l}x\pm R\pm {\delta}_x\left({x}_A^m,{y}_A^n\right)\\ {}{y}_A^{\mathrm{m},n}= np\left(1+\frac{g}{l}\right)-\frac{g}{l}y\pm R\pm {\delta}_y\left({x}_A^m,{y}_A^n\right)\end{array}\right. $$

(4)

However, in general, the recorded detector has a limited discrete number of pixels. To avoid crosstalk, the size of each elemental image must be the same as that of each microlens in the capture process. Thus, the pixel number of each elemental image *K* satisfies the equation that *Kc* = *p*. The pixelation of the actual capture position range for each microlens is implemented as depicted in Fig. 2(c) and can be accomplished as follows:

$$ {\displaystyle \begin{array}{l}{k_{\mathrm{min}}^{\left(m,n\right)}}_x=\left\lfloor \frac{\min \left({x}_A^{m,n}\right)- mp+\frac{g}{l}x}{c}\right\rfloor +\frac{mp-\frac{g}{l}x}{c}\\ {}{k_{\mathrm{max}}^{\left(m,n\right)}}_x=\left\lceil \frac{\max \left({x}_A^{m,n}\right)- mp+\frac{g}{l}x}{c}\right\rceil +\frac{mp-\frac{g}{l}x}{c}\\ {}{k_{\mathrm{min}}^{\left(m,n\right)}}_y=\left\lfloor \frac{\min \left({y}_A^{m,n}\right)- np+\frac{g}{l}x}{c}\right\rfloor +\frac{np-\frac{g}{l}x}{c}\\ {}{k_{\mathrm{max}}^{\left(m,n\right)}}_y=\left\lceil \frac{\max \left({y}_A^{m,n}\right)- np+\frac{g}{l}x}{c}\right\rceil +\frac{np-\frac{g}{l}x}{c}\end{array}} $$

(5)

where the symbol ⌈*x*⌉ denotes the smallest integer more than or equal to the parameter *x*. The symbol min() is the minimum operator and the symbol max() is the maximum operator. The corresponding pixels of (*m*, *n*) th elemental images are recorded by the recorded detector from the pixel number \( {k_{\mathrm{min}}^{\left(m,n\right)}}_x \) to \( {k_{\mathrm{max}}^{\left(m,n\right)}}_x \) in the *x*-direction and \( {k_{\mathrm{min}}^{\left(m,n\right)}}_y \) to \( {k_{\mathrm{max}}^{\left(m,n\right)}}_y \) in the *y*-direction. If a 3D point source is captured, the elemental image will be a point spread spot array.

### B. the display process

Essentially, the optical display process is the inverse of the capture process. The display device placed at the focal plane of the MLA displays the EIA, and the MLA gathers the light field information to obtain a reconstructed 3D scene in the image space. When an 3D object is captured by the MLA and displayed by the same MLA in an ideal integral imaging 3D display system, the reconstructed image is located at the exact position where the original object is. However, the parameters of the display process usually do not match the parameters of the capture process. Owing to the limitation of display manufacturing technology, the pixel size of the display device is always much larger than the recorded detector. When we use the other devices of a different specification, such as MLA, the reconstructed image will locate at a position that is different from the original object. Moreover, there are some scaling changes along with the lateral and longitudinal directions. For deriving the viewing parameters of the system performance such as the 3D voxel, field of view, and image depth range, geometric optics theory can be used as an approximation for simplicity, because the pixel size of display devices in common use so far is large enough to neglect the diffraction in the display process.

In the display process, as shown in Fig. 3, the captured EIA is displayed on the display device and reconstructed through the MLA. The *x*-*y*-*z* space is formed by the *x*-*y* plane on the display device plane with its origin at the center of the EIA array. Assume that the pixel size of the display screen is *c*_{r} and *M* represents the magnification between *c*_{r} and *c*, that is *M* = *c*_{r}/*c*. The distance between the display device and the display MLA is *g*_{r}, and the distance between the display microlens and the reconstructed image plane is *l*_{r}. The relationship between *g*_{r} and *l*_{r} satisfies the Gauss lens law and can be given as 1/*g*_{r} + 1/*l*_{r} = 1/*f*_{r}, where *f*_{r} is the focal length of each display microlens.

The 3D image data scaling that the display parameters are rescaled in the lateral and longitudinal directions by the factor of *M* will be discussed in this paper. The relationship between the display parameters and the captured parameters is as follows:

$$ \left\{\begin{array}{l}{f}_r=M\times f\\ {}{g}_r=M\times g\\ {}{p}_r=M\times p\\ {}{D}_r=M\times D\\ {}{l}_r=M\times l\end{array}\right. $$

(6)

The corresponding pixels of the original 3D point overlap in the image space and lead to a spatial energy distribution of the light field. This overlapping region cannot be resolved by the viewer unless the signal-to-noise ratio (SNR) is high enough. We define the energy-overlap region with the highest SNR as the full-chain 3D voxel, which is approximated as a cuboid for convenience and simplification. The coordinate of 3D reconstructed point *Z*_{r}, the lateral size of the 3D voxel *H*_{r}, and the longitudinal size of the 3D voxel *D*_{r} can be defined by the following formulas:

$$ {D}_r={D}^{j,h}=\min \left\{\mathrm{abs}\left(\frac{mp_r\cdot {g}_r}{c_r\cdot {k_{\mathrm{min}}^{\left(m,n\right)}}_x-{mp}_r}-\frac{mp_r\cdot {g}_r}{c_r\cdot {k_{\mathrm{max}}^{\left(m,n\right)}}_x-{mp}_r}\right),m,n=1,\dots, w\right\} $$

(7)

$$ {Z}_r=\frac{\left({D}_{\mathrm{max}}^{j,h}+{D}_{\mathrm{min}}^{j,h}\right)}{2} $$

(8)

$$ {H}_r=\frac{c_r\cdot {k_{\mathrm{max}}^{\left(0,0\right)}}_x-{c}_r\cdot {k_{\mathrm{min}}^{\left(0,0\right)}}_x}{g{}_r}{Z}_r $$

(9)

where the longitudinal size *D*_{r} is determined by the corresponding points of the (*j*, *h*) th elemental images passing through the (*j*, *h*) th microlens. In general, the index (*j*, *h*) is the maximum number of the elemental images which can contribute to the full-chain 3D voxel. That is to say, the longitudinal size of the voxel is decided by the projection of the corresponding pixels in the marginal elemental images. Moreover, the lateral size is decided by the projection of the corresponding pixels of the center elemental image on the reconstructed image plane. The lateral resolution and the longitudinal resolution can be defined as the reciprocal of the lateral size *H*_{r} and the longitudinal size *D*_{r}, respectively. Therefore, the lateral resolution and the longitudinal resolution are mainly affected by the number of the microlens in the MLA, the microlens aperture, and the resolution of the display device. The methods of increasing the number of microlenses by space-time multiplexing technology or using the computational integral imaging reconstruction technology to overcome the limitations of display device resolution can improve the 3D display resolution.

The ultimate goal of the 3D display is to present the high-quality 3D content for the human visual system. According to the theory of human visual perception, if the angle between the lateral size of the voxel and the pupil of the human eye is less than 1′, it can be considered that the voxel satisfies the resolution limit of the human eyes and the system is the perfect 3D display. We choose the median value of the optimal viewing distance range [16, 17] as the optimal viewing distance. The optimal viewing distance *d*_{OVD} can be expressed as:

$$ {d}_{OVD}=\frac{5{\mathrm{H}}_r}{8\tan \left({\alpha}_e/2\right)} $$

(10)

where *α*_{e} = 2.9 × 10^{−4}rad.

This paper uses the full-chain 3D voxel, the observable image depth range, and the field of view as the 3D display quality evaluation indicators. According to the paraxial optics theorem, the field of view can be expressed as *θ* = 2 arctan *p*_{r}/2*g*_{r}. All the depths which have the full-chain voxel resolvability can be regarded as the tolerable defocused depths, and the whole observable image depth range will be a set of those depths. The equations of the observable image depth range are given by:

$$ \left\{\begin{array}{l}{z}_{\mathrm{min}}=\frac{g_r{f}_r{D}_r}{\left({g}_r-{f}_r\right){D}_r+{f}_r{p}_r}\\ {}{z}_{\mathrm{max}}=\frac{g_r{f}_r{D}_r}{\left({g}_r-{f}_r\right){D}_r-{f}_r{p}_r}\\ {}\Delta z={z}_{\mathrm{max}}-{z}_{\mathrm{min}}=\frac{2{g}_r{p}_r{D}_r{f_r}^2{D}_r}{{\left({g}_r-{f}_r\right)}^2{D_r}^2-{f_r}^2{p_r}^2}\end{array}\right. $$

(11)