Binary defocusing technique based on complementary decoding with unconstrained dual projectors

Binary defocusing technique can effectively break the limitation of hardware speed, which has been widely used in the real-time three-dimensional (3D) reconstruction. In addition, fusion technique can reduce captured images count for a 3D scene, which helps to improve real-time performance. Unfortunately, it is difficult for binary defocusing technique and fusion technique working simultaneously. To this end, our research established a novel system framework consisting of dual projectors and a camera, where the position and posture of the dual projectors are not strictly required. And, the dual projectors can adjust defocusing level independently. Based on this, this paper proposed a complementary decoding method with unconstrained dual projectors. The core idea is that low-resolution information is employed for high-resolution phase unwrapping. For this purpose, we developed the low-resolution depth extraction strategy based on periodic space-time coding patterns and the method from the low-resolution order to high-resolution order of fringe. Finally, experimental results demonstrated the performance of our proposed method, and the proposed method only requires three images for a 3D scene, as well as has strong robustness, expansibility, and implementation.


Introduction
Fringe projection profilometry (FPP) plays an important role in some academia and applied fields, such as product inspection, reverse engineering, and computer animation, because of its advantages of being high precision, full field, and nondestructive [1][2][3][4][5].A series of fringe patterns generated by the computer are casted on an object by a projector.Then, a camera acquires the patterns distorted by the object surface geometry, as well as the three-dimensional (3D) profile of the object is obtained by the reconstruction algorithms [6].With the development of the technical requirement, the real-time performance is increasingly important [7].Generally, in order to enhance the real-time capability, we inevitably require to increase hardware speed or decrease the number of captured images to reconstruct a 3D scene.
Recently, with the development of projector technique, the refresh rate of special projector can exceed 10,000 frames/s in 1-bit mode.The novel technique has benefited from the digital-mirror-device (DMD) technique developed by Texas Instruments (TI).For example, the TI D4100 projector can reach 3.255 × 10 4 frames/s in binary mode with 1024 × 768 resolution [4].Based on this, researchers have proposed a binary defocusing technique (BDT), which breaks the limitation of the maximum refresh rate of the commercial projector.Around 2009, Lei et al. [8] developed the flexible 3D reconstruction in projector defocusing, which successfully made rate breakthroughs.Nevertheless, the quality is unsatisfactory due to the high-order harmonics influences.In 2010, researchers introduced pulse width modulate (PWM) techniques into BDT [9].The method alleviates the high-order harmonics influences, but it also has two defects: (1) the width of fringe limits its improvement, and (2) the one-dimensional (1D) fringe of the PWM technique is not fit for the two-dimension of BDT.In 2012, Lohry et al. [10] proposed to approximate the triangular waveform by modifying 2 × 2 pixels so that the casted patterns are better for BDT.Lately, researchers introduced the dithering techniques into BDT in order to significantly improve the quality of the casted fringe patterns.Because the dithering techniques represent grayscale images with binary images for printing and processing, it results in a higher quality of fringe pattern than other previous 1D defocusing method.The relevant researches mainly include the Bayer method [11], the error-diffusion method [12] and the iteration method [13].In conclusion, the speed of the projector has been greatly improved because of BDT, so the hardware speed limitation falls on the camera.For the camera, if its refresh rate is increased, the real-time performance can be enhanced to some extent.However, the refresh rate is the inherent property of camera.Hence, the more effective means is to reduce the number of captured images for a 3D scene reconstruction.
It usually has unsatisfactory resolution and precision, with only one captured frame decoded by spatial information.As to this problem, some researchers considered decoding with multiple images, for example, phaseshifting measurement profilometry (PMP).In PMP, especially, three-step phase-shifting (3-PS) algorithm can achieve rapidity since it captures the minimum number of phase-shifting (PS) patterns to reconstruct a 3D scene.Furthermore, the algorithm inevitably needs to retrieve wrapped phase.Generally, there are two types of methods with unwrapping phase: spatial phase unwrapping and temporal phase unwrapping.Spatial phase unwrapping does not require additional decoding images, but it fails when the reconstructed surfaces are abrupt, missing, or discrete.Hence, temporal phase unwrapping is still necessary, and the fusion strategies are introduced to reduce the number of captured images.Normally, the fusion strategies can be divided into the patterns fusion and the multi-device fusion.In the patterns fusion, there are two ways: the fusion based on single channel (i.e.gray) [14] and multichannel (i.e.color) [15].The gray fusion is a superposition according to different spatial frequencies or angles, and the color fusion obtains irrelevance of color.It is noted that both have some defects, such as low-resolution, low-contrast or color influence.More importantly, they are not suitable for BDT, i.e., it is impossible for a single projector to have multiple defocusing levels at the same time.In the multidevice fusion, an additional device is introduced.Zhang et al. [16] proposed a fusion framework which consists of a Kinect and a PMP system.The framework used low-resolution depth information of Kinect to guide wrapped phase retrieving.The method captures three images to reconstruct a 3D scene, but the system rate is limited due to the use of Kinect.We also proposed a reconstruction method for binary defocusing technique based on complementary decoding with dual projectors [17], which includes two strategies: the strict-alignment method and the loose-alignment method.These two strategies have some requirements for the position and posture of the dual projectors, which lead to a lack of flexibility and high algorithmic complexity.
In this paper, we present a novel binary defocusing technique based on complementary decoding with unconstrained dual projectors.A novel system framework with dual projectors and a camera is built.In our research, the position and posture of the introduced projectors are not strictly required.And, the common projection area just needs to cover the measured volume.The core idea is that low-resolution information is help for high-resolution phase unwrapping.The lowresolution information is mainly the low-resolution order of fringe in PMP, which is obtained by the lowresolution depth relation.The dual projectors have their own calibration relationship with the single camera, and there is the same calibration reference for the whole system.Therefore, the relation can be expressed as an equation between the low-resolution depth calculated by the binary PMP model and the low-resolution depth computed by our proposed low-resolution depth extraction strategy based on periodic space-time coding patterns (hereafter called LDES).In practice, through optimizing the relation, the low-resolution order is obtained.Moreover, we also propose the strategy from the low-resolution order to high-resolution order.Finally, the high-resolution phase unwrapping is easy to get from the high-resolution order.The experimental results are presented to prove the success of our proposed method, which only requires three captured images for a 3D scene and has strong robustness, flexibility, and implementation since the constraint of the position and posture of the introduced two projectors is avoided.
The organization of this paper is as follows.In Section 2, the principle of the proposed method is given in detail, which includes the theoretical system framework and the complementary decoding method with unconstrained dual projectors.In Section 3, the experiments are reported, which are used to evaluate the proposed method.Section 4 makes a conclusion for this paper.

Principle
In this section, we theoretically represent the BDT based on complementary decoding with unconstrained dual the strategy from the low-resolution order to highresolution order of fringe.

Theoretical system framework
Figure 1 shows the schematic diagram of system framework.The framework mainly includes a multiband camera and dual projectors.The dual projectors cast two different wavebands patterns in defocusing or focusing, respectively.In our research, we select Light-Crafter 3000 evaluation module (see red dotted circle in Fig. 1) of Texas Instruments (hereafter called EVM) as the projector since it has good programming and synchronization.The two EVMs are placed side by side, and cast blue and red patterns respectively.Next, the patterns are fused in the measured volume (see green dotted circle in Fig. 1).A color camera substitutes for the multiband camera, because of its three channels: red, green, and blue.The color camera captures the fusion images distorted by the object surface geometry.Then, the captured fusion images are separated and 3D profile of the object is reconstructed.Here, the position and posture of the two EVMs are not strictly required, but it is noted that the common projection area should cover the measurement area.

Complementary decoding method with unconstrained dual EVMs
The proposed novel method only obtains three captured images for a 3D scene reconstruction regardless of the position and posture between the two EVMs.The core idea is that low-resolution information is employed for high-resolution phase unwrapping.Moreover, there are two important points to notice: (1) the common projection area should cover the measurement area; (2) the dual EVMs should have their own calibration relationship with the single camera, but there is the same calibration reference for the whole system.Let us suppose, the points P (x low , y low ) are the low-resolution pixels, and there exists a following relation in theory.
where to be transformed into the optimization objective function, which can be express as: The formula above means to achieve the optimal n (x low , y low ) so that D PMP (x low , y low ) and D LDES (x low , y low ) are as close as possible.Then, for high-resolution phase unwrapping, the optimal n (x low , y low ) needs to be extended to n (x high , y high ).
Section 2.B.1 illustrates the operator Cal[•], which can be regard as the binary PMP model.Section 2.B.2 proposes the low-resolution depth extraction strategy based on periodic space-time coding patterns, which is used to solve D LDES (x low , y low ).Section 2.B.3 mainly explains the strategy from n (x low , y low ) to n (x high , y high ), which is used for high-resolution phase unwrapping.

Illustration of operator Cal[•]
The operator Cal[•] is that how the wrapped phase φ (x low , y low ) converts to the depth D PMP (x low , y low ), whose procedure can be described as a five-step process [see Fig. 2 and in order to simplify representation, the (x low , y low ) is omitted in the figure].The five steps are illustrated in detail below.
Step 1: The binary PMP model based on 3-PS algorithm is introduced since it requires the minimum number of phase-shifting patterns [13].For the algorithm with a phase shift of 2π/3, the fringe patterns can be mathematically expressed as, where I is the intensity, a is the average intensity, b is the intensity modulation and φ (x low , y low ) is the wrapped phase.As shown in Fig. 2, I 1 , I 2 and I 3 are the 3-PS patterns, which are projected into the measured volume.
Step 2: The camera captures the reflected images and the wrapped phases are calculated through the following equation: In Fig. 2, φ denotes the wrapped phase map of the captured images.
Step 3: The Eq. ( 4) provides the phase ranging [−π, +π] with 2π discontinuities.That is to say, we need to retrieve φ (x low , y low ), i.e. obtain the true value of n (x low , y low ): where Φ (x low , y low ) is the unwrapped phase.In Fig. 2, Φ denotes the unwrapped phase map of the captured images.
Step 4: The calibration model needs to introduce the relative unwrapped phase from the reference plane, i.e., the relation can be expressed as where Φ 0 (x low , y low ) is the unwrapped phase map of the reference plane.
Step 5: The nonlinear phase-to-height mapping [18] is introduced to obtain the depth D PMP (x low , y low ), and the relation can be described as, where α, β and γ represent the calibration parameters, which are obtained by moving a flat board over known distance.
In conclusion, the procedure from step 1 to 5 represents the operator Cal[•].The Eq. ( 7) establishes the relationship between ΔΦ (x low , y low ) and D PMP (x low , y low ).Moreover, based on the Eq. ( 5) and ( 6), ΔΦ (x low , y low ) is related to the n (x low , y low ) and φ (x low , y low ).φ (x low , y low ) can uniquely solve through Eq. ( 4).Therefore, the operator Cal[•] can also be described the relationship between D PMP (x low , y low ) and n (x low , y low ), i.e., D PMP (x low ,

Solution of depth D LDES (x low , y low )
For the depth D LDES (x low , y low ), we proposed a lowresolution depth extraction strategy based on periodic space-time coding patterns (hereafter called LDES).In the description of Cal[•] above, the 3-PS algorithm are introduced, i.e. one of both EVMs casts three PS patterns.Therefore, in order to balance the number of projected patterns of the dual EVMs, the other EVM also projects three patterns.That is, for the LDES, three patterns should be designed.As shown in Fig. 3, LDES has three parts: Encoding, Decoding and Calibration.Encoding is to generate three LDES patterns, which are projected into the measured volume; Decoding is to solve the decoded information from the captured LDES image; Calibration is to establish the mapping relationship from the decoded information to depth.
Encoding Figure 4 denotes a group of single period structures, which are the basis for designing the LDES patterns.In Fig. 4, the sub-figures (a), (b) and (c) show a synchronization structure and two coding structures.For the synchronization structure, the fringe edges are used to determine the position of low-resolution points.And, according to the shift from black to white or from white to black, the code words can be represented in two states: [0, 1] or [1,0].Similarly, for the coding structure, there can be four states: [0, 0], [0, 1], [1,0] or [1], where [0, 0] and [1] indicate that the both adjacent pixels are black and white, respectively.For example, in Fig. 4, the red dotted line crosses the boundary from white to black, the code words can be express as [1,0].Meanwhile, the red dotted line also crosses the coding structure A and B, the code words are [0, 1] and [0, 1].Combining these three code words, the full word words can be written as [1, 0, 0, 1, 0, 1], which corresponds to the encoded value (37) from binary to decimal.Which corresponds to the encoded value: 37. Overall, a synchronization structure has two stages and each structure has four stages.So, a group of single period structures exists 32 (i.e., 32 = 2 × 4 × 4.) encoded states.The encoding table is shown in Table 1, which can also be used for decoding.
Generally, if a single period covers the full measurement area, the resolution will be low.Hence, the projected LDES patterns utilize the periodicity to horizontally tile the single period structures.However, the points of the same encoded value in different periods may interfere with each other.To this end, [19] represented a criterion, which determines the appropriate number of the periods, and it is as follows: where Δ represents a stripe consisting of the number of pixels, n t is the number of encoded values in a single period, N x is the camera horizontal resolution, d h and d express the measurement depth and the working distance, p is the number of the periods.When the number of periods p meets Eq. ( 8), there are no confusions in the decoding process.i.e., the maximum value of p is crucial.In our experiment, the system parameters are approximately described as: d h ≈ 150 mm, d ≈ 800 mm, N x = 640 pixels, and n t = 32.Based on Eq. ( 8), the parameter p should satisfy p < 5.33, and we set p = 4.
Decoding Three designed LDES patterns are projected into the measured volume.Then, the camera captures the reflected images.Next, the computer needs to decode them, and the procedure mainly includes following: Step 1: For the captured synchronization image, the gradient is calculated, and according to the gradient, the edges are determined.The pixels on the edges are the low-resolution points P (x low , y low ).If the gradient is For the rules above, (a) if the gradient of the list is less than zero, the code words of the point P (x low , y low ) is [1,0]; (b) if the gradient of the list is greater than zero, the code words of the point P (x low , y low ) is [0, 1]; (c) if the values from p − 2 to p 2 are all zero, the code words of the point P (x low , y low ) is [0, 0]; (d) if the values from p − 2 to p 2 are all one, the code words of the point P (x low , y low ) is [1].
Step 3: Combining the code words from Step 1 and Step 2, and retrieving in Table 1, the unique encoded values of the point P (x low , y low ) can be obtained.
Calibration After decoding, we build a mapping relationship between the relative distance ΔD p (x low , y low ) and the depth D LDES (x low , y low ), where ΔD p (x low , y low ) stands for the relative distance between two points with the same encoded word.The relationship can be represented as a quadratic polynomial: where σ, ε and τ are the calibration parameters.Here, it is important to note that Eq. ( 7) and ( 10) should be calibrated at the same time and adopt the same calibration reference.When a flat board is moved over known distance, the dual EVMs respectively project three PS patterns and three LDES patterns on the flat board surface at one time.Then, the fusion patterns are captured and separated into two kinds of calibration images.Finally, based on the obtained calibration images, the Eqs.( 7) and ( 10) are respectively established.
Strategy from n (x low , y low ) to n (x high , y high ) Optimizing Eq. ( 2) can obtain the low-resolution n (x low , y low ).In order to retrieve the wrapped phase φ (x high , y high ), n (x low , y low ) needs to be extended to n (x high , y high ), where n (x high , y high ) represents the highresolution order of fringe.Normally, there are two different phase discontinuities in wrapped phase map.
(a) Because of the characterization of tan − 1 , the wrapped phase ranges from -π to +π and contains 2π discontinuity (the first discontinuity for short).(b) Due to the isolated objects or the abrupt changes, the wrapped phase contains uncertain discontinuity (the second discontinuity for short).Fortunately, the first discontinuity is easily handled by the spatial phase unwrapping method.And, except for the two types of discontinuities above, the phase of the other regions is smooth and continuous.Therefore, we proposed a strategy only for the second discontinuity.The strategy is described that the wrapped phase map are segmented into several connected regions, where the low-resolution orders n (x low , y low ) extend to the high-resolution orders n (x high , y high ).

Intervals segmentation
The main idea of the intervals segmentation is that the connected intervals are segmented row-by-row to from a flag map.Here, the criterion of the connected intervals in each row is expressed as TH is a predefined threshold value, which is determined by the measured objects.In practice, for convenient recording, we build a flag map.Firstly, the flag of the leftmost pixel in each row is identified as zero.Secondly, Scan from left to right, and meanwhile verify the criterion [see Eq. ( 11)] for each pixel.
If the current scanned pixel is an invalid pixel (e.g., in the non-object region) or violates the criterion, add one to the flag of the current scanned pixel.Otherwise, the flag of the current scanned pixel is invariable.Figure 5 shows an example of a fan object.Figure 5(a) and (b) are the wrapped phase map and its corresponding flag map, where there are three sections: A 1 -A 2 , B 1 -B 2 and C 1 -C 2 .Figure 5(c) denotes the flag curves of the three sections.As can be seen from the three curves, when the flag curve is sloping, the corresponding wrapped phase is in invalid interval or contains the second discontinuity; when the flag curve is flat, the corresponding wrapped phase only contains the first discontinuity.Moreover, it is important to point out that because there may be multiple n (x low , y low ) in a connected interval, the high-resolution orders n (x high , y high ) of a pixel may be multiple values as well, which cause the confusions in order extension.Hence, the identified high-resolution orders n (x high , y high ) repeat most often, which is used to calculate the unwrapped phase Φ (x high , y high ).Moreover, if the low-resolution order n (x low , y low ) is not found in a connected interval, the situation indicates that the interval has very few pixels, and the region should be ignored.

Experimental procedure
Step 1: system configuration The binary PS patterns and the designed LDES patterns are loaded into the dual EVMs respectively.The EVM for projecting the binary PS patterns is properly adjusted in defocusing, and the other EVM for casting the LDES patterns projects in focusing.In addition, the output frequency of the dual channel signal generator is configured at 200 Hz, whose CH1 channel triggers the dual EVMs and CH2 channel does the camera.
Step 2: system calibration The flat surface moves from 0 to 150 mm in intervals of 5 mm.At each position, the two waveband patterns are projected and reflected.Meanwhile, the fusion patterns are captured and separated into the PS images and the LDES images.Then, according to the relative phase ΔΦ and the plate moving distances, the model parameters α, β and γ in Eq. ( 7) are solved.Similarly, according to the relative distance ΔD p and the plate moving distances, the calibration parameters σ, ε and τ in Eq. ( 10) are obtained.
Step 3: practical measurement The camera captures the three fusion patterns, which are distorted by the object surface geometry.Next, the computer separates them into the PS images and the LDES images.Then, the PS images are used to calculated the low-resolution wrapped phase φ (x low , y low ), and meanwhile Cal [n (x low , y low )] is established.

Reconstruction of static and dynamic objects
A step and a mask as the static objects are reconstructed.Figure 6 illustrates the procedure from the fusion images to the 3D profile of the step object.The first column sub-figures show the three captured fusion Similarly, we also reconstructed a mask, as shown in Fig. 7, which includes the wrapped phase map, the flag map, the low-resolution points map, the highresolution orders map, the unwrapped phase map, and the depth map.In summary, from the unwrapped phase maps and the depth maps of the two static objects, we can clearly see that the proposed method has a good effectiveness and robustness when only three images are captured.
For reconstruction of dynamic objects, the slow rotating fan, changing gestures, and fluttering tissue were introduced.Both moments are selected to obtain the depth maps, as shown in Fig. 8.In the sub-figures about gestures, there are some small voids in the reconstructed surface since there may be not the low-resolution orders n (x low , y low ) in these regions.Fortunately, these voids do not cause error transmission of phase unwrapping.For other two dynamic objects (fan and tissue), the phase unwrapping and depth all have a nice performance.Again, the effectiveness of our proposed method is verified, and it has a certain ability of dynamic reconstruction.

Conclusion
This paper has presented a novel BDT based on complementary decoding with unconstrained dual projectors.A system framework with dual projectors and a camera is built, where the position and posture of the introduced projectors are not strictly required.The core idea is that low-resolution information is employed for highresolution phase unwrapping.For this purpose, we explained in detail the operator Cal[•] from wrapped phase to depth in low-resolution, the proposed LDES and the strategy from the low-resolution order to highresolution order of fringe.Finally, the experimental results prove that our proposed method only requires three captured images for a 3D scene and has strong robustness, flexibility, and implementation without considering the constraint of the position and posture of the introduced two projectors.

Fig. 1
Fig. 1 Schematic diagram of system framework.The Green dotted circle shows the two wavebands projection of the unconstrained dual EVMs and the capture of the fusion images with a color camera.The enlargement of the red dotted circle shows the actual picture of the EVM

Fig. 2
Fig. 2 Procedure of the operator Cal[•].Step 1: Generating 3-PS patterns; Step 2: Solving the wrapped phase; Step 3: Computing the unwrapped phase; Step 4: Obtaining the relative unwrapped phase; Step 5: Establishing the relationship between the relative unwrapped phase and depth

Fig. 3
Fig. 3 Illustration of the LDES.Encoding: Generating three LDES patterns; Decoding: solving code words from the captured LDES images; Calibration: Establishing the relationship between the relative distance and depth

Fig. 4 A
Fig. 4 A group of single period structures.a Synchronization structure: Determining the position of low-resolution points and containing two code words stages.b Coding structure A: Containing four code words states.c Coding structure B: Containing four code words states

Li and Zhang
Journal of the European Optical Society-Rapid Publications (2021) 17:14 Order extension Through Sec.2.B.1 and Sec.2.B.2, the low-resolution orders n (x low y low ) of points P (x low , y low ) are identified.Because the wrapped phase regions corresponding to the flat intervals in flag map only contains the first phase discontinuity, the low-resolution orders n (x low , y low ) can be extended to the high-resolution orders n (x high , y high ) with the spatial phase unwrapping method.

Figure 5 (
d) shows the pseudo-color picture of the highresolution orders map, and Fig.5(e) denotes its unwrapped phase map.

Fig. 5
Fig. 5 Example of a fan object.a Wrapped phase map.b Corresponding flag map.(c) Flag curves of the three sections.d Pseudo-color map of the high-resolution orders (Numbers denote orders).e Unwrapped phase map

Fig. 6
Fig. 6 Reconstruction of a step object.The first column sub-figures show the three captured fusion images; the second column sub-figures denote the six separated patterns; the third column sub-figures are the wrapped phase map and the low-resolution LDES depth; the last column sub-figure shows the unwrapped phase map

Fig. 7
Fig. 7 Reconstruction of a mask object.a Unwrapped phase map.b Flag map; (c) Low-resolution points map.d High-resolution orders map.e Unwrapped phase map.f Depth map

Fig. 8
Fig. 8 Depth maps of dynamic objects.a Rotating hand-held fan (about 60 Revolutions Per Minute).b Changing gesture (about 120 Clenches Per Minute).c Fluttering tissue (about 120 Shakes Per Minute)

Table 1
[1,oding table of 32 encoded stages for a group of single period structure , the point P (x low , y low ) is on positive edge, i.e. the code words is [0, 1].On the contrary, If the gradient is less than zero, the point P (x low , y low ) is on negative edge, i.e. the code words is[1, 0].Step 2: For the two captured coding images, they are changed into binary images.In order to enhance robustness, a local neighborhood of the point P (x low , y low ) is considered for decoding.The length of the local neighborhood is set to five, i.e. there is a binary list L p = [p − 2 , p − 1 , p 0 , p 1 , p 2 ], where p 0 represents the value (0 or 1) of the point P (x low , y low ).Similar, p − 2 , p − 1 , p 1 and p 2 correspond to the values of the points P (x low-2 , y low-2 ), P (x low-1 , y low-1 ), P (x low1 , y low1 ) and P (x low2 , y low2 ), respectively.The decoding rules can be described as, grad p -2 ; p -1 ; p 0 ; p 1 ; p 2