相关文章推荐
飘逸的企鹅  ·  pytorch报错:RuntimeError ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to use OpenCV, version 4.1.0 through python to convert a planar YUV 4:2:0 image to RGB and am struggling to understand how to format the array to pass to the cvtColor function. I have all 3 channels as separate arrays and am trying to merge them for use with cv2.cvtColor . I am using cv2.cvtColor(yuv_array, cv2.COLOR_YUV420p2RGB) . I understand that the yuv_array should be 1.5x as tall as the original image (that's what a yuv array from cvtColor using cv2.COLOR_RGB2YUV_YV12 looks like) and I should put the UV components into the bottom half of the yuv_array and the Y channel into the top part of the array.

I cannot seem to figure out how the U and V channels should be formatted within the bottom of this array. I've tried interleaving them and just putting them both in there back-to-back. With both methods, I've tried putting U first then V and also the other way around. All methods lead to artifacts in the resulting image. Here is my code and an example image:

import os
import errno
import numpy as np
import cv2
fifo_names = ["/tmp/fifos/y_fifo", "/tmp/fifos/u_fifo", "/tmp/fifos/v_fifo"]
#teardown; delete fifos
import signal, sys
def cleanup_exit(signal, frame):
    print ("cleaning up!")
    for fifo in fifo_names:
        os.remove(fifo)
    sys.exit(0)
signal.signal(signal.SIGINT, cleanup_exit)
signal.signal(signal.SIGTERM, cleanup_exit)
#make fifos
for fifo in fifo_names:
        os.mkfifo(fifo);
    except OSError as oe:
        if oe.errno == errno.EEXIST:
            os.remove(fifo)
            os.mkfifo(fifo)
        else:
            raise()
#make individual np arrays to store Y,U, and V channels
#we know the image size beforehand -- 640x360 pixels
yuv_data = []
frame_size = []
fullsize = (360, 640)
halfsize = (180, 320)
for i in range(len(fifo_names)):
    if (i == 0):
        size = fullsize
    else:
        size = halfsize
    yuv_data.append(np.empty(size, dtype=np.uint8));
    frame_size.append(size)
#make array that holds all yuv data for display with cv2
all_yuv_data = np.empty((fullsize[0] + halfsize[0], fullsize[1]), dtype=np.uint8) 
#continuously read yuv images from fifos
print("waiting for fifo to be written to...")
while True:
    for i in range(len(fifo_names)):
        fifo = fifo_names[i]
        with open(fifo, 'rb') as f:
            print("FIFO %s opened" % (fifo))
            all_data = b''
            while True:
                data = f.read()
                print("read from %s, len: %d" % (fifo,len(data)))
                if len(data) == 0: #then the fifo has been closed
                    break
                else:
                    all_data += data
            yuv_data[i] = np.frombuffer(all_data, dtype=np.uint8).reshape(frame_size[i])
    #stick all yuv data in one buffer, interleaving columns
    all_yuv_data[0:fullsize[0],0:fullsize[1]] = yuv_data[0]
    all_yuv_data[fullsize[0]:,0:fullsize[1]:2] = yuv_data[1]
    all_yuv_data[fullsize[0]:,1:fullsize[1]:2] = yuv_data[2]
    #show each yuv channel individually
    cv2.imshow('y', yuv_data[0])
    cv2.imshow('u', yuv_data[1])
    cv2.imshow('v', yuv_data[2])
    #convert yuv to rgb and display it
    rgb = cv2.cvtColor(all_yuv_data, cv2.COLOR_YUV420p2RGB);
    cv2.imshow('rgb', rgb)
    cv2.waitKey(1)

The above code is trying to interleave the U and V information column-wise.

I have also tried using the following to place the U and V channel information into the all_yuv_data array:

    #try back-to-back
    all_yuv_data[0:fullsize[0],0:fullsize[1]] = yuv_data[0]
    all_yuv_data[fullsize[0]:,0:halfsize[1]] = yuv_data[1]
    all_yuv_data[fullsize[0]:,halfsize[1]:] = yuv_data[2]

The image is a frame of video obtained with libav from another program. The frame is of format AV_PIX_FMT_YUV420P, described as "planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)".

Here are the yuv channels for a sample image shown in grayscale:

Y Channel:

U Channel:

V Channel:

and the corresponding RGB conversion (this was from using the above interleaving method, similar artifacts are seen when using the 'back-to-back' method):

RGB Image With Artifacts:

How should I be placing the u and v channel information in all_yuv_data?

Edited by Mark Setchell after this point

I believe the expected result is:

Where you have provided a combined screen-grab of the YUV channels, it would be more useful if you provided three actual separate PNG images rather than a screen-grab. – Mark Setchell Mar 17, 2020 at 19:50 @MarkSetchell Thank you for the feedback, I've replaced the screengrabs with actual images – founta Mar 17, 2020 at 20:16 I have added what I think is the expected result - please say if I am mistaken and I will reverse my edit back out. Thank you. – Mark Setchell Mar 17, 2020 at 20:28 @MarkSetchell Yes, that is the result I am expecting, thank you for adding it. How did you obtain it? – founta Mar 17, 2020 at 20:31 I'm quicker and with ImageMagick than OpenCV :-) And I like to have something to work towards. – Mark Setchell Mar 17, 2020 at 20:33

In case the YUV standard matches the OpenCV COLOR_YUV2BGR_I420 conversion formula, you may read the frame as one chunk, and reshape it to height*1.5 rows apply conversion.

The following code sample:

  • Builds an input in YUV420 format, and write it to memory stream (instead of fifo).
  • Read frame from stream and convert it to BGR using COLOR_YUV2BGR_I420.
    Colors are incorrect...
  • Repeat the process by reading Y, U and V, resizing U, and V and using COLOR_YCrCb2BGR conversion.
    Note: OpenCV works in BGR color format (not RGB).
  • Here is the code:

    import cv2
    import numpy as np
    import io
    # Building the input:
    ###############################################################################
    img = cv2.imread('GrandKingdom.jpg')
    #yuv = cv2.cvtColor(img, cv2.COLOR_BGR2YUV)
    #y, u, v = cv2.split(yuv)
    # Convert BGR to YCrCb (YCrCb apply YCrCb JPEG (or YCC), "full range", 
    # where Y range is [0, 255], and U, V range is [0, 255] (this is the default JPEG format color space format).
    yvu = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb)
    y, v, u = cv2.split(yvu)
    # Downsample U and V (apply 420 format).
    u = cv2.resize(u, (u.shape[1]//2, u.shape[0]//2))
    v = cv2.resize(v, (v.shape[1]//2, v.shape[0]//2))
    # Open In-memory bytes streams (instead of using fifo)
    f = io.BytesIO()
    # Write Y, U and V to the "streams".
    f.write(y.tobytes())
    f.write(u.tobytes())
    f.write(v.tobytes())
    f.seek(0)
    ###############################################################################
    # Read YUV420 (I420 planar format) and convert to BGR
    ###############################################################################
    data = f.read(y.size*3//2)  # Read one frame (number of bytes is width*height*1.5).
    # Reshape data to numpy array with height*1.5 rows
    yuv_data = np.frombuffer(data, np.uint8).reshape(y.shape[0]*3//2, y.shape[1])
    # Convert YUV to BGR
    bgr = cv2.cvtColor(yuv_data, cv2.COLOR_YUV2BGR_I420);
    # How to How should I be placing the u and v channel information in all_yuv_data?
    # -------------------------------------------------------------------------------
    # Example: place the channels one after the other (for a single frame)
    f.seek(0)
    y0 = f.read(y.size)
    u0 = f.read(y.size//4)
    v0 = f.read(y.size//4)
    yuv_data = y0 + u0 + v0
    yuv_data = np.frombuffer(yuv_data, np.uint8).reshape(y.shape[0]*3//2, y.shape[1])
    bgr = cv2.cvtColor(yuv_data, cv2.COLOR_YUV2BGR_I420);
    ###############################################################################
    # Display result:
    cv2.imshow("bgr incorrect colors", bgr)
    ###############################################################################
    f.seek(0)
    y = np.frombuffer(f.read(y.size), dtype=np.uint8).reshape((y.shape[0], y.shape[1]))  # Read Y color channel and reshape to height x width numpy array
    u = np.frombuffer(f.read(y.size//4), dtype=np.uint8).reshape((y.shape[0]//2, y.shape[1]//2))  # Read U color channel and reshape to height x width numpy array
    v = np.frombuffer(f.read(y.size//4), dtype=np.uint8).reshape((y.shape[0]//2, y.shape[1]//2))  # Read V color channel and reshape to height x width numpy array
    # Resize u and v color channels to be the same size as y
    u = cv2.resize(u, (y.shape[1], y.shape[0]))
    v = cv2.resize(v, (y.shape[1], y.shape[0]))
    yvu = cv2.merge((y, v, u)) # Stack planes to 3D matrix (use Y,V,U ordering)
    bgr = cv2.cvtColor(yvu, cv2.COLOR_YCrCb2BGR)
    ###############################################################################
    # Display result:
    cv2.imshow("bgr", bgr)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    The u and v channel information stored in the bottom of yuv_array in this function call: cv2.cvtColor(yuv_array, cv2.COLOR_YUV420p2RGB)

    are expected to be formatted as follows:

  • The upper half of the extra rows added to the bottom of the yuv_array are filled with the u information. The rows are interleaved; the first row of u is placed just under the y channel information in the left slot and the second row of u is placed in the right slot in the same row of yuv_data and so on.
  • The v channel data is the same, but for the bottom half of the extra rows added to the yuv_array.
  • Here is the concatenation code that resulted in the expected image as posted by MarkSetchnell when placed in the original program:

        #place y channel into buffer
        all_yuv_data[0:fullsize[0],0:fullsize[1]] = yuv_data[0]
        #formatted as interleaved u rows on top, (half on left, half on right)
        #and interleaved v rows on bottom
        all_yuv_data[fullsize[0]:fullsize[0]+halfsize[0]//2, :] = yuv_data[1].reshape(-1, fullsize[1])
        all_yuv_data[fullsize[0]+halfsize[0]//2:,:] = yuv_data[2].reshape(-1, fullsize[1])
        #convert to rgb
        rgb = cv2.cvtColor(all_yuv_data, cv2.COLOR_YUV420p2RGB);
    

    Here is a grayscale image of all_yuv_data in an attempt at clarity:

    and the result after calling cv2.cvtColor(all_yuv_data, cv2.COLOR_YUV420p2RGB) You can make this much easier to read by doing all_yuv_data[fullsize[0]:fullsize[0]+halfsize[0], :] = yuv_data[1].reshape(-1, fullsize[1]) and similar for the last channel. All you really care about is the order of the underlying linear buffer. – Mad Physicist Mar 17, 2020 at 22:48 As you can probably see, the colors are not accurate. There are just too many color formats... – Rotem Mar 17, 2020 at 22:48 @Rotem Oh hey, you're right. I didn't even notice that my colors were off. Thanks for pointing that out. – founta Mar 17, 2020 at 23:13 @MadPhysicist Alright thank you. I assume you mean all_yuv_data[fullsize[0]:fullsize[0]+halfsize[0]//2, :] = yuv_data[1].reshape(-1, fullsize[1]) – founta Mar 17, 2020 at 23:25 Yes, you're right. What I was thinking in the back of my head was to actually to use a view into the buffer like small = all_yuv_data[fullsize[0]:, :].reshape(2, -1, halfsize[1]); small[0] = yuv_data[1]; small[1] = yuv_data[2]. This is the magic of numpy views when you don't mess with contiguity. If you're worried about reshapes making a copy along the way, you can make the view explicitly by making a new array on the same buffer with the appropriate offsets and strides – Mad Physicist Mar 18, 2020 at 3:03

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.