Sabitlenmiş Tweet
Super PINTO
42.4K posts

Super PINTO
@PINTO03091
Hobby Programmer. Caffe, Tensorflow, NCS, RaspberryPi, Latte Panda, ROS, DL, TPU, OpenVINO. Intel Software Innovator. The remarks are my personal opinion.
Aichi, Japan Katılım Ekim 2018
97 Takip Edilen14K Takipçiler

@_artursemh Check out my post from yesterday. I'm planning to try it with the next generation of models.
English

@Kumar4Vruddhi47 Yes. This is because I want to prevent data ownership issues from arising and release it as open-source software with an easy-to-use license.
English

@fng_z This model doesn't confuse left and right even in first-person view. The purple square represents the right, and the green square represents the left. However, I think there is still a lack of test patterns and training data.
x.com/PINTO03091/sta…
Super PINTO@PINTO03091
@pzoltowski I'm creating data to destroy all past models.
English

@PINTO03091 Sorry if this was asked before — if I feed your model a first-person POV video (seeing the user’s own body/hands), will it still label body parts correctly, or can the unusual / mirror-like perspective flip left-right labels?
English

I created everything manually. Outsourcing would have increased the cost of reviewing the finished product. I wanted to create it myself from the start, incorporating my own biases. In the process of generating and verifying the model step by step, I revised the data creation criteria more than three times and redid the annotations many times. If I had reviewed someone else's work, my criticisms would have been so detailed that the worker would have gone crazy. This isn't about monetary or time costs. I'm only interested in creating the highest quality data.
English

@PINTO03091 where do you obtain the training data from ? isnt it cheaper to ask data annotation companies to do this for you as they have cheap labour ?
English

@PINTO03091 半角カナが正しいです(`・ω・´)ゞ
由来は下記のYoutubeで語られていました。
#スタックチャン #Stackchan #M5Stack
youtube.com/watch?v=Bb18uC…

YouTube
日本語

@pengadaptasian All joint areas of the human body were created using only bounding boxes (cx, cy, w, h). Instance masks were created using a custom-built CNN via a different path than the bounding boxes, and then synthesized using a data loader during training.
English

@PINTO03091 VLMでデータセットを作るのにトライ中ですが、ぜんぜん思うように行ってくれないです
LLMでコード書いてても、もういい俺がやる!てなるし、AI向いてないかもしれません
日本語

@airkatakana @giffmana @skalskip92 In my case, the original image resolution needed to be less than VGA. VLM couldn't handle that.
Width x Height = 480x360
Left: Original image at 1x magnification
Right: Image enlarged 11 times


English







