[12:14:25] Hello folks!
[12:21:37] DrTrigon, DrTrigon_ jayvdb Hi
[12:21:51] your ISP ok?
[12:22:40] It does work right now
[12:22:47] Not sure how stable it is going to be
[12:22:56] do you want to do IRC only today ?
[12:23:10] That would be good
[12:23:22] Meeting tomorrow would also be ok for me - in case of "emergency"
[12:23:40] Nope let's do it today
[12:23:57] gooood! ;) Do we want to start right now?
[12:24:35] sure
[12:25:10] jayvdb: ^^^
[12:25:39] ok
[12:25:55] I can find time tomorrow evening if we fail tonight
[12:26:12] cool!
[12:26:27] So, I saw DrTrigon's comments on the -newimages log
[12:26:44] I've incorporated it in the script (About showing the unique categories, and mentioning number of categories)
[12:26:54] The new script is running right now, should get over soonish
[12:27:19] I've also added various cmd line arguments like -limitsize to limit the file size, etc as we had discussed a few weeks back
[12:27:26] coool! Like to see other cats than file types...
[12:27:54] .
[12:27:54] nod
[12:28:24] So, I wanted to know if we should focus on adding more analysis methods to increase the percentage of files being categorized right now
[12:28:43] Or should I focus on testing more to make stuff more stable to show at WIkiConferenceIndia
[12:29:15] (both please ;))) jayvdb, your oppinion?
[12:29:35] I agree. more automatic categories
[12:29:58] we need leaf categories, which means more better analysis of the media
[12:30:15] Alrght, so I will implement the OSM nominatim API key stuff next to get Location suggestions from images.
[12:30:25] How many weeks to the conf? 3, 4?
[12:30:38] 3
[12:31:09] But keeping last week for cleaning up, documenting stuff, etc.
[12:31:18] +1
[12:31:29] (bug fixing)
[12:31:48] Ping me if I can help with that
[12:31:54] DrTrigon_, sure
[12:32:06] Is that ok for you?
[12:32:09] DrTrigon_, Thanks a lot for all the docker support !
[12:32:21] Sure!! Was not tooo much ;))
[12:32:23] nod. great to see Docker done
[12:32:41] So, what other easy automatic categories can be done ?
[12:33:31] I've spent some time on Monochromatic images, but haven't been able to make it robust yet ... Need to work on that.
[12:33:45] nod. (wanted to mention that)
[12:34:05] What is the issue?
[12:34:52] AbdealiJK__, well, we should look at your logs of new files, to find groups of media that could be detected with more categories
[12:35:02] I downloaded a few images for testing, and they seemed to have blues and other colors which were not very expected. And there was a wide variation in the blueness (even though it wasn't very visible by eye)
[12:35:27] dark blue = black ?
[12:36:16] https://commons.wikimedia.org/wiki/User:AbdealiJKTravis/logs/newimages
[12:36:48] DrTrigon_, Probably - but the blue as compared to Red and Green was a very high percentage (Like 50% + pixels were blue) Don't remember exact numbers right now
[12:36:51] Need to spend more time debugging
[12:37:18] nod.
[12:37:30] what is the thing at the bottom of these images: https://commons.wikimedia.org/wiki/File:View_from_the_carriage_resting_place_at_the_summit,_looking_n.-e_(NYPL_b11708073-G91F219_013B).tiff
[12:37:33] Also, a note that Category:Icon based on (16x16. 32x32, 64x64, etc) is present now. Although no images in -newimages were Icons, so its not seen
[12:37:36] gradient
[12:38:19] https://commons.wikimedia.org/wiki/Category:Robert_N._Dennis_collection_of_stereoscopic_views
[12:39:02] I am not sure what that gradient is.
[12:39:12] https://commons.wikimedia.org/wiki/File:WiknicNYC_2016-29_cut_jeh.jpg currently only adding "Human faces" , but if there are more than one, it is a group of people
[12:40:15] https://commons.wikimedia.org/wiki/Category:Groups_of_people
[12:40:30] AbdealiJK__: Could you try to do a black-white/monochrom detection on https://upload.wikimedia.org/wikipedia/commons/c/c3/El%C5%91nyomul%C3%A1s_a_Bug_foly%C3%B3n%C3%A1l._Fortepan_52229.jpg at some point and tell me the result, please?
[12:40:34] jayvdb, So, if >=3 people I can add "Category:Groups of people" ?
[12:40:41] seems sensible
[12:41:15] don't know the threshold anymore but basically YES
[12:41:47] jayvdb: both cats together? faces and group of people?
[12:41:50] DrTrigon_, Added that image to my ToDo. WIll ping when done
[12:41:58] cool!
[12:41:58] yes?
[12:42:34] maybe there are different categories for faces depending on how large/focused the face is?
[12:42:53] do we add both cats or does group of people replace human faces?
[12:43:00] https://commons.wikimedia.org/wiki/Category:Faces_in_profile
[12:43:07] jayvdb, What about Category:Robert_N._Dennis_collection_of_stereoscopic_views ?
[12:43:43] AbdealiJK__, that strip at the bottom looks like it is an important 'thing', and very easy to detect
[12:44:03] "Images containing a thingamabob on the bottom"
[12:44:10] ask the uploaders what it is?
[12:44:13] they can ask the library
[12:44:18] and we find out its real name
[12:45:07] can we detect big smiles ? https://commons.wikimedia.org/wiki/Category:Happy_faces
[12:45:11] jayvdb, Alright
[12:45:42] DrTrigon_, I think groups and face are independent categories and both can be added
[12:45:52] nice!
[12:46:16] guess happy faces might be a hard one
[12:46:16] jayvdb, Nope - smile detection is really really bad. There's a haarcascade for it. But it's very falsey
[12:46:30] hmmm....
[12:46:40] ...is ther a cascae for teeth?
[12:46:58] we could try that on detected faces, like eye, nose mouth etc.
[12:47:02] DrTrigon_, Nope, there's a cascade for mouth and smile IIRC
[12:47:14] https://commons.wikimedia.org/wiki/File:Reyes_Nazar%C3%ADes.png - currently "line drawing" but could be https://commons.wikimedia.org/wiki/Category:Family_trees
[12:47:19] ohhh ... location maps
[12:47:55] https://commons.wikimedia.org/wiki/File:Armenia_adm_location_map.svg
[12:48:02] I do not think it's possible to automatically categorize that as Family trees.
[12:48:17] we upload lots of location maps, and they are fairly distinctive , even in colors used
[12:48:58] map detection may be from color histogram... IF they all use the exactly same style...
[12:49:04] jayvdb, Do they use a similar color scheme ?
[12:49:06] https://commons.wikimedia.org/wiki/Category:Transparent_background
[12:49:10] or train a cascade?
[12:49:13] AbdealiJK__, most of them do , yes
[12:50:00] graphics + specific color histogram = map?
[12:50:34] Nice, I was trying to find Category:Transparent_background but was unable to :|
[12:50:37] (colors: yellowish, gray and blue, with a bit of black)
[12:50:59] https://commons.wikimedia.org/wiki/File:PHOTOS_INSIDE_THE_CLASSROOM_UPDATED006.jpg - this is a good one to test for smiling
[12:51:09] (but if smiling is impossible, dont bother)
[12:52:38] balls : https://commons.wikimedia.org/wiki/File:Alyssa_Naeher_Cleveland.jpg ?
[12:52:53] round objects ... ;-)
[12:53:09] circle detection?
[12:53:16] https://commons.wikimedia.org/wiki/Category:Spherical_objects
[12:53:22] Circle detection is very robust :)
[12:53:53] But even a large number of faces could be detected as circles
[12:53:55] ok detect a circle that has a least diameter of halve the image
[12:53:59] wow - https://commons.wikimedia.org/wiki/File:Argentina,_administrative_divisions_-_Nmbrs_-_colored_(%2Bclaims).svg was added to human faces
[12:54:39] yea... you can basically exclude that for svgs, right?
[12:54:48] except on embedded stuff maybe...
[12:54:49] .
[12:54:57] if it is a line drawing, and a human face, it is not a human face, it is a a different category, and possible even a caricature
[12:54:59] Nod - dlib detected a face in it ... There will be some falses
[12:55:29] nod. what was the score and size?
[12:55:54] Face (dlib) #1
[12:55:54] Score: 0.047
[12:55:54] Bounding Box: Left:141, Top:1421, Width:73, Height:73
[12:55:54] Other features: Eyes (2), Mouth, Nose
[12:55:54] Face (haarcascade) #1
[12:55:57] Bounding Box: Left:605, Top:183, Width:145, Height:145
[12:56:18] small (compared to image) and bad score...
[12:56:27] should be fixable ;))
[12:56:58] For line drawings, like https://commons.wikimedia.org/wiki/File:PIT-CNT.svg , add "major color" categories - you had a library for that, but I objected to it being used for photographs, but it might work very well with line drawings
[12:56:59] strange that both did a false...
[12:57:26] agree
[12:57:59] do we have color segmentation already?
[12:58:22] The score isnt very bad ... There are quite some images with correctly detected faces at 0.04x score
[12:58:29] jayvdb, I think I can do color categories for any flat images. I.e. images with ONLY 4-5 colors they would normally be logos
[12:58:47] DrTrigon_, No, never did it because it was not needed (yet)
[12:59:01] AbdealiJK__: Faces...
[12:59:12] ...you can e.g. exclude small ones, since ...
[12:59:22] https://commons.wikimedia.org/wiki/File:Kit_body_monaco1617a.png - see https://es.wikipedia.org/wiki/Association_Sportive_de_Monaco_Football_Club
[12:59:30] ...they are either wrong, or it needs to have a lot of them ...
[12:59:44] ... to be a group photo of e.g. a concert.
[12:59:58] Color/Maps: ...
[13:00:03] those pixels and filenames are 100% give aways that they are football kit images
[13:00:18] ... if we could have color segmentation back, we could also use it for map dtetection
[13:00:20] ...
[13:00:40] ... a lot of segments, but all the same colors, few fluctuations on the sgements.
[13:01:11] better example https://en.wikipedia.org/wiki/Manchester_United_F.C.
[13:01:55] https://commons.wikimedia.org/wiki/File:Kit_shorts.svg
[13:02:16] https://commons.wikimedia.org/wiki/Category:Football_kit_templates
[13:02:49] hmm, LOOKS easy ... ;)
[13:02:54] an obscured ball : https://commons.wikimedia.org/wiki/File:Whitney_Engen_Cleveland.jpg
[13:03:57] another human face : https://commons.wikimedia.org/wiki/File:Chile,_administrative_divisions_-_Nmbrs_-_colored.svg
[13:04:09] another incorrectly categorised human face : https://commons.wikimedia.org/wiki/File:Chile,_administrative_divisions_-_Nmbrs_-_colored.svg
[13:04:34] we need map detection to just exclude them
[13:04:59] https://commons.wikimedia.org/wiki/File:Jannis_knight.jpg - it finds the face ... I wonder if we can detect hair color at the edges of the bounding box
[13:05:02] AbdealiJK__: What do you think is possible? (I guess map and circle)
[13:05:34] jayvdb: may be with segments too.
[13:06:25] the oval here : https://commons.wikimedia.org/wiki/File:%D7%91%D7%A8_%D7%91%D7%95%D7%A8%D7%95%D7%9B%D7%95%D7%91,_%D7%91%D7%98%D7%95%D7%A8%D7%95%D7%A0%D7%98%D7%95,_1915.jpg
[13:06:45] if we can detect the shapes of frames, we can say it is a framed photograph
[13:06:56] it also makes it very likely to be a human if the frame is oval
[13:06:57] AbdealiJK__: still there?
[13:07:09] Yes, I am
[13:07:16] TOo much information overload. Processing ...
[13:07:21] ;))
[13:07:32] hair color again: https://commons.wikimedia.org/wiki/File:Janet_Tamaro_Gracie_Award_Acceptance_Speech.jpg
[13:08:29] huge group of people (there must be another category for this) : https://commons.wikimedia.org/wiki/File:Women_2015_2016.jpg
[13:08:56] you could at least suggest it is a team photograph
[13:09:07] huge group > 10 persons ?
[13:09:19] https://commons.wikimedia.org/wiki/Category:Team_photographs
[13:09:26] 16 ?
[13:09:51] most teams with reserves, coaches, etc are at least 16
[13:09:52] There's a very interesting https://commons.wikimedia.org/wiki/Category:People_by_quantity
[13:09:53] and the faces have to be aligned somehow...
[13:09:57] .
[13:10:19] medals https://commons.wikimedia.org/wiki/File:Wrwc_2014_jules_and_kim.jpg
[13:10:21] ► People in bed by number‎ ;))))
[13:10:25] hahaha
[13:11:29] so.... we can basically add 'No people' to anything?
[13:11:51] (except faces of course)
[13:11:54] I found that weird too.
[13:12:29] here is a weird case for line drawing colors: https://commons.wikimedia.org/wiki/File:Sudamerica_Rugby(en).png - lots of colors without much meaning
[13:12:56] you could probably add 6 color categories to that one ;-)
[13:13:10] :))) brilliant!
[13:13:18] https://commons.wikimedia.org/wiki/File:Aby.jpg
[13:13:23] sunglasses ^
[13:13:29] they are being detected as a face
[13:13:40] so the 'face' detection library must know about glasses
[13:14:11] * DrTrigon_ wondering about glasses haarcascades
[13:14:13] Yes, it does
[13:14:21] nice
[13:14:40] so that should be really an easy one...?
[13:14:55] nod
[13:15:13] But I am not sure how accurate the sunglass detection is alone
[13:15:15] WIll try
[13:15:16] crests : https://commons.wikimedia.org/wiki/File:Burnaby_Lake_RC_Blue_on_White.jpg
[13:15:23] https://commons.wikimedia.org/wiki/File:Sudamerica_Rugby(en).png - what were you trying to say ?
[13:15:43] flag detection : https://commons.wikimedia.org/wiki/File:Identification_Flag_Thai_Army_Battalion_(Artillery).svg
[13:15:45] https://commons.wikimedia.org/wiki/Category:People_with_glasses
[13:16:39] Ok, I think we need to pause
[13:16:41] *phew*
[13:16:42] flag and map detection could go together with different thresholds
[13:17:00] nod
[13:17:05] another bad face detection : https://commons.wikimedia.org/wiki/File:JUNTOSSOMOSRUGBY.png
[13:17:34] these hexagons are chemical drawings : https://commons.wikimedia.org/wiki/File:Oxogestone_phenpropionate.svg
[13:17:56] so... AbdealiJK__ do you want to continue reporting after this very nice brainstorming (thanks jayvdb!)
[13:18:31] jayvdb, In https://commons.wikimedia.org/wiki/File:JUNTOSSOMOSRUGBY.png although the faces were detected, the category isnt added
[13:18:36] As it's not reliable enough
[13:19:09] nod. but the detection is bad
[13:19:30] it is easy to exclude media like that, as they are line drawings
[13:19:58] not sure how reliable line detection is either
[13:20:15] actually I would like to change "line detection" to "graphics"
[13:20:26] DrTrigon_, I've done that in the new script which is currently running
[13:20:27] as that is more general for now
[13:20:32] perfect
[13:20:53] but I would really vote for e.g. excluding single faces smaller than 70 px
[13:20:58] the most common group of that /newimages is football kit
[13:21:01] I am thinking.... that I can take SVG Images like https://commons.wikimedia.org/wiki/File:Oxogestone_phenpropionate.svg and detect the number of times H, O, C are written in it
[13:21:13] you can detect them as football kit and detect the club colors
[13:21:24] AbdealiJK__, +1
[13:21:36] yes! add text recognition generally
[13:21:50] e.g. tesseract from pip
[13:22:29] Tesseract is very bulky
[13:22:37] better alternative?
[13:22:39] Unless theres a case to add categories using that let's not add it ?
[13:22:44] (for now)
[13:23:01] DrTrigon_, There is no better alternative though
[13:23:13] ;))
[13:23:42] https://commons.wikimedia.org/wiki/Category:Texts
[13:24:12] Nod, alright
[13:24:32] https://commons.wikimedia.org/wiki/Category:Text_logos
[13:24:58] ...if it's easy and simple - do not waste too much time in it
[13:25:13] alright
[13:25:13] you could also add poppler or equivalent to look at pdfs
[13:25:31] How do you propose to figure out football kits ?
[13:25:48] that is a goood question?
[13:25:55] how reliable is line detection?
[13:26:19] DrTrigon_, I do not have any specific number for that
[13:26:24] I dont think tesseract is needed here . we need to detect letters/glyphs , not sentences
[13:26:55] All football kits for left/right sleeve seem to be 31 x 59. I wonder if that is some standardised size
[13:27:13] (poppler extracts from pdfs)
[13:27:19] there are simpler tools for finding if a image has digits and arabic letters in it
[13:27:45] AbdealiJK__, yes, all of our football club articles have almost identical images in them
[13:28:09] so its more like icon detection
[13:28:10] the football articles have a infobox template, which automates everything if the images exist
[13:28:34] Nice.
[13:29:05] I can also do some basic shape detection to verify on top if we find there are false positives. But that size seems sufficiently standardizes
[13:29:09] standardized*
[13:29:12] the filenames are even regulated
[13:29:50] nice!!
[13:29:58] sound like a plan!
[13:30:04] AbdealiJK__: you mentioned in the report 'Bulk test' you had issues with deleted files...
[13:30:19] DrTrigon_, Yep
[13:30:23] ... what about delaying analysis of new files by 5 to 30 mins...?
[13:30:40] (such that users can correct misstakes first)
[13:31:16] Not sure if that's going to matter much. Because those errors happened even after 600th file I analyzed which would probably have been older
[13:31:29] But it does make sense to postpone the analysis
[13:31:44] may be you get less issues then
[13:31:58] RIght now, I've made the existence check and downloading in the same like with `and` - this has reduced those issues a lot
[13:32:01] there will always be such cases
[13:32:20] ...
[13:32:20] right
[13:32:27] what about ...
[13:32:41] ... not checking for existence at all and just try to download directly?
[13:32:54]