纪录片《为何讲话》 第5期:采集数据(在线收听) |
My focus and my aim was to capture the phase 我的研究重点以及目标在于 up to the two-word utterance 捕捉逐渐形成"双词"的过程 and that could happen anywhere 这一过程可能出现在 between second and third birthday. 两周岁到三周岁之间的任何时间 It turned out my son was an early talker 事实证明我儿子属于学说话很早的孩子 so by the time his second birthday arrived 所以到他两岁生日的时候 we had the main data set we wanted. 就已获取到了我们所需的主要数据 By the time recording was complete, 录制过程结束的时候 more than 240,000 hours of information 他们收集了超过二十四万小时的信息 and 16 million words had been collected. 以及一千六百万个单词 It's a lot of data but in its raw form it's useless 数据很多 可原始数据没什么用 and so the challenges this now sets up for us is 因此现阶段我们面临的挑战是 how do you start extracting the right kind of metadata, 如何着手提取出有用的元数据 transcripts of who said what, 谁说过哪些话的文字记录 annotations of where those people were, 那些人身处何方的注解 annotations of how they're moving 他们如何移动的注解 and the relationships that they were in as they were speaking. 还有讲话人处于怎样的关系之中 And these are the, the tools 这些都是我们正在制作的 that we are now building to analyse the raw data, 用来分析原始数据的工具 and from that, we're starting to see some, some 之后我们就可以开始观察 early insights into the patterns of language development. 语言发展的早期模式了 |
原文地址:http://www.tingroom.com/lesson/jlpwhjh/530438.html |