CoBoL's Blog: AI人工智慧的創新與防弊

14/05/2019

AI人工智慧的創新與防弊

AI 人工智慧的創新與防弊

蘇冠賓

中國醫藥大學精神醫學及神經科學教授

無論是日常的購物習慣，還是在社群媒體發表過的言論，凡走過必留下痕跡。中國將全面推行「社會信用系統」，這套AI科技為基礎的演算法決策（Algorithmic-decision）監控系統，可以將老百姓的一言一行全都將轉換成分數，在試用階段已經禁止多達兩千三百萬人次購買旅行票券，不被允許自由旅行。

記得中學科學實驗組裝收音機、電動馬達、甚至Apple II電腦，我們很清楚這些設備的原理和運作，有興趣的學生也都能寫上幾則程式語言，來執行特定的功能；然而現在電腦和應用程式的複雜性，已經遠遠超過常人可以理解的範圍了，在演算法的黑盒子裡面，隨著程式更新一代代演化下去，最後「社會偏見、人權違害、甚至人類健康」的運算過程會變成「隱形」、沒有人可以理解、等到令人擔憂的結果出現了，甚至沒有人需要負責任。

世界最頂尖的科學期刊「自然（Nature）」在最新出刊（2019年5月）的一篇文章中，哈佛大學的法學教授Yochai Benkler 就提出目前人工智慧（Artificial Intelligence，縮寫為AI）發展的相關隱憂。他提到，科技公司（例如Google、Apple、YouTube…等）在AI的研發和創新上，佔有舉足輕重、甚至有淩駕所有政府和非營利企業的趨勢。企業在主導未來AI的發展方向時，不可避免地會用自己已經掌握的數據和影響（權）力，以對自己有利的方式，來評估或制定「自己企業的系統如何對社會和道德衝擊」，然後寫成程式演算的原則。在可見的未來，演算法決策系統（Algorithmic-decision systems）將觸及我們生活的每個角落：醫療和保險；金融和運輸；國防、治安、新聞、政治和廣告…等等。這一切的演算，如果一開始都是以某企業或特定團體的利益而設計時，演算法必然偏離公共利益。由於演算法是針對現有數據所進行的機器訓練，因此除非人們有意識地設計防弊措施，否則未來系統有可能會讓「不公正永久化」。

然而，每當涉及管理或防弊，政府部門常常是站在「阻擋科技發展和社會進步」的一方。例如，為了計程車司機或弱勢勞工的選票，就會阻止Uber或機器化的推動，但可悲的是，政府部門有能力了解並阻擋的科技，都是已經很成熟、穩定、對未來沒有危害的科技。針對AI發展可能對未來人權或公平的危害，政客常常沒有能力理解，更不要說擬定高瞻遠矚的防弊措施。

台灣的科技和教育有很好的基礎，未來在AI的發展上有很好的機會，政府要避免總是站在對立的角色，並引領企業保持在「利己和利他的平衡點」，一定要戒除「凡事用法律和管理」慣性，改為「借重人文、理性、數據和科學」的精神。例如，政府部門應該有意識性地去補助大學和研究機構，進行未來AI科技在各個領域影響的獨立研究，而且這不應該只是科技部、經濟部、衛服部、教育部的工作，更應該是文化部、內政部、外交部、國防部、法務部…等全面部門的工作；此外，未來政府應該要跨企業和部門研議，如何合理堅持企業分享足夠的數據，來防止企業發展AI走火入魔，進而造成不可預測的影響。

https://talk.ltn.com.tw/article/paper/1288431

參考資料：
1. Yochai Benkler. Don’t let industry write the rules for AI. Nature 2019; 569: 161. (https://www.nature.com/articles/d41586-019-01413-1)
2. Yuval Noah Harari. Chapter 3: LIBERTY; 21 Lessons for the 21st Century; Spiegel & Grau (2018) ISBN 9780525512172.
3. 中國計畫明年將全面推行，以大數據科技為基礎的「社會信用系統」 (https://www.facebook.com/ptsworldnews/videos/317783272252179/)
4. One Month, 500,000 Face Scans: How China Is Using A.I. to Profile a Minority
NYTimes April 4, 2019 (https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html?)
5. San Francisco Bans Facial Recognition Technology (https://www.nytimes.com/2019/05/14/us/facial-recognition-ban-san-francisco.html
May 14, 2019)

6 comments:

cobol15/05/2019, 09:24
One Month, 500,000 Face Scans: How China Is Using A.I. to Profile a Minority
NYTimes April 4, 2019
https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html?

The Chinese government has drawn wide international condemnation for its harsh crackdown on ethnic Muslims in its western region, including holding as many as a million of them in detention camps.

Now, documents and interviews show that the authorities are also using a vast, secret system of advanced facial recognition technology to track and control the Uighurs, a largely Muslim minority. It is the first known example of a government intentionally using artificial intelligence for racial profiling, experts said.

The facial recognition technology, which is integrated into China’s rapidly expanding networks of surveillance cameras, looks exclusively for Uighurs based on their appearance and keeps records of their comings and goings for search and review. The practice makes China a pioneer in applying next-generation technology to watch its people, potentially ushering in a new era of automated racism.

The technology and its use to keep tabs on China’s 11 million Uighurs were described by five people with direct knowledge of the systems, who requested anonymity because they feared retribution. The New York Times also reviewed databases used by the police, government procurement documents and advertising materials distributed by the A.I. companies that make the systems.

Chinese authorities already maintain a vast surveillance net, including tracking people’s DNA, in the western region of Xinjiang, which many Uighurs call home. But the scope of the new systems, previously unreported, extends that monitoring into many other corners of the country.

Shoppers lined up for identification checks outside the Kashgar Bazaar last fall. Members of the largely Muslim Uighur minority have been under Chinese surveillance and persecution for years.Paul Mozur

Shoppers lined up for identification checks outside the Kashgar Bazaar last fall. Members of the largely Muslim Uighur minority have been under Chinese surveillance and persecution for years.Paul Mozur
The police are now using facial recognition technology to target Uighurs in wealthy eastern cities like Hangzhou and Wenzhou and across the coastal province of Fujian, said two of the people. Law enforcement in the central Chinese city of Sanmenxia, along the Yellow River, ran a system that over the course of a month this year screened whether residents were Uighurs 500,000 times.

Police documents show demand for such capabilities is spreading. Almost two dozen police departments in 16 different provinces and regions across China sought such technology beginning in 2018, according to procurement documents. Law enforcement from the central province of Shaanxi, for example, aimed to acquire a smart camera system last year that “should support facial recognition to identify Uighur/non-Uighur attributes.”
ReplyDelete
Replies
cobol15/05/2019, 09:29
San Francisco Bans Facial Recognition Technology
https://www.nytimes.com/2019/05/14/us/facial-recognition-ban-san-francisco.html
May 14, 2019

Attendees interacting with a facial recognition demonstration at this year’s CES in Las Vegas.Joe Buglewicz for The New York Times

Attendees interacting with a facial recognition demonstration at this year’s CES in Las Vegas.Joe Buglewicz for The New York Times
SAN FRANCISCO — San Francisco, long at the heart of the technology revolution, took a stand against potential abuse on Tuesday by banning the use of facial recognition software by the police and other agencies.

The action, which came in an 8-to-1 vote by the Board of Supervisors, makes San Francisco the first major American city to block a tool that many police forces are turning to in the search for both small-time criminal suspects and perpetrators of mass carnage.

The authorities used the technology to help identify the suspect in the mass shooting at an Annapolis, Md., newspaper last June. But civil liberty groups have expressed unease about the technology’s potential abuse by government amid fears that it may shove the United States in the direction of an overly oppressive surveillance state.

Aaron Peskin, the city supervisor who sponsored the bill, said that it sent a particularly strong message to the nation, coming from a city transformed by tech.

“I think part of San Francisco being the real and perceived headquarters for all things tech also comes with a responsibility for its local legislators,” Mr. Peskin said. “We have an outsize responsibility to regulate the excesses of technology precisely because they are headquartered here.”

But critics said that rather than focusing on bans, the city should find ways to craft regulations that acknowledge the usefulness of face recognition. “It is ridiculous to deny the value of this technology in securing airports and border installations,” said Jonathan Turley, a constitutional law expert at George Washington University. “It is hard to deny that there is a public safety value to this technology.”

There will be an obligatory second vote next week, but it is seen as a formality.

Similar bans are under consideration in Oakland and in Somerville, Mass., outside of Boston. In Massachusetts, a bill in the State Legislature would put a moratorium on facial recognition and other remote biometric surveillance systems. On Capitol Hill, a bill introduced last month would ban users of commercial face recognition technology from collecting and sharing data for identifying or tracking consumers without their consent, although it does not address the government’s uses of the technology.

Matt Cagle, a lawyer with the A.C.L.U. of Northern California, on Tuesday summed up the broad concerns of facial recognition: The technology, he said, “provides government with unprecedented power to track people going about their daily lives. That’s incompatible with a healthy democracy.”

The San Francisco proposal, he added, “is really forward-looking and looks to prevent the unleashing of this dangerous technology against the public.”

In one form or another, facial recognition is already being used in many American airports and big stadiums, and by a number of other police departments. The pop star Taylor Swift has reportedly incorporated the technology at one of her shows, using it to help identify stalkers.

The facial recognition fight in San Francisco is largely theoretical — the police department does not currently deploy such technology, and it is only in use at the international airport and ports that are under federal jurisdiction and are not impacted by the legislation.

ReplyDelete
Replies
cobol22/05/2019, 17:10
Preventing AI abuse requires cooperation
(Poorly translated by the Taipei Times)
Published on Taipei Times Wed, May 22, 2019:
http://www.taipeitimes.com/News/editorials/archives/2019/05/22/2003715563
Copyright © 1999-2019 The Taipei Times. All rights reserved.

In Junior-high school science class, we assembled radios, motors and even an Apple II computer. We understood the principles behind how those devices worked and interested students could even write code in different programming languages to perform specific functions.
However, the complexity of computers and applications has advanced far beyond ordinary people’s understanding.
As programs continue to evolve, the computational processes behind social prejudice, human rights violations and even human health are to disappear in the black box of algorithms. No one will be able to understand them and once things begin to go awry, there will be no one to take responsibility.
In an article in this month’s issue of Nature, Harvard University law professor Yochai Benkler put forward concerns about the development of artificial intelligence (AI).
In AI research, development and innovation, technology companies such as Google and Apple play a decisive role and prevail over many governments and nonprofit companies, Benkler wrote.
As businesses direct the development of AI, it is unavoidable that they would use their own data and influence in ways that are beneficial to themselves as they determine the effects of their business systems on society and morals, and then incorporate that into their programs.
In the foreseeable future, algorithms are to influence every aspect of everyday life, such as health, insurance, finance, transportation, national defense, law and order, news, politics, advertising and so on.
If all these algorithms are designed based on the interests of certain businesses or groups, they will move away from the public interest.
As machine learning algorithms are based on existing data, future systems could become permanently unfair unless people design fraud prevention measures.
However, most of the time when a government is involved in management or prevention of abuse, it sides with those who want to block technological and social progress.
For example, to win votes from taxi drivers and disadvantaged groups, politicians have been blocking Uber and automation.
Tragically, the technologies that politicians are able to understand and block are the ones that are mature, stable and pose no threat. When it comes to AI’s possible threat to human rights and fairness, politicians are incapable of understanding the implications, let alone create measures to prevent abuse.
Taiwan has solid foundations in science, technology and education, and the development of AI presents a good opportunity.
If the government does not want to oppose scientific and technological development, and wants to guide companies to maintain a balance between their own and others’ interests, it must stop imposing laws and instead rely on the humanities, reason, data and science.
For example, government agencies should subsidize independent research by universities and research institutions on the effects of AI technology.
This should not only be the responsibility of the Ministry of Science and Technology, Ministry of Economic Affairs, Ministry of Health and Welfare, and Ministry of Education, but also involve the Ministry of Culture, Ministry of the Interior, Ministry of Foreign Affairs, Ministry of National Defense, Ministry of Justice and others.
The government should also conduct cross-industry and cross-departmental discussions on how to regulate businesses so they share enough data to prevent abusive development of AI.
Su Kuan-pin is a professor and director of China Medical University’s College of Medicine and Mind-Body Interface Research Center.
ReplyDelete
Replies
cobol01/10/2019, 16:58
FDA公布輔助臨床決策AI軟體法規指南
2019.09.27 環球生技雜誌 / 記者吳培安編譯

美國時間26日，美國食品藥物監督管理局(FDA)公布新版指南，釐清各種形式的輔助臨床決策(Clinical decision support, CDS)軟體程式的監管強度。當局此次更新了多種醫療科技產品，包括手機醫用app、一般健康和低風險儀器、給製造商的現成(off-the-shelf)商業軟體應用、資料系統、醫學影像儲存、通訊儀器等的指南，使其與美國國會於2016年6月頒布的《21世紀治癒法案》(21st Century Cures Act)接軌。

FDA將CDS定義為能夠提供醫生、患者或照護者相關知識及針對患者的資訊，智能化在適當時機過濾或展示，以提昇醫療與健康照護。FDA將根據產品的風險程度，決定監管的強度。

此次頒布的是2017年底所公布、以強化牽涉到電子器材的安全需求規範草案的確定版。FDA表示，這項指南的監管強度將會聚焦在高風險、應用在緊急狀況下的軟體，以及應用機器學習演算法這類可能無法充分向使用者解釋其背後邏輯的程式。

FDA舉出的此類實際案例，像是在不解釋其基本原理下，將住院的第一型糖尿病患者判定有急性心臟病高風險、應接受手術；或者是奠基於電子化就醫紀錄(electronic medical records)和地理分布資料的演算法學習，即自動從感冒的患者中判定出哪些為季節性流感的可疑案例。

FDA並強調，FDA建立監管框架目標之一，也包含不阻礙低風險、但幫助性大的醫療相關軟體發展，例如在非緊急狀況下，不須詢問醫生、就能告知通知患者，患者也能自主確認應當如何處理的程式。它們有些甚至更優於某些傳統式的醫療儀器、具有可取代性，這樣的想法也獲得21世紀治癒法案的支持。

FDA常務副局長Amy Abernethy表示，那些以鼓勵使用者維持健康生活方式的手機app，大多不在FDA監管的範圍內。

Abernethy在聲明中表示，從越來越簡單的血糖值報告到能夠偵測心律不整的智慧手錶，患者、家屬及其醫療專業人員，正在加速擁抱數位醫療科技。FDA相信，將科技進展納入法規框架的考量，在數位醫療科技的發展上相當重要。

參考資料：

https://www.fiercebiotech.com/medtech/fda-delivers-regulatory-guidance-ai-software-and-clinical-decisionmaking-aids
ReplyDelete
Replies
cobol09/02/2020, 19:47
讓谷歌折戟的 AI 流行病預測，在今天如何被創業公司攻佔？
Logo硅谷洞察 · 2020年2月6日 12:06
https://www.chainnews.com/zh-hant/articles/998525765905.htm

預測未知，一直是人類十分嚮往的能力。遠不說國人熟悉的周易八卦、唐代道士編寫的《推背圖》，還有西方人熟知的占星術、中世紀流行起來的塔羅牌，近的比如說當年根據 “2012 世界末日”這一瑪雅預言影響下出現的全民狂熱和商業狂歡，依然讓我們記憶猶新。

現在“不問蒼生問鬼神”的時代已經過去，我們對物理世界及社會經濟的確定性的、經驗性的甚至概率性的預測都已輕車熟路。但比如說像“蝴蝶效應”描述的高度複雜的、超多變量以及超大數據量的預測，人類還是束手無策麼？

答案並不是。

近日，我國武漢新型冠狀病毒疫情的爆發引起世界衛生組織和全球多地衛生機構的密切關注。其中，《連線》雜誌報道了“一家加拿大公司 BlueDot 通過 AI 監測平臺率先預測和發佈武漢出現傳染疫情”的新聞，得到國內媒體的廣泛關注。這似乎是我們在“預測未來”這件事上最想看到的成果——藉助大數據沉澱基礎和 AI 的推斷，人類似乎正能夠揣摩“天意”，揭示出原本深藏於混沌之中的因果規律，從而在天災降臨前試圖挽救世界。

今天我們就從傳染病預測出發，看看 AI 是如何一步步走向“神機妙算”的。

谷歌 GFT 頻喊“狼來了”：

流感大數據的狂想曲

用 AI 預測傳染病顯然不是 Bluedot 的專利，其實早在 2008 年，今天的 AI“強手”谷歌，就曾進行過一次不太成功的嘗試。

讓谷歌折戟的 AI 流行病預測，在今天如何被創業公司攻佔？

2008 年穀歌推出一個預測流感流行趨勢的系統——Google Flu Trends （谷歌流感趨勢，以下簡稱 GFT）。GFT 一戰成名是在 2009 年美國 H1N1 爆發的幾周前，谷歌工程師在《Nature》雜誌上發表了一篇論文，通過谷歌累積的海量搜索數據，成功預測 H1N1 在全美範圍的傳播。就流感的趨勢和地區分析中，谷歌用幾十億條檢索記錄，處理了 4.5 億個不同的數字模型，構造出一個流感預測指數，其結果與美國疾病控制和預防中心（CDC）官方數據的相關性高達 97%，但要比 CDC 提前了整整 2 周。在疫情面前，時間就是生命，速度就是財富，如果 GFT 能一直保持這種“預知”能力，顯然可以爲整個社會提前控制傳染病疫情贏得先機。

然而，預言神話沒有持續多久。2014 年，GFT 又再次受到媒體關注，但這一次卻是因爲它糟糕的表現。研究人員 2014 年又在《Science》雜誌發佈 “谷歌流感的寓言：大數據分析的陷阱” 一文，指出在 2009 年，GFT 沒有能預測到非季節性流感 A-H1N1。從 2011 年 8 月到 2013 年 8 月的 108 周裏，GFT 有 100 周高過了 CDC 報告的流感發病率。高估了多少呢？在 2011-2012 季，GFT 預測的發病率是 CDC 報告值的 1.5 倍多；而到 2012-2013 季，GFT 預測流感發病率已是 CDC 報告值的 2 倍多。

讓谷歌折戟的 AI 流行病預測，在今天如何被創業公司攻佔？

（圖表來自 The Parable of Google Flu: Traps in Big Data Analysis | Science，2014）

儘管 GFT 在 2013 年調整了算法，並回應稱出現偏差的罪魁禍首是媒體對 GFT 的大幅報道導致人們的搜索行爲發生了變化 ,GFT 預測的 2013-2014 季的流感發病率，仍然高於 CDC 報告值 1.3 倍。並且研究人員前面發現的系統性誤差仍然存在，也就是“狼來了”的錯誤仍然在犯。

到底 GFT 遺漏了哪些因素，讓這個預測系統陷入窘境？

根據研究人員分析，GFT 的大數據分析出現如此大的系統性誤差，其收集特徵和評估方法可能存在以下問題：

一、大數據傲慢（Big Data Hubris）

所謂“大數據傲慢”，就是谷歌工程師給出的前提假設就是，通過用戶搜索關鍵詞得到的大數據包含的即是流感疾病的全數據收集，可以完全取代傳統數據收集（採樣統計），而不是其補充。也就是 GFT 認爲“採集到的用戶搜索信息”數據與 “某流感疫情涉及的人羣”這個總體完全相關。這一 “自大”的前提假設忽視了數據量巨大並不代表數據的全面和準確，因而出現在 2009 年成功預測的數據庫樣本不能涵蓋在之後幾年出現的新的數據特徵。也是因爲這份“自負”，GFT 也似乎沒有考慮引入專業的健康醫療數據以及專家經驗，同時也並未對用戶搜索數據進行“清洗”和“去噪”，從而導致此後流行病發病率估值過高但又無力解決的問題。

二、搜索引擎演化

同時搜索引擎的模式也並非一成不變的，谷歌在 2011 年之後推出“推薦相關搜索詞”，也就是我們今天很熟悉的搜索關聯詞模式。

比如針對流感搜索詞，給出相關尋求流感治療的 list，2012 年後還提供相關診斷術語的推薦。研究人員分析，這些調整有可能人爲推高了一些搜索，並導致谷歌對流行發病率的高估。舉例來說，當用戶搜索“喉嚨痛”，谷歌會在推薦關鍵詞給出“喉嚨痛和發燒”、“如何治療喉嚨痛”等推薦，這時用戶可能會出於好奇等原因進行點擊，造成用戶使用的關鍵詞並非用戶本意的現象，從而影響 GFT 蒐集數據的準確性。

讓谷歌折戟的 AI 流行病預測，在今天如何被創業公司攻佔？

而用戶的搜索行爲反過來也會影響 GFT 的預測結果，比如媒體對於流感流行的報道會增加與流感相關的詞彙的搜索次數，進而影響 GFT 的預測。這就像量子力學家海森堡指出的，在量子力學中存在的“測不準原理”說明的一樣，“測量即干涉”，那麼，在充斥媒體報道和用戶主觀信息的搜索引擎的喧囂世界裏，也同樣存在“預測即干涉”悖論。搜索引擎用戶的行爲並不完全是自發產生，媒體報道、社交媒體熱點、搜索引擎推薦甚至大數據推薦都在影響用戶心智，造成用戶特定搜索數據的集中爆發。

爲什麼 GFT 的預測總是偏高？根據這一理論，我們可以知道，一旦 GFT 發佈的流行病預測指數升高，立刻會引發媒體報道，從而導致更多相關信息搜索，從而又強化 GFT 的疫情判斷，無論如何調整算法，也改變不了“測不準”的結果。

三、相關而非因果

研究人員指出，GFT 的根源問題在於，谷歌工程師並不清楚搜索關鍵詞和流感傳播之間到底有什麼因果聯繫，而只是關注數據之間的——統計學相關性特徵。過度推崇“相關”而忽略“因果”就會導致數據失準的情況。比如，以“流感”爲例，如果一段時間該詞搜索量暴漲，可能是因爲推出一部《流感》的電影或歌曲，並不一定意味着流感真的在爆發。

一直以來，儘管外界一直希望谷歌能夠公開 GFT 的算法，谷歌並沒有選擇公開。這讓很多研究人員質疑這些數據是否可以重複再現或者存在更多商業上的考慮。他們希望應該將搜索大數據和傳統的數據統計（小數據）結合起來，創建對人類行爲更深入、準確的研究。

顯然，谷歌並沒有重視這一意見。最終在 2015 年 GFT 正式下線。但其仍在繼續收集相關用戶的搜索數據，僅提供給美國疾控中心以及一些研究機構使用。

爲什麼 BlueDot 率先成功預測：

AI 算法與人工分析的協奏曲

衆所周知，谷歌在當時已經在佈局人工智能，2014 年收購 DeepMind，但依然保持它的獨立運營。同時，谷歌也沒有 GFT 再投入更多關注，因此也並未考慮將 AI 加入到 GFT 的算法模型當中，而是選擇了讓 GFT 走向“安樂死”。

幾乎在同一時期，今天我們所見到的 BlueDot 誕生。

BlueDot 是由傳染病專家卡姆蘭·克汗（Kamran Khan）建立流行病自動監測系統，通過每天分析 65 種語言的約 10 萬篇文章，來跟蹤 100 多種傳染病爆發情況。他們試圖用這些定向數據收集來獲知潛在流行傳染病爆發和擴散的線索。BlueDot 一直使用自然語言處理（NLP）和機器學習（ML）來訓練該“疾病自動監測平臺”，這樣不僅可以識別和排除數據中的無關“噪音”，比如，系統識別這是蒙古炭疽病的爆發，還僅僅是 1981 年成立的重金屬樂隊“炭疽”的重聚。又比如 GFT 僅僅將“流感”相關搜索的用戶理解爲可能的流感病患者，顯然出現過多不相關用戶而造成流行病準確率的高估。這也是 BlueDot 區別於 GFT 在對關鍵數據進行甄別的優勢之處。

就像在這次在新型冠狀病毒疫情的預測中，卡姆蘭表示，BlueDot 通過搜索外語新聞報道，動植物疾病網絡和官方公告來找到疫情信息源頭。但該平臺算法不使用社交媒體的發佈內容，因爲這些數據太過雜亂容易出現更多“噪音”。
ReplyDelete
Replies
cobol09/02/2020, 19:48
讓谷歌折戟的 AI 流行病預測，在今天如何被創業公司攻佔 (下)？
Logo硅谷洞察 · 2020年2月6日 12:06
https://www.chainnews.com/zh-hant/articles/998525765905.htm

讓谷歌折戟的 AI 流行病預測，在今天如何被創業公司攻佔？

關於病毒爆發後的傳播路徑預測，BlueDot 更傾向於使用訪問全球機票數據，從而更好發現被感染的居民的動向和行動時間。在 1 月初的時候，BlueDot 也成功預測了新型冠狀病毒從武漢爆發後，幾天之內從武漢擴散至北京、曼谷、漢城及臺北。

新冠病毒爆發並非是 BlueDot 的第一次成功。在 2016 年，通過對巴西寨卡病毒的傳播路徑建立 AI 模型的分析，BlueDot 成功地提前六個月預測在美國佛羅里達州出現寨卡病毒。這意味着 BlueDot 的 AI 監測能力甚至可以做到預測流行病的地域蔓延軌跡。

從失敗到成功，BlueDot 和谷歌 GFT 之間究竟存有哪些差異？

一、預測技術差異

之前主流的預測分析方法採取的是數據挖掘的一系列技術，其中經常用到的數理統計中的“迴歸”方法，包括多元線性迴歸、多項式迴歸、多因 Logistic 迴歸等方法，其本質是一種曲線的擬合，就是不同模型的“條件均值”預測。這也正是 GFT 所採用的預測算法的技術原理。

在機器學習之前，多元迴歸分析提供了一種處理多樣條件的有效方法，可以嘗試找到一個預測數據失誤最小化且“擬合優度”最大化的結果。但迴歸分析對於歷史數據的無偏差預測的渴求，並不能保證未來預測數據的準確度，這就會造成所謂的“過度擬合”。

據北大國研院教授沈豔在《大數據分析的光榮與陷阱——從谷歌流感趨勢談起》一文中分析，谷歌 GFT 確實存在“過度擬合”的問題。也就是在 2009 年 GFT 可以觀察到 2007-2008 年間的全部 CDC 數據，採用的訓練數據和檢驗數據尋找最佳模型的方法所參照的標準就是——不惜代價高度擬合 CDC 數據。所以，在 2014 年的《Science》論文中指出，會出現 GFT 在預測 2007-2008 年流感流行率時，存在丟掉一些看似古怪的搜索詞，而用另外的 5000 萬搜索詞去擬合 1152 個數據點的情況。2009 年之後，GFT 要預測的數據就將面臨更多未知變量的存在，包括它自身的預測也參與到了這個數據反饋當中。無論 GFT 如何調整，它仍然要面對過度擬合問題，使得系統整體誤差無法避免。

BlueDot 採取了另外一項策略，即醫療、衛生專業知識和人工智能、大數據分析技術結合的方式，去跟蹤並預測流行傳染病在全球分佈、蔓延的趨勢，並給出最佳解決方案。

讓谷歌折戟的 AI 流行病預測，在今天如何被創業公司攻佔？

BlueDot 主要採用自然語言處理和機器學習來提升該監測引擎的效用。隨着近幾年算力的提升以及機器學習，從根本上徹底改變了統計學預測的方法。主要是深度學習（神經網絡）的應用，採用了“反向傳播”的方法，可以從數據中不斷訓練、反饋、學習，獲取“知識”，經過系統的自我學習，預測模型會得到不斷優化，預測準確性也在隨着學習而改進。而模型訓練前的歷史數據輸入則變得尤爲關鍵。足夠豐富的帶特徵數據是預測模型得以訓練的基礎。經過清洗的優質數據和提取恰當標註的特徵成爲預測能否成功的重中之重。

二、預測模式差異

與 GFT 完全將預測過程交給大數據算法的結果的方式不同，BlueDot 並沒有完全把預測交給 AI 監測系統。BlueDot 是在數據篩選完畢後，會交給人工分析。這也正是 GFT 的大數據分析的“相關性”思維與 BlueDot 的“專家經驗型”預測模式的不同。AI 所分析的大數據是選取特定網站（醫療衛生、健康疾病新聞類）和平臺（航空機票等）的信息。而 AI 所給出的預警信息也需要相關流行病學家的再次分析才能進行確認是否正常，從而評估這些疫情信息能否第一時間向社會公佈。

當然，就目前這些案例還不能說明 BlueDot 在預測流行病方面已經完全取得成功。首先，AI 訓練模型是否也會存在一些偏見，比如爲避免漏報，是否會過分誇大流行病的嚴重程度，因而再次出現“狼來了”的問題？其次，監測模型所評估的數據是否有效，比如 BlueDot 謹慎使用社交媒體的數據來避免過多的“噪音”？

讓谷歌折戟的 AI 流行病預測，在今天如何被創業公司攻佔？

幸而 BlueDot 作爲一家專業的健康服務平臺，他們會比 GFT 更關注監測結果的準確性。畢竟，專業的流行病專家是這些預測報告的最終發佈人，其預測的準確度直接會影響其平臺信譽和商業價值。這也意味着，BlueDot 還需要面臨如何平衡商業化盈利與公共責任、信息開放等方面的一些考驗。

AI 預測流行病爆發，僅僅是序曲…

“發出第一條武漢冠狀病毒警告的是人工智能？”媒體的這一標題確實讓很多人驚訝。在全球一體化的當下，任何一地流行疾病的爆發都有可能短時間內傳遍全球任何一個角落，發現時間和預警通報效率就成爲預防流行疾病的關鍵。如果 AI 能夠成爲更好的流行病預警機制，那不失爲世界衛生組織（WHO）以及各國的衛生健康部門進行流行病預防機制的一個辦法。

那這又要涉及到這些機構組織如何採信 AI 提供的流行病預報結果的問題。未來，流行病 AI 預測平臺還必須提供流行病傳染風險等級，以及疾病傳播可能造成的經濟、政治風險的等級的評估，來幫助相關部門做出更穩妥的決策。而這一切，仍然需要時間。這些組織機構在建立快速反應的流行病預防機制中，也應當把這一 AI 監測系統提上日程了。

可以說，此次 AI 對流行病爆發提前成功地預測，是人類應對這場全球疫情危機的一抹亮色。希望這場人工智能參與的疫情防控的戰役只是這場持久戰的序曲，未來應該有更多可能。比如，主要傳染病病原體的 AI 識別應用；基於主要傳染病疫區和傳染病的季節性流行數據建立傳染病 AI 預警機制；AI 協助傳染病爆發後的醫療物資的優化調配等。這些讓我們拭目以待。

ReplyDelete
Replies

Add comment