Shahan Ali Memon

University of Washington | Carnegie Mellon University | New York University

Present: I am a second year PhD student at the Information School at University of Washington advised by Jevin D. West. My research is funded by the Center for an Informed Public, and can be broadly classified into Computational Social Science.
I study how AI is changing the institution of science —- the emerging field I call the Science of AI-mediated Science. I am interested in the question: Can AI do science? Can we create mini versions of AI agents that are homines silici, computational analogues of us humans, who can autonomously interact and collaborate with each other and do research like a PhD student does (minus the coffee dependency, hopefully ;)). Basically, I'm trying to figure out if an AI system could someday take over my job. I also study scientific collaboration—because yes, I can have varied interests, and frankly, it's fascinating to study that! Teamwork and diverse perspectives fuel so much of what drives innovation in science.
I also focus on AI alignment and AI ethics. It might seem like a bit of a mixed bag, but trust me, it's all connected. How AI works, its sensemaking process, and the values embedded within it are all essential factors in developing better AI systems.

Past: I did my Master's at Language Technologies Institute (LTI) at Carnegie Mellon University in the lovely (and chilly) Pittsburgh, where I was lucky to be mentored by Bhiksha Raj, and Rita Singh on various directed research projects in the field of Speech Processing and Voice Forensics. I also completed a Master's thesis on "Characterizing Misinformed Online Health Communities." under the guidance of Kathleen M. Carley and David R. Mortensen. My Bachelor's was in Computer Science from Carnegie Mellon University in Qatar where I was mentored by Ingmar Weber and Saquib Razak for my undergraduate thesis on "Lifestyle Disease Surveillance Using Population Search Behaviour." Before starting my PhD, I also worked as a Research Associate at New York University in Abu Dhabi. This experience resulted in several important projects of my career in the field of Science of Science, in collaboration with Bedoor AlShebli. Kinga R. Makovi, Talal Rahwan, and Wifag Adnan.

Future: I have been fortunate enough to find amazing and caring advisors and collaborators who took a leap of faith to support and guide me, and collectively shaped my interests, and gave me opportunities to tackle interesting questions. I try to optimize for long-term collaborations and mentorship. I also often think about the implications of my research, and hope to eventually be able make some difference in the world -- the world where impact is not just measured by citations (though, hey, I wouldn't say no to a few of those either!).

I'm always open to collaboration and discussions. So don't hesitate to reach out if you would like to collaborate with me, or just talk about life.

Prospective PhD applicants, especially those from underrepresented backgrounds, are more than welcome to email me about questions related to the application process or PhD experience. :)

an AI generated portrait of shahan ali memon

Ongoing Research

From job titles to jawlines: How AI imagines your face from your CV

Shahan Ali Memon, Soham De, Riyan Mujtaba, Sungha Kang, Nic Weber, Bedoor AlShebli, Jaime Snyder, Jevin D. West
In Preparation 

In this study, we investigate the ways in which AI systems interpret identity and competence markers in professional Curricula Vitae (CVs) and transform structured documents into photo-realistic headshots. Our objective is to qualitatively analyze how AI systems, such as ChatGPT, process identity and competence markers such as gender, experience, job titles, education, skills, and translate them into fully realized portraits. We find that the AI system frequently generates masculine representations, regardless of the CV holder's actual gender, and often resort to stereotypical associations, such as academics "wear glasses" or appear "kind and approachable." Through this visual elicitation exercise, we examine what traits AI selects or emphasizes in its generated portraits and how biases propagate between models in AI systems.


Can LLM-based AI agents automate science?

Shahan Ali Memon, James Koppel, Tom Hope, Jevin D. West
In Preparation 
Quoted by Nature News

"AI Scientist" is a multi-agent AI system built using large language models intended to automate scientific research. The preprint associated with this system has recently attracted significant attention across social and news media, positioning it among many similar AI-driven systems under development. In this commentary, we critically examine the AI Scientist, focusing on issues such as bias, plagiarism, and hallucinations in its generated papers and reviews. Additionally, we address the system's methodological shortcomings. We further examine key technological challenges with the recent academic efforts in this area, highlighting important social, ethical and epistemological implications of such efforts as well as provide an agenda for future research and communication in this area.


Science of AI-mediated science

Shahan Ali Memon, Jevin D. West
Accepted for presentation at ICSSI (2024)
In Preparation 
Slides (ICSSI)

The various processes of scientific research are increasingly being influenced by artificial intelligence (AI). Innovative tools are emerging to assist scholars in tasks such as hypothesis generation, literature review, data collection, experimentation, and writing. As these AI-driven technologies are integrated into research practices, they are fundamentally transforming the nature of scientific inquiry and the knowledge it produces. This evolving landscape has given rise to a new field of study known as the "Science of AI-mediated Science," which examines the impact of AI on the methodology and outcomes of scientific research.


Characterizing the effect of retractions on publishing careers

Shahan Ali Memon, Kinga Makovi*, Bedoor AlShebli*
Accepted for presentation at ICSSI (2023) **Best Paper Award**
Accepted for presentation at Frontiers of Network Science Workshop (2022 & 2023)
Accepted for presentation at IC2S2 (2022)
Under Review 
Recording (IC2S2) Slides (IC2S2) Slides (ICSSI)
Preprint

Retracting academic papers is a fundamental tool for social control in the academy, and in the vast majority of cases happen only under the most extreme circumstances: when the science behind papers, or the integrity of authors come into question. While retractions do not completely erase papers from the academic record, they can have important implications for retracted scientists and their careers. In this project, we aim to uncover whether retracted authors (RQ1) retain fewer collaborators, (RQ2) gain fewer new collaborators, (RQ3) close fewer triads, and (RQ4) get penalized for public retractions, than their matched non-retracted scientists.


Where postdoctoral journeys lead

Yueran Duan, Shahan Ali Memon, Bedoor AlShebli*, Qing Guan, Petter Holme*, Talal Rahwan
Under Review 
Preprint Twitter thread

Postdoctoral training is commonly recognized as a challenging and intense period in one's career, where many talented PhD graduates encounter unforeseen circumstances that can impact their academic aspirations. Utilizing a specialized data set encompassing academic publications and career trajectories, we aim to comprehensively map out the varied outcomes of postdoctoral experiences.


Publications

China and the U.S. produce more impactful AI research when collaborating together

Bedoor AlShebli*, Shahan Ali Memon, James A. Evans, Talal Rahwan*
Nature Scientific Reports (2024; IF:3.80) 
Accepted for presentation at ICSSI (2023)
Accepted as a poster at IC2S2 (2023)
Paper Slides (ICSSI) Poster (IC2S2) Twitter thread

Fierce geopolitical tensions between China and the U.S. have led to policies that discourage cross-border collaboration and migration in the field of Artificial Intelligence. Despite this, we analyze a dataset of 363,000 AI scientists and 5,400,000 papers showing that China and the U.S. have been leading the field since 2000 in terms of impact, novelty, productivity, and workforce. Significant bidirectional migration is observed with both countries being primary destinations for one another. Collaborations between the two countries while increasing still represent a small fraction of their total productivity. Yet, we show that the two countries produce more impactful research when collaborating together, suggesting that promoting cross-border collaboration and migration could benefit the field of AI.


Search Engines Post-ChatGPT: How Generative Artificial Intelligence Could Make Search Less Reliable

Shahan Ali Memon, Jevin D. West
Center for an Informed Public Rapid Research Blog 
Presented at the Center for an Informed Public Meeting
Selected as a required reading for IMT 589: "Problematic Information" class at University of Washington.
Commentary Preprint Slides (CIP Meeting)

In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions.


Perceptions of FIFA men’s world cup 2022 host nation Qatar in the Twittersphere

Susan Dun, Hatim Rachdi Shahan Ali Memon, Yelena Mejova, Ingmar Weber
International Journal of Sport Communication (2022; IF:1.59; Q-Index:Q2) 
Accepted for presentation at the 105th NCA 105th Annual Convention (2019)
Paper

We assessed the discussion around FIFA World Cup 2022 in the Twittersphere to shed some light on whether Qatar’s nation-branding and soft power attempts are reflected in public perceptions.


COVID-19 vaccine perceptions in the initial phases of US vaccine roll-out: an observational study on Reddit

Navin Kumar, Isabel Corpus, Meher Hans, Nikhil Harle, Nan Yang, Curtis McDonald, Shinpei Nakamura Sakai, Kamila A Janmohamed, Weiming Tang, Jason L Schwartz, S Mo Jones-Jang, Koustuv Saha, Shahan Ali Memon, Chris Bauch, Munmun De Chaudhury, Orestis Papakyriakopoulos, Joseph D Tucker, Abhay Goyal, Aman Tyagi, Kaveh Khoshnood, Saad Omer
BMC Public Health (2022; IF:3.98) 
Paper

The purpose of this analysis was to detail the behavior of top Reddit users, posts’ relationship with events early in the vaccine timeline, and the relationship between subreddits that shared COVID-19 vaccine posts. Research questions are as follows: What is the behavior of top Reddit users in regards to COVID-19 vaccines (RQ1)? What are Reddit posts’ relationship with events early in the vaccine timeline (RQ2)? What is the relationship between subreddits that shared COVID-19 vaccine posts (RQ3)?


Hierarchical routing mixture of experts

Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh
25th International Conference on Pattern Recognition (ICPR 2020) 
Paper Slides

In regression tasks, the data distribution is often too complex to be fitted by a single model. In contrast, partition-based models are developed where data is divided and fitted by local models. These models partition the input space and do not leverage the input-output dependency of multimodal-distributed data, and strong local models are needed to make good predictions. Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.


Characterizing COVID-19 misinformation communities using a novel Twitter dataset

Shahan Ali Memon, Kathleen M. Carley
International Workshop on Mining Actionable Insights from Social Networks (MAISoN) (in conj. with CIKM 2020) 
Funded by Center for Machine Learning and Health (CMLH)
Paper Slides Data Codebook Recording

From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. In this paper, we present a methodology and analyses to characterize the two competing COVID-19 misinformation communities online: (i) misinformed users or users who are actively posting misinformation, and (ii) informed users or users who are actively spreading true information, or calling out misinformation. The goals of this study are two-fold: (i) collecting a diverse set of annotated COVID-19 Twitter dataset that can be used by the research community to conduct meaningful analysis; and (ii) characterizing the two target communities in terms of their network structure, linguistic patterns, and their membership in other communities.


Characterizing sociolinguistic variation in the competing vaccination communities

Shahan Ali Memon, Aman Tyagi, David R. Mortensen, Kathleen M. Carley
International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS 2020) 
Funded by Center for Machine Learning and Health (CMLH)
Paper Slides

Public health practitioners and policy makers grapple with the challenge of devising effective message-based interventions for debunking public health misinformation in cyber communities. Framing and personalization of the message is one of the key features for devising a persuasive messaging strategy. For an effective health communication, it is imperative to focus on preference based framing where the preferences of the target sub-community are taken into consideration. To achieve that, it is important to understand and hence characterize the target sub-communities in terms of their social interactions. In the context of health-related misinformation, vaccination remains to be the most prevalent topic of discord. Hence, in this paper, we conduct a sociolinguistic analysis of the two competing vaccination communities on Twitter: pro-vaxxers or individuals who believe in the effectiveness of vaccinations, and anti-vaxxers or individuals who are opposed to vaccinations


The phonetic bases of vocal expressed emotion: natural versus acted

Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh
Annual Conference of the International Speech Communication Association (INTERSPEECH 2020) 
Paper

Can vocal emotions be emulated? This question has been a recurrent concern of the speech community, and has also been vigorously investigated. It has been fueled further by its link to the issue of validity of acted emotion databases. Much of the speech and vocal emotion research has relied on acted emotion databases as valid proxies for studying natural emotions. To create models that generalize to natural settings, it is crucial to work with valid prototypes -- ones that can be assumed to reliably represent natural emotions. More concretely, it is important to study emulated emotions against natural emotions in terms of their physiological, and psychological concomitants. In this paper, we present an on-scale systematic study of the differences between natural and acted vocal emotions.


Lifestyle disease surveillance using population search behavior: feasibility study

Shahan Ali Memon, Saquib Razak Ingmar Weber
Journal of Medical Internet Research (2020; IF:7.08) 
Accepted for presentation at the Population Association of America (PAA 2021)
Accepted for presentation at CMU Qatar Meeting of the Minds (MoM 2017)
Paper Slides (PAA) Poster (MoM) Code Recording (PAA)
Slides for Google Trends Denormalization

As the process of producing official health statistics for lifestyle diseases is slow, researchers have explored using Web search data as a proxy for lifestyle disease surveillance. Existing studies, however, are prone to at least one of the following issues: ad-hoc keyword selection, overfitting, insufficient predictive evaluation, lack of generalization, and failure to compare against trivial baselines. The aims of this study were to (1) employ a corrective approach improving previous methods; (2) study the key limitations in using Google Trends for lifestyle disease surveillance; and (3) test the generalizability of our methodology to other countries beyond the United States.


Neural regression trees

Shahan Ali Memon*, Wenbo Zhao*, Bhiksha Raj, Rita Singh
International Joint Conference on Neural Networks (IJCNN 2019) 
Paper Slides

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one. Current approaches for RvC use ad-hoc discretization strategies and are suboptimal. We propose a neural regression tree model for RvC. In this model, we employ a joint optimization framework where we learn optimal discretization thresholds while simultaneously optimizing the features for each node in the tree.


Detecting gender differences in perception of emotion in crowdsourced data

Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh
arXiv 
Paper Slides Recording

Do men and women perceive emotions differently? Popular convictions place women as more emotionally perceptive than men. Empirical findings, however, remain inconclusive. Most prior studies focus on visual modalities. In addition, almost all of the studies are limited to experiments within controlled environments. Generalizability and scalability of these studies has not been sufficiently established. In this paper, we study the differences in perception of emotion between genders from speech data in the wild, annotated through crowdsourcing. While we limit ourselves to a single modality (i.e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general. Our paper addresses multiple serious challenges related to making statistically viable conclusions from crowdsourced data. Overall, the contributions of this paper are two fold: a reliable novel framework for perceptual studies from crowdsourced data; and the demonstration of statistically significant differences in speech-based emotion perception between genders.


Public perception of a country: exploring tweets about Qatar

Shahan Ali Memon, Rohith Krishnan Pillai, Susan Dun, Yelena Mejova, Ingmar Weber
International ACM Conference on Web Science (WebSci 2017) 
Accepted for presentation at 2016 Qatar Foundation Annual Research Conference (QFARC)
Accepted for presentation at CMU Qatar Meeting of the Minds (MoM 2017)
Paper Poster Abstract

Is it possible to "hack" an image of an international entity by driving international and domestic media? Here, we present an image/brand monitoring tool for a country, Qatar, which presents an overview of the contexts and references to media in which it is mentioned on social media. Tracking dozens of languages, this tool allows a global understanding of the perceptions and concerns Twitter users associate with Qatar, and which mainstream media may be driving these sentiments.


News

  • [Sep 30 2023] Transitioned from my role as a research associate at NYU Abu Dhabi to pursue a Ph.D. in Information Science
  • [Jul 18-20 2023] Attended and presented our work on "U.S. and China produce more impacful AI research when collaborating together" as a poster at IC2S2 2023 in Copenhagen.
  • [Jun 28 2023] Our paper on "Characterizing the effect of retractions on scientific careers" won "Best Student Paper Award" at ICSSI 2023.
  • [Jun 26-28 2023] Attended and presented our work on "Characterizing the effect of retractions on scientific careers" at the Networks Workshop at ICSSI 2023 at Northwestern University in Evanston.
  • [May 18-19 2023] Attended and presented our work on "Exploring the impact of retractions on academic reputation" at the Networks Workshop at NYU main campus.
  • [Jul 19-22 2022] Attended and presented our work on scientific retractions at IC2S2 2022 in Chicago
  • [May 18 2022] Attended and presented our work on scientific retractions at the Networks Workshop at NYU Abu Dhabi
  • [Apr 25 2022] Our submission on "Characterizing the effect of scientific retractions on collaboration networks" got accepted at IC2S2 2022.
  • [May 6 2021] Attended and presented at PAA 2021 (Remote)
  • [Feb 2 2021] Our JMIR paper on lifestyle disease surveillance got accepted for presentation at the Population Association of America (PAA) 2021.
  • [Oct 1 2020] Joined NYU Abu Dhabi as a Research Associate
  • [Aug 6 2020] Defended my Master's Thesis on Characterizing Misinformed Online Health Communities.
  • [May 17 2020] Graduated from CMU LTI with MSc. in Language Technologies
  • [Oct 21 2019] Presented our work on Speech Emotion Recognition from Voice in the Wild
  • at SEI Research Review 2019
  • [Aug 26 2019] Started master's in language technologies at CMU LTI
  • [Apr 9 2019] Won the Center of Machine Learning for Health (CMLH) Fellowship in Digital Health
  • [Jul 29 2017] Joined CMU LTI as a Research Scholar
  • [May 1 2017] Graduated from CMUQ with BSc. in Computer Science
  • [Aug 15 2013] Arrived @CMUQ to study Computer Science
More >

Teaching

  • 2024 Autumn, Teaching Assistant and Guest Lecturer, Calling Bullshit: Data Reasoning In A Digital World with Professor Jevin D. West and Professor Carl Bergstrom
  • 2024 Spring, Teaching Assistant, Information Management Capstone with Professor Richard Sturman
  • 2024 Winter, Teaching Assistant, Information Management Capstone with Professor Richard Sturman
  • 2021 Spring, Course Design Assistant, Applied Data Science for Social Scientists (in Python) with Professor Bedoor AlShebli
  • 2021 Spring, Teaching and Course Design Assistant, Computational Forensics & AI with Professor Rita Singh
  • 2020 Spring, Teaching and Course Design Assistant, Computational Forensics & AI with Professor Rita Singh
  • 2015 Spring, Teaching Assistant, Interpretation & Argument with Professor Silvia Pessoa
  • 2014 Spring, Programming Peer Tutor, Academic Resource Center (ARC), CMU Qatar
  • 2014 Fall, English Language Instructor, Language Bridges Program, CMU Qatar
  • 2009 Summer, Instructor, Taleem-e-Balighan (lit: Education for Adults) program, Ladies Club School Hyderabad

Awards

  • 2024 CIP Innovation Fund $5k
  • 2024 eScience Azure Credits $8k
  • 2023 iSchool Student of the Month Award
  • 2023 CIP Graduate Fellowship
  • 2023 Best Paper Award at ICSSI
  • 2023 ICSSI Travel Grant
  • 2020 SBP-BRiMS Graduate Student Scholarship
  • 2019 Center of Machine Learning for Health Fellowship Winner
  • 2018 Finalist for Best Overall at HackPrinceton
  • 2018 Finalist for Best Design at HackPrinceton
  • 2017 College Honors for Undergraduate Research Thesis
  • 2017 Outstanding Service to the Computer Science Community
  • 2017 Audience Choice Award at NYUAD Hackathon for Social Good
  • 2017 Senior Student Leadership Awards
  • 2015 IMPAQT Cultural Ambassador
  • 2013 Dean's List at National University of Computer & Emerging Sciences
  • 2012 Second Position at Speed Programming Competition
  • 2012 Dean's List at National University of Computer & Emerging Sciences
  • 2009 Sixth Position among 30k+ students in District Hyderabad in 10th Grade

Service

Journal Reviewer: eLife, 2023
Journal Reviewer: SAGE Communication & Sport, 2023
Journal Reviewer: Elsevier's Information Processing and Management Journal, 2021
Journal Reviewer: Journal of Medical Internet Research, 2020
Conference Reviewer: IEEE International Conference on Machine Learning and Applications, 2020
Moderator: SBP-BRiMS, 2020
Peer Health Advocate: Mental Health Advocate, Student Health Services, Carnegie Mellon University, 2016-2017
Co-founder/Co-designer: CMU Qatar Mindfulness Room, 2016-2017
Board Member: Academic Review Board and University Disciplinary Committee (ARB-UDC), Carnegie Mellon University, 2015-2017


Press