OpenAnalytics

156 posts

OpenAnalytics banner
OpenAnalytics

OpenAnalytics

@OpenAnalytics

Data Analysis End to End. Powered by Open Source Technology.

Belgium Sumali Ekim 2008
200 Sinusundan705 Mga Tagasunod
OpenAnalytics
OpenAnalytics@OpenAnalytics·
RDepot is enterprise-grade #opensource software to manage #rstats and #Python repositories. In v2.5.1 packages built for a specific Linux distribution, architecture and R series can now be stored for blazingly fast installation. More goodies at oa.eu/x1xJTo
OpenAnalytics tweet media
English
0
1
2
157
OpenAnalytics
OpenAnalytics@OpenAnalytics·
Crane is a new #opensource product to host #datascience artifacts: reports, documentation sites, or packages and libraries. Part of our FOSS suite to build data science platforms and playing well with ShinyProxy and RDepot. Read more at oa.eu/aPxUcm #rstats #python
OpenAnalytics tweet media
English
0
0
0
86
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@jhngrant @swardley The 'tiny subset' I mention does not refer to the harmful content (to be addressed through a data commons with contributor agreements), but rather to a subset of the data types that are used in ML. If you predict response to drugs based on omics, you use omics data (not a corpus)
English
0
0
0
25
John Grant
John Grant@jhngrant·
@OpenAnalytics @swardley Dismissing harmful content as a 'tiny subset' ignores perverse incentives. Open developers may face greater legal risks for transparency, while those who hide data avoid scrutiny. This undermines responsible openness in AI development.
English
2
0
0
39
Simon Wardley
Simon Wardley@swardley·
Just reading the @OpenSourceOrg definition of Open Source AI - opensource.org/deepdive/draft… It hints at how the symbolic instructions are not just code but include the training data however it is not explicit enough on this and the opt-outs are concerning ... 1/2
English
1
1
4
2K
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@swardley @jhngrant Indeed. Collective and intentional management as a shared resource is possible for this type of data: no need to scrape the ocean floor to catch fish (and throw all else away). Also, this type of data is a tiny subset of all data types one can train ML models on; AI > ML > LLMs.
English
1
1
3
730
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@jhngrant @swardley Many thanks for the interesting reference. Illegal content should not be in anyone's possession and therefore should not end up in a training dataset. This seems a problem to be solved upstream of Open Source AI. It does mean you cannot take shortcuts by using unguided crawling.
English
1
0
0
61
John Grant
John Grant@jhngrant·
@OpenAnalytics @swardley By "problematic content" I'm referring to illegal & harmful material like CSAM. It has been found in some training sets. Who governs filtering, who is liable, how we balance transparency of open source training data with ethical and legal considerations? purl.stanford.edu/kh752sm9123
English
1
0
0
61
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@jhngrant @swardley What do you mean with problematic content in training data? Can you give a concrete example?
English
1
0
0
43
John Grant
John Grant@jhngrant·
@OpenAnalytics @swardley What's your position on the tension between complete transparency and filtering problematic content in training data? Who governs this filtering process - individual orgs, industry groups, government regulators, or independent auditors?
English
1
0
0
49
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@jhngrant @swardley We mentioned being practitioners of open source data science (navigating the challenges of both open source and ML) since you mentioned being an open source advocate ('trust me, I am an open source expert'), but happy to go over these challenges 1 by 1.
English
1
0
0
48
John Grant
John Grant@jhngrant·
@OpenAnalytics @swardley "Trust us, we're experts" doesn't engage with the practical and legal concerns I've consistently raised. The open source model you advocate has worked well for software, but recent developments in ML introduce new real challenges that can't be ignored.
English
1
0
0
53
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@jhngrant @swardley We are a #DataScience consultancy specialized in #OpenSource data science and are knee-deep into the practical realities of #AI development (since 15+ years). Based on that experience, we do not see any valid arguments to dilute the meaning of open source in an AI context.
English
1
0
1
73
John Grant
John Grant@jhngrant·
@OpenAnalytics @swardley There's a tension between the ideals of open source and the practical realities of AI development. I'm an open source advocate, but in the context of ML, we need to find a middle ground that promotes openness while addressing legitimate concerns around data access.
English
1
0
0
69
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@jhngrant @swardley There is nothing ideological about it. There is a clear definition of open source that refers to 4 freedoms. If you apply these 4 freedoms to AI, then you need the training data to be able to meaningfully understand and modify (2 of the 4 freedoms) the resulting models.
English
1
1
2
924
John Grant
John Grant@jhngrant·
@OpenAnalytics @swardley Demanding that all training data be public to qualify as 'open source' is an ideological stance, not a scientific or rational one. It ignores real-world constraints. Openness in AI exists on a spectrum, it will never be a rigid requirement.
English
1
0
0
71
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@jhngrant @swardley We agree, of course, but not having access to the training data severely reduces the freedom to understand and to modify. You can not reasonably call something 'open source AI' if it does not respect the basic four freedoms that define the idea of open source.
English
1
1
2
878
John Grant
John Grant@jhngrant·
@OpenAnalytics @swardley Arguing that allowing descriptions of training data will inevitably lead to developers only releasing descriptions of their code is a slippery slope. AI development will require principles that balance transparency with legal and ethical constraints, not extreme hypotheticals.
English
1
0
0
70
OpenAnalytics
OpenAnalytics@OpenAnalytics·
@jhngrant @swardley Start from the four freedoms (a.o. freedom to study and to modify) and you can only end up with needing the training data. Otherwise both of these (modify and study) are severely limited. 1/2
English
0
0
0
15
John Grant
John Grant@jhngrant·
@swardley Open source principles aim for transparency and collaboration. For AI, this goal likely requires new approaches beyond simply publishing all training data. 2/2
English
2
0
0
118
OpenAnalytics
OpenAnalytics@OpenAnalytics·
A selection of our athletes running for #UNICHIR. BENISUR - UNICHIR provides high-quality surgical and obstetric care to almost two million people in Beni, Eastern #Congo. If you want to learn more or make a donation: oa.eu/hOgm1U
OpenAnalytics tweet media
English
0
0
0
112
OpenAnalytics
OpenAnalytics@OpenAnalytics·
The expiry dates on pharmaceutical products are the result of extensive research. We just published a Bayesian methodology to predict shelf life in a more robust and interpretable way. Here's the paper: oa.eu/soBMjH #Bayesian #Statistics #Pharmaceutical
OpenAnalytics tweet media
English
0
0
0
114