OpenAnalytics

156 posts

OpenAnalytics

@OpenAnalytics

Data Analysis End to End. Powered by Open Source Technology.

Belgium Sumali Ekim 2008

200 Sinusundan705 Mga Tagasunod

OpenAnalytics@OpenAnalytics·4 Kas

ShinyProxy is the solution to deploy data science apps (#Shiny, #Dash, #Streamlit, #Jupyter and more). v3.2.1 sees improvements to the ECS backend. Also, it contains multiple fixes for XSS vulnerabilities. Check out the release notes at oa.eu/laE4lS #opensource

English

OpenAnalytics@OpenAnalytics·25 Şub

#opensource #rstats pipeline and #shiny app to build high-quality molecular databases and lay the data foundation for #ai models in medicinal chemistry. 🔗 Explore the article here: oa.eu/mk67B7 💻 Try out the prototype of the customized app: oa.eu/9xzXlS

English

112

OpenAnalytics@OpenAnalytics·19 Şub

RDepot is enterprise-grade #opensource software to manage #rstats and #Python repositories. In v2.5.1 packages built for a specific Linux distribution, architecture and R series can now be stored for blazingly fast installation. More goodies at oa.eu/x1xJTo

English

157

OpenAnalytics@OpenAnalytics·14 Şub

Today is #ilovefs Day All our services and free software products stand on the shoulders of giants. Thank you #rstats, #Python and #Julia. Thank you #Rust, #Kotlin, #C++ and #Java. Thank you #Kubernetes, #Singularity and #Docker. Learn more: oa.eu/v78aFW

English

168

OpenAnalytics@OpenAnalytics·4 Şub

Crane is a new #opensource product to host #datascience artifacts: reports, documentation sites, or packages and libraries. Part of our FOSS suite to build data science platforms and playing well with ShinyProxy and RDepot. Read more at oa.eu/aPxUcm #rstats #python

English

OpenAnalytics@OpenAnalytics·23 Eki

@jhngrant @swardley The 'tiny subset' I mention does not refer to the harmful content (to be addressed through a data commons with contributor agreements), but rather to a subset of the data types that are used in ML. If you predict response to drugs based on omics, you use omics data (not a corpus)

English

John Grant@jhngrant·23 Eki

@OpenAnalytics @swardley Dismissing harmful content as a 'tiny subset' ignores perverse incentives. Open developers may face greater legal risks for transparency, while those who hide data avoid scrutiny. This undermines responsible openness in AI development.

English

Simon Wardley@swardley·17 Eki

Just reading the @OpenSourceOrg definition of Open Source AI - opensource.org/deepdive/draft… It hints at how the symbolic instructions are not just code but include the training data however it is not explicit enough on this and the opt-outs are concerning ... 1/2

English

OpenAnalytics@OpenAnalytics·23 Eki

@swardley @jhngrant Indeed. Collective and intentional management as a shared resource is possible for this type of data: no need to scrape the ocean floor to catch fish (and throw all else away). Also, this type of data is a tiny subset of all data types one can train ML models on; AI > ML > LLMs.

English

730

Simon Wardley@swardley·23 Eki

@jhngrant @OpenAnalytics As I'm sure @OpenAnalytics will explain to you, this is why we have contributor agreements in open source. The same should apply to all symbolic instructions.

English

OpenAnalytics@OpenAnalytics·23 Eki

@jhngrant @swardley Many thanks for the interesting reference. Illegal content should not be in anyone's possession and therefore should not end up in a training dataset. This seems a problem to be solved upstream of Open Source AI. It does mean you cannot take shortcuts by using unguided crawling.

English

John Grant@jhngrant·22 Eki

@OpenAnalytics @swardley By "problematic content" I'm referring to illegal & harmful material like CSAM. It has been found in some training sets. Who governs filtering, who is liable, how we balance transparency of open source training data with ethical and legal considerations? purl.stanford.edu/kh752sm9123

English

OpenAnalytics@OpenAnalytics·22 Eki

@jhngrant @swardley What do you mean with problematic content in training data? Can you give a concrete example?

English

John Grant@jhngrant·22 Eki

@OpenAnalytics @swardley What's your position on the tension between complete transparency and filtering problematic content in training data? Who governs this filtering process - individual orgs, industry groups, government regulators, or independent auditors?

English

OpenAnalytics@OpenAnalytics·22 Eki

@jhngrant @swardley We mentioned being practitioners of open source data science (navigating the challenges of both open source and ML) since you mentioned being an open source advocate ('trust me, I am an open source expert'), but happy to go over these challenges 1 by 1.

English

John Grant@jhngrant·22 Eki

@OpenAnalytics @swardley "Trust us, we're experts" doesn't engage with the practical and legal concerns I've consistently raised. The open source model you advocate has worked well for software, but recent developments in ML introduce new real challenges that can't be ignored.

English

OpenAnalytics@OpenAnalytics·22 Eki

@jhngrant @swardley We are a #DataScience consultancy specialized in #OpenSource data science and are knee-deep into the practical realities of #AI development (since 15+ years). Based on that experience, we do not see any valid arguments to dilute the meaning of open source in an AI context.

English

John Grant@jhngrant·22 Eki

@OpenAnalytics @swardley There's a tension between the ideals of open source and the practical realities of AI development. I'm an open source advocate, but in the context of ML, we need to find a middle ground that promotes openness while addressing legitimate concerns around data access.

English

OpenAnalytics@OpenAnalytics·22 Eki

@jhngrant @swardley There is nothing ideological about it. There is a clear definition of open source that refers to 4 freedoms. If you apply these 4 freedoms to AI, then you need the training data to be able to meaningfully understand and modify (2 of the 4 freedoms) the resulting models.

English

924

John Grant@jhngrant·22 Eki

@OpenAnalytics @swardley Demanding that all training data be public to qualify as 'open source' is an ideological stance, not a scientific or rational one. It ignores real-world constraints. Openness in AI exists on a spectrum, it will never be a rigid requirement.

English

OpenAnalytics@OpenAnalytics·22 Eki

@jhngrant @swardley We agree, of course, but not having access to the training data severely reduces the freedom to understand and to modify. You can not reasonably call something 'open source AI' if it does not respect the basic four freedoms that define the idea of open source.

English

878

John Grant@jhngrant·22 Eki

@OpenAnalytics @swardley Arguing that allowing descriptions of training data will inevitably lead to developers only releasing descriptions of their code is a slippery slope. AI development will require principles that balance transparency with legal and ethical constraints, not extreme hypotheticals.

English

OpenAnalytics@OpenAnalytics·22 Eki

@jhngrant @swardley Start from the four freedoms (a.o. freedom to study and to modify) and you can only end up with needing the training data. Otherwise both of these (modify and study) are severely limited. 1/2

English

John Grant@jhngrant·18 Eki

@swardley Open source principles aim for transparency and collaboration. For AI, this goal likely requires new approaches beyond simply publishing all training data. 2/2

English

118

OpenAnalytics nag-retweet

Thiago Britto-Borges@tbrittoborges·7 May

Shinyproxy 3.1.0 release: github.com/openanalytics/… many new features, my highlights are container pre-initialization and support for IPC. Container sharing may also be interesting, but I need to test it. @OpenAnalytics

English

196

OpenAnalytics@OpenAnalytics·7 May

ShinyProxy 3.1.0. Instant access to your apps through container pre-initialization. Container sharing for massive deployments. #AWS ECS backend and more. Blog post at oa.eu/Uod85E and release notes at oa.eu/9OO2AV. #opensource #datascience #python #rstats

English

385

OpenAnalytics@OpenAnalytics·30 Nis

A selection of our athletes running for #UNICHIR. BENISUR - UNICHIR provides high-quality surgical and obstetric care to almost two million people in Beni, Eastern #Congo. If you want to learn more or make a donation: oa.eu/hOgm1U

English

112

OpenAnalytics@OpenAnalytics·20 Mar

We hosted the Belgian #punkrock band PARKS for the shooting of the music video for their song Big Frog. oa.eu/ap2g2A Happy #WorldFrogDay !

English

121

OpenAnalytics@OpenAnalytics·5 Mar

The expiry dates on pharmaceutical products are the result of extensive research. We just published a Bayesian methodology to predict shelf life in a more robust and interpretable way. Here's the paper: oa.eu/soBMjH #Bayesian #Statistics #Pharmaceutical

English

114

Tuklasin

@jhngrant @swardley @OpenSourceOrg @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates