Environnement et packages¶
In [39]:
# activer l'environnement
using Pkg
Pkg.activate("env_julia_cairomakie")
Activating project at `c:\Users\ricco\Desktop\demo\env_julia_cairomakie`
In [40]:
# liste des packages installés
Pkg.status()
Status `C:\Users\ricco\Desktop\demo\env_julia_cairomakie\Project.toml` [13f3f980] CairoMakie v0.15.11 [a93c6f00] DataFrames v1.8.2 [38e38edf] GLM v1.9.5 [7073ff75] IJulia v1.34.4 [fdbf4ff8] XLSX v0.11.10
Importation et inspection des données¶
In [41]:
# packages
import DataFrames as DFR
import XLSX
# lecture des données
df = DFR.DataFrame(XLSX.readtable("./data_crm.xlsx"))
# premières lignes
DFR.describe(df)
13×7 DataFrame
| Row | variable | mean | min | median | max | nmissing | eltype |
|---|---|---|---|---|---|---|---|
| Symbol | Union… | Any | Union… | Any | Int64 | DataType | |
| 1 | Age | 41.56 | 21 | 42.0 | 70 | 0 | Int64 |
| 2 | Revenu | 46686.4 | 18000 | 46385.5 | 79318 | 0 | Int64 |
| 3 | ScoreCredit | 549.882 | 300 | 546.5 | 850 | 0 | Int64 |
| 4 | Anciennete | 7.178 | 0 | 5.0 | 19 | 0 | Int64 |
| 5 | Montant | 10357.1 | 3000 | 10369.0 | 24878 | 0 | Int64 |
| 6 | Endettement | 22.2222 | 5.0 | 21.55 | 66.1 | 0 | Float64 |
| 7 | SitFamiliale | Celibataire | Marie | 0 | String | ||
| 8 | TypeContTravail | CDD | Interim | 0 | String | ||
| 9 | SecteurTravail | Commerce | Tech | 0 | String | ||
| 10 | HistoIncidents | 0.408 | 0 | 0.0 | 3 | 0 | Int64 |
| 11 | SituImmobilier | Non | Oui | 0 | String | ||
| 12 | CanalAcquis | Agence | Web | 0 | String | ||
| 13 | CreditOK | Non | Oui | 0 | String |
In [42]:
# nombre d'observations
n = DFR.nrow(df)
println(n)
500
Graphiques - Une variable¶
In [43]:
# importation de la librairie + alias
import CairoMakie as CM
Distribution - Variable quantitative¶
In [44]:
# histogramme
CM.hist(df.Revenu)
In [45]:
# meilleure maîtrise des axes
# définir une Figure()
fig = CM.Figure()
ax = CM.Axis(fig[1,1],
xticklabelsize = 10,
xtickformat = "{:.0f}")
CM.hist!(ax,df.Revenu)# ! pour indiquer qu'on rajoute dans la figure
fig
In [46]:
# deux histogrammes côte à côte
# partage de la fenêtre d'affichage
fig = CM.Figure()
ax_1 = CM.Axis(fig[1,1], xticklabelsize = 10, xtickformat = "{:.0f}", title = "Revenu")
ax_2 = CM.Axis(fig[1,2], xticklabelsize = 10, xtickformat = "{:.2f}", title = "Endettement")
CM.hist!(ax_1,df.Revenu)
CM.hist!(ax_2,df.Endettement, color = :green)
fig
In [47]:
# density plot
CM.density(df.Revenu)
In [48]:
# boxplot
# nécessite forcément une catégorie
# on va feinter en créant une constante avec fill()
fig = CM.Figure()
ax = CM.Axis(fig[1,1], yticklabelsize = 10, ytickformat = "{:.0f}")
CM.boxplot!(ax,fill(1,n),df.Revenu)
fig
In [49]:
# violin
CM.violin(fill(1,n),df.Revenu)
Ligne¶
In [50]:
import Statistics
# ligne
fig = CM.Figure()
ax = CM.Axis(fig[1,1])
CM.lines!(ax,1:n,sort(df.ScoreCredit))
CM.hlines!([Statistics.median(df.ScoreCredit)],color=:green)
fig
Distribution - Variable qualitative¶
In [51]:
# barplot
# type d'emploi - comptabiliser
v = DFR.combine(DFR.groupby(df,:TypeContTravail),DFR.nrow => :effectif)
v
3×2 DataFrame
| Row | TypeContTravail | effectif |
|---|---|---|
| String | Int64 | |
| 1 | CDI | 308 |
| 2 | Interim | 62 |
| 3 | CDD | 130 |
In [52]:
# afficher le barplot -> provoque une erreur
CM.barplot(v.TypeContTravail,v.effectif)
ArgumentError:
Conversion failed for Makie.BarPlot (With conversion trait Makie.PointBased()) with args:
Tuple{Vector{String}, Vector{Int64}}
Got converted to: Tuple{Vector{String}, Vector{Int64}}
Makie.BarPlot requires to convert to argument types Tuple{AbstractVector{<:Union{GeometryBasics.Point2, GeometryBasics.Point3}}}, which convert_arguments didn't succeed in.
To fix this overload convert_arguments(P, args...) for Makie.BarPlot or Makie.PointBased() and return an object of type Tuple{AbstractVector{<:Union{GeometryBasics.Point2, GeometryBasics.Point3}}}.`
Stacktrace:
[1] argument_error(PTrait::Makie.PointBased, P::Type, args::Tuple{Vector{String}, Vector{Int64}}, user_kw::Dict{Symbol, Any}, converted::Tuple{Vector{String}, Vector{Int64}})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:724
[2] (Makie.BarPlot)(user_args::Tuple{Vector{String}, Vector{Int64}}, user_attributes::Dict{Symbol, Any})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:773
[3] _create_plot(::Function, ::Dict{Symbol, Any}, ::Vector{String}, ::Vararg{Any})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\figureplotting.jl:458
[4] barplot(::Vector{String}, ::Vararg{Any}; kw::@Kwargs{})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\recipes.jl:546
[5] barplot(::Vector{String}, ::Vararg{Any})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\recipes.jl:544
[6] top-level scope
@ c:\Users\ricco\Desktop\demo\jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X25sZmlsZQ==.jl:2
In [53]:
# en fait, il faut utiliser une numérotation
# qui correspond à une position de chaque label
fig = CM.Figure()
ax = CM.Axis(fig[1,1],xticks=(1:DFR.nrow(v),v.TypeContTravail))
CM.barplot!(ax,1:DFR.nrow(v),v.effectif)
fig
In [54]:
# voir aussi pie
# faire apparaître les labels s'apparente à un chemin de croix
couleurs = [:red,:green,:blue]
fig, ax, plt = CM.pie(v.effectif,
color=couleurs,
label=[v.TypeContTravail[i] =>
(; color = c) for (i,c) in enumerate(couleurs)])
leg = CM.Legend(fig[1,2],ax)
fig
Graphiques - Deux ou plusieurs variables¶
Nuage de points et régression¶
In [55]:
# scatter plot
CM.scatter(df.Revenu, df.ScoreCredit)
In [56]:
# régression simple
import GLM
reg = GLM.lm(@GLM.formula(ScoreCredit ~ Revenu),df)
reg
StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
ScoreCredit ~ 1 + Revenu
Coefficients:
──────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────────────
(Intercept) 163.715 8.61687 19.00 <1e-60 146.785 180.645
Revenu 0.00827151 0.000178807 46.26 <1e-99 0.0079202 0.00862282
──────────────────────────────────────────────────────────────────────────────────
In [57]:
# récupération des coefficients
coef = GLM.coef(reg)
# graphique avec la droite de régression
# coef[1] -> intercept
# coef[2] -> pente
fig = CM.Figure()
ax = CM.Axis(fig[1,1])
CM.scatter!(ax,df.Revenu, df.ScoreCredit,markersize=5,color=:gray)
CM.ablines!(coef[1],coef[2],color=:green,linewidth=3)
fig
In [58]:
# résidus de la régression
residus = df.ScoreCredit .- (coef[2] .* df.Revenu .+ coef[1])
residus
500-element Vector{Float64}:
-26.06676234854865
17.958659019122706
82.82511678547291
-35.087736739002594
-106.78339641893342
-37.838393356479514
18.62830610364324
-16.627174221447945
8.654318699053533
-36.916080663215325
⋮
49.09502117179716
-47.455261789385645
87.81615472120325
-30.63603175714354
-14.234405644479466
-90.10887856518855
-48.28684422195295
-75.33971699678727
-9.9718518722633
In [59]:
# graphique des résidus
fig = CM.Figure()
ax = CM.Axis(fig[1,1])
CM.scatter!(df.ScoreCredit,residus,markersize=5,color=:gray)
CM.hlines!([0],linewidth=2,color=:blue)
fig
Autres nuages de points¶
In [60]:
# nuage de points conditionnellement à une variable
les_oui = (df.CreditOK .== "Oui")
les_non = (df.CreditOK .== "Non")
fig = CM.Figure()
ax = CM.Axis(fig[1,1])
CM.scatter!(df.Revenu[les_oui],df.ScoreCredit[les_oui],color=:blue)
CM.scatter!(df.Revenu[les_non],df.ScoreCredit[les_non],color=:red)
fig
In [61]:
# nuage de points avec taille de points variant selon Montant
# taille variant entre 0 et 1
montant = df.Montant
taille = (montant .- minimum(montant))/(maximum(montant) .- minimum(montant))
# graphique
CM.scatter(df.Revenu,df.ScoreCredit,markersize=taille*15)
Distribution conditionnelle¶
In [62]:
# boxplot conditionnel -> marche pas directement ????
CM.boxplot(df.CreditOK, df.Revenu)
Failed to resolve arg1:
[ComputeEdge] arg1 = compute_identity((outlier_points, ), changed, cached)
@ C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:743
[ComputeEdge] outlier_points, outlier_indices, q1s, q5s = (::MapFunctionWrapper(#1372))((y, groups, quantiles, range, show_outliers, orientation, ), changed, cached)
@ unknown method location
[ComputeEdge] x, y = (::MapFunctionWrapper(#_register_argument_conversions!##8))((converted, ), changed, cached)
@ C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:533
with edge inputs:
converted = ((["Non", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui" … "Oui", "Non", "Oui", "Oui", "Non", "Non", "Non", "Non", "Oui", "Oui"], [63997, 41628, 51195, 57592, 57797, 48011, 50977, 54393, 41423, 55516 … 46698, 18762, 51108, 70056, 29852, 54950, 44175, 44801, 52303, 61084]),)
Triggered by update of:
dim_convert_2, arg1, dim_convert_1 or arg2
Due to ERROR: Result needs to have same length. Found: ((["Non", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Non", "Oui", "Non", "Non", "Non", "Oui", "Non", "Oui", "Non", "Non", "Non", "Non", "Non", "Non", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Non", "Non", "Non", "Non", "Oui", "Non", "Oui", "Non", "Non", "Non", "Non", "Non", "Non", "Non", "Oui", "Non", "Oui", "Non", "Non", "Oui", "Non", "Non", "Non", "Non", "Non", "Oui", "Non", "Oui", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Non", "Oui", "Non", "Non", "Non", "Non", "Oui", "Oui", "Non", "Non", "Non", "Non", "Non", "Non", "Oui", "Oui", "Non", "Oui", "Non", "Oui", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Oui", "Non", "Non", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Non", "Oui", "Non", "Oui", "Oui", "Oui", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Non", "Non", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Non", "Non", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Oui", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Oui", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Non", "Non", "Non", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Oui", "Non", "Oui", "Non", "Non", "Non", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Non", "Non", "Non", "Oui", "Non", "Oui", "Non", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Oui", "Oui", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Non", "Non", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Oui", "Non", "Oui", "Non", "Non", "Non", "Non", "Oui", "Non", "Non", "Oui", "Non", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Oui", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Non", "Oui", "Oui", "Non", "Oui", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Non", "Non", "Non", "Oui", "Non", "Non", "Non", "Non", "Oui", "Non", "Non", "Oui", "Non", "Non", "Non", "Non", "Non", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Non", "Non", "Oui", "Oui", "Non", "Non", "Oui", "Non", "Oui", "Oui", "Oui", "Non", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Non", "Oui", "Oui", "Oui", "Oui", "Non", "Oui", "Non", "Oui", "Non", "Oui", "Oui", "Non", "Non", "Non", "Non", "Oui", "Oui"], [63997, 41628, 51195, 57592, 57797, 48011, 50977, 54393, 41423, 55516, 59556, 38374, 36915, 53743, 62781, 32097, 56194, 46120, 49167, 63628, 44796, 57987, 29704, 44142, 71675, 38517, 62515, 34983, 38316, 38821, 33515, 19695, 42811, 43832, 62421, 30726, 29015, 56506, 26193, 46309, 55307, 50419, 53927, 42267, 55025, 65610, 53248, 35868, 38701, 58942, 32619, 25132, 50559, 50572, 42916, 55814, 52771, 36552, 34074, 65035, 52680, 60650, 72667, 42545, 45512, 42283, 37728, 33742, 52583, 51240, 45601, 51816, 29006, 61802, 30780, 44893, 46029, 46659, 43435, 27882, 29365, 43088, 45697, 50664, 35215, 22191, 59245, 42948, 51690, 42250, 44010, 51987, 56921, 38711, 56491, 54330, 46547, 56796, 47943, 33109, 50496, 32175, 33707, 45504, 52404, 53363, 43198, 46580, 61719, 57598, 29796, 46237, 24794, 35511, 42026, 27061, 64062, 42431, 20580, 46827, 28893, 37346, 37266, 28653, 56320, 41740, 27267, 48337, 39490, 59498, 37490, 41712, 49062, 39345, 62419, 54976, 42125, 39170, 39725, 59299, 38996, 70309, 33648, 40328, 50971, 59914, 30943, 42631, 44662, 32466, 59029, 66097, 59760, 33605, 48867, 59878, 58904, 45628, 52870, 36050, 44047, 44752, 19318, 76484, 50455, 59679, 69345, 53096, 41578, 56440, 28001, 33789, 44279, 46656, 51138, 33033, 39206, 48746, 51293, 67370, 46284, 69715, 37212, 41395, 36876, 43347, 25469, 35501, 60709, 73777, 40168, 52720, 36895, 56361, 35580, 70390, 27152, 24376, 50442, 61262, 41388, 38511, 62260, 36337, 28746, 52036, 49857, 55489, 31979, 30980, 62249, 52595, 47133, 54797, 47333, 42308, 53644, 49394, 65966, 46747, 35170, 47873, 49454, 50317, 30241, 41749, 31098, 44416, 49635, 36181, 41129, 32348, 36683, 33683, 24275, 42304, 62433, 54127, 48700, 45490, 44709, 58879, 52108, 65875, 26689, 64023, 62610, 35242, 59245, 41312, 48501, 64623, 45243, 23064, 45955, 50788, 78403, 53603, 36806, 42627, 74491, 39988, 62629, 42201, 55020, 48824, 42996, 44843, 58627, 27586, 55303, 48617, 37085, 48795, 57758, 53572, 47882, 43118, 37223, 71638, 30401, 36149, 41366, 45437, 28462, 18247, 49712, 44709, 46409, 53657, 73448, 38992, 69332, 30416, 53448, 45313, 62714, 32631, 43011, 29496, 66890, 67092, 48047, 42651, 47001, 48622, 44943, 55499, 43927, 39805, 59582, 57456, 18000, 41407, 31545, 34911, 38253, 33918, 63840, 49076, 53518, 49614, 49894, 51386, 42203, 54093, 51179, 42566, 54589, 26501, 34973, 54789, 40713, 32604, 30777, 66171, 19832, 46362, 68263, 40440, 39049, 63214, 50463, 71123, 36108, 51586, 43645, 56972, 18000, 33712, 52201, 41224, 62838, 51217, 38162, 55412, 47827, 52972, 37188, 37860, 62840, 50231, 53277, 34617, 57569, 72544, 55791, 51887, 60813, 42951, 20897, 33786, 53976, 35698, 57619, 52773, 41123, 40167, 52076, 44646, 51116, 39536, 49334, 79318, 52639, 54783, 45532, 49477, 41585, 47665, 50063, 41403, 60140, 63305, 32635, 61218, 48035, 49492, 41338, 53367, 49869, 44536, 48412, 58080, 59158, 33725, 46313, 52590, 56177, 52456, 29822, 59528, 39972, 46819, 29327, 40916, 61295, 45853, 40430, 52759, 39116, 39091, 62539, 41744, 41511, 54113, 60666, 59050, 34212, 44956, 68307, 64043, 45475, 40454, 30353, 61605, 54900, 45947, 54505, 64293, 51123, 33465, 25247, 42941, 67860, 40250, 48670, 38577, 42875, 48867, 37627, 40874, 37383, 58087, 33403, 32694, 36113, 36629, 46532, 46929, 46170, 60046, 47120, 34639, 54189, 46458, 32487, 42957, 35261, 39864, 74343, 36791, 53557, 64771, 54167, 47433, 49463, 62806, 49570, 26357, 33676, 70447, 44270, 34079, 56013, 52941, 64366, 28280, 30477, 31714, 46698, 18762, 51108, 70056, 29852, 54950, 44175, 44801, 52303, 61084]),), for func C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:1017
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:44
[2] ComputePipeline.TypedEdge(edge::ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph}, f::ComputePipeline.MapFunctionWrapper{false, Makie.var"#_register_argument_conversions!##8#_register_argument_conversions!##9"}, inputs::@NamedTuple{converted::Base.RefValue{Tuple{Tuple{Vector{String}, Vector{Int64}}}}})
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:140
[3] ComputePipeline.TypedEdge(edge::ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph})
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:120
[4] (::ComputePipeline.var"#resolve!##4#resolve!##5"{ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph}})()
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:670
[5] lock(f::ComputePipeline.var"#resolve!##4#resolve!##5"{ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph}}, l::ReentrantLock)
@ Base .\lock.jl:335
[6] resolve!(edge::ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph})
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:665
[7] _resolve!(computed::ComputePipeline.Computed)
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:658
[8] foreach
@ .\abstractarray.jl:3188 [inlined]
[9] (::ComputePipeline.var"#resolve!##4#resolve!##5"{ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph}})()
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:667
[10] lock(f::ComputePipeline.var"#resolve!##4#resolve!##5"{ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph}}, l::ReentrantLock)
@ Base .\lock.jl:335
[11] resolve!(edge::ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph})
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:665
[12] _resolve!(computed::ComputePipeline.Computed)
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:658
[13] foreach
@ .\abstractarray.jl:3188 [inlined]
[14] (::ComputePipeline.var"#resolve!##4#resolve!##5"{ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph}})()
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:667
[15] lock(f::ComputePipeline.var"#resolve!##4#resolve!##5"{ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph}}, l::ReentrantLock)
@ Base .\lock.jl:335
[16] resolve!(edge::ComputePipeline.ComputeEdge{ComputePipeline.ComputeGraph})
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:665
[17] _resolve!(computed::ComputePipeline.Computed)
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:658
[18] resolve!(computed::ComputePipeline.Computed)
@ ComputePipeline C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:650
[19] getindex
@ C:\Users\ricco\.julia\packages\ComputePipeline\E2l50\src\ComputePipeline.jl:563 [inlined]
[20] #_register_expand_arguments!##0
@ C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:399 [inlined]
[21] iterate
@ .\generator.jl:48 [inlined]
[22] _collect(c::Vector{Symbol}, itr::Base.Generator{Vector{Symbol}, Makie.var"#_register_expand_arguments!##0#_register_expand_arguments!##1"{ComputePipeline.ComputeGraph}}, ::Base.EltypeUnknown, isz::Base.HasShape{1})
@ Base .\array.jl:810
[23] collect_similar
@ .\array.jl:732 [inlined]
[24] map
@ .\abstractarray.jl:3372 [inlined]
[25] _register_expand_arguments!(::Type{Makie.Scatter}, attr::ComputePipeline.ComputeGraph, inputs::Vector{Symbol}, is_merged::Bool)
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:399
[26] _register_expand_arguments!
@ C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:395 [inlined]
[27] register_arguments!
@ C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:373 [inlined]
[28] (Makie.Scatter)(user_args::Tuple{ComputePipeline.Computed}, user_attributes::Dict{Symbol, Any})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:769
[29] _create_plot!(F::Function, attributes::Dict{Symbol, Any}, scene::Makie.BoxPlot{Tuple{Tuple{Vector{String}, Vector{Int64}}}}, args::ComputePipeline.Computed)
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\figureplotting.jl:552
[30] scatter!(::Makie.BoxPlot{Tuple{Tuple{Vector{String}, Vector{Int64}}}}, ::Vararg{Any}; kw::@Kwargs{color::ComputePipeline.Computed, marker::ComputePipeline.Computed, markersize::ComputePipeline.Computed, strokecolor::ComputePipeline.Computed, strokewidth::ComputePipeline.Computed, inspectable::ComputePipeline.Computed, colorrange::Makie.Automatic, visible::ComputePipeline.Computed})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\recipes.jl:550
[31] plot!(plot::Makie.BoxPlot{Tuple{Tuple{Vector{String}, Vector{Int64}}}})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\stats\boxplot.jl:195
[32] connect_plot!(parent::Makie.Scene, plot::Makie.BoxPlot{Tuple{Tuple{Vector{String}, Vector{Int64}}}})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\compute-plots.jl:843
[33] plot!
@ C:\Users\ricco\.julia\packages\Makie\WKgwk\src\interfaces.jl:211 [inlined]
[34] plot!(ax::Makie.Axis, plot::Makie.BoxPlot{Tuple{Tuple{Vector{String}, Vector{Int64}}}})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\figureplotting.jl:573
[35] plot!(fa::Makie.FigureAxis, plot::Makie.BoxPlot{Tuple{Tuple{Vector{String}, Vector{Int64}}}})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\figureplotting.jl:569
[36] _create_plot(::Function, ::Dict{Symbol, Any}, ::Vector{String}, ::Vararg{Any})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\figureplotting.jl:460
[37] boxplot(::Vector{String}, ::Vararg{Any}; kw::@Kwargs{})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\recipes.jl:546
[38] boxplot(::Vector{String}, ::Vararg{Any})
@ Makie C:\Users\ricco\.julia\packages\Makie\WKgwk\src\recipes.jl:544
[39] top-level scope
@ c:\Users\ricco\Desktop\demo\jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X45sZmlsZQ==.jl:2
In [63]:
# boxplot revenu vs. acceptation => if faut coder nous-même
# valeurs uniques
labels = sort(unique(df.CreditOK))
println(labels)
# associer un code aux valeurs
codes = Int.(indexin(df.CreditOK,labels))
println(codes)
# puis la figure
fig = CM.Figure()
ax = CM.Axis(fig[1,1],xlabel="Credit OK",xticks=(1:length(labels),labels))
CM.boxplot!(ax,codes, df.Revenu)
fig
["Non", "Oui"] [1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 2, 2]
In [64]:
# même chose pour violin
fig = CM.Figure()
ax = CM.Axis(fig[1,1],xlabel="Credit OK",xticks=(1:length(labels),labels))
CM.violin!(ax,codes, df.Revenu)
fig
In [65]:
# densité conditionnelle
fig = CM.Figure()
ax = CM.Axis(fig[1,1])
# densité chez les "Oui"
CM.density!(ax,df.Revenu[df.CreditOK .== "Oui"])
# avec un peu de transparence (alpha) -> les "Non"
CM.density!(ax,df.Revenu[df.CreditOK .== "Non"],alpha=0.4,color=:green)
fig
In [66]:
# on pourrait le voir différemment aussi
min_rev = minimum(df.Revenu)
max_rev = maximum(df.Revenu)
# histogramme
fig = CM.Figure()
ax_1 = CM.Axis(fig[1,1],title="Oui")
ax_2 = CM.Axis(fig[2,1],title="Non")
CM.xlims!(ax_1,min_rev,max_rev)
CM.xlims!(ax_2,min_rev,max_rev)
CM.hist!(ax_1,df.Revenu[df.CreditOK .== "Oui"])
CM.hist!(ax_2,df.Revenu[df.CreditOK .== "Non"],color=:green)
fig
In [67]:
# décalage entre fonctions de distribution
fig = CM.Figure()
ax = CM.Axis(fig[1,1],title="Empirical Cumulative Distribution Function")
CM.ecdfplot!(ax,df.Revenu[df.CreditOK .== "Oui"],color=:blue,label="Oui")
CM.ecdfplot!(ax,df.Revenu[df.CreditOK .== "Non"],color=:green,label="Non")
CM.Legend(fig[1,2],ax)
fig
In [68]:
# médianes
med_oui = Statistics.median(df.Revenu[df.CreditOK .== "Oui"])
med_non = Statistics.median(df.Revenu[df.CreditOK .== "Non"])
println("Mediane(oui) = $med_oui ; Mediane(non) = $med_non")
Mediane(oui) = 53172.0 ; Mediane(non) = 39765.0
In [69]:
# autre manière de comparer les distributions empiriques
# QQPlot
fig = CM.Figure()
ax = CM.Axis(fig[1,1])
CM.xlims!(ax,min_rev,max_rev)
CM.ylims!(ax,min_rev,max_rev)
CM.qqplot!(ax,df.Revenu[df.CreditOK .== "Oui"],df.Revenu[df.CreditOK .== "Non"])
CM.scatter!([med_oui],[med_non],color=:red,markersize=25,marker=:cross)
CM.ablines!(0,1,color=:green)
fig
Heatmap (1) - Corrélations¶
In [70]:
# variables numériques
X = DFR.select(df,names(df,DFR.Number))
names(X)
7-element Vector{String}:
"Age"
"Revenu"
"ScoreCredit"
"Anciennete"
"Montant"
"Endettement"
"HistoIncidents"
In [71]:
# calculer la matrice des corrélations
cor_mat = Statistics.cor(Matrix(X))
cor_mat
7×7 Matrix{Float64}:
1.0 -0.0809775 -0.0922087 -0.0766463 … 0.0424002 -0.0506387
-0.0809775 1.0 0.900675 0.103156 -0.0546184 0.0194893
-0.0922087 0.900675 1.0 0.347076 -0.0345155 -0.249708
-0.0766463 0.103156 0.347076 1.0 0.0262538 0.0177185
-0.0142236 0.547024 0.488969 0.0508609 0.695975 0.00864319
0.0424002 -0.0546184 -0.0345155 0.0262538 … 1.0 -0.0422676
-0.0506387 0.0194893 -0.249708 0.0177185 -0.0422676 1.0
In [72]:
# nombre de variables
p = size(cor_mat)[1]
p
7
In [73]:
# heatmap
fig = CM.Figure()
ax = CM.Axis(fig[1, 1],
xticks = (1:p,names(X)),
yticks = (1:p,names(X)),
xticklabelrotation = π / 2,
yreversed=true) #inverser les lignes !!!
echelle = CM.heatmap!(ax, cor_mat, colormap = :coolwarm, colorrange=(-1,+1))
CM.Colorbar(fig[1,2],echelle)
fig
Heatmap (2) - Moyennes conditionnelles¶
In [74]:
# moyenne de score credit en fonction de
# situation familiale et type de contrat de travail
res = DFR.combine(DFR.groupby(df,[:SitFamiliale,:SecteurTravail]),:Revenu => Statistics.mean => :RevenuMoyen)
res
15×3 DataFrame
| Row | SitFamiliale | SecteurTravail | RevenuMoyen |
|---|---|---|---|
| String | String | Float64 | |
| 1 | Marie | Finance | 61761.8 |
| 2 | Marie | Tech | 50050.0 |
| 3 | Celibataire | Industrie | 41829.0 |
| 4 | Divorce | Finance | 62541.7 |
| 5 | Celibataire | Sante | 46546.1 |
| 6 | Celibataire | Finance | 59360.1 |
| 7 | Celibataire | Commerce | 34074.0 |
| 8 | Marie | Industrie | 40676.4 |
| 9 | Celibataire | Tech | 50042.3 |
| 10 | Divorce | Tech | 53098.1 |
| 11 | Divorce | Commerce | 39013.5 |
| 12 | Marie | Sante | 45476.0 |
| 13 | Marie | Commerce | 33392.7 |
| 14 | Divorce | Industrie | 45437.0 |
| 15 | Divorce | Sante | 46028.0 |
In [75]:
# sous la forme d'un tableau croisé
res_tab = DFR.unstack(res,:SitFamiliale,:SecteurTravail,:RevenuMoyen)
res_tab
3×6 DataFrame
| Row | SitFamiliale | Finance | Tech | Industrie | Sante | Commerce |
|---|---|---|---|---|---|---|
| String | Float64? | Float64? | Float64? | Float64? | Float64? | |
| 1 | Marie | 61761.8 | 50050.0 | 40676.4 | 45476.0 | 33392.7 |
| 2 | Celibataire | 59360.1 | 50042.3 | 41829.0 | 46546.1 | 34074.0 |
| 3 | Divorce | 62541.7 | 53098.1 | 45437.0 | 46028.0 | 39013.5 |
In [76]:
# heatmap
fig = CM.Figure()
ax = CM.Axis(fig[1, 1],
xticks = (1:(DFR.ncol(res_tab)-1),names(res_tab)[2:end]),
yticks = (1:DFR.nrow(res_tab),res_tab.SitFamiliale),
title = "Revenu moyen",
yreversed=true) #inverser les lignes !!!
#tableau transformé en Matrix et transposé !!!
echelle = CM.heatmap!(ax, Matrix(res_tab[1:end,2:end])', colormap = :Blues)
CM.Colorbar(fig[1,2],echelle)
fig